RICE - Reach, Impact, Confidence, Effort - is a reasonable prioritization framework for most software products. You estimate how many users a feature will reach, how much impact it will have, how confident you are in those estimates, and how much work it takes. Divide, rank, ship in order.
AI products break this framework in three specific ways. First, confidence in AI features is structurally lower than in deterministic software - model outputs are probabilistic, accuracy varies across user segments, and what works in the lab often behaves differently in production. Second, AI features have compounding dependencies - a better embedding model enables five downstream features you couldn't build before. Third, the feedback loop between shipping and learning is longer and noisier than in traditional software.
After three years of building AI products across Life Sciences and Healthcare, I've developed a modified framework that accounts for these realities.
What Breaks in Standard RICE
The Confidence component in RICE is typically a gut-check percentage - 80% if you've validated with users, 50% if you haven't, 20% if it's speculative. For AI features, confidence needs to be decomposed into at least three separate dimensions:
- Technical confidence: Can we actually build this to the required accuracy threshold? Do we have the data? Does a similar capability exist in research?
- Product confidence: Will users engage with this feature given AI's current limitations? Will they trust the output enough to act on it?
- Business confidence: Will the accuracy level we can achieve actually create value, or does it need to be better than X% to matter?
Collapsing these into a single confidence number hides the most important risks. A feature can have high technical confidence (we know we can build it) and low business confidence (the accuracy we can achieve isn't good enough to change user behavior).
The AI Prioritization Framework: RICE-E
I've added one dimension to RICE that changes everything: Extractability. This measures how much learning you get from building a feature regardless of whether it succeeds commercially.
The formula: Score = (Reach x Impact x Confidence) / Effort x Extractability Multiplier
Extractability is a multiplier from 0.5 to 2.0:
- 2.0 - Building this creates reusable infrastructure, labeled datasets, or model capabilities that enable 3+ future features
- 1.5 - Building this creates significant learnings about user behavior or model performance that will improve future prioritization
- 1.0 - Standard feature, learnings are feature-specific
- 0.5 - One-off capability that creates technical debt or data silos that slow down future work
This multiplier changes the ranking meaningfully. A feature with modest RICE scores but high extractability - say, building your first human feedback collection loop - often should be prioritized over a higher-RICE feature that produces isolated value.
Decomposing Confidence for AI
Use this three-part confidence assessment for every AI feature:
- Technical Confidence (T): Based on existing research, data availability, and model capability benchmarks. Score 0-100%.
- Product Confidence (P): Based on user research, comparable AI features in market, and your understanding of user trust in AI outputs in this domain. Score 0-100%.
- Business Confidence (B): Based on analysis of what accuracy threshold creates value - if we need 95% accuracy but can only achieve 80%, business confidence should reflect that gap. Score 0-100%.
Combined confidence = T x P x B. This is multiplicative, not additive, because all three need to be true for the feature to succeed.
A clinical AI feature I worked on had T=90% (the ML was well-understood), P=60% (physicians were skeptical of AI recommendations), and B=40% (the accuracy we could achieve wasn't good enough for clinical decision support without human review). Combined confidence: 22%. RICE would have missed this entirely.
The Dependency Map: Sequencing for Compounding Value
Standard RICE treats features as independent. AI features often aren't. Before scoring, map your feature backlog into a dependency graph:
- Which features require labeled data that another feature will generate?
- Which features reuse embeddings, fine-tuned models, or retrieval infrastructure built for something else?
- Which features become possible only after you've learned from user behavior on an earlier feature?
Features that enable downstream capabilities should be prioritized earlier than their individual RICE score suggests. You're not just building one feature - you're building the platform that makes future features cheaper and faster.
Accuracy Thresholds as Acceptance Criteria
Every AI feature on your roadmap should have an explicit accuracy threshold in its definition. Not "we'll improve accuracy" but "this feature ships when precision is above 85% and recall is above 78% on our held-out test set, validated on the customer's own data."
This threshold should feed directly into your RICE confidence score. If your current model is at 70% precision and you need 85%, your technical confidence should reflect the gap - not assume the jump will happen.
Quarterly Recalibration Ritual
AI roadmaps go stale faster than traditional software roadmaps. Model capabilities improve, new foundation models change what's possible, user trust shifts as AI becomes more familiar. I run a quarterly recalibration:
- Re-score every backlog item with updated technical confidence based on current model benchmarks
- Check whether accuracy thresholds we set 6 months ago still reflect business requirements
- Re-evaluate dependency maps - what new capabilities have we built that enable previously low-ranked features?
- Retire features that were blocked on model capability if that capability now exists off-the-shelf
Practical Prioritization Checklist
- [ ] Decomposed confidence into Technical, Product, and Business components
- [ ] Assigned Extractability multiplier with written justification
- [ ] Mapped dependencies - what does this enable and what does it require?
- [ ] Defined accuracy threshold as acceptance criteria
- [ ] Verified Effort estimate accounts for data labeling, model evaluation, and monitoring setup - not just engineering time
- [ ] Validated with at least one domain expert that the accuracy threshold is achievable given available data
My take
RICE is a starting point, not a complete system for AI products. The key adaptations: decompose confidence into three components, add an extractability multiplier to capture compounding value, map dependencies before ranking, and treat accuracy thresholds as hard acceptance criteria rather than aspirations. Run quarterly recalibration because the space moves faster than your annual planning cycle.