The MVP concept for software is clean: find the riskiest assumption, build the smallest thing that tests it, learn, iterate. The logic holds because a basic version of the feature is still the feature. A text editor without spell-check is still a text editor.
AI products have a different problem. The model quality is constitutive of the product, not an enhancement. A clinical decision support tool that's right 60% of the time isn't a minimal version of one that's right 90% of the time - it's a tool that might create patient harm. A recommendation engine that's mostly wrong won't teach you about user adoption; it'll teach you that users don't trust wrong recommendations, which you already know.
Defining MVP for AI products requires reconfiguring the framework, not abandoning it.
The Two-Dimension MVP for AI
AI product MVP has two independent dimensions that need separate definition:
Dimension 1: Minimum Viable Accuracy
What is the lowest accuracy level at which the AI creates net positive value for users? Below this threshold, the AI is actively harmful (worse than the baseline experience), and shipping it doesn't test user adoption - it tests user tolerance for bad AI.
Define minimum viable accuracy by mapping the user decision it supports and the consequences of different error types:
- What does the user do when the AI is correct? What does the user do when it's wrong?
- What's the cost of false positives vs false negatives in user terms (not model terms)?
- Is there a human check in the workflow that catches AI errors before they become user-visible problems?
- What was the baseline error rate before the AI? (A 70% accurate AI is better than a 50% baseline; it's worse than a 90% baseline.)
For most enterprise AI use cases, minimum viable accuracy is somewhere between 75% and 90%, depending on stakes and oversight. For high-stakes, low-oversight contexts (autonomous decisions, no human review), it's higher. For low-stakes, high-oversight contexts (suggestions that always go through human review), it can be lower.
Dimension 2: Minimum Viable Scope
This is the traditional MVP dimension applied to AI: what's the smallest slice of the problem that the AI can handle at minimum viable accuracy?
AI products often become viable by scoping down - not reducing quality, but reducing the problem space to one where high quality is achievable with available data and capability.
A clinical documentation AI that covers all specialties at 70% accuracy is below minimum viable accuracy. The same AI covering only cardiology documentation at 88% accuracy is at minimum viable accuracy for a more constrained scope. The MVP is the cardiology-only version - it tests the riskiest assumptions (will physicians use AI-assisted documentation?) without creating harm from low accuracy.
The Riskiest Assumption Framework for AI
The classic MVP asks: what's the riskiest assumption? For AI products, the riskiest assumptions are usually:
- Technical feasibility: Can we achieve minimum viable accuracy on this problem with available data?
- User behavior: Will users engage with AI assistance in this workflow? Will they trust and act on AI recommendations?
- Workflow integration: Can the AI feature be embedded in the existing workflow without requiring behavioral changes users won't make?
- Business value: Does the improvement the AI delivers create enough value to justify the cost and complexity?
These assumptions have different MVPs. Testing technical feasibility requires a working model on representative data. Testing user behavior requires a real deployment with real users, not a prototype. Testing workflow integration requires building the integration, not just the model.
Identify which assumption is riskiest for your specific product, and design the MVP to test that one specifically.
Wizard of Oz: The AI MVP That Isn't AI
One underused MVP approach for AI products: fake the AI with humans first.
Build the user-facing experience exactly as designed - the interface, the workflow, the output format. But instead of a model producing the outputs, have humans produce them. Ship that to a small set of users.
This tests the user behavior and workflow integration assumptions without requiring a working model. If users don't engage with AI-assisted outputs even when those outputs are perfect (human-generated), you've learned something important before investing in model development. If users engage enthusiastically, you've de-risked the product side and can invest in the model with more confidence.
Wizard of Oz doesn't test technical feasibility (assumption 1) - you'll still need to build the model. But it lets you sequence the riskiest product assumptions before the riskiest technical assumptions if user behavior is more uncertain than model capability.
The Scope Decision Matrix
When defining MVP scope for AI, use this matrix to find the intersection of "achievable accuracy" and "sufficient value":
| Scope Option | Achievable Accuracy | Value at This Scope | MVP Candidate? |
|---|---|---|---|
| Full problem space | 65% | High (if it works) | No - below accuracy threshold |
| High-volume, standard cases only | 88% | Medium (covers 60% of cases) | Yes |
| Specific document type only | 93% | Low (covers 20% of cases) | Possibly - depends on value of that 20% |
| Single-question answering only | 91% | Medium (useful for common questions) | Yes - clean learning vehicle |
The MVP is the scope where you can achieve minimum viable accuracy and where the value at that scope is sufficient to generate real learning about whether users will adopt and whether the product creates business value.
What MVP Is Not for AI Products
- Not: a working notebook demo with clean data
- Not: a model that achieves target accuracy on the test set but hasn't been deployed
- Not: a UI prototype without a working model
- Not: a POC that worked for one customer's data but hasn't been tested on another
An AI MVP is in production with real users generating real interactions that you can learn from. It has monitoring so you can see what's happening. It has a feedback mechanism so users can signal when the AI is wrong. And it achieves minimum viable accuracy on its defined scope.
What matters here
Define minimum viable accuracy before defining scope - it's the floor below which you're creating harm rather than learning. Define scope to find the sub-problem where you can achieve minimum viable accuracy with available data. Use Wizard of Oz to test user behavior and workflow assumptions before model capability if the former is more uncertain. An AI MVP is in production with real users, real monitoring, and real feedback mechanisms - not a demo or a test set result.