Defining MVP for AI Products: Less Is Different

The MVP concept for software is clean: find the riskiest assumption, build the smallest thing that tests it, learn, iterate. The logic holds because a basic version of the feature is still the feature. A text editor without spell-check is still a text editor.

AI products have a different problem. The model quality is constitutive of the product, not an enhancement. A clinical decision support tool that's right 60% of the time isn't a minimal version of one that's right 90% of the time - it's a tool that might create patient harm. A recommendation engine that's mostly wrong won't teach you about user adoption; it'll teach you that users don't trust wrong recommendations, which you already know.

Defining MVP for AI products requires reconfiguring the framework, not abandoning it.

The Two-Dimension MVP for AI

AI product MVP has two independent dimensions that need separate definition:

Dimension 1: Minimum Viable Accuracy

What is the lowest accuracy level at which the AI creates net positive value for users? Below this threshold, the AI is actively harmful (worse than the baseline experience), and shipping it doesn't test user adoption - it tests user tolerance for bad AI.

Define minimum viable accuracy by mapping the user decision it supports and the consequences of different error types:

What does the user do when the AI is correct? What does the user do when it's wrong?
What's the cost of false positives vs false negatives in user terms (not model terms)?
Is there a human check in the workflow that catches AI errors before they become user-visible problems?
What was the baseline error rate before the AI? (A 70% accurate AI is better than a 50% baseline; it's worse than a 90% baseline.)

For most enterprise AI use cases, minimum viable accuracy is somewhere between 75% and 90%, depending on stakes and oversight. For high-stakes, low-oversight contexts (autonomous decisions, no human review), it's higher. For low-stakes, high-oversight contexts (suggestions that always go through human review), it can be lower.

Dimension 2: Minimum Viable Scope

This is the traditional MVP dimension applied to AI: what's the smallest slice of the problem that the AI can handle at minimum viable accuracy?

AI products often become viable by scoping down - not reducing quality, but reducing the problem space to one where high quality is achievable with available data and capability.

A clinical documentation AI that covers all specialties at 70% accuracy is below minimum viable accuracy. The same AI covering only cardiology documentation at 88% accuracy is at minimum viable accuracy for a more constrained scope. The MVP is the cardiology-only version - it tests the riskiest assumptions (will physicians use AI-assisted documentation?) without creating harm from low accuracy.

The Riskiest Assumption Framework for AI

The classic MVP asks: what's the riskiest assumption? For AI products, the riskiest assumptions are usually:

Technical feasibility: Can we achieve minimum viable accuracy on this problem with available data?
User behavior: Will users engage with AI assistance in this workflow? Will they trust and act on AI recommendations?
Workflow integration: Can the AI feature be embedded in the existing workflow without requiring behavioral changes users won't make?
Business value: Does the improvement the AI delivers create enough value to justify the cost and complexity?

These assumptions have different MVPs. Testing technical feasibility requires a working model on representative data. Testing user behavior requires a real deployment with real users, not a prototype. Testing workflow integration requires building the integration, not just the model.

Identify which assumption is riskiest for your specific product, and design the MVP to test that one specifically.

Wizard of Oz: The AI MVP That Isn't AI

One underused MVP approach for AI products: fake the AI with humans first.

Build the user-facing experience exactly as designed - the interface, the workflow, the output format. But instead of a model producing the outputs, have humans produce them. Ship that to a small set of users.

This tests the user behavior and workflow integration assumptions without requiring a working model. If users don't engage with AI-assisted outputs even when those outputs are perfect (human-generated), you've learned something important before investing in model development. If users engage enthusiastically, you've de-risked the product side and can invest in the model with more confidence.

Wizard of Oz doesn't test technical feasibility (assumption 1) - you'll still need to build the model. But it lets you sequence the riskiest product assumptions before the riskiest technical assumptions if user behavior is more uncertain than model capability.

The Scope Decision Matrix

When defining MVP scope for AI, use this matrix to find the intersection of "achievable accuracy" and "sufficient value":

Scope Option	Achievable Accuracy	Value at This Scope	MVP Candidate?
Full problem space	65%	High (if it works)	No - below accuracy threshold
High-volume, standard cases only	88%	Medium (covers 60% of cases)	Yes
Specific document type only	93%	Low (covers 20% of cases)	Possibly - depends on value of that 20%
Single-question answering only	91%	Medium (useful for common questions)	Yes - clean learning vehicle

The MVP is the scope where you can achieve minimum viable accuracy and where the value at that scope is sufficient to generate real learning about whether users will adopt and whether the product creates business value.

What MVP Is Not for AI Products

Not: a working notebook demo with clean data
Not: a model that achieves target accuracy on the test set but hasn't been deployed
Not: a UI prototype without a working model
Not: a POC that worked for one customer's data but hasn't been tested on another

An AI MVP is in production with real users generating real interactions that you can learn from. It has monitoring so you can see what's happening. It has a feedback mechanism so users can signal when the AI is wrong. And it achieves minimum viable accuracy on its defined scope.

What matters here

Define minimum viable accuracy before defining scope - it's the floor below which you're creating harm rather than learning. Define scope to find the sub-problem where you can achieve minimum viable accuracy with available data. Use Wizard of Oz to test user behavior and workflow assumptions before model capability if the former is more uncertain. An AI MVP is in production with real users, real monitoring, and real feedback mechanisms - not a demo or a test set result.

Defining MVP for AI Products: Less Is Different

The Two-Dimension MVP for AI

Dimension 1: Minimum Viable Accuracy

Dimension 2: Minimum Viable Scope

The Riskiest Assumption Framework for AI

Wizard of Oz: The AI MVP That Isn't AI

The Scope Decision Matrix

What MVP Is Not for AI Products

What matters here

More on this

Before you go

The Two-Dimension MVP for AI

Dimension 1: Minimum Viable Accuracy

Dimension 2: Minimum Viable Scope

The Riskiest Assumption Framework for AI

Wizard of Oz: The AI MVP That Isn't AI

The Scope Decision Matrix

What MVP Is Not for AI Products

What matters here

More on this

Keep reading

RICE Prioritization for Regulated Industries

The Agentic AI Stack: Architecture Patterns for Production Systems

Stakeholder Mapping Canvas for Enterprise AI Products