User research for AI products has the same goal as user research for any product: understand what users need, what they do, and what they think, so you can build something that works. Most of the methods transfer. Some important things are different enough to change how you run research.

I've run user research for AI products in clinical settings, enterprise automation, and consumer contexts. The differences that consistently matter enough to change methodology are: users can't easily articulate preferences for AI outputs they haven't experienced, trust is dynamic and changes with exposure, and prototype testing misses the critical AI-specific interaction patterns.

What Changes in AI User Research

Users Can't Tell You What They Want From AI They Haven't Seen

Standard discovery asks: what do you need? Users can tell you what they need in their current workflows. They can't reliably tell you what they want from an AI assistant in that same workflow - because they haven't experienced it and their mental model of what AI can and can't do is often inaccurate.

Ask users what problems they have, not what AI solutions they want. "I spend three hours per week synthesizing reports from five different systems" is a research finding you can design for. "I'd like an AI that summarizes my reports" is a solution specification from a user who doesn't know what's possible - often leading you toward the obvious solution rather than the right one.

Trust Is Dynamic, Not a Fixed Attribute

Standard research often treats attitudes as stable: users either trust AI or they don't, users either want AI assistance or they don't. Trust in AI is dynamic - it changes based on specific interactions, especially negative ones. A user who has a bad experience with an AI recommendation often becomes more skeptical of all recommendations for a period, regardless of quality.

This means trust questions in surveys are snapshot measurements, not stable trait measurements. When you see high trust scores in your research, verify whether those scores persist after exposure to realistic AI error rates - not just the model's best performance.

Prototype Testing Misses the Core AI Interaction

A static mockup can show what an AI recommendation looks like. It can't show how users respond when the recommendation is wrong, when the confidence level is low, or when the AI contradicts what the user was expecting. These are the interactions that determine whether users adopt or abandon the AI feature.

AI user research needs to include live AI outputs - which requires a working (or Wizard of Oz) model, not just a prototype.

Adapted Research Methods

Contextual Inquiry First

Watch users do their actual work before any AI involvement. Document:

  • Where they currently make decisions (explicit choice points)
  • Where they express uncertainty or seek additional information
  • What information sources they trust and why
  • What errors in their current workflow look like and how they handle them
  • Where they feel they're wasting time or operating below their capability

This gives you the baseline you'll compare AI performance against, the decision points where AI could add value, and the error handling mental models that will shape how users respond to AI errors.

AI-Assisted Think-Aloud Sessions

The standard think-aloud protocol (ask users to verbalize their thinking while performing tasks) adapts well for AI, but requires live AI outputs. Run users through their actual workflow with the AI feature enabled and ask them to verbalize:

  • What they notice about the AI output
  • Whether it matches their expectation
  • What they would do with the recommendation
  • What would make them trust it more or less

Include deliberate errors in the session: show users cases where the AI is wrong. This is the most valuable part of the research - user behavior after an AI error predicts adoption and abandonment more than user behavior when the AI is correct.

Trust Calibration Interviews

After a period of real usage (not a one-time session), interview users about their trust calibration:

  • "When do you trust the AI recommendation without checking further?"
  • "When do you always verify before acting on the recommendation?"
  • "Can you give me an example of when the AI was wrong? How did that affect your behavior afterward?"
  • "If you had to explain to a colleague when to trust this AI and when to double-check, what would you say?"

Users who have developed accurate trust calibration (trusting high-confidence outputs, verifying low-confidence ones) are the model for your target user experience. Users who are over-trusting or under-trusting are the risk cases your design needs to address.

Diary Studies for Long-Running AI Features

For AI features embedded in ongoing workflows (not single-session tasks), diary studies capture how usage and trust change over time better than point-in-time research. Ask users to log:

  • Daily: how many times they used the AI feature and for what
  • When notable: cases where the AI was helpful, cases where it was wrong, behavioral changes they made in response
  • Weekly: overall confidence in the AI this week vs last week, and why it changed

Diary studies run 2-4 weeks give you a view of trust dynamics that no single interview can capture.

Specific Research Questions for AI Products

Add these to your standard research when building AI features:

  • How does the user's existing trust in AI tools (or distrust) affect their initial engagement with this feature?
  • What information does the user need to see alongside the AI recommendation to make a good decision about whether to act on it?
  • How do users respond when the AI is wrong? Do they adapt, abandon, or escalate?
  • Are there user segments whose workflows are fundamentally incompatible with AI assistance in this area?
  • What language do users use to describe AI failure? (This is the vocabulary you need for error messaging.)
  • Would users trust AI in this specific domain even if they trust AI generally?

What Doesn't Change

Standard user research fundamentals still apply:

  • Talk to real users in their actual environment, not proxies or hypothetical users
  • Use behavioral evidence over stated preferences wherever possible
  • Recruit across the distribution of users, not just the enthusiasts
  • Test with your real target population - don't let demographics drift in recruiting
  • Synthesize across multiple research inputs before drawing conclusions

What matters here

Focus discovery on user problems, not AI solution preferences. Treat trust as dynamic - measure it after exposure to realistic error rates, not just best-case performance. Use live AI outputs in research sessions, not static prototypes, to capture the critical trust-formation interactions. Add trust calibration interviews after a period of real usage. Run diary studies for ongoing AI features to capture how trust changes over time. Build your research program around: where does AI add value in this workflow, and what does trust formation look like for this user population.


More on this