If 2023 was the year everyone discovered LLMs could write surprisingly good text, and 2024 was the year enterprises started asking "but how do we actually use this?", then these are the years agentic AI moved from research paper to production system.

I was at HIMSS26 earlier this year, and the shift was unmistakable. Three years ago, the healthcare AI conversation was about predictive models - will this patient be readmitted? Two years ago, it was about generative models - can AI write my clinical note? This year, the conversation was almost entirely about agents.

Epic announced Agent Factory, their platform for building and deploying clinical AI agents across their EHR workflows. Amazon Health AI demonstrated agents that autonomously managed prior authorization workflows. Microsoft Copilot Health showed agents coordinating care transitions across multiple provider systems. Salesforce Agentforce Health launched with pre-built agents for patient engagement, revenue cycle, and care management.

The pattern is clear. Agentic AI is no longer a research concept. It is the product.


What Actually Makes an AI System "Agentic"?

The word "agent" gets applied to almost everything right now, so let me be precise. An agentic AI system has four defining characteristics:

  1. Goal-directedness: The system is given an objective, not just a prompt. It works toward that objective across multiple steps.
  2. Tool use: The system can call external tools - APIs, databases, browsers, code executors - to gather information or take actions.
  3. Planning: The system can decompose a complex goal into subtasks and sequence those subtasks appropriately.
  4. Feedback incorporation: The system can observe the results of its actions and adjust its plan accordingly.

A chatbot that answers questions is not an agent. A system that receives a goal like "schedule a follow-up appointment for this patient, verify their insurance, send a reminder, and update the care plan" - and executes all of that autonomously - is an agent.

The distinction matters enormously for product design, risk assessment, and infrastructure requirements.

The Architecture Stack

Agentic AI systems have a recognizable architecture regardless of the domain they operate in. Understanding this stack helps PMs have more productive conversations with engineering and make better build vs buy decisions.

The Core LLM (The Brain)

At the center is a large language model that handles reasoning, planning, and natural language understanding. In production systems, this is typically GPT-4o, Claude 3.5/3.7, or Gemini 1.5/2.0 - models with strong instruction-following, tool-calling capabilities, and large context windows.

The model doesn't need to know everything. Its job is to reason about what it knows, decide what it needs to find out, and choose which tools to call.

The Tool Layer

Tools are the hands of the agent. Common tool categories:

  • Retrieval tools: Vector search, SQL queries, document retrieval (this is where RAG lives)
  • Action tools: API calls, form submissions, database writes, email sends
  • Computation tools: Code execution, calculators, data transformation
  • Perception tools: Web browsing, document parsing, image analysis

Memory

Agents need memory across multiple types of time horizons:

  • In-context memory: The current conversation and task state (limited by context window)
  • External memory: Vector databases, key-value stores, structured databases that persist beyond a single session
  • Procedural memory: The agent's "skills" - fine-tuned behaviors, system prompts, few-shot examples

Orchestration

For complex workflows, you often need a meta-agent (orchestrator) that manages multiple specialized sub-agents. The orchestrator breaks down the goal, assigns sub-tasks, monitors progress, and handles failures.

This is exactly what Epic's Agent Factory does - it provides an orchestration layer so that a "schedule and authorize" workflow can spawn a scheduling agent, an insurance verification agent, and a notification agent in parallel, then consolidate results.

Use Cases Across Industries

The architecture is industry-agnostic. The applications differ by domain context and risk tolerance.

Healthcare

Prior authorization automation, clinical documentation, care coordination, patient triage, medication reconciliation. High regulatory complexity means agents must operate in human-in-the-loop mode for decision-making tasks. The agent handles the 80% of routine workflow; the clinician reviews and approves.

Financial Services

Loan underwriting workflows, fraud investigation, portfolio rebalancing, customer onboarding, regulatory reporting. Agents here face a different risk profile - speed and accuracy on financial decisions, auditability for regulators, and real-time action in markets where latency costs money.

Legal

Contract review, due diligence, regulatory compliance monitoring, document discovery. Legal agents are currently in the "augmentation" phase - they surface relevant precedents and flag issues, but attorneys make the calls. Full autonomy in legal contexts is years away.

Retail and E-Commerce

Inventory management, dynamic pricing, returns processing, supplier negotiation, customer service resolution. Lower stakes per transaction means higher autonomy is acceptable. Shopify's AI is already handling a significant portion of customer service tickets autonomously.

Education

Adaptive tutoring, curriculum generation, assessment design, student support. Agents here are particularly interesting because the "action" of good teaching is deeply contextual - the agent needs to understand not just the content but the student's emotional state, learning history, and motivation level.

The Agentic Risk Spectrum

Not all agentic actions carry the same risk. I map them on two axes: reversibility and consequence magnitude.

  • Low consequence + reversible: Draft an email, summarize a document, generate a report. Full autonomy is fine.
  • Low consequence + irreversible: Send an email, post content, create a calendar event. Autonomy with notification is acceptable.
  • High consequence + reversible: Execute a trade, submit a form, update a database record. Human review recommended.
  • High consequence + irreversible: Administer medication instructions, make a lending decision, delete records. Human approval required.

Product managers building agentic systems must explicitly map every action the agent can take onto this grid. The grid determines your human-in-the-loop architecture.

The HIMSS26 pattern: Every major clinical AI agent demonstrated at HIMSS26 - without exception - had a human review step for high-consequence actions. The innovation wasn't removing the human. It was reducing the human's cognitive load from reading, deciding, and executing to simply reviewing and approving. That's the right place to start.

What PMs Get Wrong About Agentic AI

Mistake 1: Treating agents like features

An agent is a system, not a feature. It requires ongoing monitoring, error handling, rollback capabilities, and audit logging. The product surface area is much larger than a single-turn LLM feature. Staff your team accordingly.

Mistake 2: Under-investing in tool reliability

Agents are only as reliable as their tools. If your scheduling API is flaky, your scheduling agent will be flaky. Before building agent orchestration, audit your tool layer for reliability, latency, and error behavior. Fix the tools first.

Mistake 3: No graceful degradation

What happens when the agent fails mid-task? Does it leave the system in a consistent state? Does it notify the right human? Does it log enough context for a human to pick up where it left off? Design the failure path as carefully as the success path.

Mistake 4: Prompt injection blindness

Agents that process external content - emails, documents, web pages - are vulnerable to prompt injection attacks, where malicious content instructs the agent to take unintended actions. This isn't hypothetical; it has happened in production systems. Sanitize external inputs before they reach the LLM.

How to Start Building

My recommended progression for teams new to agentic AI:

  1. Start with a well-defined, bounded workflow. Don't start with "automate everything." Start with one specific workflow that has clear inputs, outputs, and success criteria.
  2. Map the tool layer first. Before writing any agent code, list every tool the agent needs and verify each one is reliable and well-documented.
  3. Build human-in-the-loop from day one. Start with every consequential action requiring human approval. Remove the human only when you have data showing the agent is reliable enough.
  4. Instrument everything. Log every agent action, tool call, and decision. You will need this data to debug failures and build trust with stakeholders.
  5. Run shadow mode before live mode. Let the agent run in parallel with the existing workflow for 2-4 weeks. Compare outcomes. Build confidence before giving it the wheel.

Agentic AI is not the distant future. It's HIMSS26 hallways, it's Shopify support queues, it's loan processing at regional banks. The question for product managers isn't whether agentic AI is coming to your industry - it's whether you'll be the team that shapes how it arrives or the team that inherits someone else's decisions.

Start building. Build carefully. But start now.


More on this