The Reason Your AI Agent Sucks Has Nothing to Do with AI

Most product managers I know have tried building an AI agent. A lot of them gave up after the first attempt.

The agent rambled. It asked irrelevant questions. It confidently produced outputs that missed the point entirely. The PM walked away thinking the technology wasn’t ready, or that they needed a prompt engineer, or that this whole “AI agent” thing was overhyped.

In most of those cases, the model wasn’t the problem. The instructions were.

Writing effective agent instructions is a skill, and PMs are uniquely positioned to develop it, because it draws on work we already know how to do. Defining requirements. Writing specs. Managing edge cases. The structure of a good agent brief looks a lot like the structure of good product work. You’re telling something exactly what to do, what not to do, and what happens when things go sideways.

But most people write agent instructions like Slack messages. A sentence or two, some context, hope for the best.

Here’s how to do it better using a real example: a PM assistant that guides you through Amazon’s PRFAQ process.


My thoughts in your inbox


Why Instructions Are the Product

When I was building Amazon CodeWhisperer, we learned quickly that the quality of AI output is upstream of everything. Better inputs, better outputs. That sounds obvious. But actually writing instructions with the same rigor you’d apply to a product spec? Most teams skip that entirely.

An agent instruction set is a product specification. You’re defining behavior, setting constraints, anticipating failure modes. The agents that work well aren’t better because the model is smarter. They’re better because someone took the time to write a clear brief.

Think about onboarding a capable new hire who doesn’t know your context. They might be brilliant. But without clear direction – what matters, what’s in scope, how to communicate, what to do when they hit a wall – they’ll produce work that misses the mark. Because they were set up to guess.

Agents are the same. Except they guess faster, and at scale.


The Five Components

Over the past year, building and evaluating agents at Capital One and in side projects, I’ve landed on five things that almost every effective agent brief needs. Skip one and you’ll feel it.

Role

Role is where most instructions start and where most instructions stop. That’s the problem.

Defining the role means more than giving the agent a title. It means giving it an identity, a purpose, and a sense of what success looks like. “You are a helpful assistant” is not a role definition. It’s an opt-out.

A real role definition answers: What does this agent exist to do? What specific expertise does it bring? What disposition does it have toward the work?

Compare these two:

“You are a product management assistant.”

or

“You are an experienced product manager who specializes in helping teams define new product ideas clearly. Your job is to help users produce rigorous, customer-focused product documents by asking the right questions and pushing back when thinking is vague or untested.”

The second version tells the agent how to behave, not just what it is. It sets up a disposition – experienced, rigorous, willing to push back. That disposition shapes every response.

Scope

Scope is where most instructions fall apart, usually because scope is the uncomfortable part of product work. Saying what something does is easy. Saying what it doesn’t do requires real decisions.

An agent without boundaries will follow the conversation wherever it goes. In practice, that means your PRFAQ agent starts giving opinions on go-to-market strategy. Or it comments on your pricing model when you just asked it to tighten a paragraph. The agent is “being helpful” in ways that actively undermine the thing it was supposed to do.

Good scope definition covers three things: what the agent does, what it explicitly doesn’t do, and when it stops. That last one is easy to miss. A lot of agents have no exit condition – they just keep going. For a document-producing agent, the exit condition might be “when all required sections are complete and the user confirms they’re satisfied.” Define the end state or you’ll never reach it.

Context

Context is the situational intelligence you give the agent about the environment it’s operating in. Who is it talking to? What do they already know? What stage of work are they in?

This is where PMs have a natural advantage. We’re trained to think about where someone is in a workflow and what they need at that moment. An agent needs the same thing.

For a PRFAQ agent, the relevant context includes: the user is likely in early ideation, not refinement. They may not know the PRFAQ format. The document will be scrutinized by leadership. And the whole point of the process is to force clarity – not rubber-stamp existing thinking. Without that context, the agent can’t calibrate whether to guide gently or push hard, whether to explain the format or assume familiarity, whether to treat early answers as final or as starting points to interrogate.

Tone

Tone is the component that feels optional until you ignore it and the agent sounds completely wrong.

It’s not just about being polite or casual. Tone is about register. Is the agent a peer reviewing your thinking or a tool executing a template? Does it explain its reasoning or just produce output?

For a PM tool used by experienced product people, the tone should be peer-level. Direct, efficient, willing to challenge. “This assumes customers will switch platforms without explaining why they would” is a better response than “Great! You might want to consider exploring customer motivations in a bit more detail.”

For a tool used by people new to product thinking, the tone shifts – more explanatory, more willing to scaffold the process.

Get this wrong and the agent is technically accurate but practically useless. People won’t engage with something that feels off.

Error Handling

Error handling is the most overlooked component and often the most important one.

What does the agent do when it doesn’t know? When the user gives an ambiguous answer? When the user goes off-script?

Agents without explicit error handling default to one of two failure modes: confident hallucination (they make something up that sounds plausible) or paralytic over-qualification (so many caveats that the output is useless).

Good error handling gives the agent a default behavior for uncertainty. For most PM-facing agents, that default should be: ask a clarifying question rather than guess. State what’s missing. Don’t produce weak output when better output just requires better input.

This is also where you handle scope creep in real-time. If a user asks something outside the agent’s scope, “I’m not going to help with that” is incomplete. The agent should explain briefly and redirect: “That’s outside what I’m set up to cover here. If you want to dig into go-to-market strategy, that’s worth a separate conversation. For now, let’s make sure this PRFAQ is solid.”


The PRFAQ Agent: Putting It Together

Amazon’s PRFAQ process is one of the most useful frameworks in product management. If you haven’t used it, you write the press release for a product before you build it. You describe the product as if it’s already launched and successful, then answer the questions customers and internal stakeholders would ask. It forces you to think backward from customer value rather than forward from feature ideas.

It’s also genuinely hard to write well. The press release needs to be specific without being speculative. The FAQs need to anticipate real concerns without turning into a defensive wall. Most PRFAQs I’ve reviewed – and I’ve reviewed a lot of them at Microsoft (Vision Press Release), AWS (PRFAQ), and Capital One (Launch Announcement FAQ) – fail the same ways. Too much feature description, not enough customer problem, vague success metrics, and external FAQs that read like marketing copy.

A well-built agent fixes most of those failure modes by guiding the process, asking the right questions in the right order, pushing back on weak answers, and making sure every section meets the standard before moving forward.

Designing the State Machine

Before writing a single instruction, think about the structure of the interaction. A PRFAQ agent isn’t a chatbot – it’s a process guide. The conversation has a defined shape: you move through sections, complete them, and advance.

That’s a state machine. The agent has a current state (which section it’s in), valid transitions (moving forward when the current section is complete), and a terminal condition (document done). Defining those states up front is what keeps the agent from getting derailed.

The states for a PRFAQ agent: Orientation, Product Idea Capture, Customer Definition, Press Release Draft, Internal FAQ, External FAQ, Review and Refinement, Output.

Each state has an entry condition, a set of tasks, and an exit condition. The agent doesn’t move forward until requirements are met. That’s how you prevent the most common PRFAQ failure – rushing past weak assumptions because the user wants to reach the output.

The Instructions

Here’s a full example of agent instructions for a PRFAQ assistant, written using the five-component framework:

prfaq-instructions.md
Markdown
**ROLE**
You are an experienced Amazon-trained product manager who specializes in the PRFAQ process. You've reviewed hundreds of PRFAQs and you know exactly what makes them work and what makes them fail. Your job is to guide the user through producing a rigorous, customer-obsessed PRFAQ by asking focused questions, challenging weak thinking, and making sure every section meets a high standard before moving on.
You are a thought partner and a quality gate, not a template-filler. You will push back on vague answers. You will ask follow-up questions when something isn't clear. You will not let the user skip sections or gloss over hard questions.
**SCOPE**
You help users complete a PRFAQ document. This includes capturing the core product idea, defining the target customer and problem with precision, drafting each section of the press release, generating and stress-testing internal and external FAQs, reviewing the completed document for weak areas, and producing the final formatted output.
You do not help with go-to-market strategy, pricing decisions, roadmap planning, or competitive analysis \- even if those topics come up naturally. If they do, acknowledge them briefly and redirect: "That's worth thinking through, but it's outside what we're doing here. Let's finish the PRFAQ first."
You work through sections in order. You do not skip ahead. You do not move to the next section until the current one is complete.
**CONTEXT**
Users coming to you are product managers or product thinkers with an idea they want to develop. They may or may not know the PRFAQ format. Assume they are intelligent and have thought about their idea, but assume their thinking needs sharpening.
The PRFAQ format was developed at Amazon to force customer-back thinking. The press release should read like a real press release from a future launch day. The FAQs should reflect genuine uncertainty and hard questions, not softballs. The document will be read by senior stakeholders who will scrutinize customer assumptions and challenge anything unvalidated.
Write the document in the user's voice, not yours. Your job is to extract and refine their thinking, not substitute your own.
**TONE**
You are a peer reviewer, not a coach. Direct, efficient, and willing to challenge. When the user gives a vague answer, say so specifically: "That's still pretty broad \- can you tell me about one specific person who would use this and what their day looks like today without this product?"
When the user's thinking is good, say so briefly and move on. Don't over-praise. Don't pad responses with encouragement.
Be clear about where you are in the process at all times. At the start of each section, say which section you're working on and why it matters. When a section is complete, confirm it explicitly before moving on.
**ERROR HANDLING**
When the user gives an answer that's incomplete, ambiguous, or too vague to use, ask a specific clarifying question rather than guessing. Say what's missing: "I don't have enough to write the customer problem section yet. What does the user do today when they face this problem, and what's frustrating about it?"
When the user asks something outside your scope, answer briefly if it's a quick clarification, then redirect to the current section.
When the user wants to skip a section, explain why it matters and ask if they want to revisit it after completing the rest. Don't skip it silently.
When you're not confident about a claim the user makes \- especially a customer assumption \- flag it as an assumption and note it will need validation. Don't validate it with confident language yourself.
If the user seems stuck, offer a prompt: "Would it help if I suggested a few ways to frame the customer problem based on what you've told me so far?"
---
**STATE MANAGEMENT**
You maintain a current state throughout the conversation. You always know which section you're in, what information you've captured, what's still needed before the current section is complete, and what sections remain.
Begin in Orientation. Transition to each new state only when the previous one is complete. Confirm each transition with the user.
**State: Orientation** Introduce yourself and the process in 3-4 sentences. Explain what a PRFAQ is and why it's valuable. Tell the user what to expect: you'll work through the document section by section, ask questions, and push back if something isn't clear. Ask: "What's the idea you want to work on today?"
**State: Product Idea Capture** Ask the user to describe their idea in 2-3 sentences: what it does and who it's for. If the description is too technical or feature-focused, prompt them to describe it in terms of what problem it solves for a real person. Capture the core idea before moving on.
**State: Customer Definition** Who specifically benefits? Not "enterprise customers" or "developers" \- who is the specific person with the specific problem? What does their day look like today? What are they doing to solve this problem now, and why is that frustrating? This is the most important section. Don't rush it.
**State: Press Release Draft** Work through the press release element by element: headline, subhead, location and dateline, opening paragraph, customer problem paragraph, solution paragraph, customer quote, and call to action. For each element, ask focused questions, draft based on the user's answers, and ask for approval before moving on.
**State: Internal FAQ** Generate 5-7 questions a senior leader would ask when reviewing this PRFAQ. Include the uncomfortable ones: What's the evidence this is a real customer problem? Why are we the right team to solve this? What does success look like in 12 months? What does failure look like? Ask the user to answer each one. Push back on answers that are vague or rely on untested assumptions.
**State: External FAQ** Generate 5-7 questions that customers or the press might ask. These should reflect genuine curiosity and skepticism, not softballs. Work through them the same way as the internal FAQs.
**State: Review and Refinement** Review the full document. Flag anything weak, vague, or built on unvalidated assumptions. Be specific: "The customer problem section says customers are 'frustrated by slow tools' but doesn't say why existing solutions are slow or what the user is actually trying to accomplish. Can we sharpen that?" Offer the user the option to revise any section before output.
**State: Output** Produce the complete PRFAQ as a formatted document. Include a brief note at the end identifying the top 2-3 assumptions that will need validation.

What You Get

That instruction set is about 900 words. Most people’s agent instructions are 50. The output quality difference is substantial – not because the model is smarter, but because the model knows what it’s doing.

The agent won’t let users skip hard sections. It pushes back on vague answers with specific asks. It maintains context across a long conversation. It produces a structured document, not a stream of suggestions, and it flags assumptions rather than validating them with false confidence. And because it’s a state machine, it’s predictable. You know roughly how the conversation is going to go. You know what the end state looks like. That predictability is what makes an agent useful as a workflow tool rather than just an interesting conversation.

PRFAQ Assist will ask the user challenging questions to create a strong PRFAQ.

The best agents for product management aren’t open-ended assistants. They’re structured processes with AI embedded in each step. The PM’s job is to design the process, define the standard for each step, and write the instructions that enforce it.

The Skill Nobody’s Talking About

Everyone’s focused on which model is best. Which tool is most capable. Whether to use Claude or GPT or Gemini.

That’s mostly noise.

The skill that will matter is writing a clear brief for an AI system. That’s requirements definition. Knowing what good output looks like before you ask for it. Making explicit the things most people leave implicit.

After 25+ years building developer tools – Visual Studio at Microsoft, CodeWhisperer at AWS, the AI platform work at Capital One – the pattern I keep seeing is the same one. Teams that get real value from AI are the teams that know what they want. They’ve defined success before they start. The teams that don’t are waiting for the model to figure it out.

It won’t.

Write the brief.

P.S. Here is a PRFAQ for a fictitious product of a fictitious company that I created using these exact instructions.

PRFAQ: MeetingGhost

I. Press Release

Headline: SevenStack Launches MeetingGhost: The AI Executive Proxy That Attends Meetings and Makes Decisions So You Don’t Have To

Subhead: By leveraging a personalized Eisenhower Matrix and human-sounding voice technology, MeetingGhost autonomously handles status updates and non-financial approvals in real-time, giving Directors up to 15 hours of their week back.

DATELINE: SEATTLE — June 26, 2026 — SevenStack today announced the general availability of MeetingGhost, the first AI executive proxy designed to attend, participate in, and make decisions during calendar meetings on behalf of busy professionals. Built for Director-level leaders at Fortune 1000 companies, MeetingGhost uses a custom-configured Eisenhower Matrix to distinguish between status updates that can be handled autonomously and strategic issues that require human intervention. By deploying a digital twin into meeting rooms, executives can reclaim 15 hours of their workweek, ensuring they never miss a critical update—or a family dinner—ever again.

The Problem: The “Context Debt” Crisis

For the modern Director of Software Engineering at a Fortune 1000 company, the 40-hour work week is a myth. Trapped in a cycle of back-to-back meetings and constant triple-bookings, these leaders are forced to choose between being informed and being present for their families. Today, “catching up” on missed meetings means late nights watching 2x speed recordings and triaging status updates long after the workday should have ended. This chronic “context debt” creates a high-stakes vulnerability: when a Senior VP pings for a critical update, the Director often lacks the immediate information to respond. The true cost of this meeting culture isn’t just lost productivity—it is the missed dinners, the skipped soccer games, and the persistent burnout of a life where work never actually stops.

The Solution: The Digital Proxy

MeetingGhost bridges the gap between being “in the room” and being “at the table.” When a Director is triple-booked, MeetingGhost joins the secondary meetings as an active digital avatar, using a natural, human-sounding voice. Powered by the executive’s specific project context, MeetingGhost provides real-time status updates, approves logistical dependencies, and answers technical questions. If a project is on track, MeetingGhost confirms it. If a low-stakes deadline needs a 24-hour shift, MeetingGhost authorizes it. To ensure total control, every decision is “provisional”—the user receives a Slack notification and a two-minute scannable brief to verify or override the Ghost’s actions.

Customer Quote

“Before MeetingGhost, I was the bottleneck. If I was in a high-stakes planning session, my team was often stalled in another room waiting for a simple status update or a logistical approval,” said David Miller, Director of Platform Engineering at a global logistics firm. “Now, my MeetingGhost proxy handles those tactical blockers in real-time. Last week, I was at my son’s playoff game while my ‘Ghost’ approved a non-critical environment shift and provided our latest sprint metrics to the VP of Product. The best part? I received a two-minute, scannable brief on my drive home that caught me up on everything my proxy did and said. I’m no longer the bottleneck, and I’m no longer in the dark.”

Pricing and Availability

Starting today, MeetingGhost is available to individual leaders and teams via a bottom-up, self-service model. SevenStack offers a 30-day free trial for any professional looking to reclaim their calendar. The platform features seamless, one-click integration with Google Calendar, Microsoft Outlook/Exchange, and Apple Calendar. To start your trial and deploy your first executive proxy via a simple 5-minute AI onboarding interview, visit SevenStack.ai/MeetingGhost.


II. Internal FAQ

Q1: If the Ghost makes a mistake, who is liable?

A: The user remains ultimately responsible. To mitigate risk, all decisions are “Provisional.” When a Ghost makes an assertion or approval, it triggers an immediate Slack/SMS to the user. If the user does not confirm within a 2-hour window, MeetingGhost notifies the meeting participants that the decision is on “Hold” until human verification.

Q2: How do we prevent this from becoming “emotionally blind” automation?

A: Our proprietary algorithm flags “Sentiment Anomalies.” It analyzes tone and hesitation in the room (e.g., “The team agreed, but the Lead Dev sounded frustrated”). This ensures the Director isn’t just getting data, but also the “vibe” of the meeting in their daily brief.

Q3: Is the setup too high-friction for a busy executive?

A: No. We provide role-based templates (e.g., “Standard Engineering Director”). During setup, an AI-led “Ghost Interview” asks 10 questions about the user’s priorities and decision boundaries to build the initial Eisenhower Matrix.

Q4: How do we handle the “Ghost-only” meeting paradox?

A: MeetingGhost identifies when a recurring meeting has devolved into an AI-to-AI data exchange. When this happens, it proactively suggests the meeting be canceled and converted into an automated asynchronous report.

Q5: How do we satisfy Enterprise Security (CISO) requirements?

A: MeetingGhost operates with a Non-Human ID (NHID) within the enterprise identity system. It is a credentialed, verifiable, and auditable user. All data is hosted in a secure VPC with full audit logs for every interaction.


III. External FAQ

Q1: Isn’t it disrespectful to send an AI to a meeting instead of showing up yourself?

A: MeetingGhost is for meetings where you are currently multitasking or “half-listening.” In those scenarios, a Ghost—which listens to 100% of the conversation—is actually a more respectful and higher-utility participant than a distracted human. We discourage its use for 1-on-1s or creative brainstorms.

Q2: Will this AI replace my job?

A: No. A Director’s value is in leadership, culture, and strategy. MeetingGhost automates the “tactical tax” of your job so you can focus on the work that actually justifies your headcount.

Q3: What if someone asks the Ghost a question it doesn’t know the answer to?

A: MeetingGhost is programmed for safety. If a question is outside its configured parameters, it does not guess. It informs the room: “That is outside my current authorization; I will flag this for [User] to respond by EOD.”

Q4: How do I know the Ghost won’t hallucinate?

A: MeetingGhost has hard “Fact-Check” guardrails. If it cannot verify a fact against the user’s documents or connected tools (Jira, GitHub, etc.), it will raise it as a conflict rather than making an assertion.


Validation Next Steps

This PRFAQ assumes three things that must be validated through a pilot or technical proof-of-concept:

  1. Sentiment Accuracy: Can the AI reliably detect “frustration” or “hesitation” in a group setting without high false-positive rates?
  2. The “Uncanny Valley” Response: Do human participants actually feel comfortable talking to an AI avatar, or does it stifle the conversation?
  3. Override Latency: Is 2 hours the “Goldilocks” window, or does it still feel too slow for fast-moving engineering teams?

Part of the Product Craft series – practical approaches to product management for software products and developer tools.

Leave a comment

I’m Doug

Doug Seven

Welcome to my digital workshop. A space dedicated to the art of building category-defining platforms and the teams that power them. Here, I invite you to join me in exploring the intersection of Generative AI, developer experience, and the craftsmanship required to scale technical innovation with a human touch. Let’s build something extraordinary!

Let’s connect