In my last post, I talked about the difference between launching and landing—how shipping code is just the beginning, and the real challenge is ensuring your product achieves its intended outcomes. But here’s the question that naturally follows: How do you know if you’ve landed successfully?
Most product managers jump straight to metrics. We instrument everything, build dashboards, and track numbers religiously. But here’s the trap: we often measure what’s easy to measure rather than what actually matters. We end up with vanity metrics that look impressive in slide decks but don’t tell us whether we’re delivering real value to users.
The Goals, Signals, and Metrics (GSM) framework solves this problem by forcing us to work backward from user outcomes to measurements. It’s simple in concept but requires real discipline in practice. And when done right, it becomes your telemetry system for knowing whether you’ve truly landed.
The Restaurant That Measured the Wrong Things
Imagine you’re opening a new restaurant. Your goal is simple: create a memorable dining experience that keeps guests coming back. So you instrument your business, tracking table turnover rates and average check sizes. Those numbers look great—you’re flipping tables quickly and upselling appetizers like crazy.
But three months later, you’re confused. Despite strong metrics, you’re not getting repeat customers. Reviews mention feeling rushed. Guests say they felt pressured to order more than they wanted.
What went wrong? You optimized for the wrong metrics. Table turnover and check size are easy to measure, but they don’t actually capture “memorable dining experience.” In fact, optimizing for them actively worked against your goal.
Here’s what the GSM framework would have revealed:
Goal: Guests have a memorable dining experience
Signals (what would that actually look like?):
- Guests are engaged in conversation, not checking their phones anxiously
- They try new dishes with enthusiasm rather than playing it safe
- They linger over dessert instead of rushing out
- They talk about bringing friends next time
Metrics (how we measure those signals):
- Return visit rate within 30 days
- Party size growth (solo diners becoming groups)
- Specific dish reorder rate (they loved it enough to get it again)
- Average time spent at table (within reasonable bounds)
- Review sentiment mentioning the experience, not just the food
Notice the difference? These metrics connect directly to behaviors that indicate success. They’re harder to game. If you optimize for these, you’re much more likely to achieve your actual goal.
Why This Framework Matters for Product Managers
The GSM progression forces intentionality. It prevents us from falling into the “measure what’s easy” trap and helps us identify when our metrics might actually work against our goals.
Think about an e-commerce checkout flow. A metric-first approach might say “reduce checkout time” and celebrate when users blast through in 30 seconds. But what if they’re rushing because the flow is confusing? What if faster checkout correlates with higher return rates because users didn’t review their orders carefully?
The GSM framework would start differently:
Goal: Users make confident purchases they’re happy with
Signal: Users review order details, understand shipping costs, feel informed about their decision
Metric: Completion rate, error rate during checkout, return rate, customer satisfaction scores, time spent on order review page
Now you’re measuring things that actually connect to successful outcomes. Speed might still matter, but only in service of confidence and satisfaction.
This matters because product managers live and die by metrics. We need them for stakeholder updates, prioritization decisions, and understanding impact. But the wrong metrics lead to the wrong decisions. They create a false sense of success while your product slowly fails to deliver real value.
Building Your Telemetry System
The GSM framework works as a three-step process, and the order matters.
Step 1: Define Clear Goals
Goals should describe user outcomes, not feature outputs. They should answer “what value are we creating for users?” not “what are we building?”
Good goals:
- “Developers find and fix security vulnerabilities before code reaches production”
- “Users discover relevant content without searching”
- “New team members become productive in their first week”
Bad goals:
- “Implement AI-powered code scanning”
- “Increase search traffic by 20%”
- “Build onboarding tutorial”
The bad examples aren’t wrong—they might be perfectly reasonable features. But they’re outputs, not outcomes. They describe what you’re doing, not what value you’re creating.
Step 2: Identify Observable Signals
Signals are the bridge between aspiration and measurement. They describe what you’d observe if users were actually experiencing the value you want to create.
This step requires imagination. Sit with your team and ask: “If our goal is being achieved, what would we see users doing? How would their behavior change?”
Here’s the key insight: signals often can’t be measured directly, and that’s okay. “Users feel confident in their code” isn’t directly measurable, but it’s a valid signal. The point is to articulate what success looks like before we worry about how to measure it.
Brainstorm multiple signals for each goal. Different signals reveal different dimensions of success. “Developers trust the tool” might show up as both “they adopt recommendations quickly” and “they configure it for more repos.”
Step 3: Choose Proxy Metrics
Now we get practical. For each signal, identify metrics that serve as proxies—imperfect but useful measurements that indicate whether the signal is present.
A few principles:
Use multiple metrics per signal. No single metric tells the whole story. Acceptance rate alone doesn’t tell you if developers trust your tool—they might accept suggestions blindly or out of pressure. Combine it with time-to-accept, modification rate, and outcome quality.
Balance leading and lagging indicators. Leading indicators (like PR merge rate) tell you quickly if something’s wrong. Lagging indicators (like production CVE reduction) tell you if you’re achieving real impact. You need both.
Watch for gaming. Any metric can be gamed. “PRs opened” is easily inflated. “User logins” can be automated. Choose metrics that are hard to manipulate without actually delivering value, and combine metrics that check each other.
Consider collection cost. Some metrics require expensive instrumentation or manual analysis. Start with high-value, low-effort metrics. Add more sophisticated measurements as you validate your approach.
Example: SevenStack’s AI Code Review Agent
Let me show you how this plays out with a real product example. At our fictional company SevenStack, we’ve built an AI Code Review agent. It’s an agentic AI that autonomously scans source code repositories looking for issues—code smells, security vulnerabilities (CVEs), deprecated patterns. When it finds something, it creates a feature branch, fixes the issue, and opens a pull request, just like a human developer would.
The naive metrics approach would track activity: PRs opened, issues found, lines of code changed. These are easy to measure and look impressive in reports. “Our AI agent opened 1,000 PRs last month!”
But stop and think: does 1,000 PRs mean success or spam? If developers are rejecting 98% of them, we’re not helping—we’re creating noise and wasting their time. If they’re auto-merging without review, we might be introducing new problems. The activity metrics don’t tell us whether we’re achieving anything meaningful.
Let’s apply GSM:
Goal: Developers ship secure, high-quality code with less manual effort
Notice this goal focuses on developer outcomes. We’re not trying to “maximize automated fixes” or “increase PR throughput.” We’re trying to help developers do their job better.
Signals (what would success look like?):
- Developers trust AI-generated fixes and merge them quickly without anxiety
- Security vulnerabilities get resolved before reaching production
- Developers spend less time on tedious fixes, more on feature work
- Code quality improves without adding to developer burden
These signals paint a picture. We’re looking for trust, impact, efficiency, and quality improvement. Notice how different this is from “AI agent is busy.”
Metrics (how we measure those signals):
For the trust signal:
- PR merge rate (what percentage of AI PRs get merged?)
- Time from PR creation to merge (are developers confident enough to merge quickly?)
- Modification rate (do PRs get merged as-is, or do devs rewrite them?)
- PR approval rate without changes requested
For the security impact signal:
- CVE resolution time (how quickly do vulnerabilities get fixed?)
- Reduction in security scan failures in production
- Mean time to remediation for critical issues
- Percentage of CVEs caught before production deployment
For the developer productivity signal:
- Estimated time saved (based on issue complexity and typical fix time)
- Reduction in manual code review comments about issues the AI could catch
- Developer survey scores on tool helpfulness
- Time spent reviewing AI PRs vs. reviewing human PRs for similar issues
For the quality without burden signal:
- Code quality scores trending over time (from linters, analyzers)
- Percentage of AI PRs merged without modification
- Developer time investment per merged fix
- Test coverage and pass rates on AI-modified code
Now look at what we’ve built. We’re measuring activity (PRs created), but only as context for effectiveness (PRs merged). We’re tracking leading indicators (merge rate) and lagging indicators (production CVE reduction). We’re checking whether we’re saving developer time or just shifting where they spend it.
Most importantly, notice what we’re NOT doing: we’re not celebrating 1,000 PRs opened as a success metric. If we have 1,000 PRs opened but only a 2% merge rate, our telemetry immediately tells us we’re off course. The agent is working hard but not delivering value.
This is the power of GSM. It helps you see the difference between activity and impact.
Landing Requires Telemetry
Remember the “7 minutes of terror” from the last post—that’s what landing looks like. You’ve launched your product into the market. Now comes the crucial period where you discover if it actually works.
Your GSM framework is your telemetry dashboard during those critical minutes. Without it, you’re flying blind. You might see “high engagement” or “lots of activity” and think you’ve succeeded, when actually you’re veering off course.
With proper telemetry, you know:
- Are users experiencing the value we intended? (Goals)
- Are we seeing the behaviors that indicate success? (Signals)
- What do the specific measurements tell us? (Metrics)
And critically, you can course-correct mid-flight. If your metrics show high activity but low trust (lots of PRs opened, few merged), you know to adjust your approach before you crater into the surface.
Making It Practical
Don’t try to boil the ocean. Start with one feature and one goal. Workshop the signals with your team and, crucially, with actual users. They’ll often identify signals you missed.
Instrument your metrics iteratively. Start with the easiest, highest-value measurements. Add more sophisticated instrumentation as you validate your approach and learn what matters most.
Review your GSM framework regularly—quarterly at minimum. Ask: Are these metrics still telling us what we need to know? Are we learning what we expected? What are we missing?
Be willing to evolve your metrics as you learn. The best frameworks adapt based on what you discover. If a metric stops being meaningful—maybe because you’ve fully solved that problem, or because you learned it doesn’t correlate with actual success—replace it with something more useful.
And watch for the common pitfall: falling in love with a metric. I’ve seen teams continue tracking something long after it stopped being relevant, just because it was familiar or looked good in reports. Your metrics serve your goals, not the other way around.
The Difference Between Launching and Landing
The GSM framework is simple: start with user outcomes, identify observable signals of success, then choose metrics that proxy for those signals. But simple doesn’t mean easy. It requires discipline to avoid jumping straight to convenient metrics. It requires honesty to admit when your measurements don’t actually connect to your goals.
The payoff is knowing whether you’ve truly landed. Not just whether you shipped code or generated activity, but whether you created real value for real users.
So here’s my challenge: look at the metrics you’re tracking today. For each one, trace it back. What signal does it represent? What goal does that signal serve? If you can’t make that connection clearly, you might be measuring the wrong things.
The difference between launching and landing is knowing whether you’ve achieved your objectives. The GSM framework gives you the telemetry to know for sure.
What metrics are you tracking that don’t connect to user outcomes? I’d love to hear about the vanity metrics you’ve eliminated or the signals you’ve discovered that really matter. Share your experiences in the comments.





Leave a comment