The way that Wheeler explains this is that you’ve got a voice of the customer (in this case, the goal) and a voice of the process. If the process in its current state can’t deliver what the customer expects, then you need to find a way to shift the process. If the process can deliver what the customer expects, but not consistently, then you can reduce the variation in the process. So goals tell you where you want to end up, but how you get there will depend on what the current process is capable of delivering.
As a sidenote. This blog itself is incredible.
Another hidden gem on the internet. High value (and, doesn’t seem to have a presence online elsewhere)
North Star Framework (NSF)
Two writers on whose work I stumbled due to Twitter were @cedric and John Cutler. This was back in 2020.
I have been putting the ideas that have been shared over here to practice ever since.
I am working on a vertical B2B SaaS product in a niche segment. My role is the IC PM who is responsible for delivering the overall product outcomes.
I did read the comment by Cedric about North star metric being the wrong metric.
Yet, I pursued to invest in the framework as it provided me a way to represent a causal process map of the product. It will be tested through correlations of the input metrics. The team having bought into the final NSM metric and framework gives me leverage to invest into process control charts for individual metrics of the framework.
As @minhthanh3145 points out. It is to provide the team a clear map of the land. Understand what work will move the buck. I can go so far as to say, decentralise the product development to each individual contributor.
I will have more to say post implementation of the XmR charts and the eventual WBR we do basing on this data.
I will venture to make a prediction: within 6-12 months of adopting the process control worldview and the WBR, your entire team and/or company will reject the North Star Metric framework.
This will be faster if — in the process of adopting the WBR — you discover certain things about the causal model of your business that do not match your original assumptions. If that occurs, then shortly after, you will realise that the causal model of your business is richer and more complex than you originally expected, and will begin to appreciate how many levers really do exist to pursue sometimes conflicting goals.
Anyway, let me know if my prediction comes true I have a very small sample right now (N=3), so I’m not super confident of this, but I’m reasonably confident to do this foolish thing of predicting in public!
Yes, here’s another phrasing from A3: Avoid Memos With An Agenda
Target
This is where we state what needs to change, in terms of outcomes, not methods. We should also write by when we expect certain things to happen. Think “specific, measurable, realistic, and time-bound”. For the website example, it could be as simple as “During the next six months we should reduce average page load time by 1.2 seconds.”
What I’m proposing here might seem a lot like a goal-setting exercise. If you have read my article on statistical process control, you may be surprised by that; don’t I equate goals with wishful thinking? Am I not strongly against that? Yes. When I speak of targets, I do it in the Allen Ward sense. Targets are un-ambitious and flexible; we should be fairly sure we’ll over-deliver on a target, and we should be willing to change to a different target if it seems meaningful, or trade off targets against each other.
Targets are a way to characterise the direction we are going in, they are not something measure progress against. Targets are just as much wishful thinking as goals, but with targets we acknowledge that and don’t pretend otherwise.
Targets are a way to characterise the direction we are going in, they are not something measure progress against.
Will revet back.
I am definitely not betting against your prediction.
This was me trying to get a non data oriented team started becoming more data centric. My intensions get masked and I got a nice cover in NFM.
The challenge as a PM for a growing product is that the growth will sustain even if we are focused only on maintenance for certain time. The additional S-curve growth functions are only probable with investing time and effort in important work.
Without a rigorous WBR I would be running blind on future development front. I have benefited from having WBR as a process when running a division with a no-code tool. Now, as an IC, I am beginning to navigate this through narrative building via NFM.
You can reframe NFM as the top of the funnel concept while all leads to WBR as the bottomline.
One of Demming’s most powerful demonstrations is when he shows how adjusting an input based on variation that is within the process limits actually makes the outputs more varied. Imagine a funnel that drops a ball within a certain radius. If you attempt to move the funnel in order to center the drop on a target, but you adjust to the variation within normal operational limits, the overall accuracy drops substantially.
If you haven’t seen this experiment in action, imagine that the funnel is centered and the ball ends up dropping to the north of the target. Assuming we have normal variation, it would be unlikely for the ball to drop to the north again. But, if we were reacting to each drop as if the data were meaningful, we would move the funnel to the south in order to compensate for the error. Moving the funnel to the south plus regression to the mean would result in the next drop being twice as likely to err towards the south, with a worst case scenario of being twice as far from the center as it would otherwise. It doesn’t take many repetitions to realize how much you have destabilized the process.
Going back to the question around this comment: “The gist of it is that most people don’t see variation as a problem.”
Is the challenge that they don’t see variation as a problem, or more that people don’t have an intuition / understanding that there should be a normal amount of variation? Because without that understanding it’s even harder to explain the concept of exceptional / beyond normal variation (which is what it’s really all about).
To me this is the biggest challenge I see with data analysis. Even very smart, experienced people; people that worked with all sorts of charts, reporting, analysis, etc, will still often react to the smallest change in trends as if it’s meaningful signal to explore as opposed to just normal variation.
I’ll second @cedric’s point that this is a great way to formulate this. This is one of the reasons for the exhortation to “go to gemba” in Lean Manufacturing. Yet another manifestation of this is trying to get people to generate a random sequence and showing how non-random it is compared to a truly random sequence. In all cases, our unawareness of reality is so vast that we are not even wrong (as the kids today say ).
The only way I’ve found to address this is to embrace the old notion that “the fool that persists in his folly will become wise”. Have people make predictions and then show them how those predictions compare to reality so they can become disabused of their notions. Here is an attempt to sketch out what that typical learning progression looks like:
- Prediction that the outputs will equal the target if we try hard enough (the default optimization worldview)
- Prediction that the outputs will change in 1:1 correspondence to a specific input if we try hard enough to change that input
- Prediction that the outputs will change in some formulaic correspondence to a set of inputs if we try hard enough to change them
- Prediction that the outputs are random and that it is a waste of time to worry about them
- Prediction that the outputs are non-random to a certain degree, but only based on inputs outside of our control so there is no point trying to do anything about them
- Prediction that the outputs are non-random and will fall within a certain range based on inputs, of which at least some are in our control (which is where the process worldview really starts and can be built on)
The main obstacle to this learning is that the desire to make things better leads to changes that interrupt this learning progression. @cedric’s observation that “You Aren’t Learning If You Don’t Close the Loops” says this much better than I can.
The answer is to create circumstances cheaply and quickly that make it obvious that the optimization worldview does not work. I suspect @cedric’s future essays on this topic will address this, and I look forward to seeing what he has learned.
I don’t interpret this as saying that individual engineers should each be separately held accountable for specific product outcomes. Instead, it is saying that the team as a whole should identify the outcomes their product delivers and identify the inputs within their control that seem to influence that. You’ve identified 2 hypotheses within the archetypes you mentioned:
- Teaching junior engineers about product development is less valuable than having them do $OTHER_DEVELOPMENT.
- Teaching senior engineers not interested in product development about product development does not provide any benefits.
The underpinning issue though, as you know, is that if it is difficult to define the outcome, then SPC as defined here isn’t really workable. I have an answer I use for those circumstances, but I suspect you’ll hate it!
If we can’t focus on a measurable outcome, the substitute we have to use is the HiPPO - the highest paid person’s opinion that cares about what we are doing. Use a 5 or 7 point Likert scale for how they feel about what we are delivering, track that over time, and start building a model for what input factors influence it.
Once you have a feel for that (and some confidence that your assessment of the HiPPO is accurate), you can then make more effective tradeoff decisions between what increases the HiPPO, what increases team satisfaction, and what create business value. Many people argue with me about this because it shouldn’t have to be this way, and they are right, but I’ve found this to be useful in my own work and when I help my clients with their work, so I hope it helps or inspires useful ideas on your own. Good luck.
I So … this is my interpretation of Graban’s book, but I’ll just post a couple of screenshots here and we can discuss interpretations. I’m … not entirely sure if this is the right reading of the following sections:
I suspect that there are a couple of reasons why PBCs didn’t work for that lab.
First of all they were tracking just 5 KPIs which is not enough to map the entire system and give you enough handles for levers to pull. Often a visual map of metric relationships is needed to visualize the overall system value flows.
Second, the KPIs were too detached from the processes that mattered. Percent of tests delivered by 7 am is a top level output metric. It needs to be decomposed into its components until you get to a point where you’re measuring something controllable.
People might not understand variation but they intuitively understand anomalies. Frame variation as anomalies, and you’ll get far more adoption.
Apologies in advance if it’s been covered here, because I have a lot of catching up to do on the forum. What is the prevailing thinking on this approach in business situations that are highly variable and experimental, or where outcomes are very lumpy.
For example: launching a new product. The team needs to nail down positioning, hire people, pre-sell or drum up some market interest, etc. The activities and not entirely repeated. And something like sales efforts will change as the product is developed and converges.
My point is, I suppose, that a lot of this seems to be about, well, “process control.” So is it applicable when there is hardly a “process” to control yet? When things are still experimental and one-off?
I think my (short) answer is here:
I have a longer one, but it’ll depend on some stories that I hope to tell in the WBR essay, which I’m currently working on.
I actually think these tools are underleveraged in experimental contexts. You do need to think about how you make something more than a one-off, but once you’ve done that the process behavior chart method can detect significant changes much more quickly than other methods.
For instance, nailing a product positioning might look like a one-off task. How could you reframe it so there is daily data? Maybe you have a newsletter or product signup page, and you count new signups per day. Then you can iterate the positioning language on the page and see which gets more traction. Tim Ferriss did a similar set of experiments with Google Ads to determine what versions of his book titles got the most interest.
@paul , @cedric , and @colin.hahn , I had a similar question regarding WBR-style processes, but specifically about products without a tight feedback loop, in terms of time. And particularly for the fuzzier areas such as marketing or design. For example a car manufacturer’s many granular decisions on such details as the sound of a door closing; decisions that operate in aggregate and may not bear results for a year or more.
Or software choices when it’s big enterprise and hybrid deployment rather than pure SaaS (so again longer turnaround times and unclear signals).
Perhaps this sounds like I have no clue about WBRs are supposed to achieve. I think I have some clue and some appreciation for their usage in these scenarios:
- stable situations (per Paul’s question)
- evolving situations that have nice tight feedback loops (e.g. pure SaaS development with decent monitoring, though I wouldn’t take it as far as the infamous Google example of multiple A/B tests to pick a color)
- perhaps situations that have longer feedback cycles but where you are dealing with fewer, bigger decisions that you could more easily hypothesize, model, and test.
I am reading and enjoying American Icon, the book about Alan Mulally turning around Ford that I think Cedric recommended. So far it says that there are WBR-style meetings where execs have to present, but it hasn’t gotten into details. I can see very well how finance decisions would be helped through that process. But I still don’t quite get how data would definitively point you towards certain detailed design decisions, nor some of the more “artful” aspects of marketing!
Maybe nobody tries to do that with WBRs! And that’s fine. Personally I am not sure if I would want to try to do it anyway. But I do wonder if for example Toyota and Ford, or for that matter an enterprise software company with longer cycles (asking for a friend ) has tried it.
One thing that has helped expand my understanding of process behavior charts is being really explict about what the process is intended to deliver. In your example for design decisions about a door closing: what is the actual output that the process is intended to deliver?
- If the deliverable is a decision about what to build, then you could track the performance of your decision processes. This would be similar to how a DevOps-influenced IT team tracks how long it takes to deliver a story point, how often that story point needs to be reworked because specs were ill-defined early on, etc. In this case, you are measuring how well your organization makes design decisions, not necessarily whether any particular decision was correct.
- If the deliverable is a way to produce an approved design with appropriate consistency, you could track the quality, delivery, and cost metrics like a manufacturing process. For instance, if one of your design specs is that closing the door should produce no more than X dB of noise, then you can measure the percentage of conforming parts each day and determine if your variation will allow you to meet your production targets or if your process is too inconsistent to deliver that quality spec at the level you need. This approach gets really interesting when you start combining the metrics: you can look at whether a reliable process (quality) can achieve the volume you need (delivery), and you can try experiments to see if changing the process actually improves delivery, and whether that new process has a negative impact on quality.
- If your deliverable is a unique milestone in a project plan, you can still examine how the project itself is performing. Track the actual-vs-projected time for each milestone on the plan. A process chart will be able to show if there is normal variation (in general we all made the same degree of error in our estimates), or if there are multiple “processes” in play (which would reveal that estimates from one team tend to be off more, allowing you to better forecast how the rest of the timeline might play out). You can also quickly see if the variation is drifting, which might reveal that a team is getting burned out, etc.
5 posts were split to a new topic: Applying SPC to Hiring (or Just Getting Better at Hiring!)
@colin.hahn , thank you for the thoughtful reply! It did help me with my question, though not perhaps in quite the way it was intended.
It was indeed the “whether a design decision was correct” that I was looking to get at. Or wondering whether organizations with long feedback loops do attempt to judge that with data in a serious way.
Amazon has very short feedback loops both on its marketplace business and its services. So an Amazon software PM’s question “did I build the right thing” can be measured quite directly, for example in terms of sales in the relevant area.
But much heavy manufacturing and some B2B software doesn’t have such tight feedback loops. Decisions are harder to tie to their ultimate results. The intent of a nice door closing mechanism is to:
- Sell more cars
- Retain more loyal customers
- Maintain the brand (for example people renting a car who then like it and talk to their friends about it)
But none of these is easy to tie back to a specific design decision. For that, I think there is a big element of product taste. That is, in the wide sense that @cedric described in This is What Product Taste Looks Like - Commoncog. As discussed in that post, it is not at all easy to define what product taste is, and even less describe a reliable sequence of steps to achieve product taste.
I don’t think that taste needs to be devoid of data. There are good research methods in marketing, and as I think about it more, there probably would be a way to tie back the experience of a nice door “clunk” to the higher-level factors which do eventually drive sales. I got to see some of that kind of insights work at HTC (in the glory days of that company). Heady stuff. Perhaps too heady sometimes.
To try to make those decisions on data alone wouldn’t work well in my opinion.
And I think it is both easier and more valid to use data to improve operational processes, as with your suggested approaches. Not only that, but focusing on operations seems to be the only reliable way to scale a business of any decent size, beyond the stage when a good inspiration will mostly keep you going.
Just my two cents on this. In my previous early-stage startup, I worked with a product advisor who was a senior PM at Amazon.
He was leading an initiative to implement some form of education platform. But I distinctly recall that their development took almost a year, in which they just quietly implemented it to a certain point where they think it’s sufficient, and only then was it released or had any sort of GTM movement.
I didn’t get a sense that their feedback loop was very short, but this probably applies only to new products. I don’t know how much PMs are empowered at Amazon, or most of the causal models related to which levels to push to get customers to engage in desirable behaviors reside only in the executive’s heads.
Thanks! Yes, that makes sense. The new products especially must take longer. I had in mind (as a lazy, unexamined impression) tweaks to the shopping experience or new features within AWS services. But it’s clear in Working Backwards and elsewhere that whole new products take a lot more time and thought, and some art.
Ironically I should have remembered, because I was working in this Bezos quote to something I was writing:
Often, when a memo isn’t great, it’s not the writer’s inability to recognize the high standard, but instead a wrong expectation on scope: they mistakenly believe a high-standards, six-page memo can be written in one or two days or even a few hours, when really it might take a week or more! They’re trying to perfect a handstand in just two weeks, and we’re not coaching them right. The great memos are written and re-written, shared with colleagues who are asked to improve the work, set aside for a couple of days, and then edited again with a fresh mind. They simply can’t be done in a day or two.
(Amazon letter to shareholders, 1998)
Overall, though, this exchange here has helped clarify for me that good work with data doesn’t try to replace the art and intuition that goes into product work too.
I think one reason that XmR charts aren’t real popular, even in companies that train people on Six Sigma, is that they get lost in the flood of the various types of control charts that are available.
People get flummoxed or overwhelmed by the classic flow chart that’s used to decide which control chart to use in which instance.
Don Wheeler simplifies this greatly by showing that we can use the XmR chart for basically everything. I’ve taught classes on XmR charts to Six Sigma Black Belts (and MBBs) who have said to me after class that “our executives eyes glaze over by the time I’m explaining the 2nd type of control chart.”
But I think the biggest challenge is one of organizational and executive culture. They are bothered by variation… the problem is that they can’t distinguish between signal and noise so they react to everything.
They think reacting to everything is good management.
XmR charts are, therefore, a solution to a problem that most executives don’t see as a problem.
They’re comfortable with reports that show a percentage change from last period. Or they are OK with seeing tables of numbers, such as “bowling charts.”
I think it’s more a matter that leaders don’t see their current management methods as a problem.
That includes their practice of reacting to all of the variation in a metric. They don’t like the variation, and without a tool to distinguish signal from noise (the XmR chart), they are happy to continue reacting and demanding answers or actions.
They think that’s good management and refuse to think it could be better.