Iām really curious about why SPC is such a niche knowledge, limited primarily to the manufacturing sector. In the 1990s, GE popularized Six Sigma techniques, and there was a lot of interest in expanding those methods to other domains. In my own survey of the literature, I found a number of thinkers that explicitly transferred these techniques to the domains of healthcare, service sector work, and education. And thenā¦it all sort of disappeared? Iām not involved enough in healthcare to know if SPC is still practiced there, but Iāve seen enough to be confident that itās all but forgotten in the service sector and education.
I feel like I must be missing something. Iām fully on the WBR train. So if it didnāt stick in these sectors, either there is some complication that makes it harder, or less powerful, orā¦what? Business folks just decided they didnāt like using tools that would make them better at their jobs? That sounds pretty stupid. So whatās a better explanation of what happened? Why arenāt XmR charts as common in business as Pivot Tables?
The very last chapter in Mark Grabanās Measures of Success contains some details about why itās so hard to teach. (Graban has been teaching this for years, mostly in hospital settings). The gist of it is that most people donāt see variation as a problem.
If they donāt internalise this, theyāre not going to use XmR charts. Which makes it hard to do everything else: itās hard to teach them the WBR, itās hard to teach them the process control worldview, itās unlikely that they will take the entire approach seriously.
The structure of this essay is actually set up to deal with this assumption. Nearly the entire first half is written to illustrate the problem with variation, and to illustrate what amazing things one can do once you have a way of dealing with variation.
This post is really really good! The way you describe how many companies fail to be data-driven resonate a lot with what Iāve seen. And also that thing when you look at a chart and the number goes up and you donāt know what it means. Totally resonating!
Being data-driven, in this sense of pursuing knowledge, is one of the things Iām really interested about (since I wanna start my own company and Iād like to reduce my reliance on luck and faith and deities a bit). So far my success with building causal models has (probably) been because I focus on conducting JTBD interviews and customer discovery very frequently and intentionally. When doing that, it very much feels like Iām slowly building up my intuition about what kind of levers we can pull to get users to engage in certain desired behaviors.
I suspect that once I acquire the skillsets that allow me to use data to build up my causal model, Iād feel come to feel that intuition faster and more clearly (and more methodologically, I guess). Very very excited with this series and the upcoming software that would help people become more data-driven.
Can you say more about this? If anything, I would have expected the opposite reaction: that people are so focused on making charts that go up and to the right that they overreact when thereās any wiggle, regardless of whether itās signal or noise.
Thatās exactly right! With the optimisation worldview you might think āahh, metrics are just for improving some conversion rate.ā With the process control worldview, metrics become an opportunity for accelerating product/business intuition.
Btw, do you still think the North Star Metric Framework is compatible with this approach to metrics?
@colin.hahn So ā¦ this is my interpretation of Grabanās book, but Iāll just post a couple of screenshots here and we can discuss interpretations. Iām ā¦ not entirely sure if this is the right reading of the following sections:
Edited to add: I have noticed the āshould I really put in the work to adopt this new method of thinking?ā resistance, since XmR charts and dealing with variation really do force you to think differently about reality ā¦ though itās usually traded off with a ābut you no longer have to treat your business like a black box!ā pull factor.
Haha In my mind somehow Iāve always thought of North Star Metric Framework as less of a data-driven tool and more like a way to encapsulate your causal model so that you can communicate that and align with your team. I used NSMF in the past after Iāve developed a causal model for my product, and Iāve found that to be a really great strategy deployment tool. But I wouldnāt use it right from the start though.
Right now my only tool is to just āget out of the buildingā and talk to a bunch of customers, in the process trying to build up intuitions about the what makes them tick. Thatās been the only strategy that Iāve employed (with success) so far. But in certain environments you donāt get to talk to your customers that much, so Iām not sure if Iām able to thrive in such an environment (though Iāve never been in such an environment).
Would being data-driven in such orgs make sense? Or put it another way: do you think that being data-driven, in the sense that you mentioned, could completely replace customer discovery and actually taking to people?
I think one wonderful thing about the idea of āknowledgeā is that you can just ask yourself āwhat can give me more marginal knowledge about the customer?ā And sometimes (often!) this is ātalking to the customerā, and sometimes this is āinstrumenting the product so we can see what the customer actually finds valuable.ā
So I donāt see any reason you should give one up when you can also do the other.
This is one point I wanted to make but I guess I would save it for a later essay. During our podcast together, I asked Colin if the methods behind the WBR would work for a pre-product market fit product. And he said something to the effect of āOf course! Often you need to get a whole bunch of things right to get a successful product. (Cedricās note: like if youāre launching a streaming service you need to ensure the video selection is large enough and the streaming is fast enough and latency of the services behind it is low etc etc). If you donāt instrument those things, then how do you know if your product failed because the idea was bad or if your execution was bad?ā
And I think thatās a damn good point. Itās worth recalling that while Amazon Prime itself was an intuitive bet, they instrumented the hell out of it just to make sure they had actual knowledge of consumer behaviour + program behaviour + financial performance over the life of the entire bet, up to and beyond the point it was proven two years later.
And all throughout that process they were subjecting program costs to process control: constantly iterating to see if they could bring overall costs of Prime down to a controllable, predictable level, before it bankrupted the company.
A huge deal. I think one of the more depressing things we learnt investigating this body of work is that if you donāt have the power to structure incentives, you arenāt really going to be able to execute the full dream scenario described in the āWhat If It Doesnāt Have To Be This Way?ā section. Every example I gave at the end of the essay (Amazon, Koch, Ford, etc) was of a CEO-led initiative, which enabled (forced?) the various departments to work together.
Interesting. I feel like Graban himself doesnāt have a clear answer. Heās recognized that there is a skill around understanding the behavior of the system as a system. Thereās a chicken-and-egg problem where PBCs make it easier to see the system perspective, but you need to be looking from a system perspective to appreciate what PBCs are giving you.
I work in software engineering and while I donāt want to say āit would never workā it definitely feels harder. All the things that are easily measured in software (lines of code, stories completed, PRs requested) are things that I am at best neutral towards. The things I do want more of are famously difficult to measure (impact and outcome from Measuring developer productivity? A response to McKinsey).
On the flip side, I just made an XmR chart for my weight loss efforts so far this year and it seems perfect for that!
@erikwiffin : In your role, what degree of ownership do you have to impact/outcome? I ask because, depending on what your work entails, what you measure to show those will change.
If your role involves vetting stories for relevance, then your metrics might be around customer satisfaction with stories delivered and cycle time for stories. If your work is less customer-facing and more about delivering code that does what the product manager specifies, then your metrics might be around cycle time, rework rates (how many times did this have to be adjusted because we had ambiguous specs), and the like.
Going back to the question around this comment: āThe gist of it is that most people donāt see variation as a problem.ā
Is the challenge that they donāt see variation as a problem, or more that people donāt have an intuition / understanding that there should be a normal amount of variation? Because without that understanding itās even harder to explain the concept of exceptional / beyond normal variation (which is what itās really all about).
To me this is the biggest challenge I see with data analysis. Even very smart, experienced people; people that worked with all sorts of charts, reporting, analysis, etc, will still often react to the smallest change in trends as if itās meaningful signal to explore as opposed to just normal variation.
I think this is precisely it! Damn, thatās a great articulation.
I suspect the best thing to do re: software is to make software engineers directly responsible for product outcomes. Basically teach them the process control worldview, and then empower them to materially bend the numbers that capture some form of user happiness.
I know this is easy to say, hard to do (politics and org design etc etc) but this was basically the insight behind the argument here ā where early Amazonās engineers knew they were responsible for attempting to bend either price, selection, or convenience on their core flywheel.
It doesnāt cover all the edge cases weāve found from practice, but it definitely gives you more than what Iāve given you in this essay.
Edited to add, two observation he makes that I quite like/didnāt think to make:
He uses it in a software engineering self improvement / team improvement context. Perhaps because Iām such a business nerd, I did not think to use these methods in this way!
He points out that using these charts a lot and internalizing the worldview changes you a little: you become a lot more aware of, and accepting towards the role of randomness and luck in reality.
Perhaps not an accident Iāve been talking about luck a fair bit after using SPC methods in practice last year
@colin.hahn thankfully more like the first. CSAT probably wonāt work (we run a content business, customers wouldnāt distinguish between satisfaction with content and the software that delivered that content) but cycle time sounds difficult but possible. Stories are of varying size - the point that Iād expect that to overwhelm any process variation. Not even touching how gameable that metric is (create a bunch of extremely small stories, cycle time goes to zero).
@cedric āhard to doā is understating it! Iām trying to implement SPC for myself and my team, not change the entire org chart. Iām also not 100% sure I think thatās a good idea. Product development is a useful skill for engineers, but itās not the only skill. Should the junior engineer just getting up to speed on our tech stack also need to learn product development? What about the senior engineer who is really good at building things, but doesnāt care what those things are? Those are both archetypes of engineers that I would expect to see on healthy engineering teams, but I feel like theyād be driven out by making them directly responsible for product outcomes.
Thanks for the practitionerās guide though - Iāll read through that and see if anything clicks.
Highlighting the parts from the practitionerās guide that jumps out to me
The above also leads to something known as the report card effect : if you try to aggregate too many physical processes into one summary metric, that metric will always be a stable process, meaning it loses its power as an indicator of when something goes wrong. You must look into processes in reasonable detail in order to have meaningful metrics. If you summarise too many things into one number, you average out all the useful signals into noise.
Process behaviour charts are not useful only for time series. Theyāre very common for time series, but you can also apply them to other things. ā¦ When dealing with data points attached to people, a common trick is to order the data points alphabetically by name. This is in practise the same as ordering them at random (since we can think of names as randomly assigned to people) but looks nicer in a report.
When you have a stable process such as this, you donāt have to re-compute the process limits each week. One of the defining features of a stable process is that any given week, statistically, looks like any other week. Because of this, you can just extend the process limits you have already computed indefinitely into the future.
Goals are wishful thinking [aspirations?], and do not on their own improve things ā they only make things worse.
This is classic Deming. But how does this square with Amazonās OP1/OP2 goal setting? @cedric
coolā¦i pulled that articulation basically from your essay and tweets in general
Continuing on that: For me, one of the big insights from your writing on this topic is that XmR charts are a highly effective way to explain the idea of normal variation to people that donāt intuitively think that way yet. When you just use an average, itās visually a single line, and its easier to fall the for the trap that ānormalā means a single line as opposed to normal being a range between two lines.
As food for thought, here are a couple of directions you might go based on that:
You could try instrumentation around the elements that go into customer satisfaction. Amazon did something similar by defining what customers are looking for (low price, available, etc) and then created ways to track that (what percentage of key products are listed at a price equal to or lower than the X key competitors, what percentage of product pages can be delivered within two days?)
If planning your teamās workload is an issue, you could measure the variation in story size actual vs predicted: how much time did you think this story was going to take, and how much did it actually take? This could improve the teamās estimaion capabilities, or identify where the team is consistently missing information in order to accurately gauge the effort needed for specific features.
I wouldnāt worry as much about story size being gameable if you can supplement it with customer-driven metrics. If your team is constantly delivering in fast cycles (because theyāve gamed that metric) but thereās no impact on the customer metricsāthat tells a story too.