Becoming Data Driven, From First Principles - Commoncog

cedric · January 28, 2024, 11:20am

Note: this is Part 12 in a series of blog posts about becoming data driven in business. This piece is the culmination of 1.5 years of theory and practice, and sets up for a two-parter explaining the Amazon-style Weekly Business Review.

This is a companion discussion topic for the original entry at https://commoncog.com/becoming-data-driven-first-principles

colin.hahn · January 28, 2024, 8:32pm

I’m really curious about why SPC is such a niche knowledge, limited primarily to the manufacturing sector. In the 1990s, GE popularized Six Sigma techniques, and there was a lot of interest in expanding those methods to other domains. In my own survey of the literature, I found a number of thinkers that explicitly transferred these techniques to the domains of healthcare, service sector work, and education. And then…it all sort of disappeared? I’m not involved enough in healthcare to know if SPC is still practiced there, but I’ve seen enough to be confident that it’s all but forgotten in the service sector and education.

I feel like I must be missing something. I’m fully on the WBR train. So if it didn’t stick in these sectors, either there is some complication that makes it harder, or less powerful, or…what? Business folks just decided they didn’t like using tools that would make them better at their jobs? That sounds pretty stupid. So what’s a better explanation of what happened? Why aren’t XmR charts as common in business as Pivot Tables?

cedric · January 29, 2024, 4:16am

The very last chapter in Mark Graban’s Measures of Success contains some details about why it’s so hard to teach. (Graban has been teaching this for years, mostly in hospital settings). The gist of it is that most people don’t see variation as a problem.

If they don’t internalise this, they’re not going to use XmR charts. Which makes it hard to do everything else: it’s hard to teach them the WBR, it’s hard to teach them the process control worldview, it’s unlikely that they will take the entire approach seriously.

The structure of this essay is actually set up to deal with this assumption. Nearly the entire first half is written to illustrate the problem with variation, and to illustrate what amazing things one can do once you have a way of dealing with variation.

But I still expect it to fall flat for some people. I’ve tried my best, but: https://x.com/ejames_c/status/1751817369968890081?s=20

minhthanh3145 · January 29, 2024, 8:00am

This post is really really good! The way you describe how many companies fail to be data-driven resonate a lot with what I’ve seen. And also that thing when you look at a chart and the number goes up and you don’t know what it means. Totally resonating!

Being data-driven, in this sense of pursuing knowledge, is one of the things I’m really interested about (since I wanna start my own company and I’d like to reduce my reliance on luck and faith and deities a bit). So far my success with building causal models has (probably) been because I focus on conducting JTBD interviews and customer discovery very frequently and intentionally. When doing that, it very much feels like I’m slowly building up my intuition about what kind of levers we can pull to get users to engage in certain desired behaviors.

I suspect that once I acquire the skillsets that allow me to use data to build up my causal model, I’d feel come to feel that intuition faster and more clearly (and more methodologically, I guess). Very very excited with this series and the upcoming software that would help people become more data-driven.

colin.hahn · January 29, 2024, 11:41pm

Can you say more about this? If anything, I would have expected the opposite reaction: that people are so focused on making charts that go up and to the right that they overreact when there’s any wiggle, regardless of whether it’s signal or noise.

cedric · January 30, 2024, 12:13am

That’s exactly right! With the optimisation worldview you might think “ahh, metrics are just for improving some conversion rate.” With the process control worldview, metrics become an opportunity for accelerating product/business intuition.

Btw, do you still think the North Star Metric Framework is compatible with this approach to metrics?

@colin.hahn So … this is my interpretation of Graban’s book, but I’ll just post a couple of screenshots here and we can discuss interpretations. I’m … not entirely sure if this is the right reading of the following sections:

Edited to add: I have noticed the “should I really put in the work to adopt this new method of thinking?” resistance, since XmR charts and dealing with variation really do force you to think differently about reality … though it’s usually traded off with a “but you no longer have to treat your business like a black box!” pull factor.

joereis · January 30, 2024, 3:45am

How often do incentives play into data “outcomes”?

minhthanh3145 · January 30, 2024, 3:57am

Haha In my mind somehow I’ve always thought of North Star Metric Framework as less of a data-driven tool and more like a way to encapsulate your causal model so that you can communicate that and align with your team. I used NSMF in the past after I’ve developed a causal model for my product, and I’ve found that to be a really great strategy deployment tool. But I wouldn’t use it right from the start though.

Right now my only tool is to just “get out of the building” and talk to a bunch of customers, in the process trying to build up intuitions about the what makes them tick. That’s been the only strategy that I’ve employed (with success) so far. But in certain environments you don’t get to talk to your customers that much, so I’m not sure if I’m able to thrive in such an environment (though I’ve never been in such an environment).

Would being data-driven in such orgs make sense? Or put it another way: do you think that being data-driven, in the sense that you mentioned, could completely replace customer discovery and actually taking to people?

cedric · January 30, 2024, 9:46am

I think one wonderful thing about the idea of ‘knowledge’ is that you can just ask yourself “what can give me more marginal knowledge about the customer?” And sometimes (often!) this is ‘talking to the customer’, and sometimes this is ‘instrumenting the product so we can see what the customer actually finds valuable.’

So I don’t see any reason you should give one up when you can also do the other.

This is one point I wanted to make but I guess I would save it for a later essay. During our podcast together, I asked Colin if the methods behind the WBR would work for a pre-product market fit product. And he said something to the effect of “Of course! Often you need to get a whole bunch of things right to get a successful product. (Cedric’s note: like if you’re launching a streaming service you need to ensure the video selection is large enough and the streaming is fast enough and latency of the services behind it is low etc etc). If you don’t instrument those things, then how do you know if your product failed because the idea was bad or if your execution was bad?”

And I think that’s a damn good point. It’s worth recalling that while Amazon Prime itself was an intuitive bet, they instrumented the hell out of it just to make sure they had actual knowledge of consumer behaviour + program behaviour + financial performance over the life of the entire bet, up to and beyond the point it was proven two years later.

And all throughout that process they were subjecting program costs to process control: constantly iterating to see if they could bring overall costs of Prime down to a controllable, predictable level, before it bankrupted the company.

A huge deal. I think one of the more depressing things we learnt investigating this body of work is that if you don’t have the power to structure incentives, you aren’t really going to be able to execute the full dream scenario described in the ‘What If It Doesn’t Have To Be This Way?’ section. Every example I gave at the end of the essay (Amazon, Koch, Ford, etc) was of a CEO-led initiative, which enabled (forced?) the various departments to work together.

colin.hahn · January 30, 2024, 4:40pm

Interesting. I feel like Graban himself doesn’t have a clear answer. He’s recognized that there is a skill around understanding the behavior of the system as a system. There’s a chicken-and-egg problem where PBCs make it easier to see the system perspective, but you need to be looking from a system perspective to appreciate what PBCs are giving you.

I’ll keep thinking about this.

erikwiffin · January 31, 2024, 11:01pm

I work in software engineering and while I don’t want to say “it would never work” it definitely feels harder. All the things that are easily measured in software (lines of code, stories completed, PRs requested) are things that I am at best neutral towards. The things I do want more of are famously difficult to measure (impact and outcome from Measuring developer productivity? A response to McKinsey).

On the flip side, I just made an XmR chart for my weight loss efforts so far this year and it seems perfect for that!

colin.hahn · February 1, 2024, 12:30am

@erikwiffin : In your role, what degree of ownership do you have to impact/outcome? I ask because, depending on what your work entails, what you measure to show those will change.

If your role involves vetting stories for relevance, then your metrics might be around customer satisfaction with stories delivered and cycle time for stories. If your work is less customer-facing and more about delivering code that does what the product manager specifies, then your metrics might be around cycle time, rework rates (how many times did this have to be adjusted because we had ambiguous specs), and the like.

Nately88 · February 1, 2024, 4:08am

Going back to the question around this comment: “The gist of it is that most people don’t see variation as a problem.”

Is the challenge that they don’t see variation as a problem, or more that people don’t have an intuition / understanding that there should be a normal amount of variation? Because without that understanding it’s even harder to explain the concept of exceptional / beyond normal variation (which is what it’s really all about).

To me this is the biggest challenge I see with data analysis. Even very smart, experienced people; people that worked with all sorts of charts, reporting, analysis, etc, will still often react to the smallest change in trends as if it’s meaningful signal to explore as opposed to just normal variation.

cedric · February 1, 2024, 9:29am

I think this is precisely it! Damn, that’s a great articulation.

I suspect the best thing to do re: software is to make software engineers directly responsible for product outcomes. Basically teach them the process control worldview, and then empower them to materially bend the numbers that capture some form of user happiness.

I know this is easy to say, hard to do (politics and org design etc etc) but this was basically the insight behind the argument here — where early Amazon’s engineers knew they were responsible for attempting to bend either price, selection, or convenience on their core flywheel.

cedric · February 1, 2024, 12:58pm

If you want a quick practical guide from a software engineer who has also put SPC ideas to practice, see:

Statistical Process Control: A Practitioner’s Guide

It doesn’t cover all the edge cases we’ve found from practice, but it definitely gives you more than what I’ve given you in this essay.

Edited to add, two observation he makes that I quite like/didn’t think to make:

He uses it in a software engineering self improvement / team improvement context. Perhaps because I’m such a business nerd, I did not think to use these methods in this way!
He points out that using these charts a lot and internalizing the worldview changes you a little: you become a lot more aware of, and accepting towards the role of randomness and luck in reality.

Perhaps not an accident I’ve been talking about luck a fair bit after using SPC methods in practice last year

erikwiffin · February 1, 2024, 2:25pm

@colin.hahn thankfully more like the first. CSAT probably won’t work (we run a content business, customers wouldn’t distinguish between satisfaction with content and the software that delivered that content) but cycle time sounds difficult but possible. Stories are of varying size - the point that I’d expect that to overwhelm any process variation. Not even touching how gameable that metric is (create a bunch of extremely small stories, cycle time goes to zero).

@cedric “hard to do” is understating it! I’m trying to implement SPC for myself and my team, not change the entire org chart. I’m also not 100% sure I think that’s a good idea. Product development is a useful skill for engineers, but it’s not the only skill. Should the junior engineer just getting up to speed on our tech stack also need to learn product development? What about the senior engineer who is really good at building things, but doesn’t care what those things are? Those are both archetypes of engineers that I would expect to see on healthy engineering teams, but I feel like they’d be driven out by making them directly responsible for product outcomes.

Thanks for the practitioner’s guide though - I’ll read through that and see if anything clicks.

Bennett · February 1, 2024, 2:40pm

Highlighting the parts from the practitioner’s guide that jumps out to me

The above also leads to something known as the report card effect : if you try to aggregate too many physical processes into one summary metric, that metric will always be a stable process, meaning it loses its power as an indicator of when something goes wrong. You must look into processes in reasonable detail in order to have meaningful metrics. If you summarise too many things into one number, you average out all the useful signals into noise.

Process behaviour charts are not useful only for time series. They’re very common for time series, but you can also apply them to other things. … When dealing with data points attached to people, a common trick is to order the data points alphabetically by name. This is in practise the same as ordering them at random (since we can think of names as randomly assigned to people) but looks nicer in a report.

When you have a stable process such as this, you don’t have to re-compute the process limits each week. One of the defining features of a stable process is that any given week, statistically, looks like any other week. Because of this, you can just extend the process limits you have already computed indefinitely into the future.

Goals are wishful thinking [aspirations?], and do not on their own improve things – they only make things worse.

This is classic Deming. But how does this square with Amazon’s OP1/OP2 goal setting? @cedric

Nately88 · February 1, 2024, 7:53pm

cool…i pulled that articulation basically from your essay and tweets in general

Continuing on that: For me, one of the big insights from your writing on this topic is that XmR charts are a highly effective way to explain the idea of normal variation to people that don’t intuitively think that way yet. When you just use an average, it’s visually a single line, and its easier to fall the for the trap that “normal” means a single line as opposed to normal being a range between two lines.

colin.hahn · February 1, 2024, 8:12pm

As food for thought, here are a couple of directions you might go based on that:

You could try instrumentation around the elements that go into customer satisfaction. Amazon did something similar by defining what customers are looking for (low price, available, etc) and then created ways to track that (what percentage of key products are listed at a price equal to or lower than the X key competitors, what percentage of product pages can be delivered within two days?)
If planning your team’s workload is an issue, you could measure the variation in story size actual vs predicted: how much time did you think this story was going to take, and how much did it actually take? This could improve the team’s estimaion capabilities, or identify where the team is consistently missing information in order to accurately gauge the effort needed for specific features.
I wouldn’t worry as much about story size being gameable if you can supplement it with customer-driven metrics. If your team is constantly delivering in fast cycles (because they’ve gamed that metric) but there’s no impact on the customer metrics–that tells a story too.