Can you comment on mixing the two techniques? It looked like from your forum chart (by the way, the newsletter links definitely worked on me!), youâre using the SPC limit lines to determine if an experiment is statistically significant. In the same way, can you use hypothesis testing (assume there has been no change) to determine if an observational data point is out of band? Perhaps mathematically the limit lines are actually the exact same value as a p value of 5%?!
That said, is that the same concept as âriskâ in the quoted sections? Wasnât sure how to interpret that
I tried to find a meme expressing horror at statistics, but I found this instead.
The two approaches are completely different.
The Experimental method I think youâre quite familiar with â âstatistical significanceâ means âthe probability that the observed result from the sample drawn is unlikely to be the result of sampling error aloneâ, or p < .05
The Observational approach is different, it asks: is this data Iâm looking at drawn from one probability distribution or from multiple distributions? The intuition here is that for the vast majority of real world data, routine variation will fall within three sigma units of the average. But if the data youâre looking at is drawn from multiple distributions (e.g. a outside factor has impacted your metric, or the process itself has changed its distribution), then youâre going to get data that falls outside three sigma, and thatâs a signal to investigate / a sign that your intervention has successfully changed the process behaviour.
Edited to add: there is more information on the âone distribution vs multiple distributionâ question in this blog post. And thereâs an article by Wheeler on the topic here: The Four Questions of Data Analysis.
I quite like his illustration of this question â which is basically asking, you have a collection of 50 randomly drawn beads, did the beads come from randomly picking out of one bowl (aka one probability distribution) or randomly picking out of many bowls?
At the end of the day, though, both methods, while different, aim to figure out the same thing, which is âI have done X, does it cause Y?â With the process behaviour chart approach, Iâm looking for a change in the process as a result of some action Iâve taken. With the experimental approach, Iâm looking for an effect to show up in a population given a change in one of the groups. Both methods can be used in a business, of course, but the statistical concepts for âa change has occurredâ is different.
Oh, the âriskâ in the quoted sections refer to the risk of a false alarm. Wheeler is trying to justify the extreme conservatism of the process behaviour chart approach, which actually rejects a ton of signal in favour of not giving a false alarm. (i.e. if you see something that breaks one of the three rules, itâs highly likely to be exceptional variation, but just because you donât see any special points doesnât mean a change hasnât occurred. )
Thanks for the clarification. Should have been less lazy and reviewed the previous articles in this series. I understand the statistical difference underpinning the two approaches, now.
I think Iâm still getting caught up on the difference between an experiment and a change in process. I first assumed that SPC was more to monitor a process for changes (essentially to keep it in the lane, and be able to respond quickly if out of band). So the method appears reactive instead of proactive.
So my confusion for this chart was that it looks like weâre trying to tell if changing a process has a statistically significant difference:
But still not sure when to use which when trying to improve a given metric. Am I running an experiment or am I changing a process? Is the major difference the existence of a control group?
The point about multiple distributions is a great one. Immediately I start to think about isolating each distribution, but this is likely impossible unless we change one potential source of distribution at a time.
Hmm, whatâs the nature of your confusion? This is going to be useful for me, as Iâve been explaining XmR charts to various friends and startup folk recently. (Iâm starting to see them as a powerful but useful crutch to get people used to the idea of âthereâs such a thing as routine and special variationâ!)
Some more notes, in the hopes of perhaps resolving more questions:
In manufacturing, yes, process behaviour charts are intended to help keep a process in lane, and to respond quickly if out of band. But itâs actually more accurate to say that XmR charts do two things: a) they allow you to separate signal from noise when observing a metric, and b) they allow you to characterise a processâs behaviour. This more generalised take allows you to use XmR charts in slightly different ways.
For instance! Iâve taken the following graph of Monthly New Newsletter Subscribers from Newsletter Growth Rate Puzzle:
Youâll notice that thereâs just routine variation here, but the XmR chart tells us a lot of things:
Monthly new subscribers can occasionally â rarely â go up to 516 or even as low as 0(!). I shouldnât be surprised if this happens, because itâs all routine variation.
The average monthly new subscribers is 207 a month.
The vast majority of the time, monthly new subscribers will track between 350 and 50 a month.
This is what it means, concretely, when I say âprocess behaviour charts help characterise a process.â
One way you can use the Forum User Pageviews graph is âok, I know I made a change on the 18th October, so let me place a divider there and draw a new XmR chart after that point.â
In this case Iâm not trying to detect exceptional variation. (There is one point of exceptional variation, on Oct 22, so I happen to know that my change has resulted in a real outcome. But even if there wasnât it can still be useful to just do this âplace divider and plot the new XmR chartâ)
Rather my goal is to characterise this new process, like in Step 2. Iâm trying to see how the new process behaviour is different, given that Iâm now regularly pasting links to forum topics in the newsletter. Concretely I want to see where the new limit lines end up, where the new average is, and so on. At about 6 new points the limit lines will begin to gel, and at about 10 points they will start to harden. In practice Iâll probably just make a new change at the 6 point mark (assuming I have the time to do so, since thatâs near my wedding). This is one of the limitations I discussed in the article â the observational studies approach that XmR charts belong to require you to wait before you make your next change.
This is, incidentally, the PDSA loop in action. You can hopefully see how repeated trial and error cycles like this, with feedback, helps me build an intuition for what works and what doesnât in moving the needle for various metrics I care about. (Also, keep in mind that weâre running multiple such changes in parallel; this is just one of the levers weâve found that we can pull).
Yeap! Most of the time in business weâre running âtrial and errorâ cycles, instead of rigorous âtwo groupâ experiments. The benefit of âtwo groupâ experiments is that they can detect subtle changes relatively quickly, given a large enough sample size. The tradeoff is that setting up an experiment takes a fair bit of work.
On the other hand, trial and error cycles are something we intuitively do â change a thing, wait to see if the change is good or bad. XmR charts just makes this process easier, since it can sometimes be hard to eyeball the data in the weeks after youâve made the change to figure out if a change is real.
Itâs not mentioned anywhere, but this article seems to be about the difference between causal and statistical inference.
Or rather: itâs between the group that wants to make causal conclusions from statistical observations but doesnât know the general theory, and the group that wants to make causal conclusions from experiments but doesnât know how their methods relate to the purely observational ones.
Iâm trying to think of a reading which is specific to explaining this distinction, but canât think of one. If you can get a copy, I might recommend just reading Chapter 1 of Causality by Judea Pearl, which explains causal networks and gives the basic tool for unifying experimental and observational inference in a common framework.
Your response was great, and really clarified it. Thanks for putting that all together.
My confusion was just in nomenclature â people in business often us the term âexperimentâ very loosely. Like, starting a business itself could even be considered an âexperiment:â letâs just launch this and see if it sticks.
I understand now weâre talking more about a scientific control-group style experiment.
A follow-up question, how can we track non-linear growth rate in these charts? For example, you have new subscribers as a chart, which is assuming linear growth. For compounding growth, perhaps charting the percentage?