How to Become Data Driven - Commoncog

eric · March 2, 2023, 5:39pm

I shared this response in an email to Cedric, and he suggested I share it here as well:

I hadn’t gotten around to reading this post until now, but I love that insight that understanding variation is key.

I was part of the revenue forecasting finance team at Google that set the estimates for how much money we’d make so that the company could decide how much it could plan to spend and maintain margins. When I got there in 2008, the errors were 10-20% on the annual forecast; a few years later, we were rarely off by more than 0.5-1% at the quarterly level, and 1-2% for the year, unless something had drastically changed (aka exceptional variation in the form of global economy changes).

A lot of the forecasting improvement was low-hanging fruit addressing normal variation - splitting up forecasts by country/region that behaved differently, understanding seasonality better (holidays around the world, weekdays vs. weekends, one-off events like the World Cup or the Olympics), tuning different algorithms for different products, using year-on-year and year-on-2year trends to understand outliers, etc.

As part of that, I developed my understanding of our revenue driver metrics so I could glance at a day when actuals were off from our forecast, and figure out in a few minutes whether it was something to look into more or not. I was the guy the VPs/execs looked at to tell them whether this was “normal variation” or something that needed greater investigation.

Data driven decisions without that intense understanding of the underlying data often just means leaders seizing on the numbers that support their pre-existing biases

[Cedric asked me how I developed that understanding without looking at variation charts]

In our case, it was looking at the year-on-year data split by country/region that was most helpful, plus also splitting revenue out by key subcomponents (e.g. click-through rate and cost-per-click). We had an internal dashboard that allowed us to split the revenue data along all of those axes, and it was incredibly helpful in building that intuition - I later learned from the team that built and maintained that dashboard that I was their #1 user by a mile

There’s an accumulation of little things like knowing that quarterly comparisons are tricky because they are different lengths (Q1 is 90 days due to February, compared to 91 days in Q2 or 92 days in Q3 and Q4), or knowing how Catholic countries in Southern and Eastern Europe have a much bigger slowdown due to Easter. There’s keeping track of yearly events so you can look at whether it’s this year that’s the outlier or last year when looking at weird y’y trends. There’s looking at the subcomponents as I mentioned - I remember one day when all the monitoring alerts fired because our revenue per query tanked, and we quickly realized it was because Michael Jackson had died, and we had a spike of queries about Michael Jackson that had no ads on them.

All of those intuitions were later codified into the forecasting algorithms (rather than doing a global forecast, the team automated doing y/y forecasts by country and region and summed them up, taking into account seasonal variations and holidays and other discontinuous events like the Olympics or World Cup) and passed on as knowledge to new people joining that revenue forecasting team, so it was no longer up to them to learn it as we did.