Where are the robustness tests?
A manifesto for cricket analysis in the latter half of the 2020s
Cricket analysis has entered a new era. The recently concluded T20 World Cup was the first to feature “advanced” player profiles that booted out traditional metrics such as averages and strike-rates in favour of condensed predictive ratings called the “Skill Scale”. Eight months previously, the World Test Championship final broadcast was able to quantify physical attributes of the pitch to deliver novel insights about the state of play. The Fox broadcast has been lauded for using pose estimation techniques to plot movements made by players and deliver biomechanical insights.
This is not all. Outside the ivory tower, a whole new world of analysis has been opened up by the hobbyist fan, who has made zealous use of granular cricket data made available by Himanish Ganjoo and the democratization of coding tasks made possible by the advent of large language models. Arnav Jain has been doing the work of a whole firm by himself, conjuring up a new metric every day, and on the way, finding interesting angles such as here about the recent homogenization of T20 batting and here about the absence of batting peaks in the 2020s. Divyansh Peswani has invented a wonderful new model that captures the “scoring synergy” between pairs of batters at the crease and allows for batting-order reshuffles. Omkar Walunj has harnessed a fascinating tool that helps predict a team’s best XI from an array of possible combinations. At Best of Cricket, Tarutr Malhotra and Co. have reverse-engineered the ICC’s Skill Scale and promulgated new metrics to suggest batting-order fixes to the India Women’s ODI team.
This is great news! Only two years ago, I lamented on these very pages that cricket analysis was fast becoming either a niche subculture of social media or an occasional curiosity on ESPNcricinfo. That it has turned into neither of these things is entirely due to the benevolence of a cricket-addled cosmologist and the unbridged enthusiasm of my fellow hobbyists. But, as the quantity of cricket analysis grows, the quality we expect of it also ought to grow bigger. Gone are the days when one could reasonably expect to be appreciated for merely having the curiosity to run a cumbersome piece of data analysis for no extrinsic reward. In other words, it stands to reason that now is the time to actually set out the bounds of good cricket data analysis. What constitutes these bounds?
In my day job, I am training to be an economist. My education in this field has doubtlessly indoctrinated me in the ways of the world, but one way in which this has helped me make sense of thought systems is that economists try to go from correlation to causation, in highly complicated analytical environments. Most findings that pop up in natural data are simple correlations. So this has inspired in economists a certain Cartesian skepticism - your favourite economist is an eager old crook who is pining to wail that your framework is merely capturing correlations not causation - which creates a natural criterion for good economic papers. Piggybacking off this standard, I propose that good cricket analysis must satisfy three criteria: 1) interestingness, 2) internal logical consistency, and 3) robustness.
Good analysis is interesting analysis. It is not interesting to suggest that by scoring more boundaries than your opponent in a T20 a team raises its chances of winning because it is trivial: of course you need to score more boundaries than you concede because cricket matches are won by scoring more runs than your opposition. A related example arises in the form of “applied” analysis that takes well-known tropes and uses them to shine a light on ongoing cricket. Did New Zealand make a mistake by failing to bowl a second over of offspin against Abhishek Sharma despite his documented weakness against this type of bowling? That may be logically correct, but it is not novel enough to be interesting.
In the same vein, good analysis must be logically consistent. To be logically consistent is to make sure that all implications drawn follow from first principles; i.e., definitions and axioms. Any analysis of team decision-making that does not consider what rationales a team must have followed in making a decision fails to satisfy this criterion. To return to the above example, it is a legitimate question to ask whether the trade-off of bowling a second over of part-time offspin while a right-hander is at the other end is not worth it. Further - to take the example of my own work - any analysis of the deleterious impact of dew on scoring patterns that does not address the mechanisms through which dew can hinder scoring fails to satisfy this requirement. Put simply, it is important to show that premise A leads to premise B which leads to conclusion. Any good analysis must be self-contained.
It is the requirement of robustness that I want to spend more time on however. By robustness, I mean the quality of being resistant to small changes in the assumptions that make up a piece of analysis. The word “assumption” is avoided by many analysts like it is taboo to make and scandalous to utter, but simplifying assumptions are everywhere. Assumptions are essential to make for the purpose of tractability - in complex environments, the monster is simply too hard to tame without simplifications. But assumptions which are implicitly made without being explicitly stated are the worst enemy of all academic inquiry - for they leave open the possibility that the findings made by the study reflect not the underlying truth but artefacts of the assumptions. The same data can be used to support wildly different conclusions, depending on the specification used.
In 1975, Isaac Ehrlich published a seminal article in American Economic Review in which he used previously unseen data to demonstrate that the existence of the death penalty prevented murders in America. Only later studies conducted in different landscapes failed to bear out the same conclusion. Ten years later, Edward Leamer, in the aptly named paper “Let’s Take the Con out of Econometrics”, reran the analysis on Ehrlich’s data with a new set of assumptions and specifications. Depending on the specification used, he could produce estimates ranging from a large deterrent effect on crime to a large increase in murders. Later work illustrated that whether a study found a positive or negative effect on crime could be explained by the analyst’s political bias. The moral of the story is that it pays to be honest about assumptions while working with data. In Leamer’s words, “The mapping from data to conclusions is many-to-many.”
How does this link to cricket? Because in recent times, much of cricket analysis has been of the predictive variety (as opposed to the inferential variety). The focus has been on conjuring up “summary statistics” of various qualities which traditional metrics don’t fully capture, and in doing so, it has added machine learning into its ambit. The basic idea is to pull out an objective function that satisfies certain desirable properties - such as, for instance, that when evaluated at a point the function must return a higher output if the speed of scoring is greater - and to then optimize this function. These new metrics are then used to answer interesting questions. In other words, these metrics are predictions - RAAR is a prediction of whatever we intuitively call “impact”. But the same set of properties can be epitomized by multiple objective functions. We have no way of claiming that the objective function we have used is the “best” function because our only way of understanding it is through non-unique axiomatic characterizations. So, predictions come with error bounds. When analysis runs on noisy predictions, they tend to return noisy results.
An example is helpful here. I will use an economics example since the purpose of this article is not to debunk a famous cricket analytics study. The Gini coefficient is a well-known measure of inequality which satisfies certain desirable properties. However, one property it does not satisfy is the following: a rich-to-poor cash transfer which occurs at a lower point in the income distribution must produce greater equality than a quantitatively equivalent rich-to-poor cash transfer which occurs at a higher point in the distribution. Another inequality measure which does satisfy this property is the Theil Index, which is consequently used in analyses where this quality is desirable. Indeed, the best analyses are those which show that the conclusion that is claimed can be obtained from using a multitude of inequality measures.
This is the essence of robustness tests. Does your analysis replicate given varying underlying specifications or is it vulnerable to small tweaks in the set-up? Does your conclusion show up in various environments or is it a relic of the conditions characterizing your event? What if you changed your objective function to raise the rate at which slow scoring is penalized? What if you control for a potential confounder? Is the relationship truly causal or is the correlation you have discovered spurious? Robustness tests answer questions like these. I think there is the sense among writer-analysts that it is tough to appeal to their readership if one exhibits the kind of caution that is endorsed in this article. But I disagree. As Karthik Krishnaswamy pointed out in the excellent recent interview by Sparsh Telang, an abundance of skepticism is an effective way of actualizing the thesis-antithesis-synthesis style of writing.
The history of statistics is the history of learning to be honest about assumptions. Cricket analysis has been dealt a massive shot in the arm in the last 12 months by a burgeoning quantity of interest. But when everybody is an analyst, the bar for methodological carefulness rises. The next big leap forward will come when cricket analysts break free from the chains of narrative certainty.



Brilliant, thought provoking write up, Harigovind!
Loved the way you ended this as well:
“The next big leap forward will come when cricket analysts break free from the chains of narrative certainty.”