Speak Up For Statistics

3 minute read


I am glad to see there is a 5-level of ML from @WIRED, but also concerned from the perspective of a statistician. Here is the link to the clip.

The final discussant raised concerns about how ML practice sometimes ignores the steps of problem formulation and data collection, which can diminish the value of ML products. To me, these genuine concerns characterize many significances of the statistics field, or at least statistical thinking.

I firmly believe a pinch of emphasis on statistics can mitigate these concerns of ML. With this belief, I am upset about the word ‘stat’ only mentioned a handful of times in this clip, if more than one. The lack of recognition of statistics concerns me for the future of both ML and statistics.

To be clear, I am not upset about the discussants, rather about the phenomenon that statistics always misses the spotlight of media, and maybe even have a negative rep. I got different faces when being asked about what I do, where I answered stat/ ML for the same work I do.

It got me thinking about if there is an implicit hierarchy where ML is superior/cooler than stat. I would conjecture yes, particularly recalling the discussion about the preference of @rdpeng on using “ML” over “stat” in his power sentence in @theeffortreport.

In addition to the impression of ML and stat ppl already developed, the situation of magnifying ML and understating stat is snowballing in media, where I wonder if it would lead to an avalanche of ML malpractice at some point.

The thinking seems to be pessimistic, but I deem reasonable. While we are prompting this cool idea of ML without thinking about how other fields can facilitate/improve the problem it suffers. Are we creating a negative feedback loop, which in turn amplifying what it is suffering?

A thought experiment: If we don’t intervene in any form now, I could only imagine there would be more talents entering ML programs that traditionally focus on improving the easiness of implementation, and less entering stat programs that cover the analytic process.

Such imbalance of work forces would reduce the possibility of exposure of statistical thinking in ML practice, and exaggerate the current problems.

I think there are many approaches that could improve on the current ML practice, including curriculum re-design, short course, etc. I would advocate for more statistics exposure in media. As a statistician, we should firmly stand with our reputation, and proudly own it. Build up a cool reputation for statistics such that we can attract and recruit more talents to understand the process and focus on statistical thinking.

Or at least, we can remind people that there is a whole discipline with a history of hundred-year that focusing on the whole analytic process that includes using computational devices to infer an answer as only a part but not whole.

I hope that more people can pay attention to statistics and inspired from what has been studied, instead of re-invent the wheel.

A final remark to conclude this lengthy thread: I am happy to see that all the discussants in the clip are females, which signify a bright future for female machine learning experts.