Mean vs. Median: A runner friend posted this last night in that social media space which must not be named: “Just read that Americans average 4000 calories per day. Mind boggled. Also really want to know the standard deviation.”
The immediate responses were mostly on the order of OMG THE JUNK-FOOD-SWALLOWING AMERICAN FATTIES. So, I countered with “This could be a right-skewed distribution, in which case the average is greater than the median.” In plain English, this means that if there are a lot of Americans who eat, say, a reasonable 2000-to-3000-calorie-per-day diet and there are fewer who eat 4000 to 6000 calories per day, the average number of calories consumed will work out to around 4000, but a lot more people are eating fewer calories. If you still don’t get it, read this. “Typical” or “average” numbers can only be trusted from bell/symmetrical curves. Furthermore, the standard deviation will tell you something in the case of a skewed distribution, but not a lot.
Naturally, my comment was glossed over and there were more responses on the order of OH THE HORRORS OF OBESITY AND THE AMERICAN FAST FOOD INDUSTRY. This one was my favorite: “I see the way a lot of ‘average’ eaters eat. It’s not good. Not good at all.” Oh, sweet mother of statistics. I had to reply. “It’s not about what average eaters consume, but the average calories consumed by this population sample. Again, the average without a mode and median, i.e. whether or not the distribution is a bell curve, is pretty pointless. Sorry, how statistics are presented in the media [and understood by the public] is a bit of a peeve of mine.”
The next time someone gives you a mean/average number to make you think it’s typical, ask for the mode and median as well.
Risk vs. Uncertainty: Both of the terms reference unknown outcomes, but there is a difference between risk and uncertainty. Please don’t use them interchangeably. If you’re uncertain whether something will succeed or not, that is a risk, not an uncertainty, i.e. you risk failure. Risk is “a set of possibilities each with quantified probabilities and quantified losses.” If you are out driving today, the risk you take is getting in an accident. There is no uncertainty involved here – you always run the risk of an accident when in a moving vehicle, a flood during rainy or hurricane season, the failure of an oil and gas well, a broken heart if in a romantic relationship, etc. Uncertainty is then used to describe the range of outcomes, i.e. How Badly? Will you be in a simple fender bender or will you be killed? Will your house flood by 0.2 inches or 20 feet? Will your failed well release 2 gallons of oil or 2 billion gallons? Will you get over your breakup in two days or be wounded forever? (Who knows? That’s where Bayes comes in. What, you don’t think the a priori probability distribution of your reaction to being dumped can be calculated?)
I bring this up because of something Engineer Tim said today as a response to error bars in engineering calculations: “It’s really hard to talk to the public about uncertainty and fragility; they just want to hear it’s fixed and safe.” Ack. Ok, look: 1) Nothing about human life implies 100% safety at all times. Just being alive means you are at risk of failure with any set of outcomes. 2) “Fixed” and “safe” are illusions when there is risk, uncertainty and (in)variables that are truly unknown. You may predict a set of outcomes for a low- or no-risk situation, but how do we know all factors have been taken into account? We never do.
Yes, great engineering, proper precautions, understanding the risks and uncertainties and a robust post-failure response mechanism may greatly alleviate things, but much of human living inherently involves risk with an uncertain set of outcomes. Furthermore, there are truly things we do not know or have not taken into account, but let us not use the words “risk” and “uncertainty” to characterize them.
Suggested reading: What is the difference between risk and uncertainty?; Defining risk versus uncertainty; The stock market: risk vs. uncertainty (“Whereas risk is quantifiable randomness, uncertainty isn’t.”)
#OverlyHonestMethods: A few days ago, scientists on Twitter started this hashtag to describe how real-life experiments are done and how data is collected in the real world. Some of the tweets bordered on confessional in terms of real number of samples used, unrepresentative population samples and methodology, but the majority of them displayed how experimentation is done and the limits of data capture and analysis. Popular Science rounded up a number of #overlyhonestmethods tweets into an article entitled #OverlyHonestMethods Hashtag Reveals How Science Is Really Done.
Which, of course, makes me wonder how the public thinks we do science.
Scientists seem to have one of three responses to this meme: 1) Hahahaha, story of my life!, 2) This is bad PR as laypeople will get the impression we’re a bunch of frauds because they lack the context of knowing how science is really done [e.g. @EruptionsBlog] and 3) We owe this to the public to teach them that the range of science is –>known, experimentation, uncertainty, unknowns as well as hypotheses that are built upon or changed as more data is captured and better techniques are developed<– [e.g. @SciObservatory].
I am of the (first and) third opinion. I find no problem in telling the public
what we know: Evolution is an observable fact; childhood vaccinations prevent horrible diseases.
what we don’t know but can guess at: Geophysical reservoir thickness calculations bear hundreds of feet of uncertainty based on the ability to seismically image top and base of the reservoir at great depths; these uncertainty bounds change as we drill more wells and the seismic data set becomes clearer or murkier,
what we don’t know at all: I have no clue whether aliens exist or not, and
we are human: The best measurements I made in my field area in Mexico using the Lacoste-Romberg gravimeter came from when I was sitting, which is easier on my back, and not kneeling in a tight spot.
Science is not done with all the data possible (in which case it wouldn’t be science but pure knowledge) and in a neat and linear fashion by automatons in white lab coats. Most of the time, we work very hard to acquire enough good data, make sense of what we do have, offer intelligent theories to explain any observable trend and even state that we must not over-science a problem until we have more data. Whether it works or not is immaterial, the science is in that we robustly attempted to find an answer from observations.
In fact, the onus is on the consumers of science and engineering to appreciate the concepts of statistics and uncertainty I wrote about earlier in this post AND on scientists and engineers to be transparent about our methods. If the intermediate result is the misunderstanding of science and the outing of some bad science, so be it. I’d rather that laypeople think about what science is and may be, rather than being completely disinterested or categorically “confident” of nothing of import.
What questions and concerns do you have about science?