Mean vs. Median: A runner friend posted this last night in that social media space which must not be named: “Just read that Americans average 4000 calories per day. Mind boggled. Also really want to know the standard deviation.”
The immediate responses were mostly on the order of OMG THE JUNK-FOOD-SWALLOWING AMERICAN FATTIES. So, I countered with “This could be a right-skewed distribution, in which case the average is greater than the median.” In plain English, this means that if there are a lot of Americans who eat, say, a reasonable 2000-to-3000-calorie-per-day diet and there are fewer who eat 4000 to 6000 calories per day, the average number of calories consumed will work out to around 4000, but a lot more people are eating fewer calories. If you still don’t get it, read this. “Typical” or “average” numbers can only be trusted from bell/symmetrical curves. Furthermore, the standard deviation will tell you something in the case of a skewed distribution, but not a lot.
Naturally, my comment was glossed over and there were more responses on the order of OH THE HORRORS OF OBESITY AND THE AMERICAN FAST FOOD INDUSTRY. This one was my favorite: “I see the way a lot of ‘average’ eaters eat. It’s not good. Not good at all.” Oh, sweet mother of statistics. I had to reply. “It’s not about what average eaters consume, but the average calories consumed by this population sample. Again, the average without a mode and median, i.e. whether or not the distribution is a bell curve, is pretty pointless. Sorry, how statistics are presented in the media [and understood by the public] is a bit of a peeve of mine.”
The next time someone gives you a mean/average number to make you think it’s typical, ask for the mode and median as well.
Suggested reading: Mean, median, mode and range; Summary statistics for skewed distributions; Life is log-normal!
***
Risk vs. Uncertainty: Both of the terms reference unknown outcomes, but there is a difference between risk and uncertainty. Please don’t use them interchangeably. If you’re uncertain whether something will succeed or not, that is a risk, not an uncertainty, i.e. you risk failure. Risk is “a set of possibilities each with quantified probabilities and quantified losses.” If you are out driving today, the risk you take is getting in an accident. There is no uncertainty involved here – you always run the risk of an accident when in a moving vehicle, a flood during rainy or hurricane season, the failure of an oil and gas well, a broken heart if in a romantic relationship, etc. Uncertainty is then used to describe the range of outcomes, i.e. How Badly? Will you be in a simple fender bender or will you be killed? Will your house flood by 0.2 inches or 20 feet? Will your failed well release 2 gallons of oil or 2 billion gallons? Will you get over your breakup in two days or be wounded forever? (Who knows? That’s where Bayes comes in. What, you don’t think the a priori probability distribution of your reaction to being dumped can be calculated?)
I bring this up because of something Engineer Tim said today as a response to error bars in engineering calculations: “It’s really hard to talk to the public about uncertainty and fragility; they just want to hear it’s fixed and safe.” Ack. Ok, look: 1) Nothing about human life implies 100% safety at all times. Just being alive means you are at risk of failure with any set of outcomes. 2) “Fixed” and “safe” are illusions when there is risk, uncertainty and (in)variables that are truly unknown. You may predict a set of outcomes for a low- or no-risk situation, but how do we know all factors have been taken into account? We never do.
Yes, great engineering, proper precautions, understanding the risks and uncertainties and a robust post-failure response mechanism may greatly alleviate things, but much of human living inherently involves risk with an uncertain set of outcomes. Furthermore, there are truly things we do not know or have not taken into account, but let us not use the words “risk” and “uncertainty” to characterize them.
Suggested reading: What is the difference between risk and uncertainty?; Defining risk versus uncertainty; The stock market: risk vs. uncertainty (“Whereas risk is quantifiable randomness, uncertainty isn’t.”)
***
#OverlyHonestMethods: A few days ago, scientists on Twitter started this hashtag to describe how real-life experiments are done and how data is collected in the real world. Some of the tweets bordered on confessional in terms of real number of samples used, unrepresentative population samples and methodology, but the majority of them displayed how experimentation is done and the limits of data capture and analysis. Popular Science rounded up a number of #overlyhonestmethods tweets into an article entitled #OverlyHonestMethods Hashtag Reveals How Science Is Really Done.
Which, of course, makes me wonder how the public thinks we do science.
Scientists seem to have one of three responses to this meme: 1) Hahahaha, story of my life!, 2) This is bad PR as laypeople will get the impression we’re a bunch of frauds because they lack the context of knowing how science is really done [e.g. @EruptionsBlog] and 3) We owe this to the public to teach them that the range of science is –>known, experimentation, uncertainty, unknowns as well as hypotheses that are built upon or changed as more data is captured and better techniques are developed<– [e.g. @SciObservatory].
I am of the (first and) third opinion. I find no problem in telling the public
what we know: Evolution is an observable fact; childhood vaccinations prevent horrible diseases.
what we don’t know but can guess at: Geophysical reservoir thickness calculations bear hundreds of feet of uncertainty based on the ability to seismically image top and base of the reservoir at great depths; these uncertainty bounds change as we drill more wells and the seismic data set becomes clearer or murkier,
what we don’t know at all: I have no clue whether aliens exist or not, and
we are human: The best measurements I made in my field area in Mexico using the Lacoste-Romberg gravimeter came from when I was sitting, which is easier on my back, and not kneeling in a tight spot.
Science is not done with all the data possible (in which case it wouldn’t be science but pure knowledge) and in a neat and linear fashion by automatons in white lab coats. Most of the time, we work very hard to acquire enough good data, make sense of what we do have, offer intelligent theories to explain any observable trend and even state that we must not over-science a problem until we have more data. Whether it works or not is immaterial, the science is in that we robustly attempted to find an answer from observations.
In fact, the onus is on the consumers of science and engineering to appreciate the concepts of statistics and uncertainty I wrote about earlier in this post AND on scientists and engineers to be transparent about our methods. If the intermediate result is the misunderstanding of science and the outing of some bad science, so be it. I’d rather that laypeople think about what science is and may be, rather than being completely disinterested or categorically “confident” of nothing of import.
What questions and concerns do you have about science?
Hi Maitri
Nice post, hot topic.
Discussions involving ‘calories’, Americans’, or ‘statistics’ can become quite emotionally charged in some contexts and many circles. Good for you for diving into one involving all three topics at once, and trying to bring the tone down and sticking to science.
Also, nice collections of readings.
Although my favourite is your ‘we are human’: having done a field thesis-myself with a Lacoste-Romberg (in Tuscany), I totally get it. Fitting, and funny example.
Matteo
Just to be clear, I commented that it is “really hard to talk to the public” about these things, but I did not say we shouldn’t or that I didn’t recommend it. To the contrary, I’m solidly with you that scientists and engineers need to work to raise the public to our level of understanding. Will we succeed? Well, we only know that we won’t if we don’t try.
In my work, we say Risk = Probability x Consequences. We also use Confidence as an expression of how much error resides in our numbers. So while I may calculate the Probability of any given event, imperfect data sets will limit the Confidence of my calculation. We routinely express a design value as having 50% Confidence or 90% Confidence as a way of quantifying how much error could reside in our numbers.
Thanks for blogging about this. This topic needs more attention and discussion.
Peace,
Tim
I think your notes on the OverlyHonestMethods hashtag are great.
In my mind it captures perfectly the philosophical argument of anti-intellectualism. One of the problems most people have with science is they view it as sort of this ivory tower: “our word is law” approach. But they tend to forget most scientists are people, and often we’re fairly open about the failings of our methods when confronted. But like most humans we like to save face when we can.
Most intelligent people realize the scientific data is available for them to interpret themselves, with the internet this is more true now then ever before.
What really scares me is scientists with political agendas, because often they aren’t overly honest.
To relate this to personal experience, I have no intention of going out on the boat(or field), But I damn well want to know everything that went wrong in the acquisition report(ob notes).
Impressed am I! I wish I could talk about things like this as succinctly as you. Alas, I am one of those non-scientists who wants to really understand and only then extrapolate things into real-world steps forward, but it is very hard to do. I follow for about 25% of it and then get lost. This is why we rely on scientists like you who can bridge the gap and talk to us laypeople!
I agree that there is too much jumping to conclusions when faced with data or studies, and that a lot of people can gloss over a paper, find the points that support their preconceived notion, and then stick it under someone else’s nose (or a lot of noses, thanks to social media) and say SEE? EVIDENCE. I think there should be a free course via Coursera or something on “How to read journal papers for non-scientists” and link to it every time someone does a stupid on the internet.
Andrea: This is why I say the burden is on laypeople to show interest AND on scientists to explain as clearly as possible.
Incidentally, those papers that Aaron Swartz freed from JSTOR are yours and mine –> paid for by us <-- and access to a lot of that information will help you learn and help us make progress and teach. Scientific publishing as a business will ensure that non-scientists get only bits of the information and arrive at the wrong conclusions. Notice that my scientific posts on FB are rarely liked or commented on? Says something, huh?
Brilliant