← Back

Statistics Made Simple (Here’s What Professors Won’t Tell You)

Ahmed Salama | OCT, 2024

The Statistical Literacy Crisis

Statistical illiteracy is more deadly in today's data-driven world than traditional illiteracy. Consider these facts:A hospital nearly rejected a successful COVID treatment because staff misinterpreted p-values
A Fortune 500 firm lost $2 million based on "statistically significant" findings that didn't matter in the real world
Thousands of individuals make poor health/financial decisions because they are unable to detect fudged statistics
After 15 years of translating statistics into numbers for executives, physicians, and policymakers, I've learned a revolutionary secret: You don't require hard math to think statistically. You require the correct mental models.
This guide will provide you with:
The 20% of statistical principles that deal with 80% of actual-world decisions
Actionable frameworks employed by leading data scientists (without the jargon)
Warning signs for detecting statistical manipulation
Free tools that do the heavy math work
Chapter 1: The Averages You Need1.1 The Averages Trilogy - Mean, Median, Mode Unmasked The Problem:Being misled into making poor choices by using the wrong average. When a large chain store calculated "average" customer spending, they forgot that:
Mean: $85 (distorted by some big spenders)
Median: $32 (the real run-of-the-mill customer)
Mode: $19.99 (most commonly purchased)When to Use Which:
Average Type\tBest For\tWatch Out ForMean\tNormally distributed data\tOutliers skewing resultsMedianHouse prices, anything that has outliersLosing sensitivity to real valuesModeSurvey responses, top-selling productsMultiple modes causing confusionPro Tip: Always plot your data first. The histogram never lies.
1.2 Variation - The Most Important Concept Nobody TeachesStandard deviation isn't a formula - it's your reality check. At Uber, we learned:
Average ETA: 8 minutes
Standard deviation: 4 minutes
This meant:
68% of rides: 4-12 minutes
95% of rides: 0-16 minutes
That "30 minute ETA" was a system glitch
Practical Applications:
Developing realistic customer expectations
Identifying genuine anomalies vs. normal variation
Comparing groups effectively (are ranges comparable?)
Chapter 2: Relationships and Causation2.1 Correlation - Finding the Real ConnectionsThe golden rule: Correlation ≠ Causation, but it's most likely a clue to pursue.
Modern Examples That Mislead People:
"VPN users have good credit" (Both are tech-savviness-related)"People who buy organic food live longer" (Wealth/education are the drivers, not buying organic food)
"Video gamers are more violent" (Third factor: young men)
The Correlation Checklist:
Is there a hidden third factor?
Does the timing fit?
Has anyone ever done a controlled experiment?
2.2 The Hierarchy of Evidence
Not all statistical proof is equal:
Randomized Controlled Trials (Gold standard)
Natural Experiments (Next best)
Observational Studies (Flawed but everywhere)
Anecdotes (Worthless statistically)
Case Study: When Airbnb tested stripping out profile photos to cut down on discrimination:
Observational data said it wouldn't work
Randomized test showed it reduced bias by 21%
Chapter 3: Statistical Testing Made Practical3.1 p-Values - What They Really MeanForget the textbook definition. Here's what top researchers think:
"A p-value tells you how surprised you should be by the data if your initial assumption was correct."
Interpretation Framework:
p < 0.01: Strong evidence
p < 0.05: Moderate evidence
p > 0.05: Inconclusive (not "no effect!")Common Mistakes:
Thinking p=0.04 means "94% true"
Ignoring effect size (a very small effect can be "significant")
Multiple testing problems (try 20 tests, 1 will be p<0.05 by chance)
3.2 Confidence Intervals - Your Safety NetA 95% confidence interval means:"If we did this 100 times, about 95 of the intervals would contain the true value."
How to Use Them:
Wider interval = More uncertainty
Comparing groups? Check for overlap
Sample size matters - small studies have unnecessarily wide intervals
Chapter 4: Modern Statistical Puns4.1 Big Data, Bigger NonsenseMore data is more ways to fool yourself:
False patterns: Random noise looks real with enough data
Selection bias: Your sample isn't representative
Overfitting: Models that are too good.at fitting past data only
Defense Plan:
Always have a holdout set
Mock up how your analysis would perform on new data
Remember: Humans > Algorithms at spotting nonsense
4.2 AI's Statistical Blind SpotsLarge language models:
Hallucinate fake stats with confidence
Can't distinguish correlation from causation
Inherit and scale our biases
Protection Plan:
Always verify critical stats with original sources
Ask "How could this be wrong?"
Use AI for exploration, not answers
Chapter 5: Your Statistical Toolkit5.1 The Essential Analyses1. A/B Testing Framework:
Select a single metric to improve
Randomize properly
Calculate sample size in advance
Watch for surprise side effects
2. Cohort Analysis:
Compare groups over time
Reveal hidden patterns in retention, engagement etc.
3. Trend Analysis:
Always detrend seasonal data
Compare to multiple baselines
Watch for changepoints
5.2 Free Tools That Do the Math For YouTool\tBest For\t Learning CurveJASP\tGUI-based statistical analysis\tLowRStudio (with swirl)\tLearning while doing\tMediumGoogle Sheets\tQuick analyses\tVery lowObservable\tVisual exploration\tMediumChapter 6: Developing Statistical Intuition6.1 The Daily PracticeBuild your statistical spidey-sense by:
Questioning headlines: "Says who? How? Why?"Estimating first: Guess before calculatingSpotting patterns: Play statistical games like "Guess the Correlation"6.2 Statistical Fallacy BingoCommon mistakes to watch for:Cherry-picking P-hacking Survivorship BiasOverfitting Texas Sharpshooter Gambler's FallacyMultiple Comparisons Causation Errors Base Rate NeglectConclusion: Statistics as a SuperpowerDuring an era of disinformation, statistical thinking is not just useful - it's essential to:
Making better personal decisions
Uncovering lies
Influencing business value
Being a responsible citizen
Your Action Plan:
Today: Apply one concept from this guide at work
This Week: Hunt down and shoot down a misleading statistic
This Month: Master a new analysis technique inside and out
Free Resource: [Download the Statistical Thinking Field Guide] - with cheat sheets, case studies, and practice exercises.
Remember: You don't need to be a math genius to think statistically. You just need curiosity, skepticism, and the right mental models. Now go out there and make more intelligent decisions!Introduction: The Statistical Literacy CrisisSince our data-drenched world, statistical illiteracy is more hazardous than the traditional sort of illiteracy. Just consider these real-world implications:
A hospital nearly rejected an effective COVID treatment because workers misinterpreted p-values
A Fortune 500 organization lost $2 million based on "statistically significant" findings that were not significant in fact
Thousands of individuals make poor money/health decisions due to not being able to recognize cooked statistics
15 years of explaining statistics to executives, doctors, and policy makers have impressed upon me a radical insight: You don't require advanced math to think statistically. You require the appropriate mental models.
This book will give you:
The 20% of statistical concepts that solve 80% of real-life choices
Practical models used by top data scientists (jargon-free)
Red flags for detecting statistical manipulation
Freeware utilities that get the heavy math done
Chapter 1: The Essentials You Really Need1.1 The Averages Trilogy - Mean, Median, Mode SimplifiedThe Issue:Incorrect average leads to poor decisions. When one big-box retailer looked at "average" customer spend, they forgot that:
Mean: $85 (skewed by a few big spenders)
Median: $32 (actual typical customer)
Mode: $19.99 (most common buy)
When to Use Which:
Average Type\tBest For\tWatch Out ForMean\tNormally distributed data\tOutliers warping resultsMedian\tIncome, house prices, anything with outliers\tBecoming insensitive to true valuesMode\tQuestionnaire responses, bestsellers\tMultiple modes causing confusionPro Tip: Always plot your data first. The histogram never lies.
1.2 Variation - The Most Important Concept Nobody TeachesStandard deviation isn't an equation - it's your reality check. What we learned at Uber was:
1. Average ETA: 8 minutes
This implied that the standard deviation was:
0.4 minutes
This equated to:
68% of rides: 4-12 minutes
95% of rides: 0-16 minutes
That "30 minute ETA" was a system glitch
Practical Applications:
Setting reasonable customer expectations
Identifying actual anomalies vs. daily variation
Correctly comparing groups (are the ranges similar?)
Chapter 2: Relationships and Causation2.1 Correlation - Identifying the Real ConnectionsThe golden rule: Correlation ≠ Causation, but it's normally a hint to be followed.
Modern Examples That Catch People Out:
"VPN users have better credit scores" (Both are correlated with tech savvy)
"Those who purchase organic food live longer" (Wealth/education are the true drivers)
"Gamer players are violent" (Third variable: young males)
The Correlation Checklist:
Could there be a third, unnamed factor?
Does the timing fit?
Has someone done a controlled experiment?
2.2 The Hierarchy of EvidenceNot all statistical proof is equal:
Randomized Controlled Trials (Gold standard)
Natural Experiments (Next best)
Observational Studies (Problematic but widespread)
Anecdotes (Worthless statistically)
Case Study: When Airbnb tested eliminating profile photos to minimize discrimination:
Observational information showed it wouldn't succeed
Randomized trial showed it lowered bias by 21%
Chapter 3: Statistical Testing Made Practical3.1 p-Values - What They Really MeanBin the textbook definition. Here's what top researchers actually believe:
"A p-value tells you how surprised you should be by the data if your initial assumption was correct."
Interpretation Framework:
p < 0.01: Strong evidence
p < 0.05: Moderate evidence
p > 0.05: Inconclusive (not "no effect"!)
Common Mistakes:
Assuming p=0.04 implies "94% true"
Not considering effect size (a minuscule effect can be "significant")
Multiple testing issues (run 20 tests, 1 will be p<0.05 by chance)
3.2 Confidence Intervals - Your Safety NetA 95% confidence interval means:"If we did it 100 times, about 95 of the intervals would contain the true value."
How to Use Them:
Larger interval = Greater uncertainty
Are you comparing groups? Look for overlap
Sample size matters - few studies have unnecessarily wide intervals
Chapter 4: Contemporary Statistical Snafus4.1 Big Data, Bigger NonsenseThere are more traps for yourself when you've got more data:
Spurious patterns: With lots of data, random noise becomes significant
Selection bias: Your data are not representative
Overfitting: Models that perfectly.fit only old data
Defense Strategy:
Always confirm you have a holdout dataset
Simulate your analysis on fresh data
Reminder: Humans > Algorithms at detecting nonsense
4.2 AI's Statistical Blind SpotsLarge language models:
Easily hallucinate meaningless stats
Can't distinguish correlation from causation
Inherit and amplify our biases
Protection Plan:
Always cross-check important stats with original sources
Ask "How could this be wrong?"
Use AI for exploration, not conclusions
Chapter 5: Your Statistical Toolkit5.1 The Must-Have Analyses1. A/B Testing Framework:
Select one thing to optimize
Randomize correctly
Calculate sample size in advance
Test for unintended side effects
2. Cohort Analysis:
Compare cohorts over time
Unveils subtle patterns in retention, engagement etc.
3. Trend Analysis:
Always detrend seasonality
Compare to several baselines
Look for changepoints
5.2 Free Tools That Do Math For YouTool\tBest For\tLearning CurveJASP\tGUI statistical analysis\tLowRStudio (with swirl)\tLearning by doing\tMediumGoogle Sheets\tRapid analyses\tVery lowObservable\tData visualization\tMediumChapter 6: Developing Statistical Intuition6.1 The Daily PracticeBuild your statistical spidey-sense by:
Questioning headlines: "Says who? How? Why?"
Estimating first: Guess before calculating
Spotting patterns: Play statistical games like "Guess the Correlation"
6.2 Statistical Fallacy BingoCommon mistakes to watch for:
Cherry-picking P-hacking Survivorship BiasOverfitting Texas Sharpshooter Gambler's FallacyMultiple Comparisons Causation Errors Base Rate NeglectConclusion: Statistics as a SuperpowerIn an age of misinformation, statistical thinking isn't just useful - it's essential for:
Making better personal decisions
Spotting manipulation
Driving business value
Being an informed citizen
Your Action Plan:
Today: Implement one tip from this guide to your work
This Week: Find and discredit a misleading statistic
This Month: Master one new analysis skill
Remember: You don't need to be a mathematician to think statistically. You just need curiosity, skepticism, and the right mental models. Now go make better decisions

← Back to blog

Google Sites

Report abuse

Statistics Made Simple (Here’s What Professors Won’t Tell You)

The Statistical Literacy Crisis

A. Salama

Data solved.

Decisions made.

A. Salama