Anscombe’s Quartet

Yale statistician Frank Anscombe devised this demonstration in 1973. Here are four datasets, each with 11 (x,y) points:

I II III IV
x y x y x y x y
10.0 8.04 10.0 9.14 10.0 7.46 8.0 6.58
8.0 6.95 8.0 8.14 8.0 6.77 8.0 5.76
13.0 7.58 13.0 8.74 13.0 12.74 8.0 7.71
9.0 8.81 9.0 8.77 9.0 7.11 8.0 8.84
11.0 8.33 11.0 9.26 11.0 7.81 8.0 8.47
14.0 9.96 14.0 8.10 14.0 8.84 8.0 7.04
6.0 7.24 6.0 6.13 6.0 6.08 8.0 5.25
4.0 4.26 4.0 3.10 4.0 5.39 19.0 12.50
12.0 10.84 12.0 9.13 12.0 8.15 8.0 5.56
7.0 4.82 7.0 7.26 7.0 6.42 8.0 7.91
5.0 5.68 5.0 4.74 5.0 5.73 8.0 6.89

Each set produces the same summary statistics (mean, standard deviation, and correlation). But their graphs are strikingly different:

https://commons.wikimedia.org/wiki/File:Anscombe%27s_quartet_3.svg
Image: Wikimedia Commons

The lesson, Anscombe said, is to “make both calculations and graphs. Both sorts of output should be studied; each will contribute to understanding.”

Justin Matejka and George Fitzmaurice created a similar collection in 2017: the Datasaurus Dozen.

(Thanks, Rick.)

09/05/2021 UPDATE: Here’s an animation of the Datasaurus Dozen. (Thanks, Eric.)