We have four datasets, each containing 11 pairs of X and Y measurements. All four datasets have the same X variable, and only differ on the Y values.
Here are the summary statistics for each of the four Y variables (A, B, C, D):
That's right - the mean and standard deviations are all identical. Now let's go one step further and fit the four simple linear regression models Y= a + bX + noise. Remember, the X is the same in all four datasets. Here is the output for the first dataset:
|Adjusted R Square||0.317163771|
|Coefficients||Standard Error||t Stat||P-value|
Guess what? The other three regression outputs are identical!
So are the four Y variables identical???
Well, here is the answer:
To top it off, Basset included one more dataset that has the exact same summary stats and regression estimates. Here is the scatterplot:
[You can find all the data for both Anscombe's and Basset et al.'s examples here]
Slides 3-4 from a presentation of Gramener show another neat example (with data from India) of the same phenomenon:
Post a Comment