We have four datasets, each containing 11 pairs of X and Y measurements. All four datasets have the same X variable, and only differ on the Y values.
Here are the summary statistics for each of the four Y variables (A, B, C, D):
A | B | C | D | |
Average | 20.95 | 20.95 | 20.95 | 20.95 |
Std | 1.495794 | 1.495794 | 1.495794 | 1.495794 |
That's right - the mean and standard deviations are all identical. Now let's go one step further and fit the four simple linear regression models Y= a + bX + noise. Remember, the X is the same in all four datasets. Here is the output for the first dataset:
Regression Statistics | |
Multiple R | 0.620844098 |
R Square | 0.385447394 |
Adjusted R Square | 0.317163771 |
Standard Error | 1.236033081 |
Observations | 11 |
Coefficients | Standard Error | t Stat | P-value | |
Intercept | 18.43 | 1.12422813 | 16.39347 | 5.2E-08 |
slope | 0.28 | 0.11785113 | 2.375879 | 0.041507 |
Guess what? The other three regression outputs are identical!
So are the four Y variables identical???
Well, here is the answer:
![](file:///C:/DOCUME%7E1/Galit/LOCALS%7E1/Temp/moz-screenshot-3.jpg)
![](https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi5_Y0zRcYJ_i5bvTlyeKT6zIgrBFiPUrQztzJdHAhubvoSz4MbN5YSq9JQ8jz-rpO960jFB2kEkKtRZDuI7qM0EJfscb3XmsWAPMLjsc18a1awKof26alGKv495ZCVjHMc2N0LQg/s400/Anscombe.gif)
To top it off, Basset included one more dataset that has the exact same summary stats and regression estimates. Here is the scatterplot:
![](https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh59mFVfQlnOejULHMjVh3Dc3iL1ZbtWguzCeg98xfsXONwU5kLr0YpiMbOUlbsOGaAQZVRvkhHSobdpfr5iDMQU8VWLOwS3AkIDFY0lQC6-G4BQ6Xm5hCR_s3wa_guzTkX2PjhcQ/s320/Scatter+Plot_E.gif)
[You can find all the data for both Anscombe's and Basset et al.'s examples here]