Monday, July 30, 2012

Launched new book website for Practical Forecasting book

Last week I launched a new website for my textbook Practical Time Series Forecasting. The website offers resources such as the datasets used in the book, a block with news that pushes posts to the book Facebook page, information about the book and author, for instructors an online form for requesting an evaluation copy and another for requesting access to solutions, etc.

I am already anticipating my colleagues' question "what platform did you use?". Well, I did not hire a web designer, nor did I spend three months putting the website together using HTML. Instead, I used Google Sites. This is a great solution for those who like to manage their book website on their own (whether you're self-publishing or not). Very readable, clean design, integration with other Google Apps components (such as forms), and as hack-proof as it gets. Not to mention easy to update and maintain, and free hosting.

Thanks to the tools and platforms offered by Google and Amazon, self-publishing is not only a good realistic option for authors. It also allows a much closer connection between the author and the book users -- instructors, students and "independent" readers.


Wednesday, July 25, 2012

Explain/Predict in Epidemiology

Researchers in various fields have been sending me emails and reactions after reading my 2010 paper "To Explain or To Predict?". While I am aware of research methodology in a few areas, I'm learning in more detail about the scientific challenges caused by "predictive-less" areas.

In an effort to further disseminate this knowledge, I'll be posting these reactions in this blog (with the senders' approval, of course).

In a recent email, Stan YoungAssistant Director for Bioinformatics at NISS, commented about the explain/predict situation in epidemiology:
"I enjoyed reading your paper... I am interested in what I think is [epidemiologists] lack of clarity on explain/predict. They seem to take the position that no matter how many tests they compute, that any p-value <0.05 is a strong indication of something real (=explain) and that everyone should follow their policies (=predict) when, given all their analysis problems, they at the very best should consider their claims as hypothesis generating."
In a talk by epidemiology Professor Uri Goldbourt, who was a discussant in a recent "Explain or Predict" panel, I learned that modeling in epidemiology is nearly entirely descriptive. Unlike explanatory modeling, there is little underlying causal theory. And there is no prediction or evaluation of predictive power going on. Modeling typically focuses on finding correlations between measurable variables in observational studies that generalize to the population (and hence the wide use of inference, and unfortunately, a huge issue of multiple testing).

Predictive modeling has a huge potential to advance research in epidemiology. Among many benefits (such as theory validation), it would bring the field closer to today's "personalized" environment. Not only concentrating on "average patterns", but also generating personalized predictions for individuals.

I'd love to hear more from epidemiologists! Please feel free to post comments or to email me directly.

Tuesday, July 24, 2012

Linear regression for binary outcome: even better news

I recently attended the 8th World Congress in Probability and Statistics, where I heard an interesting talk by Andy Tsao. His talk "Naivity can be good: a theoretical study of naive regression" (Abstract #0586) was about the use of Naive Regression, which is the application of linear regression to a categorical outcome, treating the outcome as numerical. He asserted that predictions from Naive Regression will be quite good. My last post was about the "goodness" of a linear regression applied to a binary outcome in terms of the estimated coefficients. That's what explanatory modeling is about. What Dr. Tsao alerted me to, is that the predictions (or more correctly, classifications) too, will be good. In other words, it's useful for predictive modeling! In his words:
"This naivity is not blessed from current statistical or machine learning theory. However, surprisingly, it delivers good or satisfactory performances in many applications."
Note that to derive a classification from naive regression, you treat the prediction as the class probability (although it might be negative or >1), and apply a cutoff value as in any other classification method.


Dr. Tsao pointed me to the good old The Elements of Statistical Learning, which has a section called Linear Regression of an Indicator Matrix. There are two interesting takeaway from Dr. Tsao's talk:
  1. Naive Regression and Linear Discriminant Analysis will have the same ROC curve, meaning that the ranking of predictions will be identical.
  2. If the two groups are of equal size (n1=n2), then Naive Regression and Discriminant Analysis are equivalent and therefore produce the same classifications.