Wednesday, July 27, 2011

Analytics: You want to be in Asia

Business Intelligence and Data Mining have become hot buzzwords in the West. Using Google Insights for Search to "see what the world is searching for" (see image below), we can see that the popularity of these two terms seems to have stabilized (if you expand the search to 2007 or earlier, you will see the earlier peak and also that Data Mining was hotter for a while). Click on the image to get to the actual result, with which you can interact directly. There are two very interesting insights from this search result:
  1. Looking at the "Regional Interest" for these terms, we see that the #1 country searching for these terms is India! Hong Kong and Singapore are also in the top 5. A surge of interest in Asia!
  2. Adding two similar terms that have the term Analytics, namely Business Analytics and Data Analytics, unveils a growing interest in Analytics (whereas the two non-analytics terms have stabilized after their peak).
What to make of this? First, it means Analytics is hot. Business Analytics and Data Analytics encompass methods for analyzing data that add value to a business or any other organization. Analytics includes a wide range of data analysis methods, from visual analytics to descriptive and explanatory modeling, and predictive analytics. From statistical modeling, to interactive visualization (like the one shown here!), to machine-learning algorithms and more. Companies and organizations are hungry for methods that can turn their huge and growing amounts of data into actionable knowledge. And the hunger is most pressing in Asia.
Click on the image to refresh the Google Insight for Search result (in a new window)

Thursday, July 14, 2011

Designing an experiment on a spatial network: To Explain or To Predict?

Image from
Spatial data are inherently important in environmental applications. An example is collecting data from air or water quality sensors. Such data collection mechanisms introduce dependence in the collected data due to their spatial proximity/distance. This dependence must be taken into account not only in the data analysis stage (and there is a good statistical literature on spatial data analysis methods), but also in the design of experiments stage. One example of a design question is where to locate the sensors and how many sensors are needed?

Where does explain vs. predict come into the picture? An interesting 2006 article by Dale Zimmerman called "Optimal network design for spatial prediction, covariance parameter estimation, and empirical prediction" tells the following story:
"...criteria for network design that emphasize the utility of the network for prediction (kriging) of unobserved responses assuming known spatial covariance parameters are contrasted with criteria that emphasize the estimation of the covariance parameters themselves. It is shown, via a series of related examples, that these two main design objectives are largely antithetical and thus lead to quite different “optimal” designs" 
(Here is the freely available technical report).