Tuesday, December 20, 2011

Trading and predictive analytics

I attended today's class in the course Trading Strategies and Systems offered by Prof Vasant Dhar from NYU Stern School of Business. Luckily, Vasant is offering the elective course here at the Indian School of Business, so no need for transatlantic travel.

The topic of this class was the use of news in trading. I won't disclose any trade secrets (you'll have to attend the class for that), but here's my point: Trading is a striking example of the distinction between explanation and prediction. Generally, techniques are based on correlations and on "blackbox" predictive models such as neural nets. In particular, text mining and sentiment analysis are used for extracting information from (often unstructured) news articles for the purpose of prediction.

Vasant mentioned the practical advantage of a machine-learning approach for extracting useful content from text over linguistics know-how. This reminded me of a famous comment by Frederick Jelinek, a prominent
Natural Language Processing researcher who passed away recently:
"Whenever I fire a linguist our system performance improves" (Jelinek, 1998)
This comment was based on Jelinek's experience at IBM Research, while working on computer speech recognition and machine translation.

Jelinek's comment did not make linguists happy. He later defended this claim in a paper entitled "Some of My Best Friends are Linguists" by commenting,
"We all hoped that linguists would provide us with needed help. We were never reluctant to include linguistic knowledge or intuition into our systems; if we didn't succeed it was because we didn't fi nd an effi cient way to include it."
Note: there are some disputes regarding the exact wording of the quote ("Anytime a linguist leaves the group the recognition rate goes up") and its timing -- see note #1 in the Wikipedia entry.

Wednesday, December 07, 2011

Polleverywhere.com -- how it worked out

Following up on my earlier post about the use of polleverywhere.com for polling in class, here is a summary of my experience using it in a data mining elective course @ ISB (38 students, after four sessions):
  • Creating polls: After a few tries and with a few very helpful tips from a PE representative, I was able to create polls and embed them into my Power Point slides. This is relatively easy and user-friendly. One feature that is currently missing in PE, which I use a lot, is the inclusion of a figure on the poll slide (for example, a snippet of some software output). Although you can paste the image on the PPT, it takes a bit of testing to place it so that it does not overlap on the poll. Also, if you need to use the poll in a browser instead of the PPT (see below), the image won't be there...
  • Operation in class: PE requires good Internet connection for the instructor and for all the users with laptops or using the wireless with a different device. Although wireless is generally operational in the classroom that I used, I did encounter a few times when it was flaky, which is very disruptive (the poll does not load; students cannot respond). Secondly, I found that voting takes much longer with mobile/laptops than with clickers. What would have taken 30 seconds with clickers can take several minutes with PE voting.
  • Student adoption: During the first session students were curious and quickly figured out how to vote. Students could either vote using a browser (I created the page pollev.com/profgalit where live polls would show up) or those lacking Internet access used their mobiles to tweet via SMS (Airtel free SMS to 53000; other carriers SMS to Bangalore number 09243000111 via smstweet.in). As the sessions progressed, the number of voters started dropping drastically. I suspected that this might be a result of my changing the settings to allow only registered users to vote. So I switched back to "anyone can vote", yet the voting percentage remained very low.
I have never graded voting, and rather use it as a fun active learning tool. With clickers response rate was typically around 80-90%, while with PE it is currently lower than 50%. Given our occasional Internet challenge, the longer voting time, and especially the low response rate I will be going back to clickers for now.

I foresee that PE would work nicely in a setting such as a one-time talk at a large conference, or a one-day workshop for execs. I will also mention the excellent and timely support by PE. And, of course, the low price!