Thursday, October 16, 2014

What's in a name? "Data" in Mandarin Chinese

The term "data", now popularly used in many languages, is not as innocent as it seems. The biggest controversy that I've been aware of is whether the English term "data" is singular or plural. The tone of an entire article would be different based on the author's decision.

In Hebrew, the word is in plural (Netunim, with the final "im" signifying plural), so no question arises.

Today I discovered another "data" duality, this time in Mandarin Chinese. In Taiwan, the term used is 資料 (Zīliào), while in Mainland China it is 數據 (Shùjù). Which one to use? What is the difference? I did a little research and tried a few popularity tests:

  1. Google Translate from Chinese to English translates both terms to "data". But Chinese-to-English translates data to 數據 (Shùjù) with the other term appearing as secondary. Here we also learn that 資料 (Zīliào) means "material".
  2. Chinese Wikipedia's main data article (embarrassingly poor) is for 數據 (Shùjù) and the article for 資料 (Zīliào) redirects you to the main article.
  3. A Google search of each term leads to surprising results on number of hits:

Search results for "data" term Zīliào
Search results for "data" term Shùjù
I asked a few colleagues from different Chinese-speaking countries and learned further that 資料 (Zīliào) translates to information. A Google Images search brings images of "Information". This might also explain the double hit rate. A duality between data and information is especially interesting given the relationship between the two (and my related work on Information Quality with Ron Kenett).


So what about Big Data?  Here too there appear to be different possible terms, yet the most popular seems to be 大数据 (Dà shùjù), which also has a reasonably respectable Wikipedia article.

Thanks to my learned colleagues Chun-houh Chen (Academia Sinica), Khim Yong Goh (National University of Singapore), and Mingfeng Lin (University of Arizona) for their inputs.