Rajesh Rao, Computing a Rosetta Stone for the Indus Script

Wed Jul 13 15:03:34 UTC 2011

On Jul 13, 2011, at 5:36 AM, Dominik Wujastyk wrote:

> TED talk, March 2011:
>
> begin quote:

> Rajesh Rao is fascinated by "the mother of all crossword puzzles":  
> How to decipher the 4000 year old Indus script. At TED 2011 he tells  
> how he is enlisting modern computational techniques to read the  
> Indus language, the key piece to understanding this ancient  
> civilization.

> end quote.

There is  nothing new in Rao's claims, which were thoroughly debunked  
(among other places) by Richard Sproat in an invited article in  
_Computational Linguistics_ less than a year ago. See Richard Sproat,  
"Ancient Symbols, Computational Linguistics, and the Reviewing  
Practices of the General Science Journals," Computational Linguistics  
36, 3 (Sept. 2010), 585-94.

You can download the full article (open access) here:

http://www.mitpressjournals.org/doi/abs/10.1162/coli_a_00011

As Richard argues, articles like the original paper by Rao in  
_Science_ that started this ball rolling should never have been  
published -- and say more about the degradation of standards in peer  
review practices (triggered in part by vastly increased information  
flows we are experiencing today) than about computational linguistics.

The flaws in Rao's work are so obvious to computational linguists --  
which it is important to note is not Rao's field, which explains in  
part the linguistic naivite in his work -- that the same claim (that  
Rao's research was not properly reviewed) was in fact made immediately  
after Rao's first paper appeared by a long series of computational  
linguists besides Sproat, including most prominently Mark Liberman and  
Fernando Pereira.

For their comments and analysis, and the related analysis by the  
mathematician Cosma Shalizi, see here in the Language Log, made  
immediately after Rao's first paper was published:

http://languagelog.ldc.upenn.edu/nll/?p=1374

There is no need to repeat their technical arguments here. In brief,  
leaving aside mathematical niceties (for those, see the links above):  
the fact that there is order of some sort in Indus symbols has been  
known since the 1920s. GR Hunter demonstrated that using nothing more  
sophisticated than pencil and paper charts in his 1929 doctoral thesis  
on Indus signs. All that Rao has replicated using complex means is  
what any simple eyeballing of the signs makes immediately apparent.

What Hunter and Rao (and many others before him who made similar  
claims, going back to the 1960s, about the "magic of computers" in  
"deciphering" the "script")  didn't bother to mention: all symbol  
strings of every sort have order in them; this includes boy scout  
medals, horoscopal signs, alchemical symbols, mnemonic signs, magical  
symbols, clan signs, the signs on Kudurru stones, or conventional  
orders of saints or saint attributes in iconographical works.

You can even find order of the same sort in modern multi-symbol  
airport and highway signs. You can also ashow from cross-cultural  
analyses of highway signs (Michael Witzel has made an interesting  
collection of these for our amusement) that there are different  
"dialects" of these symbols, none of which has to do with them  
supposedly encoding different "languages."

As Farmer, Sproat, and Witzel showed in 2004, the kind of order that  
you find in Indus symbols shows up as well in the order of 'blazons'  
or medieval heraldic signs -- which obviously doesn't suggest that  
heraldic signs encode "language", as ordinarily understood.

Sproat and his students are non embarked on a project in studying the  
various orders in different types of nonlinguistic signs, funded by  
grants from the National Science Foundation.

More sensationalist nonsense has been written about the so-called  
Indus script than about any other pseudo-script I can think about --  
grossly skewing our understanding of Indus civilization -- although  
the recent nonsense about "Pictish language" (inspired by Rao's work)  
comes close. On this, see again Liberman's  trenchant remarks in the  
Language Log:

http://languagelog.ldc.upenn.edu/nll/?p=2227

See also here, where Liberman also points to Sproat's definitive  
article in _Computational Linguistics_, which "poses the question that  
I [Liberman] was too polite to ask":

> How is it that papers that are so trivially and demonstrably wrong  
> get published in journals such as Science or the Proceedings of the  
> Royal Society?

I personally think that the answer to that question has to do with the  
marketing uses of sensationalism in a period in which traditional  
subscription-based journals are forced to compete with open access  
materials, and editor succumb to the temptations of publishing papers  
so sensational that they are sure to get noticed in the popular press.

We know that there was fierce inside opposition at Science magazine to  
publishing Rao's original paper, and yet Science refused to published  
even a short letter refuting the paper despite the widespread  
criticism the paper engendered from computational linguists due to a  
"lack of space."

Very rushed comments above (on a deadline that has nothing to do with  
anything Indus).

Regards,
Steve Farmer