Our group has been applying a variety of techniques, drawn from computational linguistics, machine learning, dynamic programming, statistical physics and Bayesian probability to uncover patterns and regularities in the Indus script. Each successive analyis has strengthened the case for an underlying language, though we have no way yet of finding out what that language might be. Our final aim is to abstract a grammar from the patterns and regularities we find in the script. The aspects of syntax uncovered by our study, we feel, will constrain decipherment claims which posit semantic values for the Indus signs.

Papers on quantitative studies of the Indus script from our group

  • N. Yadav, M. N. Vahia, I. Mahadevan, H. Joglekar, A statistical approach for pattern search in the Indus writing, International Journal of Dravidian Linguistics, 37, 39 (2008). PDF
  • N. Yadav, M. N. Vahia, I. Mahadevan, H. Joglekar, Segmentation of Indus texts, International Journal of Dravidian Linguistics, 37, 53 (2008). PDF
  • N. Yadav, H. Joglekar, R. P. N. Rao, M. N. Vahia, I. Mahadevan, R. Adhikari, Statistical analysis of the Indus script using n-grams, submitted for publication, available at
  • R. P. N. Rao, N. Yadav, M. N. Vahia, H. Joglekar, R. Adhikari, I. Mahadevan, Entropic evidence for linguistic structure in the Indus script, Science, 324, 1165 (2009). PDF
  • R. P. N. Rao, N. Yadav, M. N. Vahia, H. Joglekar, R. Adhikari, I. Mahadevan, A Markov model for the Indus script, PNAS, Early Edition, Aug 5, (2009) PDF

Response to criticisms of Science paper

Our paper in Science shows that the conditional entropy of bigrams is able to distinguish sequences of linguistic tokens from non-linguistic ones which occur in the real world. Specifically, we have compared several languages (Sumerian, Old Tamil, English, Sanskrit) with several non-linguistic systems (DNA sequences, protein sequences, FORTRAN computer code) and found that the entropy of linguistic systems cluster around a narrow band in the allowed space, while non-linguistic systems cluster around the extrema, with either very high or very low values of the conditional entropy. The Indus script falls exactly in the middle of the linguistic band. In the light of our previous work, where we show that the Indus script shares several other features common to language (Zipf-Mandelbrot unigram distributions, clear presence of beginners and enders, significant pairs and triplets), this increases the probability of the linguistic hypothesis for the script, and decreases the probability of the non-linguistic hypothesis.

Our paper has been criticised by several American researchers. The criticism, which has included unsubstantiated and uncollegial off-the-cuff remarks like "garbage in, garbage out" and ad-hominem attacks questioning our ideological moorings (Dravidian nationalists, and then morphing to Indian nationalists after the critic learnt that only one author was a "Dravidian"!), is based mostly on misunderstanding (the honest ones) and on ignoring material in our Science paper as well as our previous work listed above. We provide a detailed clarification of the nature of our work in the link below.

Further contributions from our group on this issue will continue in journals refereed by the academic community.

Red Herrings

There has been much talk from a group of American researchers on how the Indus script is "non-linguistic". Two points, made ad-nauseum, by those researchers is the presence of a large number of singletons (signs which occur only once) in the Indus script, and that the texts are of very short length, averaging about 5 signs. The latter point forms the basis of an alleged "one sentence proof" that the Indus script is non-linguistic. Below we show examples of ancient writing systems which have a large number of singletons, and use texts of short lengths, and which by all accounts are known to be linguistic.


The graph below, provided by Bryan Wells, compares distributions of sign usage for Proto-Sumerian, Proto-Elamite, and the Indus script. The comparision is striking. Thus, the Indus script shares close similarities with at least two other linguistic writing systems of the ancient world in the usage of singletons. It is by no means an exception.


Short Texts

The two graphs, again provided by Bryan Well, compares the text-length distributions in the Uruk script (above) with the Indus script (below). The mean text length in Uruk is about 7 signs, while the mean text length in the Indus script is about 5 signs. Again, we find that the Indus script shares a very close similarity with an ancient linguistic writing system, in that both use, on average, short texts.

