Lexicostatistics
is a much abused and misunderstood technique. In my experience lexicostatistics
is quite good at:
Consequently
lexicostatistical analyses can have an important role in interpreting the
evidence of comparative phonology and grammar in modeling language family
history.
The data
and analysis presented here reflects work I carried out in January 2010. 54
languages, reflecting 13 putative branches, were selected and analysed. Some of
the data and cognate score were taken from Peiros, Ilia J. 2004. Genetičeskaja klassifikacija
avstroaziatskix jazykov. Moskva: Rossijskij gosudarstvennyj gumanitarnyj
universitet (doktorskaja dissertacija).
Cognate
scores were coded according to the table at this
link. The data was then processed using Jacque Guy's Glotto software, which
can be downloaded freely from sil.org. This
produced the matrix of percentages which can be viewed
here. The software automatically generates the following dendrogram/Sammbaum.
The
currious fact is that there is an underlying trend for all branches to show
higher than expected scores in respect of Katuic and Bahnaric, declining with
geographic distance from Katuic and Bahnaric (except for Munda that shows
significant Katuic-Bahnaric isoglosses despite great distance, compared to –
say –Nicobarese). This was actually first pointed out by Franklin Huffam in 1978.
My
preliminary interpretation, consistent with Huffman, is that Austroasiatic
languages dispersed quickly after a long period of being in proximity and
contact, centred along the
Click here to download the most revision to the
data/cognate assignments.
Comments
and corrections on my data and analyses are invited. Please email to me at paulsidwell@yahoo.com
Several
earlier trials were conducted. Through April-May I worked through data sets of
24, 26, 28 and 30 languages. The data and cognate scores for the latter is at this link. The previous trials are sub-sets of the
latter. The neighbour net created from this
data is here.