Lexicostatistics is a much abused and misunderstood technique. In my experience lexicostatistics is quite good at:
Consequently lexicostatistical analyses can have an important role in interpreting the evidence of comparative phonology and grammar in modeling language family history.
The data and analysis presented here reflects work I carried out in January 2010. 54 languages, reflecting 13 putative branches, were selected and analysed. Some of the data and cognate score were taken from Peiros, Ilia J. 2004. Genetičeskaja klassifikacija avstroaziatskix jazykov. Moskva: Rossijskij gosudarstvennyj gumanitarnyj universitet (doktorskaja dissertacija).
Cognate scores were coded according to the table at this link. The data was then processed using Jacque Guy's Glotto software, which can be downloaded freely from sil.org. This produced the matrix of percentages which can be viewed here. The software automatically generates the following dendrogram/Sammbaum.
The currious fact is that there is an underlying trend for all branches to show higher than expected scores in respect of Katuic and Bahnaric, declining with geographic distance from Katuic and Bahnaric (except for Munda that shows significant Katuic-Bahnaric isoglosses despite great distance, compared to – say –Nicobarese). This was actually first pointed out by Franklin Huffam in 1978.
preliminary interpretation, consistent with Huffman, is that Austroasiatic
languages dispersed quickly after a long period of being in proximity and
contact, centred along the
Comments and corrections on my data and analyses are invited. Please email to me at firstname.lastname@example.org
Several earlier trials were conducted. Through April-May I worked through data sets of 24, 26, 28 and 30 languages. The data and cognate scores for the latter is at this link. The previous trials are sub-sets of the latter. The neighbour net created from this data is here.