Archive for October, 2007

Philological Considerations on the Whence of the Maori

Posted on timeOctober 13th, 2007 by userSimon Greenhill    flagNo Comments


Conal directs me towards a wonderful paper from J.T. Thomson in 1873, Philological Considerations on the Whence of the Maori, which is particularly relevant to the recent rates of word evolution work, and manages to beat lexicostatistics and Morris Swadesh by a good 75 years:

Primary words, i.e., those that express first wants in men in their infancy—and, equally so, tribes or nations in their infancy—are the most tenacious of existence. These are common nouns, pronouns, and verbs, but more particularly the first—such as man, woman, son, daughter, food, fruit, fish, etc.; or, I, you, he, we, etc.; or, go, come, give, kill, etc. In elucidating a subject such as this, therefore, we apply our enquiries to primary terms, which we may denominate as the fossils of the languages, so that we may, from their coincidence or approximations in different and distant communities, weigh the affinities of race or blood in the communities themselves.

But while primary words are the most lasting, yet they even are subject to slow and gradual change as ages roll on.

..and even more lexicostatistics-y:

Reverting, then, to the glossarial branch of the subject, in order to fairly weigh the respective affinities of the different races under review, as read by language, I must recall your attention to the fact stated in my former paper as to the relative number of primary words retained by an European language after eight hundred years of disconnection; these amount to only about one twenty-sixth of the whole. Mr. John Crawford, by his investigations, has declared that one fifty-seventh of the Malagasi and one-fiftieth of the Maori dictionaries were Malay, thus proving a connection whose intimacy on European experience can be approximately calculated. But I may venture to remark, from my own enquiries on the same subject, that had the above ethnographer or myself had the advantage of a critical knowledge of both or all languages, instead of only one (the Malay), double the equivalents might be found, and the approaches thus drawn nearer by half. Thus, Crawford states that out of 8,000 Malagasi words he detected only 140 Malayan; while I, out of Griffiths’ grammar, containing certainly not more than 500 words, detected eighty, in words that have had preservation throughout the whole region.

tag



What the F***, Steven Pinker?

Posted on timeOctober 11th, 2007 by userSimon Greenhill    flagNo Comments


Steven Pinker on Why We Curse: What the F***? -

 The Clean Airwaves Act assumed that fucking is a participial adjective. But this is not correct. With a true adjective like lazy, you can alternate between Drown the lazy cat and Drown the cat which is lazy. But Drown the fucking cat is certainly not interchangeable with Drown the cat which is fucking.

tag



Rates of word evolution: The less a word is used, the faster it evolves

Posted on timeOctober 10th, 2007 by userSimon Greenhill    flag(2) Comments


Today’s Nature has a couple of lovely little papers on language evolution. The first of these is Frequency of word-use predicts rates of lexical evolution throughout Indo-European history by some colleagues and friends of mine, Mark Pagel, Quentin Atkinson, and Andy Meade.

Here, Mark et al begin to explore a rather important issue in historical linguistics: how fast do words evolve? To put it simply, one common way of inferring historical relationships between languages is by looking for systematic sound correspondences in words. For example, the English word for “water” is related to the German word “wasser” by a simple sound change from “t” to “s”. We can track these changes to find sets of homologous words or “cognates”, which are words that have come from some common ancestral language.

For example, I’ve mapped the words meaning “hand” in a number of Austronesian languages (from my research project) in the figure below. The colors of the words mark the different homologous sets, so the white set, denoting some form like lima is very widespread and represents a form that’s been passed on down from the ancestor of all Austronesian languages, Proto-Austronesian which had an inferred form like *(qa)lima. You can also see a few other sets in that figure, including a few forms in Micronesia (colored red and orange), and a small blue set in Maluku.

So, given that we can use these bits of information to track historical relationships and do cool things with them, one of the big questions is how stable these things are. As time goes by, these words will be evolving and changing, and dying out, making it harder and harder to find the systematic sound correspondences needed. Most linguists would argue that in their experience, these sorts of historical relationships in the lexicon become washed out by around 10,000 years. Indeed, in the picture above, none of these languages is older than around 6,000 years, and some of them, such as the Micronesian group can’t be older than around 2,000 years, and the word for “hand” is exceptionally stable over time.


hand in Austronesian

To try and quantify this, Mark, Quentin and Andy have estimated the rates of evolution in a number of words the Indo-European languages (i.e. most of the languages in Europe, including things like English, French and German). To do this, they used a sample of basic vocabulary to estimate a phylogenetic tree of these languages representing their historical relationships. They then estimated the replacement rate of each cognate set along the tree. That is, how long does it take for one homologous/cognate set to be replaced by another non-related form?

Their results show a 100-fold difference in the rates of word evolution within their basic vocabulary sample. Some of the words like “two”, “who”, “tongue”, “night” etc show a very slow rate of evolution with around one cognate set replacement per 10,000 years. Other words like “dirty”, “to stab”, and “guts” changed much faster with around 9 cognate set replacements per 10,000 years.

Big question: why is there this difference? One of the theories is that how fast a word evolves is linked to how often it’s used, so words that are used a lot don’t change as much as words that are used rarely. To test this hypothesis, Mark & co worked out how often their words were used in four large spoken and written language databases (aka “corpora”) for English, Spanish, Russian and Greek.

In each of these four languages, there was a strong significant negative correlation (r=-0.32 — r=-0.41, p < 0.0001) between the frequency of words in the corpora and their rates of evolution. That is, words that are used more today, had slower rates of evolution, confirming the above hypothesis.

One potential flaw here was that different types of speech are used with different frequencies, so it could be possible that they were just showing this effect. To check this, they recategorised their entries into classes like nouns, verbs, pronouns, conjunctions, prepositions or special adverbs (”what”, “when”, “where”, “how”, “there”, and “not”). Using a regression model, they controlled for this effect and showed that the correlation between word-use and rate of evolution still held for each class of word.

This is beautifully elegant stuff, and it has a lot of potential. One of the huge debates in linguistics revolves around deep history - as mentioned above, most linguist argue that all historical signal is lost by around 10,000 years. However, as this paper shows, there are some very stable words that might be able to be used to push things a little bit deeper…

Update: Watch Nature interview Mark and Quentin here.

tag



Language and scientific publication

Posted on timeOctober 10th, 2007 by userSimon Greenhill    flagNo Comments


Q: Dear Nature, if the quality of writing helps someone get published, is this fair for non-native English speakers? Read the interesting debate on “Ask the Nature Editor”

Editor Henry Gee continues the discussion on his blog:

…scientists in some disciplines keep their papers inaccessible on purpose, as they are less reports of great discoveries than placeholders in the never-ending battle between competing research groups. The relationships that such authors have with journals is congruent with that between dogs and lamp-posts. The urinous signal is meaningful to other dogs, if not to cats or horses. But it’s the same old lamp-post that gets peed on.

Huh, the urinated-upon-lamp-post theory of scientific publishing?

tag



Wednesday wiki: Photic Sneeze Reflex

Posted on timeOctober 9th, 2007 by userSimon Greenhill    flagNo Comments


Today’s fine wikipedia article for your educational pleasure: the Photic Sneeze Reflex!

…is a medical condition by which people sneeze with sudden exposure to bright light, and possibly also to sneeze many times consecutively. It is also referred to as photic sneeze response, sun sneezing, photogenic sneezing, the photosternutatory reflex, or even whimsically as ACHOO syndrome with its related backronym Autosomal dominant Compelling Helio-Ophthalmic Outburst syndrome. The condition occurs in 17% to 35% of humans. The condition is passed along genetically as an autosomal dominant trait.

tag



Coevolution of languages and genes on the island of Sumba, eastern Indonesia

Posted on timeOctober 9th, 2007 by userSimon Greenhill    flagNo Comments


There’s a rather elegant little paper on the Coevolution of languages and genes on the island of Sumba, eastern Indonesia coming out in PNAS soon:

Numerous studies indicate strong associations between languages and genes among human populations at the global scale, but all broader scale genetic and linguistic patterns must arise from processes originating at the community level. We examine linguistic and genetic variation in a contact zone on the eastern Indonesian island of Sumba, where Neolithic Austronesian farming communities settled and began interacting with aboriginal foraging societies ~3,500 years ago.

Phylogenetic reconstruction based on a 200-word Swadesh list sampled from 29 localities supports the hypothesis that Sumbanese languages derive from a single ancestral Austronesian language. However, the proportion of cognates (words with a common origin) traceable to Proto-Austronesian (PAn) varies among language subgroups distributed across the island. Interestingly, a positive correlation was found between the percentage of Y chromosome lineages that derive from Austronesian (as opposed to aboriginal) ancestors and the retention of PAn cognates.

We also find a striking correlation between the percentage of PAn cognates and geographic distance from the site where many Sumbanese believe their ancestors arrived on the island. These language–gene–geography correlations, unprecedented at such a fine scale, imply that historical patterns of social interaction between expanding farmers and resident hunter-gatherers largely explain community-level language evolution on Sumba. We propose a model to explain linguistic and demographic coevolution at fine spatial and temporal scales.

tag



New MBE: Click languages, human-specific genes, mtDNA, selection & non-coding DNA

Posted on timeOctober 8th, 2007 by userSimon Greenhill    flag(1) Comment


The latest Molecular Biology and Evolution looks like a cracker & I’m going to cherry pick some of the more interesting papers from Henry’s point-of-view.

The first one to jump out at me (and Michael who prodded me to post this) is History of Click-Speaking Populations of Africa Inferred from mtDNA and Y Chromosome Genetic Variation (open access):

Little is known about the history of click-speaking populations in Africa. Prior genetic studies revealed that the click-speaking Hadza of eastern Africa are as distantly related to click speakers of southern Africa as are most other African populations.The Sandawe, who currently live within 150 km of the Hadza, are the only other population in eastern Africa whose language has been classified as part of the Khoisan language family. Linguists disagree on whether there is any detectable relationship between the Hadza and Sandawe click languages. We characterized both mtDNA and Y chromosome variation of the Sandawe, Hadza, and neighboring Tanzanian populations.

New genetic data show that the Sandawe and southern African click speakers share rare mtDNA and Y chromosome haplogroups; however, common ancestry of the 2 populations dates back >35,000 years. These data also indicate that common ancestry of the Hadza and Sandawe populations dates back >15,000 years. These findings suggest that at the time of the spread of agriculture and pastoralism, the click-speaking populations were already isolated from one another and are consistent with relatively deep linguistic divergence among the respective click languages.

Some of Arndt von Haeseler’s team have explored the human genome in Mapping Human Genetic History, with a rather striking finding that about one third of human genes started to evolve prior to the human/chimp/gorilla divergence. I’ll let their abstract explain:

The human genome is a mosaic with respect to its evolutionary history. Based on a phylogenetic analysis of 23,210 DNA sequence alignments from human, chimpanzee, gorilla, orangutan, and rhesus, we present a map of human genetic ancestry. For about 23% of our genome, we share no immediate genetic ancestry with our closest living relative, the chimpanzee. This encompasses genes and exons to the same extent as intergenic regions.We conclude that about 1/3 of our genes started to evolve as human-specific lineages before the differentiation of human, chimps, and gorillas took place. This explains recurrent findings of very old human-specific morphological traits in the fossils record, which predate the recent emergence of the human species about 5-6 MYA. Furthermore, the sorting of such ancestral phenotypic polymorphisms in subsequent speciation events provides a parsimonious explanation why evolutionary derived characteristics are shared among species that are not each other’s closest relatives.

Next, A Long-Term Evolutionary Pressure on the Amount of Noncoding DNA, which explores in silico (=simulates) how DNA accumulates non-coding regions (sometimes called “junk” DNA). The abstract says that their results show:

…(Under) low mutation rates, the indirect selection of variability promotes the accumulation of noncoding sequences: Even in the absence of self-replicating elements and mutational bias, noncoding sequences constituted an important fraction of the evolved genome because the indirectly selected genomes were those that were variable enough to discover beneficial mutations.On the other hand, high mutation rates lead to compact genomes, much like the viral ones, although no selective cost of genome size was applied: The indirectly selected genomes were those that were small enough for the genetic information to be reliably transmitted. Thus, the spontaneous evolution of the amount of noncoding DNA strongly depends on the mutation rate.

Our results suggest the existence of an additional pressure on the amount of noncoding DNA, namely the indirect selection of an appropriate trade-off between the fidelity of the transmission of the genetic information and the exploration of the mutational neighborhood. Interestingly, this trade-off resulted robustly in the accumulation of noncoding DNA so that the best individual leaves one offspring without mutation (or only neutral ones) per generation.

Finally, Relative Rates of Evolution in the Coding and Control Regions of African mtDNAs who explore the substitution rates of synonymous and non-synonymous sites in human mtDNA. They show that there appears to be strong bias towards synonymous mutations in the coding region, suggestive of purifying selection. Their analysis also shows that around 3% of the total sites are changing faster than the expected neutral rate, sometimes by more than an order of magnitude. This may be indicative of positive selection at those loci.

Technorati Tags: , , ,

tag



RSS feeds:

Search: