The History of English in One Hundred Words

In the previous post, I outlined the hypothesis that dialects of Common Indo-European were spread from the Steppes west into Europe after around 3000 BC, giving rise to the ‘European dialects’ of Indo-European – what would become Germanic, along with Italic, Celtic, and Balto-Slavic. I covered some of the main causes of the creation of these European dialects – this time, I want to talk about some of the (possible) linguistic effects of these dialects entering new cultural worlds and coming into contact with new, unrelated languages.

Todo, I have a feeling we're not in the Pontic-Caspian Steppe region any more.

In particular, there’s an intriguing potential distinction in the cultural backgrounds we can infer from the vocabularies of Common Indo-European versus the European dialects. I’ve already talked about how Common Indo-European seems to have had a very ‘pastoral’ vocabulary, fitting well with the idea that they got a lot of their food from herding livestock, which they used for meat, dairy products, and things like wool and leather. (Remember that the picture for Proto-Indo-European proper, even earlier on, is much fuzzier, even by the standards of this kind of argument – which I’ll once again emphasize is inherently pretty speculative.) The ‘European dialects’, on the other hand, share a number of distinctive words related to agrarianism, the planting and growing of plants of various sorts.

One core activity for agrarians is sowing: planting seeds by, at its simplest, tossing them in a field. The word sow goes back to a Proto-Germanic verbal root *sē-, which has cognates in both Latin (serō) and Balto-Slavic (Old Slavic sěti, and Lithuanian sė́ti). That is, we have this verb in three of the four ‘core’ European dialects, in all cases meaning very much ‘to sow’ in an agricultural sense. We can tentatively reconstruct a European root *seh₁- (with *eh₁ becoming * fairly quickly in most of these languages; see this post on the ‘laryngeal’ consonants like *h₁).

Sowing seeds.

The remaining ‘European’ branch, Celtic, doesn’t show this verb as such, but in Old Irish (recorded from the earlier Middle Ages) we find a noun síl meaning ‘seed’. This noun is clearly derived from our *seh₁- root for ‘sowing’ (and shows a distinctively Celtic sound change of * to *). Other European dialects show similar derivatives. English seed, for instance, is from the same root, just with a different suffix, and the same goes for Latin sēmen (yes, as in semen, though in Latin this just meant ‘seed’ in general). This makes sense – a seed is what you sow – and emphasizes that the European dialects all had the same verb for ‘to sow’, in a specifically agricultural sense (and not, say, as a word meaning to ‘throw’ more generally). The fact that the various nouns show different suffixes is a reminder that these really were probably already distinct dialects: they were moving in the same direction, with lots in common, but also going their own ways in various details.

Elsewhere in Indo-European, there are a couple of possible cognates of sow, but the connections are much less close, and – crucially – the meanings are not at all agrarian. There’s a Hittite verb that can mean (among other things) ‘press’, and we have some Sanskrit nouns that could come from a verb for ‘shoot’. It’s actually not clear that any of these are really related to the European sow at all, which may have been an entirely new verb, either coined among the European dialects, or borrowed from some other, non-Indo-European language (just maybe one of the languages of ‘Old Europe’?). If sow is in fact related to either the Hittite or Sanskrit words, then the European dialects show a common semantic innovation: the agriculturalization of the word specifically to refer to sowing seeds. Either way, it suggests a growing preoccupation with plant-based agriculture among speakers of the European dialects.

One word like sow on its own doesn’t really tell the whole story, of course. There are lots of examples of this sort: words that are well attested with agrarian meanings in European languages, but either have different (and usually apparently older) senses elsewhere in Indo-European, or else just don’t have good cognates elsewhere in the family and so are under suspicion of being borrowings from some non-Indo-European languages. Some examples of ‘new’ agricultural words include corn (and its Romance-derived cognate grain, both from *ǵr̥h₂-nóm), which originally meant ‘ripened (thing)’; the root of bar-ley, which seems to have referred to some sort of grain-crop, and maybe also to flour (words like French far-ine ‘flour’ are related); and the root of the word ar-able, which was earlier on a verb meaning ‘plow’ (still found in English as an archaic verb to ear as late as Shakespeare and the King James Bible, though since lost in English – the similarity to the body part ear is entirely coincidence). [Edit: but this is also found as the root for ‘plow’, āre, in Tocharian A, which is recorded a rather long ways from Europe, out in the Tarim Basin.]

‘Caesar, I bring thee word,
Menecrates and Menas, famous pirates,
Make the sea serve them, which they ear and wound
With keels of every kind’
(Antony & Cleopatra, I.iv)

Ear means ‘plow’, here used metaphorically of ships cutting through waves.

In at least some cases, it looks like we have originally non-Indo-European words that new Indo-European speakers found difficult, and which were adopted in different ways by different dialects. One such example is bean. This comes from Proto-Germanic *bau-nō, and the *-nō bit is a fairly typical kind of suffix. The rest, *bau-, would come from something like *bʰau- if it was in Germanic before Grimm’s Law occurred. This is reminiscent of words for ‘bean’ in a few other languages, especially Latin faba and Old Slavic bobŭ, both of which could come from an older form like *bʰabʰ-ā. A fairly reasonable idea is that a word like *vav- (or something vaguely along those lines) existed in some non-Indo-European language or languages whose speakers had a tradition of growing beans. Either people speaking dialects of Indo-European borrowed the new word along with the new (to them) crop, or else people who were used to growing the crop shifted to speaking Indo-European dialects, and retained their old word since there wasn’t a good one in the new (to them) language. Either way, the result was the same: later speakers of the European dialects ended up with this non-Indo-European word for ‘bean’. This is hardly the only example of this sort in the European dialects.

Annibale Carracce's The Beaneater (Mangiafagioli)

The general picture that emerges here is probably something like this: Indo-European dialects are carried into Europe, probably through a combination of new people coming in, and the existing populace adopting new languages (possibly with periods of bilingualism involved). Europe had a stronger agrarian tradition than the Steppes, and this did not change during all of this. There may have been an increased emphasis on stock breeding and dairy products due to Steppe influence, but growing plants remained a constant and major part of European lifeways. This is reflected in the ‘agrarianization’ (not the world’s loveliest word...) of those dialects of Indo-European that ended up in Europe: they repurposed older elements and borrowed new vocabulary for newly crucial concepts like sowing, plowing, and various types of specific crops.

It is hard to say much about the older European languages spoken before the Indo-European dialects took over. How many were there? What were they like linguistically? Were they related to any other known languages? We don’t really know. They’re often (possibly misleadingly) referred to as ‘substrate languages’ – meaning, in part, that they chronologically precede and ‘lie under’ the later Indo-European dialects, and the only hints we get of them are the occasional word or linguistic feature in these dialects that we think might be a holdover from some variety or other of earlier European language.

An inscription in Etruscan, a non-Indo-European language of northern Italy, and (via Latin) the source of a couple of English words, including person.

For some areas of Europe – especially around the Mediterranean basin – we have more direct clues, in some cases even direct written records, of such non-Indo-European languages, and this kind of evidence is summed up very nicely by Don Ringe. But for the less southerly areas and the older period of ‘Old Europe’, we’ve really got very little evidence indeed, except for words of non-Indo-European origin in branches like Germanic. There possibly quite a few such words. One traditional estimate is that around a third of the Proto-Germanic lexicon had no good Indo-European source (though other views have varied from a tenth to over half – a lot of words have rather tentative etymologies, and so might be Indo-European without certainly being so). But it’s indisputable that Germanic and its neighbouring dialects have at least some non-Indo-European words absorbed into their lexicon.

Even today, not all the older languages of Europe have been displaced by Indo-European.
Txindoki mountain, in Basque Country.

Many of these extend into domains far beyond agriculture. Unsurprisingly, many words for European wild plants and animals, like ash, hazel, thrush, and hawk, and even some fairly general words like fish and bee are probably from non-Indo-European sources. The same goes for many ‘environmental’ words like rain, freeze, and pool. Other words are perhaps less expected, and point to how complex the prehistoric linguistic situation must have been. In particular, a good number of basic verbs in Germanic have no good Indo-European origin, including not just culturally loaded things like sow, but also pretty fundamental verbs like speak, should, and drink.

The majority of the Germanic vocabulary is clearly Indo-European in origin, including a lot of the ‘core’ vocabulary (verbs like be are solidly Indo-European), but the non-Indo-European input seems to have really be pretty considerable, and forms an important part of the history of English – though one that we can unfortunately now understand only in very vague and general terms. We can’t even answer very basic questions like whether the bulk of these words came from one single source language or several.

There are also lurking sociolinguistic questions. The term ‘substrate’ is usually used to imply not just a chronological fact (the way I used the term above is a bit idiosyncratic), but specifically a lower-status language whose speakers shift to a more prestigious language. This conjures up images like Gimbutas’s stark picture of dominating Indo-European speakers and subjugated local farmers, who slowly switched to the language of the invaders while retaining a number of older words. This is not implausible, for some periods of prehistory. But it also may not be the whole story. Words like slay (originally meaning ‘strike’), town (originally ‘fortified or enclosed area’), mare (originally a general word for ‘horse’), and an Old English word hæleþ ‘hero, warrior’ all have at-best dubious Indo-European etymologies. Possible borrowings of this type (there are a number of other possible examples) with meanings related to warfare, social organization, or elite culture might indicate that the dynamics and prestige and power between Indo-European and non-Indo-European speakers were more complicated and variable than the term ‘substrate’ might suggest. If true, this would hardly be surprising, since it may have taken centuries, if not millennia, for most of the non-Indo-European languages of northern Europe to disappear, and some of the societies speaking such languages may well have been regionally powerful and influential. Still, this area of linguistics is extremely speculative, and these kinds of observations mostly serve to remind us how little we know, and how varied the possibilities were during these long ages.

Castro do Zambujal, a major 'Bell Beaker Culture' site from (what is now) Portugal, which flourished c. 3000-1700 BC.
Many have associated the Bell Beaker phenomenon with Indo-European speakers, but significant aspects of it seem to have moved from west to east. Possibly it is better seen as the result of contact, including both Indo-European and non-Indo-European speakers. At any rate, there is no good reason to think Indo-European languages were always ‘superstrates’ and non-Indo-European ones always ‘substrates’ during this period.



Further Reading

On seed, I’ve simplified the etymological discussion slightly, citing just one derivative from each branch. But actually Germanic shows not just seed, but also a German word Samen, which is an exact match for Latin sēmen, as well as Balto-Slavic words such as Old Prussian semen or Old Slavic sěmę. While seed and Old Irish síl point to the divergences within the European dialects, *seh₁-men- reinforces their commonalities.

There are a large number of studies of the specially ‘European’ or ‘Northwest’ Indo-European words, from a variety of perspectives. An older but rather acccessible overview of the idea of a special regional vocabulary is Antoine Meillet’s chapter ‘Le vocabulaire du Nord-ouest’ from his book Les dialectes indo-européens (also available in English translation). Norbert Oettinger’s 2003 article on ‘Neuerungen in Lexikon und Wortbildung des Nordwest-Indogermanischen’ has a very useful and more up-to-date list.

The question of ‘substrate’ languages specifically has received a lot of attention. One useful volume to mention is Markey & Greppin's When Worlds Collide, as well as the collection that Oettinger's article is from. A good deal of the earlier history of ‘substrate’ theorizing is discussed in an interesting article by Bernard Mees.

I want to really emphasize how uncertain most aspects of this are. A few cases, like the etymology of corn, are very clearly Indo-European (with only their meaning being a ‘European’ innovation), while others (such as plow) are almost certainly borrowings. But a lot of cases are less certain. Usually, some scholar over the years has at least proposed an Indo-European etymology, but these are sometimes tentative indeed: an Indo-European root is taken, some special derivational suffixes and ablaut are added, and a good deal of semantic change is assumed. How to evaluate these words partly depends on what angle we approach things from.

There are basically two questions we can ask: does a given word meet the criteria to seem like a certainly Indo-European word? and does it meet the criteria to seem like a certainly non-Indo-European word?

For the first question, we first look for solid cognates elsewhere in Indo-European. How much is enough remains an open question. Some people work with a ‘three-branch’ rule of thumb: if a word is found in three subgroups within Indo-European, it is likely to be from Proto-Indo-European. But by this logic, fish would be Indo-European, since it’s found in Germanic, Italic (Latin piscis), and Celtic (Old Irish íasc, with regular Celtic loss of *p). These are all western dialects, in long contact with each other, and surely in contact with a similar set of non-Indo-European languages. Any words they share are probably pretty old, but not necessarily Indo-European old. What we really want are cognates in a branch like Anatolian or Indo-Iranic.

If we don’t have such cognates, then we have to judge a proposed etymology by the naturalness of any meaning changes involved, and the regularity of any derivations. Many difficult words do not pass the threshold from possible Indo-European etymology to a plausible one (or more accurately, a word may be considered etymologically ‘difficult’ precisely because it doesn’t pass this threshold).

On the other side of things, there are certain features we look for that can be positive evidence of borrowing. Proto-Indo-European *b, for instance, is a very rare sound, so Germanic words with *p (the outcome of *b after Grimm’s Law) are always a bit suspect, especially as the first consonant in a word. This is one of the reasons why plow can be labelled as a likely borrowing. Some meanings are also particularly ripe for borrowing: classic types include technological words and terms for animals or plants (especially for things like horses, where a new word might enter as a special breed or variety name, and become generalized to refer to the organism in general). Sometimes features of grammatical inflection, or the presence of certain suffixes, have been proposed as signs of borrowed vocabulary (see this paper by Guus Kroonen for some Germanic examples).

Many of the ‘difficult’ words in the European dialects, or Germanic specifically, meet neither threshold. Their Indo-European etymologies are nowhere near good enough to be considered actually very likely, but they don’t really show any positive traces of being non-Indo-European either. This is a big grey area. Probably some of the words do in fact go back to Indo-European, but it seems to me unlikely in the extreme that all such words do – every tentative etymology is a roll of the dice, and we can’t always come up sixes. In general, the words I’ve given as examples in this post are ones I think probably don’t come from Common Indo-European, but they should always be taken with a grain of salt, and the difficulty of etymologizing such ‘marginal’ words should always be remembered.

An important recent synthesis dealing with all the issues discussed in this post is Rune Iversen and Guus Kroonen's ‘Talking Neolithic’. They see pre-Germanic as having been in prolonged contact with an ‘adstrate’ language in northern Europe, associating pre-Germanic with the so-called Single Grave culture (this was strongly influenced by Corded Ware, which is widely thought to be at least partly Indo-European speaking), and an important non-Indo-European language with the late or post Funnel Beaker culture. It's a very interesting take on things, and I think broadly plausible, though of course it is difficult to associate specific borrowings with specific archaeological areas.

A recent example of a more ‘Indo-Europeanizing’ approach to difficult vocabulary is this paper by Roland Schuhmann. In addition to the specific etymologies discussed (which I think illustrate the issues with difficult words well, with his proposals ranging from pretty convincing, as for care, to highly speculative, as with German Kröte ‘toad’), Schuhmann also raises the astute point that borrowed words more usually come from ‘superstrate’ languages than ‘substrate’ ones (that is, languages which have more sociocultural prestige or influence at a particular time and place), arguing that:

With these figures in mind, it soon becomes clear that if one third of the Germanic lexicon was influenced by foreign sources, this could only be due to superstrate influence, never to substrate or even adstrate.So when the advocates stick to this percentage, they should rather speak of the result of a superstrate, not of a substrate influence. (p. 379)

In any case, it’s clear that the exact numbers of non-Indo-European words in the ‘European dialects’ and in Germanic specifically will probably never be fully agreed on. The estimate of one third of the Germanic vocabulary having no Indo-European etymology is an old one, going back, it seems, to an article by Sigmund Feist in 1910 (pp. 350-351), adjusted from an earlier quantitative study by Bruno Liebich in 1899 (p. 521). More recently, Robert Mailhammer has argued that some 46.5% of the Germanic strong verbs specifically (just one area of the lexicon, but usually nouns are slightly easier to borrow than verbs) don’t have a good Indo-European etymology, and only about a fifth have really solid ones, according to his four-part categorization of how ‘good’ each etymological proposal is (this from p. 174 of his 2007 book).

Addition: the Tocharian A word for ‘plow’is discussed on p. 262 of this article by Peyrot, which is in general a good overview of Tocharian agricultural vocabulary:

*[The further reading section has been edited to include reference to Iversen & Kroonen 2017.]

Previously: Guest