Category Archives: Finno-Ugric Languages

The Aryan Migration Theory: Last Word

It has been known for 150 years now that the Indo-Aryan languages came from outside of India. The evidence is overwhelming, primarily linguistic, but there is also some archeological evidence. In scholarly circles, there is no debate on the Aryan Migration Theory (AMT) and there has been little debate for 150 years. It is only among Indian nationalists and a few hacks and kooks that it is not accepted.

1. There is a substrate of a language that looks like a Munda language in the Rig Vedas. A Munda language was probably spoken in the Punjab when the Aryans migrated there. About 4% of the words in Rig Vedas are these early Munda loans. None of these Munda loans are found outside of Indic.

They would be found all through IE if the Out of India Theory (OIT) was true. The OIT holds that Aryans inhabited North India for 8,000 years, all the while the Dravidians were in South India and Munda tongues were in East India. Obviously, the Aryans came into Punjab and there mass language shift from a Munda language to Indo-Aryan (IA). The language shift is evident in the sparse Munda loans into Vedic Sanskrit.

There are also a few place names left in North India from the original Munda language of the Greater Punjab area. There are some river names left in Eastern Punjab and Haryana where the local Indus Valley Civilization (IVC) continued for some time after the arrival of the Aryans. These names would not be there if the OIT was true.

There are a large number of IA words for local plants and animals and for agriculture that have been borrowed from the Munda language of Punjab. There would be no reason for the IA people to borrow these terms if the IA people were native to Punjab. Instead, this borrowing is precisely what we would expect to see when pastoralists from Central Asia move into the tropics, encounter new plants and animals and start farming – they borrow the terms for these new living things and technologies from the locals.

This is particularly so in the case of farming, which was left to the local people – the Sudra caste. The IA people only brought a few farming related words with them from Central Asia – the remainder were borrowed from the new locals.

40% of Hindi agricultural words still derive from an unknown pre-Munda language of the Indo-Ganges Plains. Nahali, a small language in Madhya Pradesh, at successively lower levels of its vocabulary, displays high levels of borrowing from earlier tongues. 36% of vocabulary is of Kurku (Munda) origin and 9% is Dravidian. At the oldest level, 24% have no cognates in any known language and appear to have derived from the oldest language known from India.

2. There is an old set of shared loans between proto-Indo-Aryan and proto-Iranian for a number of agricultural and other cultural items that appeared in the Bactria-Margiana (BMAC) 3700-4200 YBP. The BMAC is located more or less in present day Turkmenistan. Obviously, these shared loans were picked up by the proto-Indo-Iranian people as they moved down from the steppes of Kazakhstan and Russia into the BMAC, conquering the people who lived there.

There are references in the Rig Vedas to the conquest of the BMAC peoples by the Arya. For this sequence of events to have occurred, the Indo-Aryans would have had to have moved through the BMAC during this time period and later moved into Iran and India, not the other way around. The language of the loans is not known, but it is apparently the language of the BMAC people.

So there is a BMAC or Central Asian substrate in Indo-Iranian. A possible guess for the language of the BMAC people might be a relative of the Burushaski language of northern Pakistan.

The substrate of the Rig Vedas is a Munda language. The substrate of the earlier Proto-Indo-Iranian language is the language of the BMAC. This sequence is not possible under the OIT and is only possible under the AMT.

3. There are early Indo-Aryan loans into the Caucasian language of the Mittani, who lived in northern Iraq and Syria. These loans are dated to ~3400 YBP. These loans are from an earlier form of Indo-Aryan than is used in the Vedas. Therefore, the Vedas must have been composed 3000-3500 YBP and could not have been composed any earlier.

Also, the Mittani could not possibly have come out of India as the OIT demands, since the IA loans do not show any Indic influences. Nor could the loans have come from Iran, as there are numerous IA Gods in the Mittani texts who are marginalized or do not exist in Iran. The loans must have come from somewhere else, apparently the north.

4. We have numerous references in the Vedas to battles between the Arya with their stone forts, metals, horses and chariots against the more sedentary peoples living in South Asia at the time.

5. There are pottery shards in the BMAC that resemble that shards found in the steppe culture to the north. This indicates that there is cultural resemblance between the two cultures. The suggestion is that the shards are Indo-Aryan and appear first on the steppes and then again in the BMAC with its conquest by the Aryans.

6. The chariot appears in the Urals 4000 YBP and then spreads rapidly in many directions with the spread of IE languages, to Europe, to China via the Tocharians and of course to India and Iran via the Aryans. The horse also appears in South Asia (Pakistan) 3700 YBP in conjunction with the chariots. The modern horse is not native to South Asia, so obviously it came from outside, obviously from the Aryans. The indigenous horse of South Asia, the Siwalik horse, was long since extinct.

7. There are specific Punjabi and Uttar Pradesh loans in Vedic Indic that are not found in Iranian. Therefore, Iranian could not possibly have come from Indic as the OIT demands. The languages must have split in the BMAC, one line going to Iran and another line going to India.

8. The Soma ritual originates in the high mountains of Central Asia  – the mountains of Iran, the Himalayas and the Pamirs – with the proto-Indo-Iranian peoples. The original name for the plant is a Central Asian term amsu . This term is borrowed into Indo-Iranian and eventually becomes soma, etc. Later, it moves down into Iran and India and appears in the Vedas. Therefore, the Aryans brought the Soma ritual with them from Central Asia to Iran and India.

9. There is tremendous evidence for a common Indo-Iranian language, mythology and ritual. This shared heritage is not possible with the OIT. It is only possible if there was an Indo-Iranian people, who then split into the Iranian and Indic branches.

10. The Vedic branch of IA becomes innovated and Indianized (in particular, the retroflex consonants) after its arrival in Punjab, while the Iranian branch escapes this development because it did not enter the subcontinent then. In addition, Iranian lacks any specific Indic terms. According to the OIT, the Iranian branch must be Indianized too, or else all of the Indic terms were somehow lost in Iranian.

Since it is not, both branches came from outside India, to the northwest. Iranian languages cannot possibly have come from the Punjab. An early date for Iranian to leave India is preposterous, and Old Iranian (Avestan) is too archaic to have left India after the Vedas. All this means that Iranian and Indic must have split before the Vedas and thus, not inside India. The OIT for Iranian lies in ruins.

11. Zero specifically Indic words are found in IE languages outside of India. For the OIT to be correct, many Indic terms should be found in all the other branches of IE. After all, the Gypsies left India 1000 years ago and took a large specifically Indic set of terms with them to Europe and beyond.

12. Retroflexion. According to the OIT, all branches of IE would have had to have lost their retroflexion after they left India. How likely is that? What we do find, though, is that those branches of Iranian which move east to abut the Indic languages do acquire retroflexion. Since retroflexion is in general not present outside Indic or languages abutting Indic, it must be a late development in IE specific to Indic and cannot have been part of the original IE language as required by the OIT.

Retroflexion only effects those moving into the Indic plain and the eastern Iranian lands, but everyone moving out of South Asia somehow loses it. This does not make sense.

13. Chariots. For the OIT to make sense, chariots must be exported from India 7,000 YBP. However, chariots only appear 4000 YBP in the Urals and NW Kazakhstan and from there spread from Ukraine to Mongolia. The western IE languages retain an IE root rotho for wheel because they had already moved away before the chariot had actually been developed in the Urals. Everyone to the east uses the IA form ratha. This could only be the case if the IA languages moved south from Urals.

Further, according to the OIT, chariots that appear in the Rig Vedas must show up in the text before they have even been invented. Linguistics shows that the word must have been innovated in proto IA at the Urals, for it is present in both branches of IA. This word, along with its invention, can be proved to have been innovated in the steppes and and then carried into India and vice versa could not possibly be the case.

14. Lack of tropical core vocabulary in IE. The core vocabulary of IE shows that the IE homeland was a temperate or even cold place. The plants and animals in the IE language include such cold weather animals, plants and weather words as the otter, beaver, wolf, bear, lynx, salmon, elk, red deer, hare, hedgehog, mouse, birch, willow, elm, fir, ash, oak, beech, juniper, poplar, apple, maple, alder, hazel, nut, linden, hornbeam, and cherry in addition to snow.

A few of these are found in South Asia, but most are not. There are no specific Indic plant and animal names found outside of India, even where these plants do occur outside of India. The OIT would assume retention of at least some of these terms, would it not? Instead, what we find is that a few core IE terms are modified inside India to apply to new plants and animals.

For instance, IE beaver bheber is adopted for the mongoose in South Asia, since beavers do not exist there. IE willow becomes reed, cane in India. So we see that IE temperate plant and animal terms are adopted for the newly encountered tropical living things in India. The flow is into India, not out of India.

For the OIT to be right, the IE languages would have had to have coined these terms after they left India. However, this is not what happened. Instead, the words were IE words from the core IE language itself, which, according to the OIT, was only spoken in India. But these plant and animal names could not possibly have been created in India because most don’t even exist there.

For the OIT to be correct, IE core vocabulary should indicate a tropical climate.

15. Early loans in very early IE. The earliest loans in IE are from Semitic languages of the Middle East. This is possible with an IE homeland in SW Russia or Anatolia, but not possible if the IE homeland is in India, as the OIT requires.

16. Typological features of IE. The typological features of IE are between Kartvelian in Georgia and Uralic in the Urals, as we would expect with an IE homeland in SW Russia, and unlikely with an IE homeland in India.

17. Skeletons. Where are the Indian bones? The OIT requires not a trickling out of India, but a massive migration out of the Punjab. Yet Indian bones look remarkably different from Middle Eastern and European bones. With the massive migration out of India required by the OIT, we should find Indian bones in all of the branches of IE. One would have to argue that the IE speakers who left India did not look like the rest of the Indian people.

18. Facial characteristics. DNA analyses of burials in the Kurgan area near the IE homeland 6000 YBP shows that 60% of the early IE people there had light hair and green or blue eyes. How many Indians, even North Indians, have light hair and light eyes? Almost none. Clearly, the Kurgan peoples were a European type of people. They moved down into Iran and India and mixed with the darker folks already there, creating the present day swarthy peoples of South and Central Asia.

19. Very early Proto IA loans in Finno-Ugric. The homeland of the Finno-Ugric people is somewhere in the Urals. The homeland of the Proto Indo-Aryan people is also somewhere in Urals, especially at the very southern end. The only way for these early PIA loans to get into Finno-Ugric is if the PIA homeland is in the Urals. It’s not possible with the OIT, which generally makes a separate Indo-Aryan branch impossible anyway.

20. Vedic is later than Hittite. For the OIT to be correct, Vedic must be the most ancient branch of IE of them all, very close to Proto IE itself. Yet Hittite, attested from 4000 YBP, is earlier than Vedic. In fact, it is later than Eastern IE, Proto IA, and even pre-Vedic, so Vedic must be a fairly late development in IE. In fact, Vedic is even later than the early forms present in Mittani 3400 YBP.

21. Sanskrit is the most ancient language in all of IE and looks a lot like the original IE language. This is the OIT claim. In fact, IE does not look much like Sanskrit at all. And Sanskrit is not even the oldest attested IA language. Vedic comes first, then Epic Sanskrit and then Classical Sanskrit, and Vedic itself cannot possibly be older than 3500 YBP. The IE language is dated to 6500-8000 YBP (I favor the earlier date). Epic Sanskrit appears only 2500 YBP and Classical Sanskrit comes even later.

22. Lack of IA archeological sites. This is a classic OIT argument. Actually, we do have quite a few site. From the original Proto-Indo-Iranian sites in Sintashta southeast of Urals to the BMAC in Turkmenistan to the Yaz Culture in northeast Iran to the Swat Culture in the Swat Valley of Pakistan to the Cemetery H Culture in Punjab to the Copper Hoard Culture to the south, to the Painted Grey Ware Culture to the south and east, we have a long stretch of cultures that have long been associated with the AMT by archeologists.

Cemetery H in particular shows a possible move away from IVC culture. While the pottery is of course the same, there is a new design on the pottery. On funeral urns we see a small picture of a man with a bird inside of him. This seems to indicate the Vedic belief that the souls of men could fly like birds. Cemetery H also shows a new burial style – cremation and deposit of remains in burial urns. These changes in culture are probably due to Aryan influence.

The ideal Aryan archeological site, however, has typically not yet been found. The ideal site would have the remains of horses and their furnishings, chariots, A Vedic ritual site with three fireplaces west of a river, a flimsy and primitive building pattern of bamboo huts, tools made of stone, copper and bronze, gold and silver ornaments, food consisting of barley, milk products and the meat of cows, sheep and goats. However, the pottery style would remain local, as the Aryans did not innovate pottery.

Such a site has continued to elude searchers, but one has been found in Swat. Swat is mentioned in the Vedas as Aryan territory – suvastu.

23. No Aryan bones. Another OIT argument. It’s quite common for migrations to not be represented by skeletal remains. The remains of the Huns, a large force of proven invaders who conquered Hungary have only just been found in the past 20 years. The most recent research indicates that the Aryans left language, but few genes, in India.

This is reasonable and is often the case with many migrations and invasions. The Huns left as little genetic imprint on the Hungarians as the IA people did on Indians. The Magyars also left their language in the Danube, as the IA people left their IA language in India.

24. European appearance of Indo-Iranian peoples. There is no getting around it. The speakers of Indo-Iranian (II)languages often look strikingly European. This is particularly the case of Iranians, who consider themselves White, or Europeans outside of Europe. The speakers of II languages in Afghanistan often look very European. People in northern Pakistan are some of the most European looking people in the region. Punjabis often look very European, and they look much different from the South Indians to the south.

For the OIT to be correct, this should not be the case. All across the region, all II speakers should look like South Indians, and so should the Punjabis of North India. That II speakers look so European is evidence that they are partly descended from the very European looking peoples of the Kurgan culture of southern Russia. They moved south and east into Central and South Asia, bred in with darker locals, but still retain a strong resemblance to their European roots.

25. LANDSTAT photos indicate the drying up of the Sarasvati River 3900 YBP. A stable of the OIT argument. Since the Sarasvati is mentioned as “the great river” in the Vedas, this proves that the Vedas are much older than 4000 YBP, despite the copious linguistic evidence. The problem is that LANDSTAT photos cannot indicate geographic times.

Further, the Sarasvati River situation is very tricky. The situation as represented in the Vedas is the same situation as exists today. The upper Sarasvati is a significant river, and this is where the settlements were. The lower Sarasvati had already begun to dry up, and by the time of the Vedas it emptied into an inland lake. In a few places, the lower river goes underground in the alluvial Punjabi plains and disappears.

Archeological investigation indicates that settlements along the lower river were abandoned as the river dried up around this time. As you can see, the “Sarasvati River dried up” meme is a huge red herring.

26. No memories of an Aryan migration. Another OIT line. First of all, it is quite typical of most people to have no memories or false memories of wherever they came from. The Romans said they came from Greece. The Gypsies say they came from Egypt.

However, the Vedas do contain vague references to former habitations, such as what appears to be the BMAC and there are references to journeys over mountains and mountain passes. Many place names in Afghanistan are from proto-II words from Central Asia and often lead back to ancient Central Asian enemies of the Arya referred to in the Vedas. One of these is the Parni, associated with the BMAC and later with a northern Iranian group. They had stone forts and well-built cow stables in northern Iran that look a lot like earlier BMAC structures.

The route of migration did not take place over the high passes of the Himalayas and the Pamirs. Few groups have migrated over these treacherous mountains in the last 2000 years. Instead, the migration went from the BMAC down through northern Iran to Herat in West Afghanistan to the Gomal River in near Ghazni in East Afghanistan to the Swat Valley.

There are frequent references in the Vedas to southward and eastward movements of various groups of Arya. There are no references to westward groups as would be required by the OIT. Some of these movements to the south and east are described in military terms as victorious conquests. There are also references in the late Vedas of movements of the Arya east from the Afghan/Pakistan border to Haryana, Uttar Pradesh and all the way to Bihar.

27. Archeoastronomy. OIT proponents like to push this theory. Supposedly, the positions of stars are mentioned in the Vedas. By analyzing the positions of stars in the Vedas, we can make claims about when the Vedas were written via tracking the movements of stars in ancient days.

However, archeoastronomy is a field in poor standing. All we can learn for sure from archeoastronomy is that the Vedas were written some time in the past 8,000 years. All else is up in the air.

The Indian Astrophysicist Rajesh Kochhar has clearly mentioned that the astronomical data in the Vedas is not reliable.

28. The association of Andronovo culture with Indo-Iranians is controversial. So say the OIT proponents. This is not true.

Andronovo is a culture associated with the proto Indo-Iranians that stretched, in its formative location, around northern Kazakhstan and and west into Russia to near Samara, then down to the Caspian Sea, covering most of the northwest quadrant of Kazakhstan.

Later its borders enlarged. At maximum, its northern boundary was from Samara in the Volga Basin east to Anzhero Sudzhensk northeast of Novosibersk in southern Siberia.

The eastern boundary bordered on the Afansevo Culture in eastern Kazakhstan, southern Siberia and Xinjiang. Andronovo did include part of Xinjiang in the far north where the Altai Mountains come down.

The eastern border then encompassed most of eastern Kazakhstran except the area east of Balquash Koli, moving down to the border with Kyrgyzstan in the south, encompassing most of Uzbekistan except the far south, the northern half of Turkmenistan all the way to the the southeastern shore of the Caspian Sea. The Aral Sea was the realm of the Andronovo People.

The relation of Andronovo to the Indo-Aryan people in particular, as opposed to Indo-Iranians in general, is more controversial, but has been suggested by some experts.

29. Chariots could not go over the Hindu Kush. Another OIT argument. But as noted above, the Aryans did not move down through the Hindu Kush; instead, they came east from the BMAC through northern Iran to Herat in west Afghanistan east to around Ghazni over to the Bannu region in the NWFP of Pakistan. That’s a much easier route than the Hindu Kush.

30. There was no invasion. The invasion scenario has been replaced in the past 40 years to a migration scenario. It seems more likely that instead of defeating the Dravidian people and pushing them to the south, or destroying the IVC, instead the Aryans merely profited from the collapse of the IVC that was already underway.

31. There was no genocide of the Dravidian people, all Indians look alike genetically. No one ever said there was a genocide of the Dravidians by the Aryans. Instead, the Aryans moved in, and there was intermingling and intermarriage with the Dravidians, the combined result being the culture of the Vedas.

32. The linguistic evidence. The case for the AMT and the total non-case for the OIT is made by the linguistic evidence. Everything else is secondary. The case was clinched by Hock 1999 (see references).

33. Indians descend overwhelmingly from the Paleolithic population of India. It’s true that 80% of Indian genes go all the way back to the Paleolithic era. But 80% of European genes go all the way back to the Paleolithic too. Same in Britain. Therefore, Europe and Britain has never experienced any migrations of invasions in the past 10,000 years. The Aryan genetic footprint on Indian genes, if it exists, is doubtless less than 10% of the total. It’s well known by now that the Aryans left language, but few genes, in India. Identifying genetic history with linguistic history is naive.

Keep in mind that the Aryans were probably installed a superstrate over the existing Dravidian population. The Aryans were probably no more than 10-15% of the population genetically, and the remaining 85-90% were Dravidians.

34. How could a more primitive people like the Aryans replace the language of the more civilized people, the IVC Dravidians? So ask the OIT theorists. However, let us note that Greek speakers in the Levant, Aramaic speakers in Mesopotamia, Coptic speakers in Egypt and Romans in northern Africa all got their languages replaced by the culturally inferior Bedouins of Arabia. This sort of thing happens all the time.

35. There is no solid proof an Aryan migration to India in archeological terms. This is true as far as it goes, but all it means that is that archeologists typically refuse to characterize migrations in terms of who is migrating where. While there is no archeological proof for an Aryan migration, there is also no proof for Greek, Germanic, Italic, Celtic or Armenian migrations in those branches of IE either.

36. The Rig Veda says that the Sarasvati River flows to the sea. According to OIT folks, since the river dried up 3900 YBP, if the Vedas discuss it flowing the sea, they must have been written before 4000 YBP. However, this statement is only in one sentence of the Vedas, and the word “sea” in question is actually samuda, which Sanskrit experts say can mean lots of thing, but in this case means and inland sea or lake as formed by a river emptying into a desert. Which is what the Sarasvati did. The Sarasvati never emptied into the sea at any time.

37. Horses. OIT proponents keep claiming that they have found horse bones or evidence of horses on seals or objects at some early date. None of this has been confirmed, and some cases have involved overt fraud by Indian nationalist “scholars.” The earliest confirmed horse in the region is at Pirak 3800 YBP. Many horse remains have been found after that, but none earlier.

38. The AMT was invented by Max Muller in 1848. Muller as a British spy – agent – whatever who was sent by the British to falsify the history of India so the Indians would lose their national pride. Hence, the AMT is a British conspiracy. Yes, OIT supporters actually say this. The long version is that he was hired by the British East India Company as part of a nefarious plot to denigrate Hinduism.

First of all, the theory was not invented in 1848 nor was it invented by Muller, as it substantially predates 1848 and Muller was not the first to come up with it.

There is no evidence at all that the AMT was hatched as a British conspiracy (other popular theories say that the entire linguistic community was in on this conspiracy), nor has anyone offered any reason how or why the British could profit by making up the theory of a Bronze Age culture in India. Or why the British, who supposedly hated Indians and thought they were inferior, would invent a theory that said that Indians were in part related to the great British people.

References

Hock, Hans H. 1999. Out of India? The Linguistic Evidence. In: J. Bronkhorst & M. Deshpande, Aryan and Non-Aryan in South Asia: Evidence, Interpretation and Ideology, 1-18. Harvard Oriental Series. Opera Minora, Vol. 3. Cambridge, MA. 

Kochhar, Rajesh. 2000. The Vedic People: Their History and Geography. New Delhi: Sangam Books. 2000.

47 Comments

Filed under Animals, Anthropology, Asia, China, Domestic, East Indians, Eurasia, Europe, Europeans, Finno-Ugric Languages, Horses, India, Indic, Indo-European, Indo-Hittite, Indo-Iranian, Indo-Irano-Armenian, Indo-Irano-Armeno-Hellenic, Iran, Kazakhstan, Language Families, Linguistics, Literature, Pakistan, Physical, Race/Ethnicity, Regional, Roma, Russia, Sanskrit, Scholarship, South Asia, South Asians, Wild

More On The Hardest Languages To Learn – Non-Indo-European Languages

Note: Unbelievably, the PC nutjobs have accused this post, a Linguistics post of all things, of racism. See here for my position statement on racism.

Caution: This post is very long. It runs to 75  pages on the Net.

This is a continuation of the earlier post. I split it up into two parts because it had gotten too long.

The post refers to which languages are the hardest for English speakers to learn, though to some extent, the ratings are applicable across languages. Most Chinese speakers would recognize Spanish as being an easy language, despite its alien nature. And even most Chinese, Navajo, Poles or Czechs acknowledge that their languages are hard to learn. To a certain extent, difficulty is independent of linguistic starting point. Some languages are just harder than others, and that’s all there is to it.

Method, Results and Conclusion. See here.

Ratings: Languages are rated 1-5, easiest to hardest. 1 = easiest, 2 = moderately easy to average, 3 = average to moderately difficult, 4 = very to extremely difficult, 5 = most difficult of all.

Time needed: Time needed to learn the language “reasonably well”: Level 1 languages = 3 months-1 year. Level 2 languages = 6 months-1 year. Level 3 languages = 1-2 years. Level 4 languages = 2 years. Level 5 languages = 3-4 years, but some may take longer.

NE Caucasian, NW Caucasian and Kartvelian

Of course the Caucasian languages like Tsez, Tabasaran, Georgian, Chechen, Ingush, Abkhaz and Circassian are some of the hardest languages on Earth to learn. Chechen, Circassian, Ingush and Abkhaz are rated 5, hardest of all.

NE Caucasian

Tsez has 64-126 different cases, making it by far the most complex case system on Earth! It is said that even native speakers have a hard time picking up the correct inflection to use sometimes.

Tabasaran is rated the 3rd most complex grammar in the world, with 48 different noun cases.

Tsez and Tabasaran are rated 5, hardest of all.

Kartvelian

One problem with Georgian is the strange alphabet: ქართულია ერთ ერთი რთული ენა. It also has lots of glottal stops that are hard for many foreigners to speak, a single verb can have up to 12 different parts, similar to Polish, consonant clusters can be huge – up to eight consonants stuck together, many consonant sounds are strange, and there are six cases and six tenses. In addition, Georgian is both highly agglutinative and highly irregular, which is the worst of two worlds. Georgian is one of the hardest languages on Earth to pronounce.

On the plus side, Georgian has borrowed a great deal of Latinate foreign vocabulary, so that will help anyone coming from a Latinate or Latinate-heavy language background.

Georgian is rated 5, hardest of all.

NW Caucasian

Ubykh, a Caucasian language of Turkey, is now extinct, but there is one second language speaker. It has more consonants than any language on Earth – 78 consonant sounds in all. Combine that with only 2 vowel sounds and a highly complex grammar, and you have one tough language. However, it does lake the convoluted case systems of the Caucasian languages next door.

Ubykh is rated 5, hardest of all.

American Indian Languages

American Indian languages are also notoriously difficult, though few try to learn them in the US anyway. In the rest of the continent, they are still learned by millions in many different nations. You almost really need to learn these as a kid. It’s going to be quite hard for an adult to get full competence in them.

One problem with these languages is the multiplicity of verb forms. For instance, the standard paradigm for the overwhelming number of regular English verbs is a maximum of five forms: steal, steals, stealing, stole, stolen. Many Amerindian languages have over 1000 forms of each verb in the language.

Dene-Yeniseian

Na-Dene

Navajo has long, short and nasal vowels, a tone system, and a grammar totally unlike anything in Indo-European. A stem of only four letters or so can take enough affixes to fill a whole line of text. Some Navajo dictionaries have thousands of entries of verbs only, with no nouns. A verb has no particular form like in English – to walk. Instead, it assumes various forms depending on whether or not the action is completed, incomplete, in progress, repeated, habitual, one time only, instantaneous, or simply desired.

For instance, the verb ndideesh means to pick up or to lift up. But it varies depending on what you are picking up.

For instance, ndideeshtiilto pick up a slender stiff object (key, pole),
ndideeshleel to pick up a slender flexible object (branch, rope)
ndideesh’aalto pick up a roundish or bulky object (bottle, rock)
ndideeshgheelto pick up a compact and heavy object (bundle, pack)
ndideeshjolto pick up a non-compact or diffuse object (wool, hay)
ndideeshteelto pick up something animate (child, dog)
ndideeshnil to pick up a few small objects (a couple of berries, nuts)
ndideeshjihto pick up a large number of small objects (a pile of berries, nuts)
ndideeshtsos -to pick up something flexible and flat (blanket, piece of paper)
ndideeshjil - to pick up something I carry on my back
ndideeshkaalto pick up anything in a vessel
ndideeshtlohto pick up mushy matter (mud).

But picking up is only one way of handling the 12 different consistencies. One can also bring, take, hang up, keep, carry around, turn over, etc. objects. There are about 28 different verbs one can use for handling objects. If we multiply these verbs by the consistencies, there are over 300 different verbs used just for handling objects.

In Navajo textbooks, there are conjugation tables for inflecting words, but it’s pretty hard to find a pattern there. One of the most frustrating things about Navajo is that every little morpheme you add to a word seems to change everything else around it, even in both directions.

It is even said that Navajo children have a hard time learning Navajo as compared to children learning other languages, but Navajo kids definitely learn the language.

Similarly with Hopi below, even linguists find even the best Navajo grammars difficult or even impossible to understand.

Navajo is rated 5, hardest of all.

Hopi is so difficult that even grammars describing the language are almost impossible to understand.

Hopi is rated 5, hardest of all.

Slavey, a Na-Dene language of Canada, is hard to learn. It is similar to Navajo and Apache. Verbs take up to 15 different prefixes. It also uses a completely different alphabet, a syllabic one designed for Canadian Indians.

Slavey is rated 5, hardest of all.

Burushaski

Burushaski is often thought to be a language isolate, related to no other languages, however, I think it is Dene-Caucasian. It is spoken in the Himalaya Mountains of far northern Pakistan in an area called the Hunza. It’s verb conjugation is complex, it has a lot of inflections, there are complicated ways of making sentences depending on many factors, and it is an ergative language, which is hard to learn for speakers of non-ergative languages. In addition, there are very few to no cognates for the vocabulary.

Haida

Haida is often thought to be a Na-Dene language, but proof of its status is lacking. If it is Na-Dene, it is the most distant member of the family. Haida is in the competition for the most complicated language on Earth, with 70 different suffixes.

Salishan

The Salishan languages spoken in the Northwest have a long reputation for being hard to learn, in part because of long strings of consonants, in one case 11 consonants long. The Salish languages are, like Chukchi, polysynthetic. Some translations treat all Salish words are either verbs or phrases. Some say that Salish languages do not contain nouns, though this is controversial. Many of the vowels and consonants are not present in most widely spoken languages.

Nuxálk is a notoriously difficult Salishan Amerindian language spoken in British Colombia. It is famous for having some really wild words and even sentences that don’t seem to have any vowels in them at all. For instance, xłp̓x̣ʷłtłpłłskʷc̓he had a bunchberry plant.

The Salishan languages are rated rated 5, hardest of all.

Kootenai

Yet the Salishans always considered the neighboring language Kootenai to be too hard to learn. Kootenai is an isolate spoken in Idaho.

Kootenai is rated 5, hardest of all.

Algonquian

Central Algonquian

Ojibwa and Cree are very hard to learn. They are written in a variety of different ways with different alphabets and syllabic systems, complicating matters even further. They are both polysynthetic and have long, short and nasal vowels and aspirated and unaspirated voiceless consonants. Words are divided into metrical feet, the rules for determining stress placement in words are quite complex and there is lots of irregularity. Vowels fall out a lot, or syncopate, within words.

Cree adds noun classifiers to the mix, and both nouns and verbs are marked as animate or inanimate. In addition, verbs are marked for transitive and intransitive. In addition, verbs get different affixes depending on whether they occur in main or subordinate clauses.

Cree and Ojibwa ares rated 5, hardest of all.

Plains Algonquian

Cheyenne is well-known for being a hard Amerindian language to learn. Like many polysynthetic languages, it can have very long words.

náohkêsáa’oné’seómepêhévetsêhésto’anéheI truly don’t know Cheyenne very well.

Cheyenne is rated 5, hardest of all.

Uto-Aztecan

Numic

Comanche is legendary for being one of the hardest Indian languages of all to learn. Reasons are unknown, but all Amerindian languages are quite difficult. I doubt if Comanche is harder than other Numic languages.

Bizarrely enough, Comanche has very strange sounds called voiceless vowels, which seems to be an oxymoron, as vowels would seem to be inherently voiced. English has something akin to voiceless vowels in the words particular and peculiar, where the bolded vowels act something akin to a voiceless vowel.

Comanche was used for a while by the codespeakers in World War 2 – not all codespeakers were Navajos. Comanche was specifically chosen because it was hard to figure out. The Japanese were never able to break the Comanche code.

Comanche is rated 5, hardest of all.

Quechuan

Quechua is controversial; some say it is very hard to learn, but others disagree. One argument is that there is a lot of dialectal divergence and a lack of learning materials.

On the difficulty side, some say that Quechua speakers spend their whole lives learning the language. Quechua is a controversial case, but I can’t imagine any Amerindian language getting lower than a 5.

Quechua is rated 5, hardest of all.

Oto-Manguean

Chinantec, an Indian language of southwest Mexico, is very hard for non-Chinantecs to learn. The tone system is maddeningly complex, and the syntax and morphology is very intricate.

Chinantec is rated 5, hardest of all.

Iroquoian

Cherokee is very hard to learn. In addition to everything else, it has a completely different alphabet. It’s polysynthetic, to make matters worse. It is possible to write a Cherokee sentence that somehow lacks a verb. There are five categories of verb classifiers. Verbs needing classifiers must use one. Each regular verb can have an incredible 21,262 inflected forms! All verbs contain a verb root, a pronominal prefix, a modal suffix and an aspect suffix. In addition, verbs inflect for singular, plural and also dual. Number is marked for inclusive vs. exclusive.

Cherokee also have lexical tone, with complex rules about how tones may combine with each other. Tone is not marked in the orthography.

Cherokee is rated 5, most difficult of all.

Nambikwaran

This is actually a series of closely related languages as opposed to one language, but the Nambikwara language is the most well-known of the family, with 1,200 speakers in the Brazilian Amazon.

Phonology is complex. Consonants distinguish between aspirated, plain and glottalized, common in the Americas. There are strange sounds like prestopped nasals glottalized fricatives. There are nasal vowels and three different tones. All vowels except one have both nasal, creaky-voiced and nasal-creaky counterparts, for a total of 19 vowels.

The grammar is polysynthetic with a complex evidential system.

Reportedly, Nambikwara children do not pick up the language fully until age 10 or so, one of the latest recorded ages for full competence. Nambikwara is sometimes said to be the hardest language on Earth to learn, but it has some competition.

Nambikwara definitely gets a 5 rating, hardest of all!

Wintotoan

Bora, a Wintotoan language spoken in Peru and Colombia near the border between the two countries, has a mind-boggling 350 different noun classes.

Bora gets a 5 rating, hardest of all.

Tucanoan

Tuyuca is a Tucanoan language spoken in by 450 people in the department of Vaupés in Colombia. An article in The Economist magazine concluded that it was the hardest language on Earth to learn.

It has a simple sound system, but it’s agglutinative, and agglutinative languages are pretty hard. For instance, hóabãsiriga means I don’t know how to write. It has two forms of 1st person plural, I and you (inclusive) and I and the others (exclusive). It has between 50-140 noun classes, including strange ones like bark that does not cling closely to a tree, which can be extended to mean baggy trousers or wet plywood that has begun to fall apart.

Like Yamana, a nearly extinct Amerindian language of Chile, Tuyuca marks for evidentiality, that is, how it is that you know something. For instance:

Diga ape-wi. The boy played soccer (I saw him playing).
Diga ape-hiyi. - The boy played soccer (I assume, though I did not see it firsthand).

Evidential marking is obligatory on all Tuyuca verbs and it forces you to think about how you know whatever it is you know.

Tuyuca definitely gets a 5 rating!

Australian

Australian Aborigine languages are some of the hardest languages on Earth to learn, like Amerindian or Caucasian languages.

All Australian languages are rated 5, most difficult of all.

Papuan

Tor-Kwerba

Berik is a Tor-Kwerba language spoken in Indonesian colony of Irian Jaya in New Guinea.

Verbs take many strange endings, in many cases mandatory ones, that indicate what time of day something happened, among other things.

TelbenerHe drinks in the evening.

Where a verb takes an object, it will not only be marked for time of day but for the size of the object.

KitobanaHe gives three large objects to a man in the sunlight.

Verbs may also be marked for where the action takes place in reference to the speaker.

GwerantenaTo place a large object in a low place nearby.

Berik is rated 5 - hardest of all.

Trans New Guinea

Amele is the world’s most complex language as far as verb forms go, with 69,000 finitive and 860 infinitive forms.

Amele is rated 5 - hardest of all.

Afroasiatic

Semitic

Arabic has some very irregular manners of noun declension, even in the plural. For instance, the word girls changes in an unpredictable way when you say one girl, two girls and three girls, and there are two different ways to say two girls depending on context. Two girls is marked with the dual, but different dual forms can be used. All languages with duals are relatively difficult for most speakers that lack a dual in their native language.

Further, it is full of irregular plurals similar to octopus and octopi in English, whereas these forms are rare in English. When you say I love you to a man, you say it one way, and when you say it to a woman, you say it another way. On and on.

There are 28 different symbols in the alphabet and three different ways to write each symbol depending on its place in the word. Consonants are written in different ways depending on where they appear in a word. An h is written differently at the beginning of a word than you would write it at the end of a word. However, one simple aspect of it is that the medial form is always the same as the initial form.

The laryngeals, uvulars and glottalized sounds are hard for many foreigners to make and nearly impossible for them to get right.

Arabic is at least as idiomatic as French or English, so it order to speak it right you have to learn all of the expressionistic nuances.

One of the worst problems with Arabic is the dialects, which in many cases are separate languages altogether. If you learn Arabic, you often have to learn one of the dialects along with classical Arabic. All Arabic speakers speak both an Arabic dialect and Classical Arabic.

To attain anywhere near native speaker competency in Egyptian Arabic, you probably need to live in Egypt for 10 years, but Arabic speakers say that few if any second language learners ever come close to native competency. There is a huge vocabulary, and most words have a wealth of possible meanings.

Adding weight to the commonly held belief that Arabic is hard to learn is research done in Germany in 2005 which showed that Turkish children learn their language at age 2-3, German children at age 4-5, but Arabic kids did not get Arabic until age 12.

Arabic is rated 4, extremely difficult.

Maltese is a strange language, basically an Arabic language that has very heavy influence from non-Arabic tongues. It shares the problem of Gaelic that often words look one way and are pronounced another.

Maltese is rated 4, extremely difficult.

Hebrew is hard to learn according to a number of Israelis. Part of the problem may be the abjad writing system, which often leaves out vowels. Also, other than borrowings, the vocabulary is Afroasiatic, hence mostly unknown to speakers to IE languages. There are also difficult consonants as in Arabic such as pharyngeals and uvulars.

Hebrew gets a 4 for extremely difficult.

Dravidian

Malayalam, a Dravidian language of India, was recently rated the hardest language of all to learn by the World Language Research Foundation.

Malayalam words are often even hard to look up in a Malayalam dictionary.

For instance, adiyAnkaLAkkikkoNDirikkukayumANello is a word in Malayalam. It means something like “I, your servant, am sitting and mixing (which is why I cannot do what you are asking of me)”.  The part in parentheses is an example of the type of sentence where it might be used.

The word is composed of many different morphemes, including conjunctions and other affixes, with sandhi going on with some of them so they are eroded away from their basic form. There doesn’t seem to be any way to look that word up, or to write a Malayalam dictionary that lists all the possible forms, including forms like the word above. It would probably be way too huge of a book.

Tamil, a Dravidian language, is probably close to Malayalam in difficulty. Tamil has an incredible 247 characters in its alphabet. In addition, as with other languages, words are written one way and pronounced another.

Tamil has two completely different registers for written and spoken speech. Both Tamil and Malayalam are very hard to pronounce, are spoken very fast and have extremely complicated, nearly impenetrable scripts. If Westerners try to speak a Dravidian language in south India, more often than not the Dravidian speaker will simply address them in English rather than try to accommodate them.

Malayalam and Tamil are rated 5, most difficult of all.

Altaic

Most agree that Korean is a hard language to learn.

The alphabet, Hangul at least is reasonable; in fact, it is quite elegant. But there are four different Romanizations- Lukoff, Yale, Horne, and McCune-Reischauer – which is preposterous. It’s best to just blow off the Romanizations and dive straight into Hangul. This way you can learn a Romanization later, and you won’t mess up your Hangul with spelling errors, as can occur if you go from Romanization to Hangul. Hangul can be learned very quickly, but learning to read Korean books and newspapers fast is another matter altogether.

Bizarrely, there are two different numeral sets used, but one is derived from Chinese so should be familiar to Chinese, Japanese or Thai speakers who use similar or identical systems.

Korean has a similar problem with Japanese, that is, if you mess up one vowel in sentence, you render it incomprehensible. Korean has a wealth of homonyms, and this is one of the tricky aspects of the language. Any given combination of a couple of characters can have multiple meanings.

One problem is that the bp, j, ch, t and d are pronounced differently than their English counterparts. The consonants, the pachim system and the morphing consonants at the end of the word that slide into the next word make Korean harder to pronounce than any major European language. The vocabulary is very difficult for an English speaker who does not have knowledge of either Japanese or Chinese. Japanese or Chinese will help you a lot with Korean.

Korean is agglutinative and has a subject-topic discourse structure, and the logic of these systems is difficult for English speakers to understand.

Meanwhile, Korean has an honorific system that is even wackier than that of Japanese. However, the younger generation is not using the honorifics so much, and a foreigner isn’t expected to know the honorific system anyway. Speakers of Korean can learn Japanese fairly easily.

Korean is rated by language professors as being one of the hardest languages to learn.

Korean is rated 5, hardest of all.

Japonic

Japanese also uses a symbolic alphabet, but the symbols themselves are sometime undecipherable, in that even Japanese speakers will sometimes encounter written Japanese and will say that they don’t know how to pronounce it. I don’t mean that they mispronounce it; that would make sense. I mean they don’t have the slightest clue how to say the word! This problem is essentially nonexistent in a language like English.

There are over 2,000 frequently used characters in three different symbolic alphabets that are frequently mixed together in confusing ways. Due to the large number of frequently used symbols, it’s said that even Japanese adults learn a new symbol a day a ways into adulthood.

The Japanese writing system is probably crazier than the Chinese writing system. Japanese borrowed Chinese characters. But then they gave each character several pronunciations, and in some cases as many as 24. Next they made two syllabaries using another set of characters, then over the next millenia came up with all sorts of contradictory and often senseless rules about when to use the syllabaries and when to use the character set. Later on they added a Romanization to make things even worse.

Chinese uses 5-6,000 characters regularly, while Japanese only uses around 2,000. But in Chinese, each character has only one or maybe two pronunciations. In Japanese, there are complicated rules about when and how to combine the hiragana with the characters. These rules are so hard that many native speakers still have problems with them. There are also personal and place names (proper nouns) which are given completely arbitrary pronunciations often totally at odds with the usual pronunciation of the character.

Speaking Japanese is not as difficult as everyone says, and many say it’s fairly easy. However, there is a problem similar to English in that one word can be pronounced in multiple ways, like read and read in English.

There is also a class of Japanese called “honorifics” that is quite hard to master. These typically effect verbs. Honorifics vary depending on who you are and who you are talking to. In addition, gender comes into play. One wild thing about Japanese is counting forms. You actually use different numeral sets depending on what it is you are counting! There are dozens of different ways of counting things.

Japanese grammar is often said to be simple, but that does not appear to be the case on closer examination. Particles are especially vexing. Verbs engage in all sorts of wild behavior, and adverbs often act like verbs. Meanwhile, honorifics change the behavior of all words. There are particles like ha and ga that have many different meanings. One problem is that everything that all noun modifiers, even phrases, must precede the nouns they are modifying.

It’s often said that Japanese has no case, but this is not true. Actually, there are seven cases in Japanese. The aforementioned ga is a clitic meaning nominative, made is terminative case, -no is genitive and -o is accusative.

In this sentence:

The plane that was supposed to arrive at midnight, but which had been delayed by bad weather, finally arrived at 1 AM.

Everything underlined must precede the noun plane:

Was supposed to arrive at midnight, but had been delayed by bad weather, the plane finally arrived at 1 AM.

Speaking Japanese is one thing, but reading and writing it is a whole new ballgame. It’s perfectly possible to know the meaning of every kanji and the meaning of every word in a sentence, but you still can’t figure out the meaning of the sentence because you can’t figure out how the sentence is stuck together in such a way as to create meaning.

However, Japanese grammar has the advantage of being quite regular. For instance, there are only four frequently used irregular verbs.

Like Chinese, the nouns are not marked for number or gender. However, while Chinese is forgiving of errors, if you mess up one vowel in a Japanese sentence, you may end up with incomprehension.

The real problem is that the Japanese you learn in class is one thing, and the Japanese of the street is another. One problem is that in street Japanese, the subject is typically not stated in a sentence. Instead it is inferred through such things as honorific terms or the choice of words you used in the sentence. Probably no one goes crazier on negatives than the Japanese. Particularly in academic writing, triple and quadruple negatives are common, and can be quite confusing.

Yet there are problems with the agglutinative nature of Japanese. It’s a completely different syntactic structure than English. Often if you translate a sentence from Japanese to English it will just look like a meaningless jumble of words. Although many Japanese learners feel it’s fairly easy to learn, surveys of language professors continue to rate Japanese as one of the hardest languages to learn. However, it’s generally agreed that Japanese is easier to learn than Korean. Japanese speakers are able to learn Korean pretty easily.

Japanese is rated 5, hardest of all.

Turkic

Turkish is often considered to be hard to learn, and it’s rated one of the hardest in surveys of language teachers, however, it’s probably easier than its reputation made it out to be. It is agglutinative, so you can have one long word where in English you might have a sentence of shorter words. One word is Çekoslovakyalilastiramadiklarimizdanmissiniz?, meaning, Were you one of those people whom we could not make into a Czechoslovakian? Many words have more than one meaning.

There is no verb to be, which is hard for many foreigners. Instead, the concept is wrapped onto the subject of the sentence as a -dim or -im suffix. Turkish is an imagery-heavy language, and if you try to translate straight from a dictionary, it often won’t make sense. However, the suffixation in Turkish, along with the vowel harmony, are both very precise, and there are few if any exceptions.

Turkish is a language of precision in other ways. For instance, there are eight different forms of subjunctive mood that describe various degrees of uncertainty that one has about what one is talking about. This relates to the evidentiality discussed under Tuyuca above. On Turkish news, verbs are generally marked with miş, which means that the announcer believes it to be true though he has not seen it firsthand

The Roman alphabet and almost mathematically precise grammar really help out. A suggestion that Turkish may be easier to learn that many think is the research that shows that Turkish children learn attain basic grammatical mastery of Turkish at age 2-3, as compared to 4-5 for German and 12 for Arabic. The research was conducted in Germany in 2005.

In addition, Turkish has a phonetic orthography.

However, Turkish is hard for an English speaker to learn for a variety of reasons. It is agglutinative like Japanese, and all agglutinative languages are difficult for English speakers to learn. As in Japanese, you start your Turkish sentence the way you would end your English sentence. As in the Japanese example above, the subordinate clause must precede the subject, whereas in English, the subordinate clause must follow the subject. The italicized phrase below is a subordinate clause.

In English, we say, “I hope that he will be on time.”

In Turkish, the sentence would read, “That he will be on time I hope.”

Turkish is rated 3, or average to moderately difficult.

Finno-Ugric

Finnic

Finnish is very hard to learn, and even long-time learners often still have problems with it. You have to know exactly which grammatical forms to use where in a sentence. In addition, Finnish has 15 cases in the singular and 16 in the plural. This is hard to learn for speakers coming from a language with little or no case.

For instance,
talo is the house
talonhouse’s
taloasome of the house
taloksiinto/as the house
talossa in the house
talostafrom inside the house
talooninto the house
talolla on to the house
taloltafrom beside the house
talolleto the house
taloistafrom the houses
taloissa in the houses.

It gets much worse than that. This web page shows that the noun kauppashop can have 2,253 forms.

A simple adjective + noun type of noun phrase of two words can be conjugated in up to 100 different ways.

Adjectives and nouns belong to 20 different classes. The rules governing their case declension depend on what class the substantive is in.

As with Hungarian, words can be very long. For instance, lentokonesuihkuturbiinimoottoriapumekaanikkoaliupseerioppilas which means a non-commissioned officer cadet learning to be an assistant mechanic for airplane jet engines.

Finnish, oddly enough, always puts the stress on the first syllable. Finnish vowels will be hard to pronounce for most foreigners.

However, Finnish has the advantage of being pronounced precisely as it is written. This is also part of the problem though, because if you don’t say it just right, the meaning changes. So, similarly with Polish, when you mangle their language, you will only achieve incomprehension. Whereas with say English, if a foreigner mangles the language, you can often winnow some sense out of it.

However, despite that fact that written Finnish can be easily pronounced, when learning Finnish, as in Korean, it is as if you must learn two different languages – the written language and the spoken language. A better way to put it is that there is “one language for writing and another for speaking.” You use different forms whether conversing or putting something on paper.

Nevertheless, some pronunciation is difficult, especially the contrast between short and long vowels and consonants. Check out these minimal pairs:

sydämelläsydämmellä and jollekinjollekkin

One easy aspect of Finnish is the way you can build many forms from a base root: kirj-, you can build
kirjabook
kirjeletter
kirjoittaato write
and kirjailijawriter.

Finnish verbs are very regular. The irregular verbs can almost be counted on one hand – juosta, käydä , olla, nähdä, tehdä , and a few others. In fact, On the plus side, Finnish in general is very regular.

As in many Asian languages, there are no masculine or feminine pronouns. One redeeming feature of Finnish is a complete lack of consonant clusters.

Finnish is rated 5, hardest of all.

Estonian has similar difficulties with Finnish, since they are closely related. Estonian has 14 cases, including strange cases such as the abessive, adessive, elative and inessive. It also has three different varieties of vowel length, which is strange in the world’s language. There are short, long vowels and extra-long vowels and consonants.

linalinen – short n
linnathe town’s – long n, written as nn
`linnainto the town – extra-long n, not written out!

There are differences in the pronunciation of the three forms above, but in rapid speech, they are hard to hear, though native speakers can make them out. Difficulties are further compounded in that extra-long sonorants (m, n, ng, l, and r) and vowels and are not written out. All in all, phonemic length can be a problem in Estonian, and foreigners never seem to get it completely down.

Estonian is rated 5, hardest of all.

Ugric

It’s widely agreed that Hungarian is one of the hardest languages on Earth to learn. Even language professors agree. For one thing, there are many different forms for a single word via word modification. This enables the speaker to make his intended meaning very precise.

Hungarian is said to have an incredible 35 different cases, but the actual number is probably just 18. Verbs change depending on whether the object is definite or indefinite. There are five different types of verb conjugations. Nearly everything in Hungarian is inflected, similar to Lithuanian or Czech.

The case distinctions alone can create many different words out of one base form. For the word house, we end up with 31 different words using case forms.

házbainto the house
házban
in the house
házból
- from [within] the house
házra
onto the house
házon
on the house
házról
off [from] the house
házhoz
to the house
házíg
until/up to the house
háznál
at the house
háztól
- [away] from the house
házzá
– Translative case, where the house is the end product of a transformation, such as They turned the cave into a house.
házként
as the house, which could be used if you acted in your capacity as a house, or disguised yourself as one. He dressed up as a house for Halloween.
házért
for the house, specifically things done on its behalf, or done to get the house. They spent a lot of time fixing things up (for the house).
házul
– Essive-modal case. Something like “house-ly” or “in the way/manner of a house.” The tent served as a house (in a house-ly fashion).

And we do have some basic cases:
ház - nominative. The house is down the street.
házat
– accusative. The ball hit the house.
háznak
- dative. The man gave the house to Mary.
házzal – Similar to instrumental, but more similar to English with. Refers to both instruments and companions.

The genitive takes 12 different declensions, depending on person and number.
házam – my house
házaim – my houses
házad – your house
házaid – hour houses
háza – his/her/its house
házai - his/her/its houses
házunk - our house
házaink – our houses
házatok - your house
házaitok - your houses
házuk - their house
házaik - their houses
egyház (literally one-house) means church, as in the Catholic Church.

There are also very long words such as megszentségteleníthetetlenségeskedéseitekért. Being an agglutinative language, that word is made up of many small parts of words, or morphemes. That word means something like for your (you all possessive) repeated pretensions at being impossible to desecrate.

The preposition is stuck onto the word in this language, and this will seem strange to speakers of languages with free prepositions.

Hungarian is full of synonyms, similar to English.

For instance, there are 78 different words that mean to move: halad, jár, megy, dülöngél, lépdel, botorkál, kódorog, sétál , andalog, rohan, csörtet, üget, lohol, fut, átvág, vágtat, tipeg, libeg, biceg, poroszkál, vágtázik, somfordál , bóklászik, szedi a lábát, kitér, elszökken, betér , botladozik, őgyeleg, slattyog, bandukol, lófrál, szalad, vánszorog, kószál, kullog, baktat, koslat, kaptat, császkál, totyog, suhan, robog, rohan, kocog, cselleng, csatangol, beslisszol, elinal, elillan, bitangol, lopakodik, sompolyog, lapul, elkotródik, settenkedik, sündörög, eltérül, elódalog, kóborol, lézeng, ődöng, csavarog, lődörög, elvándorol , tekereg, kóvályog, ténfereg, özönlik, tódul, vonul, hömpölyög, ömlik, surran, oson, lépeget, mozog and mozgolódik .

Only about five of those terms are archaic and seldom used, the rest are in current use.

In addition, while most languages have names for countries that are pretty easy to figure out, in Hungarian even languages of nations are hard because they have changed the names so much. Italy becomes Olazorszag, Germany becomes Nemetzorsag, etc.

As in Russian and Serbo-Croatian, word order is relatively free in Hungarian. Further, there are quite a few dialects in Hungarian. Native speakers can pretty much understand them, but foreigners often have a lot of problems. Accent is very difficult in Hungarian due to the bewildering number of rules to determine accent. In addition, there are exceptions to all of these rules. Nevertheless, Hungarian is probably more regular than Polish. Hungarian spelling is also very strange for non-Hungarians, but at least the orthography is phonetic.

There are many irregularities in inflections, and even Hungarians have to learn how to spell of these in school and have a hard time learning this. Hungarian phonetics is also strange, and to make matters worse, there is tons of slang.

One of the problems with Hungarian phonetics is vowel harmony. Since you stick morphemes together to make a word, the vowels that you have used in the first part of the word will influence the vowels that you will use to make up the morphemes that occur later in the word. The vowel harmony gives Hungarian the “singing effect” when it is spoken. The gy sound is hard for many foreigners to make.

It’s hard to say, but Hungarian is probably harder to learn than even the hardest Slavic languages like Czech, Serbo-Croatian and Polish.

Hungarian is rated 5, hardest of all.

Sino-Tibetan

Sinitic

It’s fairly easy to learn to speak Mandarin at a basic level, though the tones can be tough. This is because the grammar is very simple. Short words, no case, gender, verb inflections or tense. But with Japanese, you can keep learning, and with Chinese, you sort of hit a wall, often because the syntactic structure is so strangely different from English (isolating).

Actually, the grammar is harder than it seems. At first it seems simple, like a simplified English with no tense or articles. But the simplicity makes it difficult. No tense means there is no easy way to mark time in a sentence. Furthermore, tense is not as easy as it seems. Sure, there are no verb conjugations, but instead you must learn some particles and special word order that are used to mark tense.

Once you start digging into Chinese, there is a complex layer under all the surface simplicity. There is aspect, serial verbs, a complex classifier system, syntax marked by something called topic-prominence, a strange form called the detrimental passive, preposed relative clauses, use of verbs rather than adverbs to mark direction, and all sorts of strange stuff.

The alphabet uses symbols, so it’s not even a real alphabet. There are at least 85,000 symbols and actually many more, but you only need to know about 3-5,000 of them, and many Chinese don’t even know 1,000. To be highly proficient in Chinese, you need 10,000 characters, and probably less than 5% of Chinese know that many.

Even leaving the characters aside, the stylistic and literary constraints required to Chinese in an eloquent or formal (literary) manner would make your head swim. And just because you can read Chinese, does not mean that you can read Classical Chinese prose. It’s as if it’s written in a different language.

It’s a real problem when you encounter a symbol you don’t know because there is no way to sound out the word. You are really and truly lost and screwed. You need to learn quite a bit of vocabulary just to speak simple sentences.

The tones are often quite difficult for a Westerner to pick up. If you mess up the tones, you have said a completely different word. Often foreigners who know their tones well nevertheless do not say them correctly, and hence, they say one word when they mean another.

A major problem with Chinese is homonyms. To some extent, this is true in many tonal languages. Since Chinese uses short words and is either monosyllabic or disyllabic, there is a limited repertoire of sounds that can be used. At a certain point, all of the sounds are used up, and you are into the realm of homophones.

Tonal distinctions is one way that monosyllabic and disyllabic languages attempt to deal with the homophone problem, but it’s not good enough, since Chinese still has many homophones, and meaning is often discerned by context. Chinese, like French and English, is heavily idiomatic.

It’s little known, but Chinese also uses different forms to count different things, like Japanese. Many agree that Chinese is the hardest to learn of all of the major languages. Language professors have rated Chinese as the hardest language on Earth to learn.

It gets a 5 rating for hardest of all.

However, Cantonese and Min Nan (Taiwanese) are even harder to learn than Mandarin. Cantonese has nine tones to Mandarin’s four, and in addition, they continue to use a lot of the older traditional Chinese characters that were superseded when China moved to a simplified script in 1949. In addition, Cantonese has verbal aspect, possibly up to 20 different varieties. Furthermore, since non-Mandarin characters are not standardized, Cantonese cannot be written down as it is spoken.

Min Nan also has a more complex tone system than Mandarin, with eight tones. Even many Taiwanese natives don’t seem to get it right these days, as it is falling out of favor and many fewer children are being raised speaking than before.

Cantonese and Min Nan get 5 ratings, hardest of all.

Austroasiatic

Mon-Khmer

Vietnamese is also hard to learn because to an outsider, the tones seem hard to tell apart. Therefore, foreigners often make themselves difficult to understand by not getting the tone precisely correct. It also has “creaky-voiced” tones, which are very hard for foreigners to get a grasp on. Vietnamese grammar is fairly simple, and reading Vietnamese is pretty easy once you figure out the tone marks. Words are short as in Chinese. However, the simple grammar is relative, as you can have 25 or more forms just for I, the 1st person singular pronoun.

Vietnamese gets 4, extremely difficult.

Khmer has a reputation for being hard to learn. I understand that it has one of the most complex honorifics systems of any language on Earth. Over a dozen different words mean to carry depending on what one is carrying. There are several different words for slave depending on who owned the slave and what the slave did. There are 28-30 different vowels, including sets of long and short vowels and long and short diphthongs. The vowel system is so complicated that there isn’t even agreement on exactly what it looks like.

Speaking it is not so bad, but reading and writing it is pretty difficult. For instance, you can put up to five different symbols together in one complex symbol.

Khmer gets a 5 rating, hardest of all.

Sedang, a language of Vietnam,  has the highest number of vowel sounds of any language on Earth, at 55 distinct vowel sounds.

Sedang gets a 5 rating, hardest of all.

Hmong-Mien

Hmong is widely spoken in this part of California, but it’s not easy to learn. There are eight tones, and they are not easy to figure out. It’s not obviously related to any other major language but the obscure Mien.

It has some very strange consonants called voiceless nasals. We have them in English as allophones – the m in small is voiceless, but in Hmong, they put them at the front of words – the m in the word Hmong is voiceless. These can be very hard to pronounce.

Hmong gets a 5 rating, hardest of all.

Austro-Tai

Austronesian

Malayo-Polynesian

Bahasa Indonesia and the related Malaysian are fairly easy languages to learn. For one thing, the grammar is dead simple. Verbs are not marked for tense at all. And the sound system of these languages, in common with Austronesian in general, is one of the simplest on Earth. Bahasa Indonesia has few homonyms, homophones, homographs,
heteronyms, etc. Words in general have only one meaning. Though the orthography is not completely phonetic, is only has a small number of exceptions. The system for converting words into nouns or verbs is regular.

Bahasa Indonesia and Malaysian get a 1 rating for very easy.

However, Tagalog is considerably harder. Tagalog is an ergative-absolutive language, not a nominative-accusative language. In the former, phrases are marked not according to subject or object as in the latter, but according to whether the verb is transitive or intransitive. The subject of a transitive verb is marked one way, and the subject of an intransitive verb and object of a transitive verb are marked a second way.

Compared to many European languages, Tagalog syntax, morphology and semantics are often quite different. Unlike Malay, verbs conjugate quite a bit in Tagalog. However, articles and creation of adjectives from nouns is very easy. Compare ganda = beauty (noun) and maganda = beautiful (adjective).

Tagalog gets a 3 rating, average to moderately difficult.

Maori and other Polynesian languages have a reputation for being quite hard to learn, but others say they are not that hard at all, so the situation is confused. The pronunciation is simple, and there is no gender. The main problem for English speakers is that the sentence structure is backwards compared to English. In addition, macrons can cause problems.

Maori gets a 3 rating, average to moderately difficult.

Kwaio is an Austronesian language spoken in the Solomon Islands. It has four different forms of number to mark pronouns – not only the usual singular and plural, but also the rarer dual and the very rare paucal.

For instance:

1 dual inclusive (you and I)
1 dual exclusive (I and someone else, not you)

1 paucal inclusive (you, I and a few others)
1 paucal exclusive (I and a few others)

1 plural inclusive (I, you and many others)
1 plural exclusive (I and many others)

Pretty wild!

Kwaio gets a 5, hardest of all.

Tai-Kadai

Thai is a pretty hard language to learn. There are 75 symbols in the strange script, there are no spaces between words in the script, and vowels can come before, after, above or below consonants in any given syllable. There are five tones, including a neutral tone. Tones are determined by a variety of complex things, including a combination of tone marks, the class of consonants, if the syllable ends in a sonorant or a stop, and what the tone of the preceding syllable was.

There is a system of noun classifiers for counting various things, similar to Japanese. In addition, common to many Asian languages, there is a complicated honorifics system. The vowels are different than in many languages, and there are some unusual diphthongs: eua, euai, aui and uu. There is a contrast between aspirated and unaspirated consonants.

Consonant pronunciations vary depending on the location of the syllable in the word – for instance, s can change to t. There are many vowels which are spoken but not written. There are many consonants that are pronounced the same – for instance, there are six different t‘s, not counting the s‘s that turn into t‘s. The Thai script is definitely one of the most difficult phonetic scripts. Nevertheless, the Thai script is easier to learn than the Japanese or Chinese character sets. In spite of all of that, the syntax is simple, like Chinese.

Thai gets a 4 rating, extremely hard to learn.

Niger-Kordofanian

Niger-Congo

Bantu

Bakjalukasha, a Bantu language spoken in Ivory Coast, is hard to learn. Many of these African languages are tonal and can be quite complex. They also divide nouns into different categories (noun classes) like Caucasian languages do. Further, they are often seriously inflected.

Bakjalukasha gets a 5 rating, hardest of all.

Nguni and Xhosa, two languages of South Africa, are quite difficult, with up to nine click sounds in both. Clicks only exist in one language outside of Africa, an Australian language, and are extremely difficult to learn. Even native speakers mess up the clicks sometimes. Nelson Mandela said he had problems making some of the click sounds in Xhosa.

Nguni and Xhosa get 5 ratings, hardest of all.

Zulu and Ndebele also have these impossible click sounds. These languages also make plurals by changing the prefix of the noun, and the manner varies according the noun class. If you want to look up a word in the dictionary, first of all you need to discard the prefix. For instance, in Ndebele,

river = umfula
rivers = imifula

but stone = ilitshe
stones = amatshe

yet tree = isihlahla
trees = izihlahla .

Zulu has pitch accent, tones and clicks. There are nine different pitch accents, four tones and three clicks, but each click can be pronounced in five different ways. However, tones are not marked in writing, so it’s hard to figure out when to use them. Zulu also has depressor consonants, which lower the tone in the vowel in the following syllable. In addition, Zulu has multiple gender – 15 different genders. And some nouns behave like verbs.

Zulu and Ndebele both get 5 ratings, hardest of all.

The African Bantu language Ga has a bad reputation for being a tough nut to crack. It is spoken in Ghana by about 600,000 people. It has two tones and engages in a strange behavior called tone terracing that is common to many West African languages. It also has many sounds that are not in any Western languages.

Ga gets a 5 rating, hardest of all.

Ndali is a Bantu language with 150,000 speakers spoken in Malawi and Tanzania. It has many strange tense forms. For instance, in the past tense:

Past tense A: He went just now.
Past tense B: He went sometime earlier today.
Past tense C: He went yesterday.
Past tense D: He went sometime before yesterday.

Future tense is marked similarly:

Future tense A: He’s going to go right away.
Future tense B: He’s going to go sometime later today.
Future tense C: He’s going to go tomorrow.
Future tense D: He’s going to go sometime after tomorrow.

Ndali gets a 5, hardest of all.

For unknown reasons, Swahili is generally considered to be an easy language to learn. The US military ranks it 1, with the easiest of all languages to learn. This seems to be the typical perception. Why Swahili is so easy to learn, I am not sure. It’s a trade language, and trade languages are often fairly easy to learn. There’s also a lot of controversy about whether or not Swahili can be considered a creole, but that has not been proven. For the moment, the reasons why Swahili is so easy to learn will have to remain mysterious.

Swahili gets a 1 rating, easiest of all.

Khoisan

!Xóõ (Taa),spoken by only 4,200 Bushmen in Botswana and Namibia, is a notoriously difficult Khoisan language replete with the notoriously impossible to comprehend click sounds. Taa has anywhere from 130 to 164 consonants, possibly the largest phonemic inventory of any language. Of this vast wealth of sounds, there are anywhere from 30-64 different click sounds.

In addition, there are four types of vowels: plain, pharyngealized, breathy-voiced and strident. On top of that, there are four tones. Speakers develop a lump on their larynx from making the click sounds.

Taa, gets a 5 rating, hardest of all.

Eskimo-Aleut

Inuktitut is extremely hard to learn. Inuktitut is polysynthetic-agglutinative, and roots can take many suffixes, in some cases up to 700. Verbs have 63 present indicative and conjugation involves 252 different inflections. However, suffixation is extremely regular. In a typical long Inuktitut text, 92% of words will occur only once. This is quite different from English and many other languages where certain words occur very frequently or at least frequently. Certain fully inflected verbs can be analyzed both as verbs and as nouns. Words can be very long.

InuktituusuungutsialaarungnanngittuaraaluuvungaI truly don’t know how to speak Inuktitut very well.

Inuktitut is also rated one by linguists one of the hardest languages on Earth to pronounce. Inuktitut may be as hard to learn as Navajo.

Inuktitut is rated 5, hardest of all.

Paleosiberian

Chukchi is a polysynthetic languages, so clearly it must be hard to learn. In polysynthetic languages, very long words can denote an entire sentence, and it’s quite hard to take the word apart into its parts and figure out exactly what they mean and how they go together.

Chukchi gets a 5 rating, hardest of all.

Basque

Basque, of course, is just a wild language altogether. There is an old saying that the Devil tried to learn Basque, but after seven years, he only learned how to say Hello and Goodbye. There are 24 cases, and the verbs are quite complex. This is because it is an ergative language, so verbs vary according to the number of subjects and the number of objects and if any third person is involved.

If you don’t grow up speaking Basque, it’s hard to attain native speaker competence. It’s quite a bit easier to write in Basque than to speak it. Nevertheless, Basque verbs are quite regular. In fact, the entire language is quite regular. In addition, most words above the intermediate level are borrowings from large languages, so once you reach intermediate Basque, the rest is not that hard. In addition, on the plus side, pronunciation is straightforward.

Basque is rated 5, hardest of all.

51 Comments

Filed under !Xóõ, Afroasiatic, Algonquian, Altaic, Arabic, Austro-Asiatic, Austro-Tai, Austronesian, Bahasa Indonesian, Bakjalukasha, Bantu, Basque, Cantonese, Cherokee, Chinantec, Chinese language, Chukchi, Chukotko-Kamchatkan, Cree, Dene-Yenisien, Descriptive, Dravidian, Eskimo-Aleut, Finnic, Finnish, Finno-Ugric Languages, Hebrew, Hmong, Hmong-Mien, Hopi, Hungarian, Inuktitut, Iriquoian, Isolates, Japanese, Japonic, Khmer, Khoisan, Kootenai, Korean language, Language Families, Language Learning, Language Samples, Linguistics, Malayalam, Malayo-Polynesian, Malaysian, Maltese, Mandarin, Maori, Min Nan, Mon-Khmer, Na-Dene, Navajo, NE Caucasian, Nguni, Niger-Congo, Niger-Kordofanian, Ojibwa, Oto-Manguean, Paleosiberian, Philippine, Quechua, Quechuan, Salishan, Semitic, Sinitic, Sino-Tibetan, Slavey, Tabasaran, Tamil, Tsez, Turkic, Turkish, Ugric, Vietnamese, Xhosa, Yamana

Paper On Karelian Available

I have edited and rewritten a seminal paper on the present state of the Karelian language by one of the top Karelian linguists in the world P. Zaikov. He’s a native Karelian and Russian speaker, and his English had problems. On the other hand, it was not really atrocious. So it needed a rewrite.

Rewrites of this kind are quite difficult. You try to keep as much of the phrasing and voice of the author, yet you need to redo it. You also do not want it to sound too much like you, the editor. Kudos to the real editors of the world. They have quite a job.

Karelian is a language spoken by about 90,000 speakers in the far northwest of Russia near the Finnish border. It is closely related to Finnish, but not intelligible with it. Liv, or Livvi, said to be a dialect, is actually a separate language. It is not in very good shape at all, but it may be salvageable.

Part of the problem is the fascist dictator of Russia, Vladimir Putin’s, language policies, and another part of the problem is that many Karelians no longer speak the language or speak it well. In addition, the Karelian Parliament has refused to make Karelian a co-official language of Karelia. It is unique among regional Parliaments in not making the language of the region co-official.

Anyway, the paper is downloadable on my site here if you want to check it out.

It is titled The Future Of The Karelian Language In The Republic Of Karelia. The author is P. Zaikov of Petrozavodsk State University in Petrozavodsk, Russia.

Leave a Comment

Filed under Europe, Finno-Ugric Languages, Language Families, Linguistics, Regional, Russia