Category Archives: Sinitic

More On The Hardest Languages To Learn – Non-Indo-European Languages

Note: Unbelievably, the PC nutjobs have accused this post, a Linguistics post of all things, of racism. See here for my position statement on racism.

Caution: This post is very long. It runs to 75  pages on the Net.

This is a continuation of the earlier post. I split it up into two parts because it had gotten too long.

The post refers to which languages are the hardest for English speakers to learn, though to some extent, the ratings are applicable across languages. Most Chinese speakers would recognize Spanish as being an easy language, despite its alien nature. And even most Chinese, Navajo, Poles or Czechs acknowledge that their languages are hard to learn. To a certain extent, difficulty is independent of linguistic starting point. Some languages are just harder than others, and that’s all there is to it.

Method, Results and Conclusion. See here.

Ratings: Languages are rated 1-5, easiest to hardest. 1 = easiest, 2 = moderately easy to average, 3 = average to moderately difficult, 4 = very to extremely difficult, 5 = most difficult of all.

Time needed: Time needed to learn the language “reasonably well”: Level 1 languages = 3 months-1 year. Level 2 languages = 6 months-1 year. Level 3 languages = 1-2 years. Level 4 languages = 2 years. Level 5 languages = 3-4 years, but some may take longer.

NE Caucasian, NW Caucasian and Kartvelian

Of course the Caucasian languages like Tsez, Tabasaran, Georgian, Chechen, Ingush, Abkhaz and Circassian are some of the hardest languages on Earth to learn. Chechen, Circassian, Ingush and Abkhaz are rated 5, hardest of all.

NE Caucasian

Tsez has 64-126 different cases, making it by far the most complex case system on Earth! It is said that even native speakers have a hard time picking up the correct inflection to use sometimes.

Tabasaran is rated the 3rd most complex grammar in the world, with 48 different noun cases.

Tsez and Tabasaran are rated 5, hardest of all.

Kartvelian

One problem with Georgian is the strange alphabet: ქართულია ერთ ერთი რთული ენა. It also has lots of glottal stops that are hard for many foreigners to speak, a single verb can have up to 12 different parts, similar to Polish, consonant clusters can be huge – up to eight consonants stuck together, many consonant sounds are strange, and there are six cases and six tenses. In addition, Georgian is both highly agglutinative and highly irregular, which is the worst of two worlds. Georgian is one of the hardest languages on Earth to pronounce.

On the plus side, Georgian has borrowed a great deal of Latinate foreign vocabulary, so that will help anyone coming from a Latinate or Latinate-heavy language background.

Georgian is rated 5, hardest of all.

NW Caucasian

Ubykh, a Caucasian language of Turkey, is now extinct, but there is one second language speaker. It has more consonants than any language on Earth – 78 consonant sounds in all. Combine that with only 2 vowel sounds and a highly complex grammar, and you have one tough language. However, it does lake the convoluted case systems of the Caucasian languages next door.

Ubykh is rated 5, hardest of all.

American Indian Languages

American Indian languages are also notoriously difficult, though few try to learn them in the US anyway. In the rest of the continent, they are still learned by millions in many different nations. You almost really need to learn these as a kid. It’s going to be quite hard for an adult to get full competence in them.

One problem with these languages is the multiplicity of verb forms. For instance, the standard paradigm for the overwhelming number of regular English verbs is a maximum of five forms: steal, steals, stealing, stole, stolen. Many Amerindian languages have over 1000 forms of each verb in the language.

Dene-Yeniseian

Na-Dene

Navajo has long, short and nasal vowels, a tone system, and a grammar totally unlike anything in Indo-European. A stem of only four letters or so can take enough affixes to fill a whole line of text. Some Navajo dictionaries have thousands of entries of verbs only, with no nouns. A verb has no particular form like in English – to walk. Instead, it assumes various forms depending on whether or not the action is completed, incomplete, in progress, repeated, habitual, one time only, instantaneous, or simply desired.

For instance, the verb ndideesh means to pick up or to lift up. But it varies depending on what you are picking up.

For instance, ndideeshtiilto pick up a slender stiff object (key, pole),
ndideeshleel to pick up a slender flexible object (branch, rope)
ndideesh’aalto pick up a roundish or bulky object (bottle, rock)
ndideeshgheelto pick up a compact and heavy object (bundle, pack)
ndideeshjolto pick up a non-compact or diffuse object (wool, hay)
ndideeshteelto pick up something animate (child, dog)
ndideeshnil to pick up a few small objects (a couple of berries, nuts)
ndideeshjihto pick up a large number of small objects (a pile of berries, nuts)
ndideeshtsos -to pick up something flexible and flat (blanket, piece of paper)
ndideeshjil - to pick up something I carry on my back
ndideeshkaalto pick up anything in a vessel
ndideeshtlohto pick up mushy matter (mud).

But picking up is only one way of handling the 12 different consistencies. One can also bring, take, hang up, keep, carry around, turn over, etc. objects. There are about 28 different verbs one can use for handling objects. If we multiply these verbs by the consistencies, there are over 300 different verbs used just for handling objects.

In Navajo textbooks, there are conjugation tables for inflecting words, but it’s pretty hard to find a pattern there. One of the most frustrating things about Navajo is that every little morpheme you add to a word seems to change everything else around it, even in both directions.

It is even said that Navajo children have a hard time learning Navajo as compared to children learning other languages, but Navajo kids definitely learn the language.

Similarly with Hopi below, even linguists find even the best Navajo grammars difficult or even impossible to understand.

Navajo is rated 5, hardest of all.

Hopi is so difficult that even grammars describing the language are almost impossible to understand.

Hopi is rated 5, hardest of all.

Slavey, a Na-Dene language of Canada, is hard to learn. It is similar to Navajo and Apache. Verbs take up to 15 different prefixes. It also uses a completely different alphabet, a syllabic one designed for Canadian Indians.

Slavey is rated 5, hardest of all.

Burushaski

Burushaski is often thought to be a language isolate, related to no other languages, however, I think it is Dene-Caucasian. It is spoken in the Himalaya Mountains of far northern Pakistan in an area called the Hunza. It’s verb conjugation is complex, it has a lot of inflections, there are complicated ways of making sentences depending on many factors, and it is an ergative language, which is hard to learn for speakers of non-ergative languages. In addition, there are very few to no cognates for the vocabulary.

Haida

Haida is often thought to be a Na-Dene language, but proof of its status is lacking. If it is Na-Dene, it is the most distant member of the family. Haida is in the competition for the most complicated language on Earth, with 70 different suffixes.

Salishan

The Salishan languages spoken in the Northwest have a long reputation for being hard to learn, in part because of long strings of consonants, in one case 11 consonants long. The Salish languages are, like Chukchi, polysynthetic. Some translations treat all Salish words are either verbs or phrases. Some say that Salish languages do not contain nouns, though this is controversial. Many of the vowels and consonants are not present in most widely spoken languages.

Nuxálk is a notoriously difficult Salishan Amerindian language spoken in British Colombia. It is famous for having some really wild words and even sentences that don’t seem to have any vowels in them at all. For instance, xłp̓x̣ʷłtłpłłskʷc̓he had a bunchberry plant.

The Salishan languages are rated rated 5, hardest of all.

Kootenai

Yet the Salishans always considered the neighboring language Kootenai to be too hard to learn. Kootenai is an isolate spoken in Idaho.

Kootenai is rated 5, hardest of all.

Algonquian

Central Algonquian

Ojibwa and Cree are very hard to learn. They are written in a variety of different ways with different alphabets and syllabic systems, complicating matters even further. They are both polysynthetic and have long, short and nasal vowels and aspirated and unaspirated voiceless consonants. Words are divided into metrical feet, the rules for determining stress placement in words are quite complex and there is lots of irregularity. Vowels fall out a lot, or syncopate, within words.

Cree adds noun classifiers to the mix, and both nouns and verbs are marked as animate or inanimate. In addition, verbs are marked for transitive and intransitive. In addition, verbs get different affixes depending on whether they occur in main or subordinate clauses.

Cree and Ojibwa ares rated 5, hardest of all.

Plains Algonquian

Cheyenne is well-known for being a hard Amerindian language to learn. Like many polysynthetic languages, it can have very long words.

náohkêsáa’oné’seómepêhévetsêhésto’anéheI truly don’t know Cheyenne very well.

Cheyenne is rated 5, hardest of all.

Uto-Aztecan

Numic

Comanche is legendary for being one of the hardest Indian languages of all to learn. Reasons are unknown, but all Amerindian languages are quite difficult. I doubt if Comanche is harder than other Numic languages.

Bizarrely enough, Comanche has very strange sounds called voiceless vowels, which seems to be an oxymoron, as vowels would seem to be inherently voiced. English has something akin to voiceless vowels in the words particular and peculiar, where the bolded vowels act something akin to a voiceless vowel.

Comanche was used for a while by the codespeakers in World War 2 – not all codespeakers were Navajos. Comanche was specifically chosen because it was hard to figure out. The Japanese were never able to break the Comanche code.

Comanche is rated 5, hardest of all.

Quechuan

Quechua is controversial; some say it is very hard to learn, but others disagree. One argument is that there is a lot of dialectal divergence and a lack of learning materials.

On the difficulty side, some say that Quechua speakers spend their whole lives learning the language. Quechua is a controversial case, but I can’t imagine any Amerindian language getting lower than a 5.

Quechua is rated 5, hardest of all.

Oto-Manguean

Chinantec, an Indian language of southwest Mexico, is very hard for non-Chinantecs to learn. The tone system is maddeningly complex, and the syntax and morphology is very intricate.

Chinantec is rated 5, hardest of all.

Iroquoian

Cherokee is very hard to learn. In addition to everything else, it has a completely different alphabet. It’s polysynthetic, to make matters worse. It is possible to write a Cherokee sentence that somehow lacks a verb. There are five categories of verb classifiers. Verbs needing classifiers must use one. Each regular verb can have an incredible 21,262 inflected forms! All verbs contain a verb root, a pronominal prefix, a modal suffix and an aspect suffix. In addition, verbs inflect for singular, plural and also dual. Number is marked for inclusive vs. exclusive.

Cherokee also have lexical tone, with complex rules about how tones may combine with each other. Tone is not marked in the orthography.

Cherokee is rated 5, most difficult of all.

Nambikwaran

This is actually a series of closely related languages as opposed to one language, but the Nambikwara language is the most well-known of the family, with 1,200 speakers in the Brazilian Amazon.

Phonology is complex. Consonants distinguish between aspirated, plain and glottalized, common in the Americas. There are strange sounds like prestopped nasals glottalized fricatives. There are nasal vowels and three different tones. All vowels except one have both nasal, creaky-voiced and nasal-creaky counterparts, for a total of 19 vowels.

The grammar is polysynthetic with a complex evidential system.

Reportedly, Nambikwara children do not pick up the language fully until age 10 or so, one of the latest recorded ages for full competence. Nambikwara is sometimes said to be the hardest language on Earth to learn, but it has some competition.

Nambikwara definitely gets a 5 rating, hardest of all!

Wintotoan

Bora, a Wintotoan language spoken in Peru and Colombia near the border between the two countries, has a mind-boggling 350 different noun classes.

Bora gets a 5 rating, hardest of all.

Tucanoan

Tuyuca is a Tucanoan language spoken in by 450 people in the department of Vaupés in Colombia. An article in The Economist magazine concluded that it was the hardest language on Earth to learn.

It has a simple sound system, but it’s agglutinative, and agglutinative languages are pretty hard. For instance, hóabãsiriga means I don’t know how to write. It has two forms of 1st person plural, I and you (inclusive) and I and the others (exclusive). It has between 50-140 noun classes, including strange ones like bark that does not cling closely to a tree, which can be extended to mean baggy trousers or wet plywood that has begun to fall apart.

Like Yamana, a nearly extinct Amerindian language of Chile, Tuyuca marks for evidentiality, that is, how it is that you know something. For instance:

Diga ape-wi. The boy played soccer (I saw him playing).
Diga ape-hiyi. - The boy played soccer (I assume, though I did not see it firsthand).

Evidential marking is obligatory on all Tuyuca verbs and it forces you to think about how you know whatever it is you know.

Tuyuca definitely gets a 5 rating!

Australian

Australian Aborigine languages are some of the hardest languages on Earth to learn, like Amerindian or Caucasian languages.

All Australian languages are rated 5, most difficult of all.

Papuan

Tor-Kwerba

Berik is a Tor-Kwerba language spoken in Indonesian colony of Irian Jaya in New Guinea.

Verbs take many strange endings, in many cases mandatory ones, that indicate what time of day something happened, among other things.

TelbenerHe drinks in the evening.

Where a verb takes an object, it will not only be marked for time of day but for the size of the object.

KitobanaHe gives three large objects to a man in the sunlight.

Verbs may also be marked for where the action takes place in reference to the speaker.

GwerantenaTo place a large object in a low place nearby.

Berik is rated 5 - hardest of all.

Trans New Guinea

Amele is the world’s most complex language as far as verb forms go, with 69,000 finitive and 860 infinitive forms.

Amele is rated 5 - hardest of all.

Afroasiatic

Semitic

Arabic has some very irregular manners of noun declension, even in the plural. For instance, the word girls changes in an unpredictable way when you say one girl, two girls and three girls, and there are two different ways to say two girls depending on context. Two girls is marked with the dual, but different dual forms can be used. All languages with duals are relatively difficult for most speakers that lack a dual in their native language.

Further, it is full of irregular plurals similar to octopus and octopi in English, whereas these forms are rare in English. When you say I love you to a man, you say it one way, and when you say it to a woman, you say it another way. On and on.

There are 28 different symbols in the alphabet and three different ways to write each symbol depending on its place in the word. Consonants are written in different ways depending on where they appear in a word. An h is written differently at the beginning of a word than you would write it at the end of a word. However, one simple aspect of it is that the medial form is always the same as the initial form.

The laryngeals, uvulars and glottalized sounds are hard for many foreigners to make and nearly impossible for them to get right.

Arabic is at least as idiomatic as French or English, so it order to speak it right you have to learn all of the expressionistic nuances.

One of the worst problems with Arabic is the dialects, which in many cases are separate languages altogether. If you learn Arabic, you often have to learn one of the dialects along with classical Arabic. All Arabic speakers speak both an Arabic dialect and Classical Arabic.

To attain anywhere near native speaker competency in Egyptian Arabic, you probably need to live in Egypt for 10 years, but Arabic speakers say that few if any second language learners ever come close to native competency. There is a huge vocabulary, and most words have a wealth of possible meanings.

Adding weight to the commonly held belief that Arabic is hard to learn is research done in Germany in 2005 which showed that Turkish children learn their language at age 2-3, German children at age 4-5, but Arabic kids did not get Arabic until age 12.

Arabic is rated 4, extremely difficult.

Maltese is a strange language, basically an Arabic language that has very heavy influence from non-Arabic tongues. It shares the problem of Gaelic that often words look one way and are pronounced another.

Maltese is rated 4, extremely difficult.

Hebrew is hard to learn according to a number of Israelis. Part of the problem may be the abjad writing system, which often leaves out vowels. Also, other than borrowings, the vocabulary is Afroasiatic, hence mostly unknown to speakers to IE languages. There are also difficult consonants as in Arabic such as pharyngeals and uvulars.

Hebrew gets a 4 for extremely difficult.

Dravidian

Malayalam, a Dravidian language of India, was recently rated the hardest language of all to learn by the World Language Research Foundation.

Malayalam words are often even hard to look up in a Malayalam dictionary.

For instance, adiyAnkaLAkkikkoNDirikkukayumANello is a word in Malayalam. It means something like “I, your servant, am sitting and mixing (which is why I cannot do what you are asking of me)”.  The part in parentheses is an example of the type of sentence where it might be used.

The word is composed of many different morphemes, including conjunctions and other affixes, with sandhi going on with some of them so they are eroded away from their basic form. There doesn’t seem to be any way to look that word up, or to write a Malayalam dictionary that lists all the possible forms, including forms like the word above. It would probably be way too huge of a book.

Tamil, a Dravidian language, is probably close to Malayalam in difficulty. Tamil has an incredible 247 characters in its alphabet. In addition, as with other languages, words are written one way and pronounced another.

Tamil has two completely different registers for written and spoken speech. Both Tamil and Malayalam are very hard to pronounce, are spoken very fast and have extremely complicated, nearly impenetrable scripts. If Westerners try to speak a Dravidian language in south India, more often than not the Dravidian speaker will simply address them in English rather than try to accommodate them.

Malayalam and Tamil are rated 5, most difficult of all.

Altaic

Most agree that Korean is a hard language to learn.

The alphabet, Hangul at least is reasonable; in fact, it is quite elegant. But there are four different Romanizations- Lukoff, Yale, Horne, and McCune-Reischauer – which is preposterous. It’s best to just blow off the Romanizations and dive straight into Hangul. This way you can learn a Romanization later, and you won’t mess up your Hangul with spelling errors, as can occur if you go from Romanization to Hangul. Hangul can be learned very quickly, but learning to read Korean books and newspapers fast is another matter altogether.

Bizarrely, there are two different numeral sets used, but one is derived from Chinese so should be familiar to Chinese, Japanese or Thai speakers who use similar or identical systems.

Korean has a similar problem with Japanese, that is, if you mess up one vowel in sentence, you render it incomprehensible. Korean has a wealth of homonyms, and this is one of the tricky aspects of the language. Any given combination of a couple of characters can have multiple meanings.

One problem is that the bp, j, ch, t and d are pronounced differently than their English counterparts. The consonants, the pachim system and the morphing consonants at the end of the word that slide into the next word make Korean harder to pronounce than any major European language. The vocabulary is very difficult for an English speaker who does not have knowledge of either Japanese or Chinese. Japanese or Chinese will help you a lot with Korean.

Korean is agglutinative and has a subject-topic discourse structure, and the logic of these systems is difficult for English speakers to understand.

Meanwhile, Korean has an honorific system that is even wackier than that of Japanese. However, the younger generation is not using the honorifics so much, and a foreigner isn’t expected to know the honorific system anyway. Speakers of Korean can learn Japanese fairly easily.

Korean is rated by language professors as being one of the hardest languages to learn.

Korean is rated 5, hardest of all.

Japonic

Japanese also uses a symbolic alphabet, but the symbols themselves are sometime undecipherable, in that even Japanese speakers will sometimes encounter written Japanese and will say that they don’t know how to pronounce it. I don’t mean that they mispronounce it; that would make sense. I mean they don’t have the slightest clue how to say the word! This problem is essentially nonexistent in a language like English.

There are over 2,000 frequently used characters in three different symbolic alphabets that are frequently mixed together in confusing ways. Due to the large number of frequently used symbols, it’s said that even Japanese adults learn a new symbol a day a ways into adulthood.

The Japanese writing system is probably crazier than the Chinese writing system. Japanese borrowed Chinese characters. But then they gave each character several pronunciations, and in some cases as many as 24. Next they made two syllabaries using another set of characters, then over the next millenia came up with all sorts of contradictory and often senseless rules about when to use the syllabaries and when to use the character set. Later on they added a Romanization to make things even worse.

Chinese uses 5-6,000 characters regularly, while Japanese only uses around 2,000. But in Chinese, each character has only one or maybe two pronunciations. In Japanese, there are complicated rules about when and how to combine the hiragana with the characters. These rules are so hard that many native speakers still have problems with them. There are also personal and place names (proper nouns) which are given completely arbitrary pronunciations often totally at odds with the usual pronunciation of the character.

Speaking Japanese is not as difficult as everyone says, and many say it’s fairly easy. However, there is a problem similar to English in that one word can be pronounced in multiple ways, like read and read in English.

There is also a class of Japanese called “honorifics” that is quite hard to master. These typically effect verbs. Honorifics vary depending on who you are and who you are talking to. In addition, gender comes into play. One wild thing about Japanese is counting forms. You actually use different numeral sets depending on what it is you are counting! There are dozens of different ways of counting things.

Japanese grammar is often said to be simple, but that does not appear to be the case on closer examination. Particles are especially vexing. Verbs engage in all sorts of wild behavior, and adverbs often act like verbs. Meanwhile, honorifics change the behavior of all words. There are particles like ha and ga that have many different meanings. One problem is that everything that all noun modifiers, even phrases, must precede the nouns they are modifying.

It’s often said that Japanese has no case, but this is not true. Actually, there are seven cases in Japanese. The aforementioned ga is a clitic meaning nominative, made is terminative case, -no is genitive and -o is accusative.

In this sentence:

The plane that was supposed to arrive at midnight, but which had been delayed by bad weather, finally arrived at 1 AM.

Everything underlined must precede the noun plane:

Was supposed to arrive at midnight, but had been delayed by bad weather, the plane finally arrived at 1 AM.

Speaking Japanese is one thing, but reading and writing it is a whole new ballgame. It’s perfectly possible to know the meaning of every kanji and the meaning of every word in a sentence, but you still can’t figure out the meaning of the sentence because you can’t figure out how the sentence is stuck together in such a way as to create meaning.

However, Japanese grammar has the advantage of being quite regular. For instance, there are only four frequently used irregular verbs.

Like Chinese, the nouns are not marked for number or gender. However, while Chinese is forgiving of errors, if you mess up one vowel in a Japanese sentence, you may end up with incomprehension.

The real problem is that the Japanese you learn in class is one thing, and the Japanese of the street is another. One problem is that in street Japanese, the subject is typically not stated in a sentence. Instead it is inferred through such things as honorific terms or the choice of words you used in the sentence. Probably no one goes crazier on negatives than the Japanese. Particularly in academic writing, triple and quadruple negatives are common, and can be quite confusing.

Yet there are problems with the agglutinative nature of Japanese. It’s a completely different syntactic structure than English. Often if you translate a sentence from Japanese to English it will just look like a meaningless jumble of words. Although many Japanese learners feel it’s fairly easy to learn, surveys of language professors continue to rate Japanese as one of the hardest languages to learn. However, it’s generally agreed that Japanese is easier to learn than Korean. Japanese speakers are able to learn Korean pretty easily.

Japanese is rated 5, hardest of all.

Turkic

Turkish is often considered to be hard to learn, and it’s rated one of the hardest in surveys of language teachers, however, it’s probably easier than its reputation made it out to be. It is agglutinative, so you can have one long word where in English you might have a sentence of shorter words. One word is Çekoslovakyalilastiramadiklarimizdanmissiniz?, meaning, Were you one of those people whom we could not make into a Czechoslovakian? Many words have more than one meaning.

There is no verb to be, which is hard for many foreigners. Instead, the concept is wrapped onto the subject of the sentence as a -dim or -im suffix. Turkish is an imagery-heavy language, and if you try to translate straight from a dictionary, it often won’t make sense. However, the suffixation in Turkish, along with the vowel harmony, are both very precise, and there are few if any exceptions.

Turkish is a language of precision in other ways. For instance, there are eight different forms of subjunctive mood that describe various degrees of uncertainty that one has about what one is talking about. This relates to the evidentiality discussed under Tuyuca above. On Turkish news, verbs are generally marked with miş, which means that the announcer believes it to be true though he has not seen it firsthand

The Roman alphabet and almost mathematically precise grammar really help out. A suggestion that Turkish may be easier to learn that many think is the research that shows that Turkish children learn attain basic grammatical mastery of Turkish at age 2-3, as compared to 4-5 for German and 12 for Arabic. The research was conducted in Germany in 2005.

In addition, Turkish has a phonetic orthography.

However, Turkish is hard for an English speaker to learn for a variety of reasons. It is agglutinative like Japanese, and all agglutinative languages are difficult for English speakers to learn. As in Japanese, you start your Turkish sentence the way you would end your English sentence. As in the Japanese example above, the subordinate clause must precede the subject, whereas in English, the subordinate clause must follow the subject. The italicized phrase below is a subordinate clause.

In English, we say, “I hope that he will be on time.”

In Turkish, the sentence would read, “That he will be on time I hope.”

Turkish is rated 3, or average to moderately difficult.

Finno-Ugric

Finnic

Finnish is very hard to learn, and even long-time learners often still have problems with it. You have to know exactly which grammatical forms to use where in a sentence. In addition, Finnish has 15 cases in the singular and 16 in the plural. This is hard to learn for speakers coming from a language with little or no case.

For instance,
talo is the house
talonhouse’s
taloasome of the house
taloksiinto/as the house
talossa in the house
talostafrom inside the house
talooninto the house
talolla on to the house
taloltafrom beside the house
talolleto the house
taloistafrom the houses
taloissa in the houses.

It gets much worse than that. This web page shows that the noun kauppashop can have 2,253 forms.

A simple adjective + noun type of noun phrase of two words can be conjugated in up to 100 different ways.

Adjectives and nouns belong to 20 different classes. The rules governing their case declension depend on what class the substantive is in.

As with Hungarian, words can be very long. For instance, lentokonesuihkuturbiinimoottoriapumekaanikkoaliupseerioppilas which means a non-commissioned officer cadet learning to be an assistant mechanic for airplane jet engines.

Finnish, oddly enough, always puts the stress on the first syllable. Finnish vowels will be hard to pronounce for most foreigners.

However, Finnish has the advantage of being pronounced precisely as it is written. This is also part of the problem though, because if you don’t say it just right, the meaning changes. So, similarly with Polish, when you mangle their language, you will only achieve incomprehension. Whereas with say English, if a foreigner mangles the language, you can often winnow some sense out of it.

However, despite that fact that written Finnish can be easily pronounced, when learning Finnish, as in Korean, it is as if you must learn two different languages – the written language and the spoken language. A better way to put it is that there is “one language for writing and another for speaking.” You use different forms whether conversing or putting something on paper.

Nevertheless, some pronunciation is difficult, especially the contrast between short and long vowels and consonants. Check out these minimal pairs:

sydämelläsydämmellä and jollekinjollekkin

One easy aspect of Finnish is the way you can build many forms from a base root: kirj-, you can build
kirjabook
kirjeletter
kirjoittaato write
and kirjailijawriter.

Finnish verbs are very regular. The irregular verbs can almost be counted on one hand – juosta, käydä , olla, nähdä, tehdä , and a few others. In fact, On the plus side, Finnish in general is very regular.

As in many Asian languages, there are no masculine or feminine pronouns. One redeeming feature of Finnish is a complete lack of consonant clusters.

Finnish is rated 5, hardest of all.

Estonian has similar difficulties with Finnish, since they are closely related. Estonian has 14 cases, including strange cases such as the abessive, adessive, elative and inessive. It also has three different varieties of vowel length, which is strange in the world’s language. There are short, long vowels and extra-long vowels and consonants.

linalinen – short n
linnathe town’s – long n, written as nn
`linnainto the town – extra-long n, not written out!

There are differences in the pronunciation of the three forms above, but in rapid speech, they are hard to hear, though native speakers can make them out. Difficulties are further compounded in that extra-long sonorants (m, n, ng, l, and r) and vowels and are not written out. All in all, phonemic length can be a problem in Estonian, and foreigners never seem to get it completely down.

Estonian is rated 5, hardest of all.

Ugric

It’s widely agreed that Hungarian is one of the hardest languages on Earth to learn. Even language professors agree. For one thing, there are many different forms for a single word via word modification. This enables the speaker to make his intended meaning very precise.

Hungarian is said to have an incredible 35 different cases, but the actual number is probably just 18. Verbs change depending on whether the object is definite or indefinite. There are five different types of verb conjugations. Nearly everything in Hungarian is inflected, similar to Lithuanian or Czech.

The case distinctions alone can create many different words out of one base form. For the word house, we end up with 31 different words using case forms.

házbainto the house
házban
in the house
házból
- from [within] the house
házra
onto the house
házon
on the house
házról
off [from] the house
házhoz
to the house
házíg
until/up to the house
háznál
at the house
háztól
- [away] from the house
házzá
– Translative case, where the house is the end product of a transformation, such as They turned the cave into a house.
házként
as the house, which could be used if you acted in your capacity as a house, or disguised yourself as one. He dressed up as a house for Halloween.
házért
for the house, specifically things done on its behalf, or done to get the house. They spent a lot of time fixing things up (for the house).
házul
– Essive-modal case. Something like “house-ly” or “in the way/manner of a house.” The tent served as a house (in a house-ly fashion).

And we do have some basic cases:
ház - nominative. The house is down the street.
házat
– accusative. The ball hit the house.
háznak
- dative. The man gave the house to Mary.
házzal – Similar to instrumental, but more similar to English with. Refers to both instruments and companions.

The genitive takes 12 different declensions, depending on person and number.
házam – my house
házaim – my houses
házad – your house
házaid – hour houses
háza – his/her/its house
házai - his/her/its houses
házunk - our house
házaink – our houses
házatok - your house
házaitok - your houses
házuk - their house
házaik - their houses
egyház (literally one-house) means church, as in the Catholic Church.

There are also very long words such as megszentségteleníthetetlenségeskedéseitekért. Being an agglutinative language, that word is made up of many small parts of words, or morphemes. That word means something like for your (you all possessive) repeated pretensions at being impossible to desecrate.

The preposition is stuck onto the word in this language, and this will seem strange to speakers of languages with free prepositions.

Hungarian is full of synonyms, similar to English.

For instance, there are 78 different words that mean to move: halad, jár, megy, dülöngél, lépdel, botorkál, kódorog, sétál , andalog, rohan, csörtet, üget, lohol, fut, átvág, vágtat, tipeg, libeg, biceg, poroszkál, vágtázik, somfordál , bóklászik, szedi a lábát, kitér, elszökken, betér , botladozik, őgyeleg, slattyog, bandukol, lófrál, szalad, vánszorog, kószál, kullog, baktat, koslat, kaptat, császkál, totyog, suhan, robog, rohan, kocog, cselleng, csatangol, beslisszol, elinal, elillan, bitangol, lopakodik, sompolyog, lapul, elkotródik, settenkedik, sündörög, eltérül, elódalog, kóborol, lézeng, ődöng, csavarog, lődörög, elvándorol , tekereg, kóvályog, ténfereg, özönlik, tódul, vonul, hömpölyög, ömlik, surran, oson, lépeget, mozog and mozgolódik .

Only about five of those terms are archaic and seldom used, the rest are in current use.

In addition, while most languages have names for countries that are pretty easy to figure out, in Hungarian even languages of nations are hard because they have changed the names so much. Italy becomes Olazorszag, Germany becomes Nemetzorsag, etc.

As in Russian and Serbo-Croatian, word order is relatively free in Hungarian. Further, there are quite a few dialects in Hungarian. Native speakers can pretty much understand them, but foreigners often have a lot of problems. Accent is very difficult in Hungarian due to the bewildering number of rules to determine accent. In addition, there are exceptions to all of these rules. Nevertheless, Hungarian is probably more regular than Polish. Hungarian spelling is also very strange for non-Hungarians, but at least the orthography is phonetic.

There are many irregularities in inflections, and even Hungarians have to learn how to spell of these in school and have a hard time learning this. Hungarian phonetics is also strange, and to make matters worse, there is tons of slang.

One of the problems with Hungarian phonetics is vowel harmony. Since you stick morphemes together to make a word, the vowels that you have used in the first part of the word will influence the vowels that you will use to make up the morphemes that occur later in the word. The vowel harmony gives Hungarian the “singing effect” when it is spoken. The gy sound is hard for many foreigners to make.

It’s hard to say, but Hungarian is probably harder to learn than even the hardest Slavic languages like Czech, Serbo-Croatian and Polish.

Hungarian is rated 5, hardest of all.

Sino-Tibetan

Sinitic

It’s fairly easy to learn to speak Mandarin at a basic level, though the tones can be tough. This is because the grammar is very simple. Short words, no case, gender, verb inflections or tense. But with Japanese, you can keep learning, and with Chinese, you sort of hit a wall, often because the syntactic structure is so strangely different from English (isolating).

Actually, the grammar is harder than it seems. At first it seems simple, like a simplified English with no tense or articles. But the simplicity makes it difficult. No tense means there is no easy way to mark time in a sentence. Furthermore, tense is not as easy as it seems. Sure, there are no verb conjugations, but instead you must learn some particles and special word order that are used to mark tense.

Once you start digging into Chinese, there is a complex layer under all the surface simplicity. There is aspect, serial verbs, a complex classifier system, syntax marked by something called topic-prominence, a strange form called the detrimental passive, preposed relative clauses, use of verbs rather than adverbs to mark direction, and all sorts of strange stuff.

The alphabet uses symbols, so it’s not even a real alphabet. There are at least 85,000 symbols and actually many more, but you only need to know about 3-5,000 of them, and many Chinese don’t even know 1,000. To be highly proficient in Chinese, you need 10,000 characters, and probably less than 5% of Chinese know that many.

Even leaving the characters aside, the stylistic and literary constraints required to Chinese in an eloquent or formal (literary) manner would make your head swim. And just because you can read Chinese, does not mean that you can read Classical Chinese prose. It’s as if it’s written in a different language.

It’s a real problem when you encounter a symbol you don’t know because there is no way to sound out the word. You are really and truly lost and screwed. You need to learn quite a bit of vocabulary just to speak simple sentences.

The tones are often quite difficult for a Westerner to pick up. If you mess up the tones, you have said a completely different word. Often foreigners who know their tones well nevertheless do not say them correctly, and hence, they say one word when they mean another.

A major problem with Chinese is homonyms. To some extent, this is true in many tonal languages. Since Chinese uses short words and is either monosyllabic or disyllabic, there is a limited repertoire of sounds that can be used. At a certain point, all of the sounds are used up, and you are into the realm of homophones.

Tonal distinctions is one way that monosyllabic and disyllabic languages attempt to deal with the homophone problem, but it’s not good enough, since Chinese still has many homophones, and meaning is often discerned by context. Chinese, like French and English, is heavily idiomatic.

It’s little known, but Chinese also uses different forms to count different things, like Japanese. Many agree that Chinese is the hardest to learn of all of the major languages. Language professors have rated Chinese as the hardest language on Earth to learn.

It gets a 5 rating for hardest of all.

However, Cantonese and Min Nan (Taiwanese) are even harder to learn than Mandarin. Cantonese has nine tones to Mandarin’s four, and in addition, they continue to use a lot of the older traditional Chinese characters that were superseded when China moved to a simplified script in 1949. In addition, Cantonese has verbal aspect, possibly up to 20 different varieties. Furthermore, since non-Mandarin characters are not standardized, Cantonese cannot be written down as it is spoken.

Min Nan also has a more complex tone system than Mandarin, with eight tones. Even many Taiwanese natives don’t seem to get it right these days, as it is falling out of favor and many fewer children are being raised speaking than before.

Cantonese and Min Nan get 5 ratings, hardest of all.

Austroasiatic

Mon-Khmer

Vietnamese is also hard to learn because to an outsider, the tones seem hard to tell apart. Therefore, foreigners often make themselves difficult to understand by not getting the tone precisely correct. It also has “creaky-voiced” tones, which are very hard for foreigners to get a grasp on. Vietnamese grammar is fairly simple, and reading Vietnamese is pretty easy once you figure out the tone marks. Words are short as in Chinese. However, the simple grammar is relative, as you can have 25 or more forms just for I, the 1st person singular pronoun.

Vietnamese gets 4, extremely difficult.

Khmer has a reputation for being hard to learn. I understand that it has one of the most complex honorifics systems of any language on Earth. Over a dozen different words mean to carry depending on what one is carrying. There are several different words for slave depending on who owned the slave and what the slave did. There are 28-30 different vowels, including sets of long and short vowels and long and short diphthongs. The vowel system is so complicated that there isn’t even agreement on exactly what it looks like.

Speaking it is not so bad, but reading and writing it is pretty difficult. For instance, you can put up to five different symbols together in one complex symbol.

Khmer gets a 5 rating, hardest of all.

Sedang, a language of Vietnam,  has the highest number of vowel sounds of any language on Earth, at 55 distinct vowel sounds.

Sedang gets a 5 rating, hardest of all.

Hmong-Mien

Hmong is widely spoken in this part of California, but it’s not easy to learn. There are eight tones, and they are not easy to figure out. It’s not obviously related to any other major language but the obscure Mien.

It has some very strange consonants called voiceless nasals. We have them in English as allophones – the m in small is voiceless, but in Hmong, they put them at the front of words – the m in the word Hmong is voiceless. These can be very hard to pronounce.

Hmong gets a 5 rating, hardest of all.

Austro-Tai

Austronesian

Malayo-Polynesian

Bahasa Indonesia and the related Malaysian are fairly easy languages to learn. For one thing, the grammar is dead simple. Verbs are not marked for tense at all. And the sound system of these languages, in common with Austronesian in general, is one of the simplest on Earth. Bahasa Indonesia has few homonyms, homophones, homographs,
heteronyms, etc. Words in general have only one meaning. Though the orthography is not completely phonetic, is only has a small number of exceptions. The system for converting words into nouns or verbs is regular.

Bahasa Indonesia and Malaysian get a 1 rating for very easy.

However, Tagalog is considerably harder. Tagalog is an ergative-absolutive language, not a nominative-accusative language. In the former, phrases are marked not according to subject or object as in the latter, but according to whether the verb is transitive or intransitive. The subject of a transitive verb is marked one way, and the subject of an intransitive verb and object of a transitive verb are marked a second way.

Compared to many European languages, Tagalog syntax, morphology and semantics are often quite different. Unlike Malay, verbs conjugate quite a bit in Tagalog. However, articles and creation of adjectives from nouns is very easy. Compare ganda = beauty (noun) and maganda = beautiful (adjective).

Tagalog gets a 3 rating, average to moderately difficult.

Maori and other Polynesian languages have a reputation for being quite hard to learn, but others say they are not that hard at all, so the situation is confused. The pronunciation is simple, and there is no gender. The main problem for English speakers is that the sentence structure is backwards compared to English. In addition, macrons can cause problems.

Maori gets a 3 rating, average to moderately difficult.

Kwaio is an Austronesian language spoken in the Solomon Islands. It has four different forms of number to mark pronouns – not only the usual singular and plural, but also the rarer dual and the very rare paucal.

For instance:

1 dual inclusive (you and I)
1 dual exclusive (I and someone else, not you)

1 paucal inclusive (you, I and a few others)
1 paucal exclusive (I and a few others)

1 plural inclusive (I, you and many others)
1 plural exclusive (I and many others)

Pretty wild!

Kwaio gets a 5, hardest of all.

Tai-Kadai

Thai is a pretty hard language to learn. There are 75 symbols in the strange script, there are no spaces between words in the script, and vowels can come before, after, above or below consonants in any given syllable. There are five tones, including a neutral tone. Tones are determined by a variety of complex things, including a combination of tone marks, the class of consonants, if the syllable ends in a sonorant or a stop, and what the tone of the preceding syllable was.

There is a system of noun classifiers for counting various things, similar to Japanese. In addition, common to many Asian languages, there is a complicated honorifics system. The vowels are different than in many languages, and there are some unusual diphthongs: eua, euai, aui and uu. There is a contrast between aspirated and unaspirated consonants.

Consonant pronunciations vary depending on the location of the syllable in the word – for instance, s can change to t. There are many vowels which are spoken but not written. There are many consonants that are pronounced the same – for instance, there are six different t‘s, not counting the s‘s that turn into t‘s. The Thai script is definitely one of the most difficult phonetic scripts. Nevertheless, the Thai script is easier to learn than the Japanese or Chinese character sets. In spite of all of that, the syntax is simple, like Chinese.

Thai gets a 4 rating, extremely hard to learn.

Niger-Kordofanian

Niger-Congo

Bantu

Bakjalukasha, a Bantu language spoken in Ivory Coast, is hard to learn. Many of these African languages are tonal and can be quite complex. They also divide nouns into different categories (noun classes) like Caucasian languages do. Further, they are often seriously inflected.

Bakjalukasha gets a 5 rating, hardest of all.

Nguni and Xhosa, two languages of South Africa, are quite difficult, with up to nine click sounds in both. Clicks only exist in one language outside of Africa, an Australian language, and are extremely difficult to learn. Even native speakers mess up the clicks sometimes. Nelson Mandela said he had problems making some of the click sounds in Xhosa.

Nguni and Xhosa get 5 ratings, hardest of all.

Zulu and Ndebele also have these impossible click sounds. These languages also make plurals by changing the prefix of the noun, and the manner varies according the noun class. If you want to look up a word in the dictionary, first of all you need to discard the prefix. For instance, in Ndebele,

river = umfula
rivers = imifula

but stone = ilitshe
stones = amatshe

yet tree = isihlahla
trees = izihlahla .

Zulu has pitch accent, tones and clicks. There are nine different pitch accents, four tones and three clicks, but each click can be pronounced in five different ways. However, tones are not marked in writing, so it’s hard to figure out when to use them. Zulu also has depressor consonants, which lower the tone in the vowel in the following syllable. In addition, Zulu has multiple gender – 15 different genders. And some nouns behave like verbs.

Zulu and Ndebele both get 5 ratings, hardest of all.

The African Bantu language Ga has a bad reputation for being a tough nut to crack. It is spoken in Ghana by about 600,000 people. It has two tones and engages in a strange behavior called tone terracing that is common to many West African languages. It also has many sounds that are not in any Western languages.

Ga gets a 5 rating, hardest of all.

Ndali is a Bantu language with 150,000 speakers spoken in Malawi and Tanzania. It has many strange tense forms. For instance, in the past tense:

Past tense A: He went just now.
Past tense B: He went sometime earlier today.
Past tense C: He went yesterday.
Past tense D: He went sometime before yesterday.

Future tense is marked similarly:

Future tense A: He’s going to go right away.
Future tense B: He’s going to go sometime later today.
Future tense C: He’s going to go tomorrow.
Future tense D: He’s going to go sometime after tomorrow.

Ndali gets a 5, hardest of all.

For unknown reasons, Swahili is generally considered to be an easy language to learn. The US military ranks it 1, with the easiest of all languages to learn. This seems to be the typical perception. Why Swahili is so easy to learn, I am not sure. It’s a trade language, and trade languages are often fairly easy to learn. There’s also a lot of controversy about whether or not Swahili can be considered a creole, but that has not been proven. For the moment, the reasons why Swahili is so easy to learn will have to remain mysterious.

Swahili gets a 1 rating, easiest of all.

Khoisan

!Xóõ (Taa),spoken by only 4,200 Bushmen in Botswana and Namibia, is a notoriously difficult Khoisan language replete with the notoriously impossible to comprehend click sounds. Taa has anywhere from 130 to 164 consonants, possibly the largest phonemic inventory of any language. Of this vast wealth of sounds, there are anywhere from 30-64 different click sounds.

In addition, there are four types of vowels: plain, pharyngealized, breathy-voiced and strident. On top of that, there are four tones. Speakers develop a lump on their larynx from making the click sounds.

Taa, gets a 5 rating, hardest of all.

Eskimo-Aleut

Inuktitut is extremely hard to learn. Inuktitut is polysynthetic-agglutinative, and roots can take many suffixes, in some cases up to 700. Verbs have 63 present indicative and conjugation involves 252 different inflections. However, suffixation is extremely regular. In a typical long Inuktitut text, 92% of words will occur only once. This is quite different from English and many other languages where certain words occur very frequently or at least frequently. Certain fully inflected verbs can be analyzed both as verbs and as nouns. Words can be very long.

InuktituusuungutsialaarungnanngittuaraaluuvungaI truly don’t know how to speak Inuktitut very well.

Inuktitut is also rated one by linguists one of the hardest languages on Earth to pronounce. Inuktitut may be as hard to learn as Navajo.

Inuktitut is rated 5, hardest of all.

Paleosiberian

Chukchi is a polysynthetic languages, so clearly it must be hard to learn. In polysynthetic languages, very long words can denote an entire sentence, and it’s quite hard to take the word apart into its parts and figure out exactly what they mean and how they go together.

Chukchi gets a 5 rating, hardest of all.

Basque

Basque, of course, is just a wild language altogether. There is an old saying that the Devil tried to learn Basque, but after seven years, he only learned how to say Hello and Goodbye. There are 24 cases, and the verbs are quite complex. This is because it is an ergative language, so verbs vary according to the number of subjects and the number of objects and if any third person is involved.

If you don’t grow up speaking Basque, it’s hard to attain native speaker competence. It’s quite a bit easier to write in Basque than to speak it. Nevertheless, Basque verbs are quite regular. In fact, the entire language is quite regular. In addition, most words above the intermediate level are borrowings from large languages, so once you reach intermediate Basque, the rest is not that hard. In addition, on the plus side, pronunciation is straightforward.

Basque is rated 5, hardest of all.

51 Comments

Filed under !Xóõ, Afroasiatic, Algonquian, Altaic, Arabic, Austro-Asiatic, Austro-Tai, Austronesian, Bahasa Indonesian, Bakjalukasha, Bantu, Basque, Cantonese, Cherokee, Chinantec, Chinese language, Chukchi, Chukotko-Kamchatkan, Cree, Dene-Yenisien, Descriptive, Dravidian, Eskimo-Aleut, Finnic, Finnish, Finno-Ugric Languages, Hebrew, Hmong, Hmong-Mien, Hopi, Hungarian, Inuktitut, Iriquoian, Isolates, Japanese, Japonic, Khmer, Khoisan, Kootenai, Korean language, Language Families, Language Learning, Language Samples, Linguistics, Malayalam, Malayo-Polynesian, Malaysian, Maltese, Mandarin, Maori, Min Nan, Mon-Khmer, Na-Dene, Navajo, NE Caucasian, Nguni, Niger-Congo, Niger-Kordofanian, Ojibwa, Oto-Manguean, Paleosiberian, Philippine, Quechua, Quechuan, Salishan, Semitic, Sinitic, Sino-Tibetan, Slavey, Tabasaran, Tamil, Tsez, Turkic, Turkish, Ugric, Vietnamese, Xhosa, Yamana

Revisions to Races of Man Classification

Repost from the old site.

Click to enlarge. This is the chart from the paper, The Origin of Minnan & Hakka, the So-called “Taiwanese”, Inferred by HLA Study, utilized in this post.

I usually try to be very conservative about adding in new races to my races of man post, but sometimes I just feel like I’m forced to. Based on this article, and in particular, the figure above, forced me to make some new splits.

The question was what to do about the Taiwanese people. Not the Taiwan aborigines – but the Hakka and Min Nan people of SE China who settled in Taiwan in the past 400 years. It turns out that they appear to be a discrete race, and that they are linked to Singapore Chinese and the Thai Chinese. In Singapore and Thailand, Chinese form a market-dominant minority position.

They are a minority of the population, but they tend to run businesses and be very wealthy. Similar cases are seen in Indonesia and the Philippines, where tiny Chinese minorities of 2-3% control up to 70% of the wealth in the nation.

So the interesting question arises – who exactly are the Chinese minorities of Thailand and Singapore? By genetic studies, we can now see that they are SE Chinese people related to the Min Nan and the Hakka.

The Min Nan and Hakka both speak languages that are called Chinese dialects, but in reality, they are completely separate languages. Both languages are doing fine – Min Nan (Southern Min) with 49 million speakers and Hakka with 34 million speakers.

Min Nan and Hakka both strangely lack official status anywhere, although Southern Min is widely spoken in Taiwan. It’s odd that some of the world’s most widely spoken languages lack official status – Min Nan is the 24th largest language, and Hakka is the 35th largest language, in terms of numbers of speakers.

Both languages are vigorous and are in good shape. Southern Min has a roman script that is fairly widely used. Hakka also has a roman script, but I am not sure how widely it is used.

Southern Min is actually a number of separate languages: Min Nan proper, Amoy, Teochew and Hainanese , at the very least.

Click to enlarge. Here is a map of the various Chinese languages. These are not Chinese dialects, but actual separate languages. Some may be dialects of other Chinese languages though. The main languages are Mandarin, Wu, Cantonese, Min, Xiang, Hakka and Gan. Ping, Hui and Jin are classed above as dialects of those larger languages.Jin is classed as a dialect of Mandarin, but it is actually a separate language with 45 million speakers, making it around the 25th largest language in the world.Min is said 5 separate languages, but it is actually many separate languages. The 5 separate recognized languages are Min Nan, Min Dong, Min Zhong, Min Bei and Puxian. Min Nan itself is a number of separate languages. Huizhou, or Hui, is a separate language that is actually a set of related languages. Wu is more than one language.

Ping is traditionally considered to be part of Cantonese, but it is a separate language. Mandarin is also a set of related languages instead of one language. Cantonese is also be more than one language. Hakka is also be more than one language.

It is nonsense to say someone speaks “Chinese”. There is no such thing as a language called “Chinese”.

Instead, there are various languages in the Chinese language family – at least 14 separate languages, and actually many more. Mandarin is by far the largest of these languages, and most of the smaller languages are suffering under the influence of Mandarin. In addition, the Chinese government favors Mandarin and does not support the other languages much, if at all.

I also split off a group called the Li and another group called the Oroqen based on the chart above.

The Li are a transitional group between the Northern Chinese and the Southern Chinese, though they live on Hainan Island in the far south of China. They speak a Tai-Kadai language called Hlai which has 667,000 speakers. Use is vigorous; the language is doing well, but it is generally not written, although a Roman script exists. Mandarin is used for writing.

The Oroqen are nomadic people who live in far northeastern China and speak a Tungusic tongue. As you can see from the chart, they are closer to the Japanese than to the NE Chinese. There are only 1,200 speakers left out of a small 7,000 population, but there are 800 monolinguals, and use is vigorous by those who speak the language.

They live by hunting and used to practice shamanism. They still lack an official script for their language, but there are radio programs in Oroqen.

The truth is that both the Oroqen people and their language are in poor shape, and most of the blame can be placed on the Communist Chinese regime, even though the regime has also done many good things for the Oroqen. The Cultural Revolution in particular was a period of insanity, stupidity and terror.

An Oroqen Race was added to the NE Asian Major Race due to the extreme divergence of these people. I also added Inner Mongolians to the Mongolian Race inside of NE Asian.

I added the Buyei to the Tai Race within the SE Asian Major Race and created a new race called SE Chinese Race, consisting of Min Nan, Hakka, Singapore Chinese and Thai Chinese. The Buyei live in southern China and northern Vietnam and speak a Tai language that has over 2 million speakers yet has no official status. Buyei language use is vigorous, and it is in good shape.

There is a romanized script, and there are newspapers in the language, but they mostly use Mandarin for writing. The Buyei language is probably made up of a few separate languages, because some of the dialects are not mutually intelligible. The language is very close to the Zhuang language.

The SE Chinese Race really consists of the descendants of the ancient Chinese people known as the Yueh. The Yueh, or Yue, formed a state in southeastern coastal China during the Warring States Period and the Spring and Autumn Period. The state lasted from about 525 BC to 334 BC. The Chinese were already involved in metallurgy and were producing excellent swords during these periods.

The new lineup looks like this:

Northeast Asian Major Race*

Japanese-Korean Race
Southern Japanese Race (Honshu Kinki – Kyushu)
Ryukyuan Race
Ainu Race***
Gilyak Race**
Northern Chinese Race
(Northern Chinese – Qiang – Manchu – Hui)
Oroqen Race
Sherpa-Yakut Race
Nepalese Race (Nepali – Newari)
Mongolian Race
(Mongolian – Inner Mongolian – Buryat – Kazakh)
Northern Turkic Race
(Dolgan – Altai – Shor – Tofalar – Uighur – Chelkan – Soyot – Kumandin Teleut – Hazara)***
Central Asian Race (Kirghiz – Karalkalpak – Uzbek – Turkmen)
Tuva Race
Tungus Race (Even – Evenki – Russian Saami)
Siberian Race
Beringian Race**
(Chukchi – Aleut – Siberian Eskimo)
Koryak-Itelmen Race
Reindeer Chukchi Race
General Tibetan Race
(Tibetan – Lisu – Nu – Karen – Tujia – Hui – Akha – Burmese – Bai – Yizu – Pnar – Mizo)
Bhutanese Race
Siberian Uralic Race
(Nentsy – Samoyed – Ket – Mansi – Khanty)
Nganasan Race
Uralic Race (Komi – Mari)
North American Eskimo Race

Southeast Asian Major Race*

Southern Chinese Race (Hmong – Mien – Dong – Henan Han – Yi – Naxi)
Li Race
Southeast China Race
(Hakka – Min Nan – Singapore Chinese – Thai Chinese)
South China Sea Race (Filipino – Ami Taiwanese Aborigine – Guangdong Han)
Tai Race (Thai – Lao – Lahu – Aini – Deang – Blang – Shan – Dai – Vietnamese – Muong – Buyei)
Kachin Race (Kachin – Va – Nung – Lu)
General Taiwanese Aborigine Race
(Ayatal – Bunun – Yami)
Island SE Asian Race (Paiwan Taiwanese Aborigine – Sea Dayak – Sumatran – Balinese)
Indonesian Race (Sulawesi – Borneo – Lesser Sunda)
Malay Race (Javanese – Sarawak – Malaysia)
Zhuang Race
(Senoi – Zhuang – She – Santhal – Ho – Nicobarese)
Austroasiatic Race (Mon – Khmer – Khasi – Nongtrai – Bhoi – Maram – Kynriam – Wajaintia)
Meghalaya NE Indian Race (Khasi – Garo – Lyngngam)
Philippines Negrito Race (Aeta – Ati – Palau Micronesian)
Mamanwa Philippines Negrito Race
Andaman Islands Negrito Race**
Semang Malay Negrito Race***

References

Lin M, Chu CC, Chang SL, Lee HL, Loo JH, Akaza T, Juji T, Ohashi J, Tokunaga K. March 2001. The Origin of Minnan & Hakka, the So-called “Taiwanese”, Inferred by HLA Study. Tissue Antigens:57(3):192-9.

21 Comments

Filed under Altaic, Anthropology, Asia, Asian, Asians, Austro-Tai, Austronesian, Buyei, China, Chinese, Chinese (Ethnic), Genetics, History, Language Families, Left, Li, Maoism, Marxism, Modern, Northeast Asians, Oroquen, Physical, Race/Ethnicity, Regional, Reposts From The Old Site, SE Asian, SE Asians, Sinitic, Sino-Tibetan, Tai-Kadai, Taiwan, Tungusic

Response To Mike Campbell on Chinese Language Classification

An autodidact named Mike Campbell has issued a long critique of my Chinese language classification.

There are problems with his analysis.

First of all, Campbell says we need to defer to the Chinese on what is a dialect and what is a language. But top Sinologists in the West are saying that the Chinese are falling down on the job and not working according to the modern scientific definition of what is a language and what is a dialect.

The Chinese linguists operate, like Chinese medicine, according to a completely different format that is pretty much at odds with the one used in the West and in much of the rest of the world.

One element of this format is the fangyan. A fangyan has many meanings, but in Chinese it tends to mean “dialect,” or better yet, “topolect.” It also tends to mean the speech form of a given county. But the Chinese definition of the word “dialect” differs radically from the definition used by linguists elsewhere in the world. For one thing, questions of intelligibility with other lects are left out of the definition of fangyan.

Chinese linguists also use hua, which means something like “speech.” This tends to be more expansive than fangyan, but at the same time it can occur down to the level of dialect. Examples include Putonghua, Shanghaihua, Beijinghua, etc, but also Pinghua and Tuhua. It tends to be geographically based – the speech of a particular geographical location, however that geographical location can be expansive or very restricted. But this is not the case in Putonghua, which is just “average speech”, and is spoken all over China.

The third category is yu. Yu is probably the category that Western linguists would most commonly associate with “language” or even “language family.” Yu only refers to separate languages within Chinese. Outside Chinese, the word wen tends to be used. Examples are Wuyu, Minyu, Huiyu, etc.

No one seems to quite know exactly what the Chinese classification is at any given time.

According to Campbell, we must not do anything until the Chinese act first, but they only make a new language maybe once every few years, and they are failing even at that.

Campbell states that Scots and Bavarian are dialects, not languages. He says that Scots is a dialect of English and Bavarian is a dialect of German. However, Ethnologue says that Scots is a separate language and so is Bavarian. The intelligibility of Bavarian and German is only 40%. I lack figures for Scots, but clearly intelligibility is lower than 90%.

Ethnologue is run by SIL. SIL has been granted the task of assigning all of the new ISO numbers. An ISO number means that a lect has been officially recognized by the world linguistic community as a separate language. So SIL are the linguistic scientists who world community has given the task of deciding what is a language and what is not. Campbell is saying that SIL does not know what they are talking about.

Campbell states that mutual intelligibility cannot be determined by talking to speakers and simply asking them whether or not they can understand “those people over there.”

According to Campbell, this is inaccurate. He says the only way to determine intelligibility is through scientific testing methods looking for % in phonology, lexicon, morphology, syntax, etc. He also says that tonal differences are irrelevant for Chinese, because differences in tones do not impede communication, but I would beg to differ on that. Chinese speakers have told me that closely related lects with much different tones can be very difficult to understand, at least at first.

On Ethnologue’s Mexico page, extensive tests have been done on various lects spoken in small villages determining intelligibility between one lect and another. Intelligibility testing is commonly done by simply sitting a speaker of Lect A down in front of a recorded corpus of Lect B and see how much they can understand.

Campbell says that intelligibility testing on human informants is inherently erroneous because as speakers of Close Lect A hear more and more of Close Lect B, they can understand it over a period of time (the exposure factor). This is the problem of interdialectal learning.

Interdialectal learning (the tendency of closely related lects to hear each others’ lects and quickly learn to speak them and hence muddy the waters of intelligibility), trumpeted by Campbell as a reason that intelligibility testing cannot be done on human informants, is regarded by SIL as different from inherent intelligibility. Inherent intelligibility is best regarded as a test of the ability to use the mother tongue.

In other words, when two lects are said to be “inherently unintelligible” this appears to be referring to “virgin” speakers who have not yet had the opportunity to learn each other’s dialects.

Similarly, members of Lect A may simply be bilingual in Lect B, which also invalidates intelligibility testing. However, measures have already been developed to determine bilingualism and the degree of it. A favorite one is SLOPE. SRT is also used in bilingualism testing. Like other intelligibility testing instruments, they have been subjected to tests for reliability and validity over the years.

Further, testing has evolved to the point where we can begin to ferret out bilingualism from inherent intelligibility. In Casad 1974 the author describes testing done on speakers of Mazatec, a Mexican Indian language.

Intelligibility testing was done to see how well they understood Huautla, a related language. Three female speakers had scores in the 50-60% range, and three males had scores in the 90-100% range. Huautla is a local market language that is learned as a second language by many non-Huautla in the surrounding area. I would gather that 55% represents true inherent intelligibility and the 95% speakers represent practiced bilinguals.

At any rate, in the survey, the figures were averaged together so that Mazatec speakers had 76% intelligibility with Huautla and Mazatec and Huautla were said to be separate languages.

Campbell also throws out a red herring in the notion that certain members of a group may simply refuse to hear the language of another group and insist that they do not understand it. Although existent, this problem has little relevance in intelligibility testing. SIL does testing with cross sections of communities.

Furthermore, SIL notes that intelligibility is typically distributed evenly across a community with regard to sex, class and age.

The SD’s for inherent intelligibility in a community are narrow, less than 15%, whereas the SD’s for bilingualism are much higher. This is because in the case of bilingualism, communities differ. Some feel a strong need to learn the other language, others feel no need at all. Further, members differ in their access to an opportunity to learn the other language, even though they may wish to learn it.

This should throw out the notion that females, the aged, the young or the old, the wealthy or the poor, will automatically give us false data on intelligibility.

Campbell hints that intelligibility is poorly defined. However, SIL has listed a hierarchy of intelligibility. SIL says that intelligibility below 70% is “unintelligible” and intelligibility over 90% is “adequately intelligible” (this usually conforms to our ideas of a dialect). Between 71-89% is what SIL calls “marginally intelligible.” Lately, SIL throws most lects with under 90% intelligibility into separate languages.

Campbell recommends throwing out all intelligibility testing with informants as inherently inaccurate and focusing instead of measures of language similarity.

However, SIL notes that linguistic similarity is not an adequate single predictor of intelligibility. For instance, testing in the Philippines revealed pairs of lects with vocabulary similarity of 52, 66, 72 and 74% which had over 90% intelligibility (were inherently intelligible). Over 80% vocabulary similarity for lect pairs resulted in several cases of inherent intelligibility. So lexical similarity is not an adequate measure at all for measuring intelligibility.

In testing of Polynesian, Siouan and Buang, it was found that the higher the level of lexical similarity up to a certain point, the lower the intelligibility scores were. This is counterintuitive, but it shows once again that lexical similarity is poor measure.

Morris Swadesh was the founder of lexicostatistics, the study of lexical similarity. Lexicostatistics has its uses, but determining between closely related languages and dialects is apparently not one of them.

This myth seems to be dying a hard death. Robert Longacre and Sarah Gudschinsky were involved in long debates with Swadesh about the validity of lexical similarity measures, and they seem to have been proven right. The latest findings calculate that any study that uses lexical similarity alone to determine intelligibility of lects has a 4.5-1 chance of failing to do so with any reliability.

Word lists still have their uses. Where word lists show similarities between lects below 60%, odds are that we are dealing two separate languages, and there is no need to do any further intelligibility testing. And they have obvious uses in historical linguistics and in determining genetic relationships between languages.

Vocabulary similarity below 67%, though, typically reveals intelligibility estimates below 60%. Intelligibility below 60% is inadequate for all but the very simplest communication. Before any kind of even slightly complex or revealing messages can be conveyed, intelligibility usually needs to be over 85%. Casad found that 90% intelligibility on a narrative test was necessary before one could move to more complex kinds of communication. Here once again we get into the dialects.

Intelligibility is usually asymmetrical. In other words, Lect A can understand 80% of Lect B, but Lect B can only understand 70% of Lect A. There are arguments about the reasons for this, but one suggestion is that higher figures result from some sort of bilingual learning.

Campbell also points out that it is not uncommon that people speaking the same language cannot always understand each other. He asks how often we have heard a fellow English speaker of the same dialect say something and we did not catch what they were saying for some reason or other. The implication is that we need to throw out all testing with informants due to this.

SIL has actually examined this, and they often include a test called “home-town” in which people are presented with narratives within their own dialect and an intelligibility score is given for that. It is true that sometimes this is lower than 100%, but it is typically not much lower. Nevertheless, using the “home-town factors” of Lects A and B as controls in factor analysis helps greatly when moving on to actual intelligibility between Lect A and Lect B.

One thing to do is to throw out all sentences or questions that score less than 100% on home-town, since if the speakers can’t even understand these sentences well when their own people speak them, how can we measure how well they understand them when speakers of other lects speak them?

Campbell suggests that there are no tests available to use on human informants that pass the smell test of empiricism. This is not the case.

One test, the Sentence Repetition Test (SRT), has been used for decades, subjected to many papers and studies, and criticized and modified in many ways.

In this case of SRT, testing of group members individually has been shown to be superior to testing them in groups. The reason for this is because when you do intelligibility testing in a group of say eight people, you can run into a strong personality or high-ranking male in that group who might say he understands much more than he really does for some reason or another,  possibly to show off. The other less dominant group members then follow his lead and give false high readings on the intelligibility test.

Many linguists, led by SIL, have been leading the way in intelligibility testing for decades now. Some of the top figures in in this subfield are the couple Joseph and Barbara Grimes of SIL. Joseph Grimes is a retired linguistics professor from Cornell.

In addition, a number of computer programs have been created that help the researcher to test intelligibility.

Another charge, that intelligibility testing lacks adequate controls, has been shown to be false. Bias in both experimenter and subject has been shown to be a problem, as is the case in most or all science, and measures have been undertaken to deal with it.

The notion that this subfield of Linguistics, intelligibility testing, is unscientific should be laid to rest.

Ethnologue seems to place tremendous importance on mutual intelligibility, however defined. Mutually unintelligible lects are assumed to be separate languages by Ethnologue. Their criteria for splitting off a dialects into languages seems to be 90%. Below 90%, separate languages. Above 90%, dialects of a single language.

In conclusion, Mr. Campbell’s principal contentions in his critique are all incorrect.

First, he suggests that the very concept of mutual intelligibility between lects is impossible to define or prove. SIL has shown that the concept can be defined and tested by reliable instruments.

Second, he says that the use of human informants in mutual intelligibility testing is so prone to error that it cannot guarantee satisfactory results. This is not the case. SIL has proven, through decades of testing, that mutual intelligibility is best done, or possibly can only be reliably done, through intelligibility tests with human informants.

Third, he throws up a number of red herrings that supposedly prove the inherent unreliability of human informants in intelligibility testing. All of these are shown to be the very red herrings that I claim they are, although it is true that unrecognized bilingualism is a problem, but it can often be ferreted out.

Fourth, he says that the only way to reliably test for intelligibility is to compare lects via tones, phonology, morphology, syntax and lexicon. This is an extremely complicated process utilizing math and computer programs and can only be undertaken by practiced linguists. In truth, such elaborate testing, while interesting, is entirely unnecessary.

Fifth, he suggests that any Western reformulations of Chinese language classification need to first defer to the Chinese. The problem here is that the Chinese have completely fallen down on the job. We cannot defer to the Chinese without upsetting our entire system of language classification. The Chinese are entitled to their system, but it is at odds with that used by the rest of the world.

References

Casad, Eugene H. 1974. Dialect Intelligibility Testing. Summer Institute of Linguistics Publications in Linguistics and Related Fields, 38. Norman, OK: Summer Institute of Linguistics of the University of Oklahoma.

Casad, Eugene H. 1992. “State of the Art: Dialect Survey Fifteen Years Later.”‭ In Eugene H. Casad (ed.), Windows on Bilingualism, 147-58. Summer Institute of Linguistics and the University of Texas at Arlington Publications in Linguistics, 110. Dallas, TX: Summer Institute of Linguistics and the University of Texas at Arlington.

Grimes, Barbara F. 1992. “Notes on Oral Proficiency Testing (SLOPE).”‭ In Eugene H. Casad (ed.), Windows on Bilingualism, 53-60. Summer Institute of Linguistics and the University of Texas at Arlington Publications in Linguistics, 110. Dallas, TX: Summer Institute of Linguistics and the University of Texas at Arlington.

Grimes, Joseph E. 1992. “Calibrating Sentence Repetition Tests.”‭ In Eugene H. Casad (ed.), Windows on Bilingualism, 73-85. Summer Institute of Linguistics and the University of Texas at Arlington Publications in Linguistics, 110. Dallas, TX: Summer Institute of Linguistics and the University of Texas at Arlington.

Grimes, Joseph E. 1992. “Correlations Between Vocabulary Similarity and Intelligibility.”‭ In Eugene H. Casad (ed.), Windows on Bilingualism, 17-32. Summer Institute of Linguistics and the University of Texas at Arlington Publications in Linguistics, 110. Dallas, TX: Summer Institute of Linguistics and the University of Texas at Arlington.

12 Comments

Filed under Asia, China, Chinese language, Language Classification, Language Families, Linguistics, Regional, Sinitic, Sino-Tibetan

The Place of Mandarin in Sinitic

In the comments, James Schipper suggests that Mandarin is to Sinitic what German and Russian are to Germanic and Slavic. He also offers that most Sinitic speakers also speak Mandarin and makes a comparison with Welsh and English and Frisian and Dutch, where every Welsh speaker speaks English and every Frisian speaker speaks Dutch, and each one would rather write in English or Dutch than in Welsh or Frisian.

My comments:

English and German have 60% lexical similarity. English and French have about 25% and English has about 29% with Russian (I need to check on that one!). I need to look at some charts here.

It’s not uncommon for Chinese lects to have 5-30% lexical similarity. Further, there are deep differences in tones, and even grammar and structure. Even the pronouns can differ. But clearly they are all related to German and they all derived form Chinese.

So yes, your analogy with Russian and German as super-languages on top of their families is correct, but it is important to note the vast differences in the lects. It was said that no one could understand Chairman Mao’s dialect, Xiang Nan (Mandarin dialect). Apparently his secretary could understand him, but few others could. I’m not sure how he got his points across.

Further, at this point probably most speakers of the Sinitic languages for sure speak Putonghua, which is the Standard Mandarin. It’s a standard the same way that High German is Standard German and Standard Italian is the standard for that language. However, overseas, many do not speak Putonghua, and in the Cantonese area, I believe many still do not speak Putonghua. English is a Germanic language.

Look at the vocabulary – closest language is Frisian with 64%. Dutch is 62% and German is 60%. French is 25%. English is clearly a Germanic language. There are similar cases with the English Latin layering in the Chinese languages. Some of them have heavy layers of non Sinitic tongues like Zhuang or Hmong.

Besides Putonghua, you are correct that the vast majority of Sinitic speakers are native speakers of some kind of Mandarin.
I believe that a lot of the older folks do not have very good Mandarin and may be monolinguals of their Sinitic tongue, but I’m not sure. The government has been pushing Putonghua very hard for the past decade or so, almost too hard. It’s been killing the smaller tongues. So it’s not quite the same way with Frisian and Welsh yet. I believe it’s pretty common in the South to find Cantonese speakers who don’t speak Mandarin, and it’s for sure the case overseas.

As far as writing, I don’t believe it’s a problem. An ideographic system was perfect for Chinese as it was the one way that all of the speakers of the various Chinese lects could communicate. My father was in China in 1946 and he said that the rickshaw drivers often could not understand each other, but they could all write Chinese, so they would communicate by writing notes.

All Chinese can write to each other, no matter what language they speak, assuming they are literate. A decade ago in a college in Henan, a professor said that the students would come to the college from all over the province and for the first month would communicate by writing notes to each other, so they all wrote a common language. In that province, every county has its own language, and there are even separate languages within counties. It took them about a month or so before they could start working out each other’s languages.

Some comment that the Chinese languages are like a Cockney accent of English. On a website, a commenter said that that’s not true. He said he can understand Cockney, but they had a speaker of an Anhui Mandarin lect as a professor at the university and no one could understand what he was talking about. So it’s quite common for the various Chinese lects to be pretty much incomprehensible to each other.

There are other comments around the Net that say that the Chinese lects are close enough to pick them up if you spend a bit of time there. That’s not really true. The differences between the Chinese lects are often as different as English and German. Now suppose you are an English speaker and you go to Germany. Are you going to “just pick up” German really vast? Forget it. I mean, if you stay there 3 years, maybe. Maybe! Someone else compared the differences between Chinese lects to the gulf between English and Irish. That may be too distant, but it may also be correct.

Differences between the lects ecompass tones, grammar and lexicon. All of them boil down to intelligibility. The major Chinese lects regularly score around 50-60% intelligibility. That is pretty bad and certainly does not qualify them as dialects. A dialect should have 90% intelligibility or more.

This is especially true in the center and south of the country. In Anhui, Fujian , Henan, Hunan , Jiangsu and Zhejiang there is an incredible diversity of tongues. It is said in Fujian that every 3 miles the culture changes and every 6 miles the language changes.

In these parts of China, there are lots of mountains and it is very rural. Many people never left their home village to go over the mountain to talk to the people over there, so a multitude of tongues arose. I understand that in this part of China there are even incomprehensible tongues inside major cities where the downtowners can’t understand the suburbs.

5 Comments

Filed under Asia, China, Chinese language, Language Classification, Linguistics, Mandarin, Regional, Sinitic, Sino-Tibetan

A Reworking of Chinese Language Classification

Updated April 10, 2013. This post runs to 112 pages so far. On March 6, 2011, Sinologist Victor Mair took on the question of Mutual Intelligibility of Sinitic Languages.

The Chinese languages have undergone a lot of reclassification lately (Mair 1991), from one Chinese language a couple of decades ago up to 14 Chinese languages today according to the latest Ethnologue.

However, Jerry Norman, one of the world’s top experts on Chinese, says that based on mutual intelligibility, there are 350-400 separate languages within Chinese alone (Mair 1991). According to Gong Xun, a Sichuan Mandarin speaker in Deyang, China, by my criteria of distinguishing between language and dialect, there would be 300-400 separate languages in Fujian alone.

So far, 2,500 dialects of the Chinese language have been identified, and a number of them are separate languages.

I have been doing research on this issue recently. Based on the criteria of mutual intelligibility, I have expanded the 14 Chinese languages into 360 separate languages.

There are different ways of doing mutual intelligibility. I decided to put it at 90%, with >90% being dialect and <90% being a separate language. Experts in Chinese linguistics concurred that this seems to be a reasonable way to divide dialect from language (Mair 2009). This is based on what appears to be Ethnologue‘s criteria for establishing the line between a dialect and a language.

In the cases below where I had intelligibility data available, a number of Chinese languages had no more than 65% intelligibility between them (Cheng 1991).

Intelligibility is hard to determine. I am not interested in typological studies of lects involving either lexicon, phonology or tones, unless this can be quantified in terms of intelligibility in a scientific way (see Cheng 1991). For the most part, what I am interested in is, “Can they understand each other?”

The data below is best regarded as a pilot study.

Reasonable, fair-minded and professional comments, additions, criticisms, elaborations, presentations of evidence, etc. are highly encouraged, as long as politics and emotions are left out of it. The purpose of the classification below is more to stimulate academic interest and sprout new thinking and theory. It is not intended to be an end-all or be-all statement on the subject, in fact, it is quite the opposite.

Interested scholars, observers or speakers of Chinese languages are encouraged to contribute any knowledge that they may have to add to or criticize this data below. So far as I know, this is the first real attempt to split Chinese beyond the 14 languages elucidated by Ethnologue.

There are lapses in the data below. I mean to present this data in outline form to make it more readable.

There are also problems with the data below. In many cases, “separate language” just means that the lect is not intelligible with Putonghua. Unfortunately, I currently lack intelligibility data within the major language groups such as Gan, Xiang, Wu and the branches of Mandarin. There is probably quite a bit of lumping still to be done below. Where lects are mutually intelligible below, I have tried to lump them into one language with various dialects.

It is reasonable to ask what background and expertise I have to write such a post. I have a Masters Degree in Linguistics and have been employed as a salaried linguist for a US Indian tribe.

I assume it will be controversial. Keep in mind that this work is extremely tentative and should not be taken as the last word on the subject by a long shot. There are claims that this study claims to be “accurate and precise.”

In truth, it claims nothing of the sort. Pilot studies, which is what this is, are de facto never “accurate and precise,” and you can take an extreme argument from scientific philosophy that no science is really “accurate and precise” but is simply “correct for now” or “correct until proven otherwise.”

Gan is a separate language, already identified as such. Many individual Gan lects are unintelligible to other Gan lects. In fact, it is possible that all Gan lects are unintelligible with each other, but that remains to be proven.

Outside of Gan Proper, Leping, while very diverse, is nevertheless intelligible with nearby Gan lects and with Nanchang (Campbell January 2009).

Nanchang and Anyi are apparently separate languages within Gan based on a 200 word Swadesh test (Ben Hamed 2005). Nanchang has a great deal of dialectal diversity, with several dialects covering different cities and the rural areas. Intelligibility is not known. Jiangyu, spoken in Hubei, is very strange and at least unintelligible to Putonghua speakers, as is Huarong (evidence). Huarong is surely a separate language.

Similarly, Wanzai must surely be a separate language, as must Yichun, Ji’an, Wanan, Fuzhou, Yingtan, Leiyang, Huaining and Dongkou.

Nanchang and Anyi are within the Changjing Group of Gan, which has 15 different lects. Yingtan and Leping are members of the Yingyi Group has 12 lects. Jiangyu and Huarong are members of the Datong Group of Gan, which has 13 lects. Yichun is a member of the Yiliu Group of Gan, which has 11 lects. Wanzai is a member of the Yiping Group of Gan, of which it is the only member.

Leiyang is a member of the Leizi Group of Gan, which has 5 lects. Wanan is a member of the Jilian Group of Gan, of which it is the only member. Ji’an is a member of the Jicha Group of Gan, which has 15 lects. Huaining is a member of the Huaiyue Group of Gan, which has 9 lects. Fuzhou is a member of the Fuguang Group of Gan, which has 15 lects. Dongkou is a member of the Dongsui Group of Gan, which has 5 lects.

Gan has 102 separate lects in it. There are 30 million speakers of the Gan languages.

Within the Min group, Northern Min (Min Bei) and Central Min (Sanminghua) have already been identified as separate languages. There are 50 million speakers of all of the Min languages (Olson 1998). Northern Min has only 0-20% intelligibility with Min Nan.

Central Min has three lects, Shaxian, Sanming and Yongan, but we don’t know if there are languages among them. Central Min has 3.5 million speakers.

Northern Min is said to be a single language, although it has 9 separate lects. Most dialects are said to be mutually intelligible, but Jianyang and Jian’ou have only about 75% intelligibility. Northern Min has 10 million speakers.

The standard dialect of Min Dong or Eastern Min is Fuzhou. Eastern Min has only 0-20% intelligibility with Min Nan. Chengguan, Yangzhong and Zhongxian are separate languages, all spoken in Youxi County (Zheng 2008).

Beyond that, Eastern Min is reported to have several other mutually unintelligible languages. One of them is Fuqing, located near Fuzhou but not intelligible with it, according to Wikipedia, but others say the two are mutually intelligible, although speakers are divided on the question.

It appears that possibly Fuzhou speakers can understand Fuqing speakers better than the other way around. Fuzhou and Fuqing are about 65% intelligible in praxis, and it about the same with the rest of the Hougan Group (Ngù 2009).

Ningde, Fuding and Nanping are probably other languages in this family (evidence). Of these three, Ningde is definitely a separate language. According to George Ngù, a passionate proponent of Fuzhou, “Fuzhou is not intelligible even within its many varieties.”

It’s not clear if that applies to all of Eastern Min, but it appears that it does. Therefore, Changle, Gutian, Lianjiang, Luoyuan, Minhou, Minqing, Pingnan, Pingtan, Yongtai, Fuan, Fuding, Shouning, Xiapu, Zherong and Zhouning are all separate languages.

There are two other lects lumped in with Eastern Min. Manjiang is spoken in the central part of Taishun County, and Manhua spoken in the eastern part of Cangnan County. Both of these names mean “barbarian speech.”

Both are probably mixtures of Southern Wu (Wenzhou etc.), Eastern Min, Northern Min, and maybe even pre-Sinitic languages. Manhua and Manjiang are not intelligible with Fuzhou. However, Manjiang has affinity with Shouning in phonology, vocabulary and grammar. Whether or not it is intelligible with it is not known. Min Nan speakers who have looked at Manjiang data say that it doesn’t even look like a Sinitic language.

Manhua is best dealt with as a form of Wu. I discuss it further below under Wu.

Fuding, Fuan, Shouning, Xiapu, Zherong and Zhouning are in the Funing Group of Eastern Min, which has 6 lects.

Fuzhou, Fuqing, Chengguan, Yangzhong, Zhongxian, Ningde, Changle, Gutian, Lianjiang, Luoyuan, Minhou, Minqing, Pingnan, Pingtan, Yongtai and Nanping are in the Houguan Group of Eastern Min, which has 16 lects.

Eastern Min contains 23 separate lects.

Within Min Nan, Xiamen and Teochew are separate languages (evidence). There is even a proposal to split Xiamen, Qiongwen and Teochew into three separate languages before SIL. Amoy, Taiwanese, Jinjiang, ZhangzhouTainan, Taibei, Yilan, Taichung, Quanzhou and Lufeng are part of the Xiamen group. Jinmen is apparently a separate language, as it has poor intelligibility with Taiwanese. A much better name for Xiamen according to the Chinese literature is Quanzhang (Campbell January 2009).

Quanzhang is a combination of Quanzhou and Zhangzhou, two of the most important dialects in the language. Xiamen has only 51% intelligibility with Teochew. Whether or not Zhangzhou and Quanzhou are intelligible in China itself is still somewhat of an open question.

Nevertheless, Quanzhou speakers in Singapore can no longer understand Taiwanese or Xiamen well, though they have partial understanding of them. They have only 30-40% intelligibility with Yilan. Nevertheless, they have good understanding of Zhangzhou. This implies that much of the understanding between at least some of the Xiamen lects was due to bilingual learning.

The Yilan dialect on Taiwan is so different that it alone has posed serious problems for the task of standardizing Taiwanese Min Nan, yet it is intelligible with the rest of Taiwanese (Campbell January 2009). Lugang is also very different, but is also intelligible (Campbell 2009).

There are some communication problems for Tainan speakers hearing Taipei, but it appears that they are still intelligible with each other (Campbell January 2009).

JieyangRaoping, Chaoyang, Shantou (Swatow) and Hailok’hong (Haklau) are lects in the Teochew Group (evidence) of Teochew. Teochew (Chaozhou) is the prestige version of Teochew. Chaoyang speakers can understand Jieyang, Raoping (evidence) and Shantou, but intelligibility is difficult with Haifeng and Lufeng. Shantou, Raoping, and Jieyang are then dialects of Chaoyang.

Zhangzhou and Quanzhou have marginal intelligibility with Teochew varieties. They are both spoken in Taipei, Taiwan. After all, Taiwanese itself is just a mixture between Zhangzhou and Quanzhou. The situation in Taipei was interesting. The dialects of the city were a mix of Zhangzhou and Quanzhou. The dialect of the center of the city was mixed between the two, with a slight Quanzhou lean to it. In Sulim (Shilin), people spoke with a dialect that heavily favored Zhangzhou. Other districts spoke a Tang’oann-type dialect, which is just Quanzhou mixed with a bit of Zhangzhou.

All these conditions are more common with the older generation because the new generation either does not speak Teochew at all or they favor the mixed Zhangzhou-leaning “Southern” style favored in the media, or they just do not speak the language at all. Hailok’hong (Haklau) is spoken down the coast between the Teochew zone and the Hong Kong area. It has marginal intelligibility with other Teochew lects. Nevertheless, Taiwanese speakers can no longer understand the pure Quanzhou spoken in the Chinese city of that name.

On the other hand, Chaoyang itself is unintelligible to some other Teochew lects. Shantou speakers cannot understand some of the other Teochew lects, and speakers of other lects often find Shantou hard to understand.

Sources report that Teochew lects can vary greatly in the pronunciation of even single words, and the tones can be quite different too.

There are claims that Teochew is intelligible with Zhangzhou and Quanzhou, but these claims appear to be incorrect (see above). That might make some sense, as Teochew are a group of Min speakers who broke off from Zhangzhou Min about 600-1,100 years ago. They moved down to northeast Guangdong, after hundreds of years, a heavy dose of Cantonese went in, producing modern Teochew.

chinese language map

Teochew has only 51% intelligibility with Xiamen.

Haifeng and Shanwei are members of the Luhai Teochew subgroup of Teochew, which differs markedly from Teochew and may be a separate language. Luhai is said to be halfway between Teochew and Zhangzhou. Luhai probably represents a later move from Zhangzhou towards northeast Guangdong by the same group that formed Teochew. This move may have occurred around 400 years ago.

Lufeng is said to have over 90% intelligibility with Xiamen, but if it is really halfway between, it should have 75% intelligibility. Intelligibility testing may be needed.

The Teochew spoken in Indochina – in particular, in Vietnam and Cambodia (Indochinese Teochew) may be a separate language. Some Indochinese Teochew speakers who have returned to their family villages say they could only understand 70% of the speech there.

Furthermore, intelligibility is difficult between Malay Teochew and other Teochew, such as SE Asian Teochew and Teochew on the mainland. Malay Teochew is spoken in Malaysia, Singapore and Indonesia.

The Teochew variant spoken in Malaysia is composed of many highly variant lects. Whether or not they are mutually intelligible with each other is not known. The variety spoken in Medan, Indonesia is particularly interesting. It has heavy Malay and Cantonese influence and cannot be understood by other Teochew speakers. Teochew has 10 million speakers.

Zhangping, though close to Xiamen, is a separate language according to a 200 word Swadesh test (Ben Hamed 2005).

Sanjiang appears to be a separate language . Datian, in Fujian, is also a separate language.

A version of Hokkien called Malay Hokkien is spoken in Malaysia and in Indonesia in Sumatra and Kalimantan. In Indonesia, it is spoken in the city of Medan, the state of Riau, the city of Bagansiapiapi on Sumatra and in a few places on Kalimantan, such as Kuching and especially in Brunei. Malay Hokkien is heavily laced with Teochew.

Northern Malay Hokkien is spoken from Taiping along the coast formerly all the way to Phuket but now only to Pedang in Malaysia and in Indonesia in the city of Medan, the state of Riau, the city of Bagansiapiapi on Sumatra and in a few places on Kalimantan, such as Kuching and especially in Brunei. Speakers of Northern Malay Hokkien have a hard time understanding the Southern Malay Hokkien (see Singapore Hokkien below) spoken in Kelang, Malacca and Singapore. Northern Malay Hokkien is creolized, with Malay and Thai embedded deeply in the language.

Southern Malay Hokkien is less creolized, if at all. Singapore Hokkien lies between Northern Malay Hokkien and Taiwanese on the continuum. A very pure variety of Hokkien is spoken in the Indonesian city of Bagansiapiapi. It has avoided the Mandarinization of Hokkien that is occurring elsewhere. They speak like the Hokkien speakers of Tang’oann (Tong’an), China.

Kelantan Hokkien is spoken in the Malay state of Kelantan. It is wildly creolized with Malay and is probably not intelligible with any other form of Hokkien.

The version of Hokkien spoken in the Philippines is often called Binamhue, Banlamhue or Minanhua (Philippines Hokkien) by speakers, derives from a dialect on the outskirts of Quanzhou, and it may have drifted into a separate language. At present, it is sometimes not intelligible with Quanzhou or Xiamen. That is, some Philippines Hokkien speakers claim that they can only understand about 70% of Taiwanese television.

The version of Min Nan, Singapore Hokkien (Southern Malay Hokkien), spoken in Singapore, Kelang and Malacca is similar to that spoken in Taiwan, but many Singapore Hokkien speakers have a hard time understanding Taiwanese Hokkien, while others can understand it just fine. Older Singapore Hokkien speakers can understand Taiwanese Hokkien better than younger ones. This is due to bilingual learning more than anything else because younger Singapore Hokkien speakers are no longer good at understanding other Min Nan dialects due to lack of exposure to them.

The reason that Taiwanese speakers can seem to speak communicate well with Singapore Hokkien speakers is because they are using a simpler vocabulary. A Singapore Hokkien speaker, if immersed in Taiwan, could pick up Taiwanese fairly quickly, within say 3 months.

An umbrella term covering Malay Hokkien, Singapore Hokkien and Philippines Hokkien may be Nusantaran Hokkien.

Another language in the same group is best called Wan’an, comprising a number of dialects and possibly languages in Wan’an County of Fujian (Branner 2008). Zhaoan, Pinghe and Yunxiao, also of Fujian, are separate languages.

Wan’an and Longyan are not mutually intelligible (Branner 2008). Longyan seems to have about 85% intelligibility with Taiwanese. Koongfu and Shizhong are apparently dialects of Longyan Min and are probably intelligible with it. Koongfu is spoken in Kanshi Township in Yongding County. Shizhong is spoken in southern Longyan County.

There are many varieties of Southern Min spoken in Western Fujian that may or may not be independent languages.

Liancheng Gutyan Junbao, Longyan Wan’an Wuzhai, Longyan Wan’an Songyang, Longyan Wan’an Tutuan, Longyan Baisha Youshui, Shiahtsuen Buhyun Liling, Shanghang Buhyun Liling, Liancheng Xuanhe Shengxing, Shanghang Gutian Laifang, Liancheng Xinquan Linguo, Liancheng Xinquan Lelian, Liancheng Pengkou Wangcheng, Liancheng Miaoqian Zhixi, Liancheng Gechuan Zhuyu, Liancheng Miaoqian Jiangshe, Liancheng Sibao Shangjian Zhenbian, Liancheng Juxi Gaoding, Liancheng Tangqian Dikeng, Liancheng Wenheng Hengming, Liancheng Xinquan Dongnancun, Liancheng Quxi Puxi Dongxiduan, Liancheng Quxi Qiaotou and Liancheng Liwu Nanban Zhangwu are spoken in Western Fujian. Shiahtsuen is spoken in Laiyuan Township in southeastern Liancheng County. (Branner 2000).

Whether or not these lects are dialects or separate languages is difficult to say. With many of these lects, they don’t understand each other at first, but after they talk to each other for a while, they start to figure out the other lect. (Branner 2008). Intelligibility testing needs to be done for these lects.

Quanzhou, Zhangzhou, Singapore Hokkien, Philippines Hokkien, Xiamen, Amoy, Yilan, Tainan, Taipei, Taichung, Taiwanese, Jinjiang, Lufeng, Lugang, Jinmen, Zhangping, Koongfu, Shizhong, Nanjing, Zhaoan, Pinghe, Yunxiao, Longyan, Wan’an, Liancheng Gutyan Junbao, Longyan Wan’an Wuzhai, Longyan Wan’an Songyang, Longyan Wan’an Tutuan, Longyan Baisha Youshui, Shiahtsuen, Shanghang Buhyun Liling, Liancheng Xuanhe Shengxing, Shanghang Gutian Laifang, Liancheng Xinquan Linguo, Liancheng Xinquan Lelian, Liancheng Pengkou Wangcheng, Liancheng Miaoqian Zhixi, Liancheng Gechuan Zhuyu, Liancheng Miaoqian Jiangshe, Liancheng Sibao Shangjian Zhenbian, Liancheng Juxi Gaoding, Liancheng Tangqian Dikeng, Liancheng Wenheng Hengming, Liancheng Xinquan Dongnancun, Liancheng Quxi Puxi Dongxiduan, Liancheng Quxi Qiaotou and Liancheng Liwu Nanban Zhangwu are all members of the Quanzhuang Group of Min Nan, which has 50 lects.

Teochew, Shantou, Lufeng, Haifeng, Chaoyang, Jieyang, SE Asian Teochew and Malaysian Teochew are members of the Chaoshan Group of Min Nan, which has 12 lects.

Datian is in its own group in Min Nan.

Min Nan consists of 68 separate lects. Clearly, the dialectal relationships of Min Nan are confusing, as many of the lects are very closely related, if not fully intelligible. Intelligibility testing may be needed to sort out some of these issues. There are 30 million speakers of Southern Min.

Zhenan Min, spoken in Zhejiang Province around Pingnang and Cangnan and in the Zhoushan Islands, is a separate language. Zhenan Min contains 4 lects, Pingyang, Cangnan, Dongtou and Yuhuan, which may or may not be languages. Zhenan Min has 574,000 speakers. Zhenan Min is influenced by Eastern and Northern Min.

Qiongwen (Hainanese) is a separate language with 8 million speakers. It has the lowest intelligibility with the rest of Southern Min as any other Min Nan lect. Qiongwen itself has 14 separate lects, all spoken on Hainan. Whether or not any of them are separate languages is not known.

Longyan (Branner 2008) is a separate language, apart from Southern Min. It is spoken in Longyan City’s Xinluo District and Zhangping City and has 740,000 speakers. It has heavy Hakka influence due to the large number of Hakka speakers in the surrounding areas.

Another split in Min is Leizhou. Leizhou Min is a separate language and is now recognized by some as a separate branch of Min altogether, along the lines of Southern and Northern Min. Leizhou consists of 7 different lects. Haikang appears to be just a dialect of Leizhou.

However, at least some of the other 6 Leizhou lects are very different in phonology and lexicon. Intelligibility data is not known, but they may be intelligible. Leizhou Min, with 4 million speakers, has low intelligibility with Min Nan lects and has only 50% intelligibility with Hainanese.

Shaojiang Min, or Min Gan, is said to be a completely separate high-level division of the Min language like Leizhou Min. It has four lects – Shaowu, Guangze, Jiangle and Shunchang – that are said to be mutually intelligible. There are subdialects within these larger lects. The substratum of Shaojiang is not Min, Gan or Hakka – instead, it is the ancient Baiyue language.

Puxian Min has already been identified as a separate language. Puxian has 3 separate lects. There are minor differences between these lects.  However, there is a form of Puxian Min spoken in Singapore, Hinghwa, and presently it lacks full intelligibility with Puxian Min proper. Puxian speakers are a minority in Singapore, and their language has mixed a lot with Singapore Hokkien, Malay, English and other languages spoken in Singapore, resulting in a separate language.

A Min language called Longdu, located in Guangdong, is not only a separate language (evidence here and here) but seems to be in another Min category from Southern Min. It is spoken in the southwest corner of Zhongshan City in Shaxi and Dayong.

In Guangdong Province, there are other divergent lects of Min Nan. Two others, Nanlang (also spoken in Zhongshan) and Sanxiang, are also separate languages. Nanlang is spoken 10 miles southeast of Zhongshan in Cuiheng. It is also spoken in Nanlang and Zhangjiabian. Sanxiang is spoken to the south of Zhongshan in the hilly rural areas.

In Chinese, Longdu, Nanlang and Sanxiang are referred to as All-Lung, South Gourd and Three Rural, respectively. Sources give Longdu and Nanlang 100,000 speakers and Sanxiang 30,000 speakers. 14% of the population of Zhongshan speaks Min. Nanlang now has mostly elderly speakers.

All of these seem to be in the same group, Zhongshan Min, and all are spoken in the Pearl River Delta near Hong Kong. Zhongshan Min has 150,000 speakers.

This group is possibly a Northern or Eastern Min group stranded way down in Guangdong. They are sometimes referred to in old literature as “Northeastern Min”. That’s not really a category. It often means Northern Min, but sometimes it means Eastern Min. These languages have all borrowed extensively from the type of Cantonese spoken in the Pearl River Delta.

Looking at the whole picture, it appears that various immigrants speaking Puxian Min, Northern Min and Southern Min all settled around Zhongshan. These various Min elements, along with a hefty dose of Cantonese, have gone into the creation of Zhongshan Min.

Sanxiang, Nanlang and Longdu are apparently not mutually intelligible, although Nanlang is close to Longdu. Sanxiang is more divergent. Further, there are more dialects within these three languages, and dialectal divergence is considerable, with possible communication difficulties among them.

Sanxiang has at least two dialects, Phao and Tiopou. Phao is fairly uniform across a number of villages, but Tiopou is quite different. Nevertheless, there is near-full intelligibility. For now, we will just list Sanxiang, Nanlang and Longdu as separate languages, with possible dialects Phao and Tiopou (Sanxiang); Nanlang A and Nanlang B; and Longdu A and Longdu B, among them.

A very strange lect is spoken by the She people in Zhejiang, Fujian and Guangdong. The She language was originally Hmong-Mien, then added a Cantonese layer, then a Hakka layer, then a Min layer, and in Zhejiang, a Wu layer. It is best described as a Hmong-Mien language that has been Sinicized. There are probably 200,000 speakers of this language.

There is also an original She language that is non-Sinitic (Hmong-Mien) and is spoken by only about 1,000 people in Guangdong.

In Eastern Guangdong, the She speak the Chaoshan She language. They live in the Phoenix Mountains in Chao’an County in Chaozhou prefecture. It has had heavy contact with Chaoshan (Teochew) Min group. This is probably a separate language, unintelligible with other She languages and also with Chaoshan Min.

Within Hakka, besides Hakka Proper (Meixia)Tingzhou is a separate language (evidence). Wuhua Hakka is intelligible with Meixian.

Fangcheng and Dabu are close to Meixian, but intelligibility data is lacking. Fengcheng has five different lects within it, but intelligibility is not known. Hong Kong Hakka is not intelligible with the Hakka spoken on Taiwan, nor with Dabu. Dongguan, spoken near Hong Kong, can understand Meixian, but Meixian cannot understand Dongguan.

Taipu or Taipo is spoken in the village of the same name in Hong Kong and is not intelligible with Meixian, nor is Wakia, also spoken in Hong Kong.

A variety of Hakka spoken in a part of Hong Kong called Shataukok is called variously Satdiugok, Sathewkok, Shataukok, Satdiukok or Satdiugok. It is said to be different from other Hakka, and evidence indicates that Shataukok may indeed be a separate language. Shataukok has dialects within it and they are different, but they are generally mutually intelligible.

All three of these are dialects of a more or less intelligible language called Hong Kong Hakka.

Located near Hong Kong, Shenzhen/Bao’an is a separate language.

Haifeng and Lufeng, located near each other in Guangdong, appear to be dialects of a separate language called Hailufeng.

Longchuan in northeastern Guangdong is a separate language (evidence), with poor intelligibility with other Hakka lects. Longchuan has four different dialects, Huangbu, Sidu, Chetian and Tuocheng. Sidu and Tuocheng are close and probably dialects of Longchuan. Sidu Longchuan has 18,000 speakers.

Boluo and Heyuan are separate languages, not mutually intelligible. Longchuan, Bolou and Heyuan are quite distant from other Hakka. Heyuan is spoken in central Guangdong.

Huizhou is mutually intelligible with Longchuan and also with Meixia and Dabu.

Sanxiang, spoken in Zhongshan prefecture, is different from all other Hakka, but intelligibility data is lacking.

It is possible that in northern Guangdong, there may be many different Hakka languages, since dialects tend to differ from village to village, and in many cases, communication is difficult.

The Hakka spoken in Kunming, Sarawak, in Malaysia, known as Ho Po Hak, is a separate language. It is very different from the Hakka spoken in Sabah, Malaysia, and it is similar to Hopo, spoken in Hopo, near Meizhou. Hopo is not intelligible with Dabu, Hailu or Meixian. Hopo appears to be a dialect of Jiaoling. Hopo has deep influence from Teochew Min, because it is located right next to the Teochew area.

The Gannan Group (or Ninglong Group) from Southern Jiangxi, Mingxi from Western Fujian, and the Yuemin Group from Southern Fujian and Southeastern Guangdong are separate languages.

In the Gannan Group are multiple lects. One of them is Xingguo, spoken in Xingguo County in Ganzhuo Prefecture (evidence). The Gannan Group is extremely diverse compared to the Hakka of Guangdong and Fujian. Gannan lects differ even from village to village.

With Gannan Hakka, we may be dealing with a situation of many different languages, as with Wu, Hui, Tuhua and Xiang. In fact, it quite possible that with Jiangxi Hakka, we may be dealing with every Hakka lect being a separate language, but that remains to be proven.

In Fujian Province, there is the wildly diverse Tingzhou Hakka Group mentioned above. Even within this group, there are separate languages, including Yongding, Liancheng, Changting, Xinquan, Qingliu, Mingxi, Ninghua and Shanghang (evidence). Gucheng is probably also a member of Tingzhou.

Sources say that each Hakka village in Fujian speaks its own lect, and that the lects are far enough apart to make communication from village to village very difficult. Therefore, we conclude that in addition to the above, we will add Wuping, Longyan, Zhaoan, Yunxiao, Shangsixiang, Fuding, Fuan, Gucheng and Nanjing Qujiang.

Luoyuan She Hakka is spoken in Fujian. It is an extremely diverse form of Hakka that differs from all other Hakka. It must surely be a separate language.

Chengdu is spoken in Chengdu, Sichuan. It is quite different from other forms of Hakka and has poor intelligibility with other forms.

On Taiwan, the Miaoli (Four Counties), Dongshi (Dapu) and Xinzhu (Hailu) lects are not mutually intelligible, nor is the mixed Gaoxiong lect created in order that these three lects could communicate with each other. Kunbei (Zhaoan) is very different and must be a separate language. Raoping may well be a separate language, but intelligibility data is lacking. In general, speakers of other kinds of Hakka find Taiwan Hakka to be hard to understand, possibly due to Southern Min influence.

Bangka Island Indonesian Hakka, spoken on Bangka Island in Indonesia, has diverged so radically with its tones that it is now a separate language. That is, speakers of other Indonesian Hakka lects say that they cannot understand Bangka Island speakers. It’s actually said to be a Hakka creole more than anything else.

In Indonesia, two other Hakka languages are spoken, Kun Dian Indonesian Hakka, spoken in Borneo, and Belitung (Ngion Voi) Indonesian Hakka. Kun Dian Hakka is the largest Hakka group in Indonesia. Most live at Pontianak and Singkawang, where they speak two different intelligible lects, but they have spread all over Indonesia. Kun Dian Hakka is a dialect of Meixian.

Belitung Hakka is spoken mostly on Sumatra and Borneo, and is characterized by a soft way of speaking. Belitung Hakka and Bangka Hakka say they cannot understand Kun Dian Hakka, but Kun Dian speakers say they can understand the other two for the most part. East Timor Hakka is a dialect of Meixian.

Jiexi is spoken in southeast Guangdong. Dayu is spoken in southern Guangxi. Liannan is spoken northwest Guangdong. Dongguan Qingxi is spoken in south-central Guangdong. Wengyuan is spoken in northern Guangdong. Ningdu is spoken in Jiangxi. Mengshan Xihe is spoken in eastern Guangxi. Hong Kong Hakka is spoken in Hong Kong.

Zhaoan Xiuzhuan is spoken in southern Fujian. Shanghang Pengxin, Basel Mission and Shanghang Guanzhuang Shangzhuo are spoken in West Fujian (Branner 2000).

Dayu, spoken in Jiangxi, is a separate language, not intelligible at least to Central, or Meixian, Hakka speakers.

Meixian, Wuhua and Bao’an are members of the Yuetai Group of Hakka, which has 23 lects. Within Yuetai, Wuhua and Dabu are members of the Xinghua subgroup, which has 5 lects. Xinghua has 3.4 million speakers. Bao’an and Lufeng are in the Xinhui subgroup of Yuetai, which has 9 lects. Xinhui has 2.4 million speakers.

Gaoxiong, Xinzhu, Dongshi and Miaoli are members of the Jiaying Group of Hakka, which has 7 lects.

Tingzhou, Yongding, Liancheng, Changting, Xinquan, Shanghang, Basel Mission, Shanghang Pengxin, Wuping, Ninghua, Qingliu and Mingxi are all part of the diverse Tingzhou Group of Hakka. All told, Tingzhou has 12 lects, all of which are separate languages.

Longchuan, Boluo and Heyuan are members of the Yuezhong Group of Hakka, which has 5 lects.

Huizhou is in its own subgroup of Hakka.

Xingguo and Ningdu are in the Ninglong Group of Hakka, which has 13 lects. This group is said to be very diverse, with lects differing from village to village.

Liannan and Wengyuan are members of the Yuebei Group of Hakka, which has 11 lects and must surely be a separate language.

Dayu is a member of the Yugui Group of Hakka, which has 43 lects.

Ho Po Hak, Bangka Island, Nanjing Qujiang, Jiexi, Dayu, Hong Kong, Mengshan Xihe, Zhaoan Xiuzhuan, Nanjing Qujiang, Fuan, Fuding and Haifeng are unclassified.

There are 12 major Hakka lects and 210 Hakka lects altogether. Others claim that there are over 1000 Hakka lects spoken in China. There are 30 million speakers of the various Hakka languages. The dialect situation with Hakka, as with Min Nan, is quite confused and somewhat contradictory. Intelligibility testing could clear up some of the confusion. Some speakers report adequate intelligibility between lects, while others report difficulty.

Putonghua is Standard Mandarin, based on the Beijing dialect as of 1949, but it has since diverged wildly and many Putonghua speakers today cannot understand Beijing. Putonghua is being promoted as the national language of China. In addition to Putonghua, there 1,500 other dialects of Mandarin spoken in China. In general, other Mandarin dialects are not intelligible to Putonghua speakers (Campbell April 2009).

However, the Northeastern dialects and the dialects around Beijing may be more intelligible than the Mandarin dialects in the rest of the country. The implication is that there may be as many as 1,500 Mandarin languages in China. However, many of these Mandarin dialects are intelligible with at least some other Mandarin dialects. Hence, despite the lack of intelligibility with Putonghua, there is a lot of potential lumping within Mandarin.

The degree to which Mandarin dialects are intelligible to each other is very much an open question and in general is poorly investigated.

Within Mandarin, besides Putonghua, the main branch, Jinan (New Jinan), Beijing and Tianjin (evidence and here) are not intelligible with Putonghua; however, Tianjin may be intelligible with Beijing, on the other hand, Tianjin is looking more and more like a separate language.

For one thing, Tianjin’s tones are quite different from Putonghua’s, and its tone sandhi is much more complicated and it is more closely related to lects 150-500 miles away, since originally Tianjin speakers came from Anhui (Lee 2002). Some reports say that Tianjin is intelligible with Putonghua, so intelligibility testing may be needed.

Jinan is not intelligible with Putonghua, but may be learned over a period of weeks to possibly months, as it is close enough. Jinan is only 65% intelligible with Beijing.

Since Beijing, Tianjin, Nanjing City, Hebei and all of NE Mandarin may be intelligible, I am just going to make a language called Northeast Mandarin and call Beijing, Tianjin, Hebei and Nanjing City dialects of NE Mandarin for now. Beijing is has low intelligibility with other branches of Mandarin: 72% intelligible with Southwest Mandarin, 64% intelligible with Jilu Mandarin and Zhongyuan Mandarin and 55% intelligible with Jiaoliao Mandarin.

However, many Putonghua speakers claim that Beijinghua is not inherently intelligible with Putonghua. Complaints about unintelligible taxi drivers in Beijing are legendary. At the very least, competing views of the intelligibility of Beijinghua and Putonghua deserve investigation.

On the other hand, Beijinghua may be intelligible with Hebei and Nanjing City. I think that Hebei is clearly a dialect of Beijing. The lect of Beijing’s hutongs and taxi drivers is legendary for being hard to understand. It would be interesting to see whether Tianjin and Hebei speakers can understand it. Tianjin may be a separate language, since it is not intelligible with Beijinghua.

What probably happened was that Beijinghua and Putonghua have taken separate trajectories. This has also occurred in Italian, where, though Standard Italian was based on Tuscan, Standard Italian and Tuscan have taken separate trajectories since. It is said that if you see old Tuscan men on TV in Italy, a speaker of Standard Italian from southern Italy would need subtitles to understand them, but one from northern Italy would not.

Others say that Putonghua was based on the language of the Beijing suburbs, not the city itself.

For whatever reason, Beijinghua often seems to have less than 90% intelligibility with Putonghua, though the question needs further research. Beijinghua, in its pure and least mutually intelligible form, seems to be spoken mostly in the innermost hutongs and among taxi drivers and other low income and working class people. The lect of people with more education and money is probably a lot more comprehensible.

I would describe the real, pure, Putonghua as “CCTV speech”, the lect you hear on Chinese state television. Evidence that Beijinghua lacks full intelligibility with Putonghua is here, here, here, here, here, here, here and here.

The question of whether or not Beijinghua is a separate language from Putonghua is sure to be highly controversial. Perhaps intelligibility testing could settle the question.

Beijing is in a group all of its own called the Beijing Group. It contains 43 separate lects, and may contain more than one language.

We should also note here that even Putonghua, the language that was meant to tie the nation together, seems to be evolving into regional languages.

Guangdong Putonghua is not fully intelligible to speakers of the Putonghuas of Northern China and hence is probably a separate language.

There are also varieties of Putonghua that are spoken in Singapore and Taiwan. Taiwanese Mandarin is about 80-85% intelligible with Putonghua and is a separate language (Mair July 2009). Claims that Taiwan Mandarin is fully intelligible with Putonghua are incorrect.

Shanghai Putonghua is often not intelligible with Putonghua from other regions. It has heavy interference from Shanghaihua, which seriously effects the Putonghua accent. Even after four years of exposure to it, Standard Putonghua speakers often have problems with it.

In addition, Jianghuai Mandarin Putonghua and Zhengcao Mandarin Putonghua Putonghua are not intelligible with Putonghua from other areas (Campbell April 2009). These varieties of Mandarin cause a particular interference with Putonghua Mandarin that results in a severe dialectal disturbance in their Putonghua.

These Putonghuas are spoken in the regions native to the Jianghuai and Zhengcao dialects of Mandarin. Jianghuai is spoken in Anhui, Jiangsu, Hubei and to a much lesser extent Zhejiang Provinces. Zhengcao is spoken in Anhui, Henan, Shandong, Jiangsu, with one dialect is spoken in Hebei.

Although it is different, Singapore Putonghua is still intelligible with Putonghua. Malay Mandarin is said to be quite different but nevertheless intelligible. Nevertheless Malay Mandarin speakers say they have to make speech adjustments with Chinese speakers otherwise their speech is poorly intelligible. This implies that Malay Mandarin is indeed a separate language.

Yunnan Putonghua is intelligible with Putonghua from other regions (Campbell January 2009).

Cangzhou, spoken in southeastern Hebei, is a separate language. It is only partly intelligible with Putonghua. Renqiu, Huanghua, Hejian, Cangxian, Qingxian, Xianxian, Dongguang, Haixing, Yanshan, Suning, Nanpi, Wuqiao and Mengcun, all spoken in Cangzhou prefecture, are all dialects of Cangzhou. Cangzhou shares some similarities with Tianjin, but it is only partly intelligible with it.

Jinan is a member of the Liaotai Group of the larger Jilu Group, which has 37 lects.

The Baotang Group of Jilu has 52 lects. Tianjin forms its own subgroup within Baotang. Cangzhou, Renqiu, Huanghua, Hejian, Cangxian, Qingxian, Xianxian, Dongguang, Haixing, Yanshan, Suning, Nanpi, Wuqiao and Mengcun are members of the Huangle subgroup of Baotang, which has 25 lects.

Jilu itself consists of 170 lects.

Taiwanese Mandarin, while different from Putonghua, is intelligible with it. Singapore Mandarin has fewer differences then Taiwanese. Both are dialects of Putonghua.

Luoyang, Kiafeng, Changyuan and Zhengzhou, all in Henan Province, are not intelligible with Putonghua. However, all four are mutually intelligible with each other, so they are dialects of a single language, Henan Mandarin.

Xinyang, also spoken in Henan, is a separate language and cannot be understood by Luoyang speakers. Nanyang has high but not complete intelligibility with Luoyang. After a few weeks of close contact, Luoyang speakers can understand Nanyang, but initially, comprehension is poor due to different tones. Nanyang has 15 million speakers.

Luoyang and Gushi are unintelligible. In addition, Gushi is different from Nanyang and may not be intelligible with it. Intelligibility between Xinyang, Gushi and Nanyang is not known. In general, intelligibility between many lects in Henan is not good, but after a week or two of close contact, they can start to understand each other.

In Shaanxi, Yanan, Xian, Huxian (evidence), Zhouzhi (evidence) and Hanzhong are not intelligible with Putonghua. Let us call this language Shaanxi Mandarin. Xi’an, for instance, is about 65% intelligible with other Mandarin groups. Xining, in Xinghai, seems to be very different from other Shaanxi lects, and is probably a separate language altogether (evidence here and here) .

In Gansu Province, Tongwei is not intelligible with Putonghua, and Gansu Mandarin seems to be very different from other forms of Mandarin. Gansu Mandarin appears to be a separate language. However, within Gansu, there are divergent lects, such as Sale, which is unintelligible with other Gansu lects.

Bozhou (evidence), Yingshang (evidence) and Fuyang (evidence), spoken in Anhui, are at least unintelligible with Putonghua. Fuyang is very different. The lect spoken 300 km south of Jinan, around Mengcheng in rural Anhui, is said to be completely unintelligible with Putonghua, Tianjin and Beijinghua. For the time being, we will refer to this as one language, Anhui Mandarin. Intelligibility between lects of Anhui Mandarin is not known.

Anhui Mandarin Putonghua has poor intelligibility with Standard Putonghua due to its phonology. Therefore, it is a separate language.

Xian, Huxian and Zhouzhi are members of the Guanzhong Group of Zhongyuan, which has 45 lects.

Yanan, Hanzhong and Xining are members of the Qinlong Group of Zhongyuan, which has 67 lects.

Luoyang is a member of the Luoxu Group of Zhongyuan, which has 28 lects.

Kiafeng, Nanyang, Zengzhou, Changyuan and Bozhou are members of the Zhengcao Group of Zhongyuan. The Zhengcao Group has 93 lects.

Xinyang and Gushi are in the Xinbeng subgroup of Zhongyuan, which has 20 lects.

Tongwei and Sale are part of the Longzhong Group of Zhongyuan, which has 25 lects.

Yingshang is a member of the Cailu Group of Zhongyuan, which has 30 lects.

The Mandarin spoken in Qinghai is very different from that spoken in Gansu, but it’s not known if it is a separate language. They are both usually two types of Zhongyuan Mandarin.

Zhongyuan has a shocking 388 lects. Zhongyuan Mandarin is not fully intelligible with Putonghua. Zhongyuan Mandarin has 130 million speakers (Olson 1998).

Yichang (evidence), Longchang (evidence), Chengdu, Chongqing (evidence), Guilin and Nanping (spoken near Mt. Wuyi evidence), Longcheng (evidence), Luocheng (evidence), Luzhou (evidence here and here), Lingui (evidence), Jiuzhaigou (evidence) Xindu, Wenshan (evidence), Mianzhu (evidence here and here), Yangshuo (evidence), Wuhan (evidence) and Leshan (evidence) are all unintelligible with Putonghua.

Furthermore, Guilin is not intelligible with general Southwest Mandarin speech either. Wenshan at least is not intelligible with other Southwestern varieties (Johnson 2010).

Chengdu is part of a Sichuan Mandarin koine that is spoken in many of the larger cities in Yunnan. It includes Kunming, Bazhong , Dazhou, Neijiang, Zigong, Yibin, Luzhou, Chengdu, Mianyang, Deyang and Guiyang and is broadly intelligible (Xun 2009). Ziyang is intelligible with the koine, but has a heavy accent (Xun 2009). Leshan is unintelligible with the koine, but it can be learned in a few weeks of exposure (Xun 2009).

Dali is also not intelligible with Putonghua, but that is because Tibetan Mandarin has heavy Tibetan admixture.

Chongqing speakers cannot understand Chengdu or Luzhou speakers. The many small lects around Mt. Emei are not intelligible with Chengdu, appear to be be very different, and may one or more separate languages.

Wuhan is not intelligible to speakers of Southwest Mandarin from other provinces, for instance, it is only 80% intelligible with Chengdu. The intelligibility of Wuhan and Yichang is not known.

Dahua, spoken in and around Dahua village on the Puduhe River near Dongchuan in Yunnan Province, is apparently a separate language .

Another language spoken in Yunnan, Lanping, is also not intelligible with Putonghua and neither is Kunming(evidence). Kunming is not intelligible with Tuoyuan. The language spoken in Kunming is part of the Sichuan Mandarin koine that includes Kunming, Bazhong, Dazhou, Neijiang, Zigong, Yibin, Luzhou, Chengdu, Mianyang, Deyang and Guiyang.

Chuanlan is a little-known language spoken by the Tunbao people of Guangxi Province.

Yingshan is a separate language based on a 200 word Swadesh test (Ben Hamed 2005).

Menghai (evidence) may well be a completely separate language. The mutual intelligibility of Menghai, Guiyang and Kunming is not known. Guiyang is at least not intelligible with Putonghua. Guiyang is evolving into the Sichuan Mandarin koine, which is broadly intelligible with Kunming, Bazhong, Dazhou, Neijiang, Zigong, Yibin, Luzhou, Chengdu, Mianyang and Deyang.

Shaoshan, apparently Mao Zedong’s lect, spoken in Hunan Province, is a separate language. It was said although Mao had a secretary who could understand him well, not many others could. Another language spoken in Hunan, in Zhangjiajie County, is called Zhangjiajie Maoxi. The Maoxi are a tribal group there that speak a strange variety of Mandarin. Tuoyuan in Hunan is not fully intelligible with other Southwest Mandarin lects, or at least not with Kunming.

Junhua, or military language, is a language spoken by an ethnic group on Hainan in the city of Zonghe. It is said to be “Old Mandarin”, and is probably not intelligible with other lects. It is a form of Southwest Mandarin known as the Junhua Group, which contains 4 lects .

Guilin, Luocheng, Yangshuo, Liuzhou and Lingui are members of the Guiliu Group of Southwest Mandarin, which has 57 lects. Guiliu Southwest Mandarin is at least not comprehensible with Putonghua or Chengyu Southwest Mandarin.

Leshan and Longchang are members of the Guanchi Group of Southwest Mandarin, which has 85 lects. Within Guanchi, Longchang is a member of the Renfu Group , which has 13 lects.

Yichang, Chengdu, Chongqing and Yingshan are members of the Chengyu Group of Southwest Mandarin, which has 113 lects. Chengyu Southwest Mandarin is not comprehensible with Putonghua or Guiliu Southwest Mandarin.

Menghai, Kunming, Wenshan and Guiyang are members of the Kungui Group of Southwest Mandarin. The Kungui Group itself has an incredible 95 lects.

Lanping is in the Dianxi Group of Southwest Mandarin, which has 36 lects. Within Dianxi, it is a member of the Baolu subgroup, which has 21 lects.

Taoyuan is in the Changhe Group of Southwest Mandarin, which has 14 lects.

Wuhan is a member of Wutian Group of Southwest Mandarin, which has 9 lects.

Dali is a member of the Dianxi Group of Mandarin, which has 36 members. Within Dianxi, Dali is a member of the Yaoli Group, which has 15 members.

Nanping, Chuanlan, Shaoshan, Jiuzhaigou, Zhangjiajie Maoxi and Dahua are unclassified.

Southwest Mandarin itself has a stunning 519 lects and is not fully intelligible with Putonghua. There are 240 million speakers of Southwest Mandarin (Olson 1998).

Jianghuai Mandarin is a separate language. Yangzhou is considered to be a separate language by a 200 word Swadesh test (Ben Hamed 2005). Yangzhou has about 52% intelligibility with the other branches of Mandarin.

Nanjing (evidence and here) is also a separate language – now mostly spoken in the suburbs, as city speech is not a separate language anymore. The city language is said to be intelligible with the general northeastern China lect spoken in Beijing and Hebei. So I will call Nanjing Suburbs a separate language. Lianyungang is a separate language, as is Yancheng and Huaian ( evidence for both).

Nantong, a very strange variety of Mandarin on the border of Wu and Mandarin that shares many features with Wu languages, is a separate language, as is its sister language, Tongdong. Jinsha is a dialect of Nantong. Rugao, next to Nantong, is also a separate language. Also within Jianghuai, Hefei is considered to be a separate language by a 200 word Swadesh list (Ben Hamed 2005).

Rudong is at least not intelligible with Putonghua.

Anqing, in Anhui Province, is also not intelligible with Putonghua. In 1933, there were three different languages spoken in Tongcheng, Anhui – East Tongcheng, West Tongcheng and Tongcheng Wenli. Tongcheng Wenli was the classical-based language spoken by the educated elite of the city. Whether these three languages still exist is not known, but surely some of the speakers in 1933 are still alive.

Chuzhou, spoken in Anhui, is not intelligible with Putonghua, although it is said to be close to Nanjing. Dangtu, also spoken in Anhui, is not intelligible with Putonghua.

Dongtai is a separate language (evidence). The lects spoken in Dafeng, Taizhou, Xingua and Haian are said to be similar to Dongtai, so for the time being, we will list them as dialects of Dongtai. Jiujiang, spoken in Jiangxi Province, is a separate language, as is Xingzi , located close by.

Intelligibility between Rudong, Anqing, Chuzhou, Dafeng, Taizhou, Xingua, Haian and Dangtu is not known.

Yangzhou, Lianyungang, Yancheng, Huaian, Nanjing, Hefei, Anqing, the Tongchengs, Chuzhou and Dangtu are in the Hongchao Group of Jianghuai, which has 82 lects.

Dongtai, Dafeng, Taizhou, Haian, Xinghua, Jinsha, Nantong, Tongdong, Rudong and Rugao are in the Tairu Group of Jianghuai. Tairu has 11 different lects.

Jiujiang and probably Xingzi are members of the Huangxiao Group of Jianghuai, which has 20 lects.

Jianghuai is composed of an incredible 120 lects and is not fully intelligible with Putonghua. Some suggest that all of the lects of Jianghuai are mutually unintelligible, but that remains to be proven. Jianghuai Mandarin has 65 million speakers (Olson 1998).

Northeastern (Dongbei) Mandarin is a separate language. Within Northeast, Shenyang is a separate language according to a 200 word Swadesh list (Ben Hamed 2005). Harbin is often listed as intelligible with Putonghua, but some Putonghua speakers can barely understand a word of it. Harbin may be a separate language. That classification is sure to be controversial, so intelligibility testing may be required to sort it out.

Shenyang is a member of the Jishen Group of Northeastern Mandarin, which has 44 dialects. Within Jishen, Shenyang is a member of the Tongxi Group, which has 24 dialects.

Harbin is a member of the Hafu Group of Northeastern Mandarin, which has 64 lects. Within Hafu, it is a member of the Zhaofu Group, which has 18 lects.

Lanyin Mandarin in the far northwest is also a separate language (Campbell 2004). Though Lanyin is said to be intelligible with Putonghua, that does not appear to be the case. Minqin (evidence) and Lanzhou (evidence) in Gansu are not fully intelligible with Putonghua, nor is Yinchuan (evidence) in Ningxia. Intelligibility within Lanyin is not known, but Jiuquan at least appears to be a completely separate language inside Lanyin.

Jiuquan is a member of the Hexi Group of Lanyin, which has 18 lects.

Yinchuan is a member of the Yinwu Group of Lanyin, which has 12 lects.

Lanzhou is a member of the Jincheng Group of Lanyin, which has 4 lects.

Lanyin is composed of 57 separate lects. Lanyin Mandarin has 9 million speakers (Olson 1998).

The Jiaoliao Mandarin spoken in Shandong contains lects such as Qingdao (evidence here and here) and Wehai (evidence) which are not fully intelligible with Putonghua. Dalian is quite different from Putonghua. Intelligibility between Qingdao, Wehai and Dalian is not known.

Weihai and Dalian are members of the Denglian Group of Jiaoliao, which has 23 lects.

Qingdao is a member of the Qingzhou Group of Jiaoliao, which has 16 lects.

Jiaoliao is composed of 45 lects. Jiaoliao is not fully intelligible with Putonghua. Intelligibility inside of Jiaoliao is not known, but there may be multiple languages inside of it, because some Shandong Peninsula lects sound very strange even to speakers used to hearing Shandong Mandarin.

Karamay is an unclassified Mandarin language spoken in Xinjaing. The Mandarin spoken around Tiantai in Zhejiang is not intelligible with Putonghua and may be a separate language. It is also unclassified.

Mandarin has 873 million speakers. There are an incredible 1,526 lects of Mandarin.

Although it is related to Mandarin, Jin is a completely separate language. Besides the Main Jin branch Baotou are apparently separate languages (evidence). As is possibly Taiyuan (evidence).

Within Hohhot Jin, there are two separate languages. One is Hohhot Xincheng Jin, a combination of Hebei Jin, Northeastern Mandarin and the Manchu language. The other is Jiucheng Hohhot Jin, spoken by the Muslim Hui minority in the city. It is related to other forms of Jin in Shanxi Province.

Yuci is a separate language from Taiyuan on a 200 word Swadesh test (Ben Hamed 2005). Fenyang, the language used in Chinese director Jia Zhanke’s movie Xiao Shan Going Home is not intelligible with Putonghua. Jingbian, in Shanxi, is a separate language. Yulin is also a separate language.

Hohhot is a member of the Zhanghu Group of Jin, which has 29 lects.

Baotou and Yulin are members of the Dabao Group of Jin, which has 29 lects.

Taiyuan and Yuci are members of the Bingzhou Group of Jin, which has 16 lects.

Fenyang is a member of the Luliang Group of Jin, which has 17 lects.

Jingbian is a member of the Wutai Group of Jin, which has 30 lects.

Jin is composed of 171 lects, and some of them are separate languages. Jin has 48 million speakers (Olson 1998).

Besides Xiang Proper, assuming there even is such a thing, Shuangfeng and Changsha are separate languages, having only 47% intelligibility.

In fact, Changsha itself is divided into multiple languages in the city itself. We do not know how many there are, but we know that they exist. For the moment, we shall just add one lect to Changsha, and divide it into Changsha A and Changsha B, but there may be more. Furthermore, there are significant differences within the Changsha spoken in Changsha City and in the surrounding countryside.

Shuangfeng is also very different within itself, as the vocabulary changes every 10 miles or so. Intelligibility data is lacking.

Mao Zedong spoke Xiangtan, a notoriously difficult Xiang language in Hunan, about which it is said, “No one can understand.” Xiangtan itself is internally diverse, with differences between the dialect of the city and rural areas, but intelligibility data is lacking. Hengyang is apparently a separate language, as is Jishou (evidence). There is significant dialectal diversity in Hengyang, but intelligibility data is lacking.

Liuyang is a separate language, spoken in Liuyang county-level city in Changsha prefecture in Hunan. Liuyang is split into 5 divisions – Liuyang North, Liuyang South, Liuyang West, Liuyang East and Liuyang City.

Liuyang South and Liuyang East are separate languages, mutually unintelligible with the others. Liuyang City has recently arisen as a sort of a Liuyang “Putonghua” that is understandable to speakers of all Liuyang lects. So within Liuyang, we have three dialects – Liuyang City, Liuyang North and Liuyang West. Outside of Liuyang Proper, there are also two separate languages – Liuyang South and Liuyang East. None of the three Liuyang languages is intelligible with Changsha.

Even within this classification, each of the 5 Liuyang lects has multiple dialects. Each village is said to have its own lect in Liuyang.

Hengshan (evidence) is a separate language with vast dialectal divergence divided by Mount Hengshan. There are two Xiang Hengshan lects on either side of the mountain – Qianshan and Houshan – that are very different and must be separate languages. Huayuan (evidence) is at least not intelligible with Putonghua.

In the city of Yiyang, Henan Province, 3 lects are spoken. One is a Yiyang Changyi Xiang lect, another is a Yiyang Luoshao Xiang lect, and a third is Luoyang Southwest Mandarin, a dialect of Henan Mandarin, described above. All appear to be separate languages. We will call the two Xiang lects Yiyang Changyi and Yiyang Luoshao.

Baojing at least is not intelligible with Putonghua, yet it is said to be intelligible with Chengdu Southwest Mandarin.

Lingshuijiang, also spoken in Hunan by 300,000 people, may well be a separate language.

Ningxiang is said to be very different from Changsha. Given the dramatic divergence present even as background in Xiang, this must mean that Ningxiang is at least not intelligible with Changsha.

According to good sources, there is a tremendous amount of lect diversity in Western Hunan, and most of it probably involves Xiang lects, while most or all of these lects are not mutually intelligible. But until we get more data, we cannot carve any languages out of this mess yet.

Shuangfeng and Lingshuijiang are a members of the Luoshao Group of Xiang, which has 21 lects.

The Changshas, Hengyang, Xiangtan, Hengshan, Ningxiang and the Liuyangs are members of the Changyi Group of Xiang, which has 32 lects.

Baojing, Jishou and Huayuan are members of the Jixu Group of Xiang, which has 8 lects.

Xiang is composed of 74 lects. Many, or possibly all of them are separate languages. The various languages of Xiang have 50 million speakers.

Wu is a major group of diverse Chinese languages that is often divided into Northern Wu and Southern Wu. Northern Wu and Southern Wu are definitely mutually unintelligible languages. Southern Wu has 18 million speakers. In general, the list below just lists Wu lects that are utterly unintelligible with Putonghua. My opinion is that in general, the Wu lects are mostly separate languages, however, some are merely dialects of other Wu lects.

A good general rule for Zhejiang lects is that people say they can sort of understand the next city over, but two cities away was incomprehensible. For instance, in the Taizhou prefecture region, there are 4-5 unintelligible dialects across a 12 mile area. In Zhejiang, the mountains go all the way down to the sea, so there are few flat areas where language can spread out and become comprehensible.

Suzhou, Shanghaiese, Wuxi (evidence), Huzhou (evidence), Changzhou (evidence), Xiaoshan (evidence), Songjiang (evidence), Jiaxing, Hangzhou (evidence), Kunshan (evidence), Ningbo and Yixing (evidence) are separate languages.

Tongxiang also appears to be a separate language, as does Yuyao (evidence) and Zhoushan.

Qidong, spoken in the city of Qidong, is a separate language. Lvsi, Qisi or Tongdong, spoken in the nearby town of Qisi, is a separate language from Qidong. Qidong is said to be very close to Chongming, so for the time being, we will list Chongming as a dialect of Qidong.

Haimen also appears to be a dialect of Qidong. However, there are 2 lects spoken in Haimen, and they are apparently not mutually intelligible. We will leave Haimen A as a dialect of Qidong, while we will set Haimen B as a separate language as it is not intelligible with Haimen A.

There are differences between Chongming and Haimen A, but the degree of them is not known. Changyinsha is very similar to Haimen, Chongming and Qidong, so it is probably a dialect of Qidong also. Another name for Qidong is Qihai, which refers to the speech of Qidong, Haimen and Tongzhou. For the time being, we will list Haimen A, Changyinsha and Chongming as dialects of Qidong. Chongming, and hence Qidong, is not intelligible with Shanghaiese.

Zhangjiagang, Changsha and Kunshan may be intelligible with Suzhou, but data is lacking. Suzhou is only 43% intelligible with Wenzhou. None of these lects is intelligible with Shanghaiese.

Ningbo has good intelligibility with Shanghaiese, but not vice versa.

Reports vary on the intelligibility of Shanghaiese and Suzhou. Some say they understand each well, but that is probably not the case at first due to serious differences in tones. Intelligibility testing is needed.

Pudong, the older form of the Shanghai language, is still spoken in the Pudong District of the city, but it is dying out. There is a question of whether or not it is mutually intelligible with Shanghaiese, but Shanghaiese speakers seem to feel it is not mutually intelligible (Gilliland 2006).

Several lects are spoken in the suburbs of Shanghai. Reports vary, but Shanghai residents generally report that these lects are not mutually intelligible with Shanghaiese (Gilliland 2006). They are Baoshan, Fengxian, Nanhui, Jiading, Jinshan, Pudong (or Chuansha) and Qingpu.

Hangzhou is reportedly much different from the lects of Shanghaiese, Ningbo, etc. to the northeast, and is not intelligible with Shanghaiese, nor with Suzhou. Hangzhou has 1.2 million speakers.

Changzhou and Wuxi are not intelligible with Shanghaiese or Suzhou. Changzhou and Wuxi have high, but not full, intelligibility. Changzhou and Wuxi are part of a dialect chain in which eastern Changzhou speakers can communicate with western Wuxi speakers, but as one moves further west into Wuxi or east into Changzhou, intelligibility drops off. Like Czech and Slovak, it is best then to split Wuxi and Changzhou into separate languages.

Changzhou itself has considerable dialectal divergence, though apparently all dialects are intelligible. Changzhou has 3 million speakers.

Yixing, near Changzhou, is not intelligible with Shanghaiese.

Jiangyin is spoken in Jiangyin city. It is related to Changzhou and has high intelligibility with Changzhou and Wuxi.

All of the above are in the Taihu Group.

Taizhou, centered around the city of Tuzhou in Eastern Zhejiang, is composed of 11 separate lects, all of which are separate languages, Huangyan (evidence), Jiaojiang, Linhai, Sanmen, Tiantai (evidence), Wenling (evidence), Ninghai (evidence), Xianju, Leqing (evidence), Yubei and Yuhuan (evidence). (Evidence for all).

A single subgroup of Wuzhou, Yiwu – contains 18 separate languages, all mutually unintelligible. We will call them Yiwu A, Yiwu B, Yiwu C, Yiwu D, Yiwu E, Yiwu F, Yiwu G, Yiwu H, Yiwu I, Yiwu J, Yiwu K, Yiwu L, Yiwu M, Yiwu N, Yiwu O, Yiwu P, Yiwu Q and Yiwu R for the time being.

Pucheng is a separate language. Pucheng has 2 dialects, Nampo and North Dabei. Intelligibility data is not known. Pucheng is so diverse that some say it is a language isolate and is not even a part of Wu (Norman 1988).

There are two groups of Southern Wu which are said to be both highly divergent and to have very low intelligibility internally. These groups are sometimes called Jinqu and Shangli.

Jinqu consists of at least 30 languages: Jinhua, Jinhua Xiaohuang, Tangxi, Lanxi, Pujiang, Yiwus A-R, Dongyang, Pan’an, Yongkang (evidence), Wuyi (evidence), Quzhou (evidence), Longyou and Jinyun. Lanxi has 660,000 speakers (Rickard 2006). Quzhou is apparently not intelligible with Wenzhou. Jinqu is roughly equivalent to the Wuzhou Group.

Shangli contains at least 18 languages: Shangrao City, Shangrao County, Guangfeng, Yushan, Kaihua, Changshan, Jiangshan, Lishui (evidence), Suichang , Songyang, Xuanping, Qingtian (evidence here and here), Yunhe, Jingning, Longquan, Qingyuan, Taishun and Pucheng.

This group is roughly equivalent to the Longqu and Chuzhou Groups of Chuqu. Some members of this group extend beyond Zhejiang and into northeastern Jiangxi and northern Fujian.

We are going to cautiously classify all of these lects as separate languages since they are said to be much more divergent and much less mutually intelligible than Taihu, and Taihu itself seems to have pretty low internal intelligibility.

Wenzhou (evidence) is a separate language. Ouhai, Yongjia and Ruian appear to be dialects of Wenzhou, but all of them are probably separate languages, since if you go 5 miles in any direction in Wenzhou, there’s a new dialect, and it’s hard to understand people. Wenzhou is 43% intelligible with Suzhou. Wencheng (evidence) appears to be a separate language.

Wenxi is a separate language within Oujiang, not intelligible with Wenzhou. It is spoken in one town in Qingtian County.

Jinxiang also has its own Wu lect, with Mandarin influences. This is a Taihu (Northern Wu) outlier.

In addition, in Taishun County, there is also an aberrant Wu lect spoken in the town of Luoyang, influenced by both Manjiang and Oujiang Wu.

There is another Wu lect similar to Manjiang Eastern Min spoken in the town of Hedi in Qingyuan County in Lishui.

Manhua is quite different. There is a controversy over whether or not Manhua is Macro-Min or Macro-Wu. It is probably Macro-Wu based on phonology and it also shares some similar Min-like traits with other Wu lects such as those in the Chuqu group.

Within Manhua, there is a northern group spoken in the town of Yishan and a southern group spoken in the towns of Qianku and Jinxiang. Qianku is the standard for Manhua. The northern/southern divide may impede intelligibility, but we have no information yet.

Wuhu is a separate language, unintelligible with Shanghaihua.

Nanjing Wu is a separate language

Jiaxing, Shanghaiese, Suzhou, Wuxi, Songjiang, Tongxiang, Qidong, Lvsi, Yunhe and Kunshan are all in the Hujia Group of Taihu. The Hujia Group contains 32 lects.

Changzhou, Yixing, Jiangyin and Haimen are in the Piling Group of Taihu. Piling has 12 lects. Piling has 8 million speakers.

Wenzhou, Ouhai, Yongjia, Ruian and Wencheng are in the Oujiang Group of Taihu, which also contains 12 lects.

Hangzhou has its own group, the Hangzhou Group of Taihu.

Shaoxing, Fuyang, Xiaoshan, Linan, Yuyao and Zhuji are in the Linshao Group of Taihu which also contains 12 lects.

Fenghua and Zhoushan are in the Yongjiang Group of Taihu. The Yongjiang Group contains 11 lects and has 4 million speakers.

Changxing is in the Taioxi Group of Taihu, which has 5 lects.

The Taihu Group is composed of 75 separate lects, many or all of which are separate languages. Taihu has 47 million speakers.

Lishui, Qingyuan, Jingning, Jinyun and Taishun are in the Chuzhou group of Chuqu, which contains 9 lects. Chuzhou has 1.5 million speakers. Chuqu itself contains 35 separate lects.

Pucheng, Shangrao County, Shangrao City, Jiangshan, Songyang, Guangfeng, Longquan, Kaihua, Changshan, Suichang, Longyou, Yushan and Quzhou are members of the Longqu Group of Chuqu, which has 14 lects and 5 million speakers (Olson 1998).

The Yiwu languages, Dongyang, Jinhua, Jinhua Xiaohuang, Lanxi, Tangxi, Wuyi, Pan’an, Pujiang and Yongkang are all members of the Wuzhou Group, which contains 27 separate languages. Wuzhou has 4 million speakers.

Nanjing Wu is unclassified.

The various Wu languages have 85 million speakers.

Within Hui, there are at least six separate languages (Hirata 1998). Actually, there are many more.

Xidi, spoken in a village at the foot of Huangshan Mountain, is a separate language. Xidi is unintelligible even to villages a few miles away. Tunxi, Wuyuan and Xiuning are separate languages. The first two are spoken in Anhui, but Xiuning is spoken in Jiangxi Province.

Within the Jingzhan Group of Hui, JingdeNingguo, Qimen, Chilingkou, (spoken in Chiling, Qimen County), Meixi Xiang, and Shitai are separate languages.

Within Qimen County itself, there are 6 different Hui lects, with low intelligibility between them. It is quite possible that we are talking about 6 different languages here. One of them appears to be Chilingkou above. The others we will just call: Qimen A, Qimen B, Qimen C, Qimen D and Qimen F. All except Meixi are spoken in Anhui Province. Meixi is spoken in Meixi, Jiangxi.

Jixi, Hongmen and Shexian are separate languages. Within Shexian, there are two different languages that we will only call Shexian A and Shexian B for now. Chunan is spoken in Jiangxi, while Jixi and the Shexian languages are spoken in Anhui.

Dexing and Dongzhi are separate languages, the first spoken in Jiangxi and the second spoken in Anhui.

In the Yanzhou Group of Hui, Jiande and Chunan are separate languages. There are two other lects in the group, Suian and Shouchang. Chunan and Suian are very diverse and in all probability separate languages. Shouchang is also extremely diverse, and Jiande has some differences with Shouchang. These two are probably both separate languages too.

The Yangzhou languages are interesting because there is controversy whether they are Wu or Hui languages. Careful examination reveals that they cannot be subsumed under Southern Wu due to their great divergence, despite having some similarities with Wu. Some authors feel that they are Hui-Wu merged lects, and their similarity with both is given as a reason for merging Wu and Hui into a supergroup.

While it is best to classify them as Hui, they are much different from most Hui lects. All are spoken in western Zhejiang. The Yanzhou Group has 4 languages. Discussion here.

Huangshan, Tunxi, Wuyuan and Xiuning are members of the Xiuyi Group of Hui, which has 6 lects.

Meixi, the Qimens, Chilingkou, Shitai, Ningguo and Jingde are members of the Jingzhan Group of Hui. Jingzhan has 12 lects, all of which are separate languages.

Jixi, Hongmen and the Shexians are members of the Jishe Group of Hui. The Jishe Group has 6 lects .

Dexing and Dongzhi are members of the Qide Group of Hui. The Qide Group has 5 lects.

Xidi is unclassified.

The various Hui languages have 3.2 million speakers . There are 34 different Hui lects, at least 24 of which are separate languages. There is a possibility that all Hui lects are separate languages, but that remains to be proven.

Cantonese is a major language spoken in the south of China. They are said to be a mix between the Yue people and the Han. They have great pride in their speech which appears to be closer to ancient Chinese than Mandarin is. When Sun Yat-Sen was President of Republican China, a vote was held on which language to base Standard Chinese on. Cantonese only lost by one vote in favor of Mandarin.

Some Cantonese activists denounce Mandarin as a pidgin language spoken Manchu and Mongol invaders glommed onto the Chinese of the people they conquered.

Attempts to determine intelligibility through the use of complex lexical, tonal, grammatical and phonological formulae produce results that are excessively high in terms of percentage of intelligibility. A better method is presented in Szeto 2000, in which sentences in other lects are played to speakers of Lect A, and speakers of Lect A are asked to give the basic meaning of the sentences played to them. A sentence is recorded as correct if the basic meaning was ascertained.

By this better method, Standard Cantonese has only 31.3% intelligibility of Siyi, 7.2% of Hakka, 2.7% of Teochew and 2.5% of Xiamen. This paper also highlights the very important role morphological and syntactic differences play in intelligibility, even apart from phonology and other factors.

In contrast, the more complex method not relying on actual informants gives false positives. By this method, Cantonese has 54.7% intelligibility of Hakka, 47.4% of Xiamen 43.5% of Teochew. This method falsely overestimates the intelligibility of Hakka by 7.6 X, of Teochew by 16.1 X and of Xiamen by 19 X.

Cantonese is traditionally said to have nine tones, but phonemically, there are only six tones, since the last three are just three of the first six with a voiceless stop consonant on the end. These are often called entering tones in traditional Chinese scholarship.

Entering tones have disappeared from most Mandarin lects, probably about 800 years ago due to the influence of invading Mongols speaking Turkic languages, but are still present in Cantonese, Hakka and Min. The original entering tones of Middle Chinese have merged into one or the other or Mandarin’s four tones.

Traditional Chinese tones or contour tones end in a vowel or a nasal. However, in Cantonese, the entering tone has retained its original short and sharp character from Middle Chinese, so in a sense, it has a different sound quality.

Besides Standard Cantonese (the Guangzhou lect in the Yuehai Group), there is Siyi, or Sze Yup, a separate language. Siyi has 8 dialects, however, there are reports that there are intelligibility problems within the Siyi lects. In particular, Enping speakers cannot understand some other dialects. Therefore, Enping is a separate language.

Kaiping, or Chikan, is not fully intelligible with Enping until they get used to each others’ sounds. Kaiping is so different from Taishan that it is hard to imagine how they can communicate well, though there is partial intelligibility.

In Xinhui, there is a dialect called Hetang that is very divergent and has many strange features not found in other dialects. Doubtless it is less than fully intelligible with other Siyi lects.

Actually, there seems to be many more than 8 dialects of Siyi. In Taishan County alone, there are 20 townships there may be a different lect in each one. For certain, there are at least three distinct dialects of Taishan, Taishan A, Taishan B and Taishan C. Even the lects in Taishan County can be quite different. However, all lects in Taishan County appear to be intelligible.

Xinhui is somewhat different from Taishan, but appears to be intelligible. Heshan is said to be intelligible with Xinhui and Taishan.

Nevertheless, there are calls from Taishan speakers to split their lect off from the rest of Siyi. If Taishanese is unintelligible with the rest of Siyi, this would make sense, but that does not appear to be the case.

150 years ago, there was less, but still significant, difference between Siyi and Sanyi (Standard Cantonese), but Siyi was disparaged as a “hill dialect” of poor farmers, while Sanyi was elevated as the prestige lect of the cultured and cosmopolitan. This is why Sanyi became the Standard Cantonese lect. The Siyi incorporated this negative view into their self-image even to the point where they held overseas meetings meeting in Sanyi speech.

There are 3.6 million speakers of Siyi.

Vietnamese Cantonese is quite different from Standard Cantonese, but it is nevertheless intelligible. Malay Cantonese is also quite different from Standard Cantonese. Both are dialects of Cantonese.

Hong Kong is a dialect of Guangzhou. Foshan and Nanhai are close to Guangzhou and may be intelligible with it. Nanhai and Shunde are intelligible.

Some say that Shunde and Zhongshan are intelligible with Standard Cantonese, but others disagree. This requires further study, as they are obviously close. However, both are said to at the same time be quite different from Standard Cantonese.

Even within Yuehai, Panyu is said to be a separate language (Chan 1981). Namlong, a poorly understood lect from the Pearl River area, is also a separate language, or at least it was one in 1949. Whether it still exists is not certain, but speakers must still be alive. Yuehai itself has 31 separate lects.

Danija, the Cantonese lect of the Tanka fisherpeople who live on boats off the coast of Guangdong, Guangxi and Hainan, may well be a separate language. In Hong Kong, another Cantonese language, Gashiau, is spoken by a group of fisherpeople related to the Danija. This language is related to Danija but apparently not intelligible with it.

Maihua, a Cantonese lect spoken on Hainan, may well be a separate language also.

Nanning is a dialect of Cantonese, easily understandable by a Standard Cantonese speaker. However, Lizhou is a separate language, with difficult intelligibility with Standard Cantonese. Dongguan and Zhanjiang (evidence), are separate languages. Shiqi, spoken in Guangxi, is a separate language. Speakers of Standard Cantonese cannot necessarily understand Shiqi, but Shiqi people can understand Guangzhou. Shiqi is spoken in the urban part of Zhongshan City.

Huazhou is a very divergent Cantonese lect that is very hard even for other Cantonese speakers to understand. It is surely a separate language (evidence here and here).

Maoming is an extremely diverse Cantonese lect that must also be a separate language.

Beihai and Hepu are reported to be very different, but intelligibility data is not known, nor is it known to what extent these two lects differ from other Cantonese. But the Quinlian Group of which they are members must surely be a separate language.

One division holds that the Standard Cantonese (Guangzhou), Siyi, Zhongshan, Gaoyang and Guangfu groups are mutually unintelligible groups.

The Goulou Group of Cantonese appears to be a separate language from all of the rest of Cantonese, and is probably in a group of its own away from the rest of Cantonese, and linked with Pinghua and Tuhua. Yulin is a representative lect in Goulou, and is said to present form of Chinese that is closest to Old Chinese.

Siyi has at least 11 dialects, includes the famous Taishanese (includes Taishan A, Taishan B and Taishan C), along with Heshan, Jiangmen, Siqian, Doumen Xinhui, Enping and Kaiping.

Nanning is in the Yongxun Group of Cantonese, which has 12 lects.

Zhanjiang and Maoming are members of the Gaoyang Group of Cantonese, which has 10 lects. Gaoyang has 5.4 million speakers.

Dongguan, Shunde, Foshan, Zhongshan, Nanhai, Panyu and Hong Kong are members of the Guangfu Group of Cantonese, which has 31 lects. Guangfu has 13 million speakers.

Shiqi is a member of the Zhongshan Group of Cantonese , which contains at least 3 lects.

Huazhou is a member of the Wuhua Group of Cantonese, which has 2 lects.

Beihai and Hepu are members of the Quinlian Group of Cantonese, which has 6 lects.

Namlong is unclassified.

There are 100 lects of Cantonese, and Cantonese has 64 million speakers.

Pinghua, now recognized as a major split off from Cantonese, is composed of Guinan and Guibei, which are separate languages. The Guibei lects are very different, but we don’t have any intelligibility data.

Guinan has 22 lects, and Guibei has 8 lects .

There is one Pinghua lect that is unclassified.

Pinghua has 31 separate lects. Ping has 2 million speakers.

Tuhua is a separate branch of Chinese spoken in Guangdong and Hunan Provinces. It has 26 separate lects.

In addition to Tuhua Proper, the best known of the Tuhua lects is Shaozhou, referred to here as Shaozhou Proper. Shaozhou is said to be very different from other Chinese lects. Shaozhou itself consists of many different lects which are often strikingly different from the others. Some say that Shaozhou is a branch of Min Nan, while others say it is related to Hakka.

In Lechang prefecture, there are five separate languages, Lechang Tuhua 1, Lechang Tuhua 2, Lechang Tuhua 3, Lechang Tuhua 4 and Lechang Tuhua 5, which are not fully intelligible with each other.

Additionally, many Tuhua lects are starting to splinter recently as influences from Hakka, Cantonese and Southwest Mandarin begin to affect the younger speakers such that the language of the youngest speakers is quite a bit different from the language of the older speakers.

One of the Shaozhou Tuhua lects, Longgui Tuhua, spoken in Qujiang County in Guangdong, is a separate language. Longgui Tuhua has 2,000 speakers.

Actually, Tuhua is not really a language group, but a wastebasket group for various lects derisively referred to as “tuhua” – or “farmer’s language.”

Xianghua, said to be an unclassified Chinese lect, is actually a branch of Tuhua that contains 6 lects of its own. Xianghua is a completely separate and highly diverse language that is spoken in Western Hunan.

Jiahe Tuhua is a completely separate language, unintelligible with other lects. Furthermore, there are huge dialectal differences within Jiahe Tuhua that may or may not constitute separate languages.

Jiangyong Tuhua is divided into two mutually unintelligible languagesNorth Jiangyong Tuhua and South Jiangyong Tuhua (Leming 2004). It is spoken in the rural areas of Jiangyong County in Hunan Province. There are multiple lects within these two languages, which have considerable distance between them.

A subdialect of North Jiangyong Tuhua – the suburban, or “upper street language” dialect, was the basis for the famous nishu, “women’s script”, a secret language of women, originating from the Shangjiangxu (Xiao River) region of northeastern Jiangyong County in Hunan Province, of which much has been written lately.

Also in Hunan, in Guiyang County, another Tuhua language is spoken – Guiyang Tuhua. This is apparently a separate language, and the northern and southern variants are so divergent that they are separate languages also – Northern Guiyang Tuhua and Southern Guiyang Tuhua. In addition, there are a lot of diverse dialects within the two Guiyang Tuhua languages, but intelligibility data is lacking.

Yantang Tuhua, one of these dialects, may well be a separate language, as may Yangshi Tuhua. Jiangyong and Guiyang are in the Tuhua branch of Tuhua. Yantang and Yangshi are unclassified.

Furthermore, initial examination suggests that a number of things.

First of all, that the Tuhua lects, especially those of Southern Hunan, are very diverse, possibly as diverse as Wu, Xiang and Hui. Many or all of them may well be separate languages. Further, they are poorly studied and dialectally very diverse. There are many dialects inside the known Tuhua lects, and these dialects are often very different. So there appear to be languages inside even the known Tuhua lects.

Further, there appear to be links with the Tuhua lects of Southern Hunan, the Tuhua lects of northern Guangdong and the Ping lects of northern Guangxi, which border each other. They all appear to be related, and to have descended from a common ancestor.

Danzou is a separate language. Danzou is spoken in the northwest of Hainan, and Hainanese speakers cannot understand it. It is related to the language spoken by the Lingao, or is the same language. Yet the Danzou people speak 9 different lects, including lects described as Hakka, others described as Cantonese and others described as Mandarin.

Maojiahua is a form of Chinese spoken by 20,000 Hmong in southwest of Hunan, in the northeast of Guangxi and in some areas of Hubei. It is a separate language already recognized by Ethnologue, but is incorrectly lumped in with the Hmong languages by them.

Linghua is an unclassified Chinese lect spoken in Yongzhou in Hunan. Linghua is a separate language. It is apparently the same as the Yongzhou Tuhua dialect.

However, the Yongzhou Tuhua language has 17 different dialects: Yongzhou Tuhua A, Yongzhou Tuhua B, Yongzhou Tuhua C, Yongzhou Tuhua D, Lanshan Tushi Tuhua, Lanjiaoshan Tuhua, Xintian Southern Rural Tuhua, Xintian Northern Rural Tuhua, Ningyuan Zhangjia Tuhua, Ningyuan Pinghua, Lanshan Shangdong Tuhua, Lanshang Taiping Tuhua, Daoxian Xianglinpu Tuhua, Daoxian Xiaojia Tuhua, Shuangpai Lijiaping Tuhua and Jianghua Baimangying Tuhua.

Of these, Lanshan Tushi Tuhua may well be a separate language.

Intelligibility between lects is not known, but dialectal divergence within Tuhua lects is typically great, and some or all of the above may be separate languages.

Pingde Yahua or Kim Mun, incorrectly classed as an unclassified Chinese lect, is actually one of the Mien languages. It is not a Sinitic language.

Wutun, or Wutunhua, is a Chinese-Mongolian-Tibetan mixed language spoken by 2,000 Tu in Qinghai Province. Whether it is a form of Chinese is controversial. Until it is proven to be Sinitic, we will not list it here.

References

Ben Hamed, Mahe´. 2005. Neighbour-nets Portray the Chinese Dialect Continuum and the Linguistic Legacy of China’s Demic History. Proc. R. Soc. B 272:1015–1022.

Bodman, Nicholas C. 1988. Two Divergent Southern Min Dialects of the Sanxiang District, Zhongshan, Guangdong. BIHP 59 (2): 401-423.

Branner, David. 2000. Problems in Comparative Chinese Dialectology. The Classification of Miin and Hakka. Berlin: Walter de Gruyter.

Branner, David. 2008. Personal communication.

Campbell, Hilary. 2004. Chinese Grammar – Synchronic and Diachronic Perspectives. Oxford, UK: Oxford University Press.

Campbell, James Michael. Putonghua and Taiwanese Min Nan speaker. Taipei, Taiwan. January 2009. Personal communication.

Campbell, James Michael. Putonghua and Taiwanese Min Nan speaker. Taipei, Taiwan. April 2009. Personal communication.

曹志耘 (Cao, Zhiyun). 2002. 南部吴语语音研究 (Southern Wu Phonology Research). Beijing: Commercial Press (In Chinese).

Chan, Marjorie K.M., Lee, Douglas W. 1981. Chinatown Chinese: A Linguistic And Historical Re-evaluation. Amerasia Journal, Volume 8, Number 1.

Cheng, Chin-Chuan. 1997. Measuring Relationship Among Dialects: DOC and Related Resources. Computational Linguistics & Chinese Language Processing 2.1:41-72.

Cheng, Chin-Chuan. 1998. Extra-Linguistic Data for Understanding Dialect Mutual Intelligibility. Taipei, Taiwan: Paper delivered at the 1998 Annual Conference of the Pacific Neighborhood Consortium.

Gilliland, Joshua. 2006. Language Attitudes And Ideologies In Shanghai, China. MA Thesis. Columbus, OH: Ohio State University.

Hirata, Shoji. 1998. Aspect: A General System and its Manifestation in Mandarin Chinese. Taipei: Student Book Company.

Johnson, Eric. 2010. SIL Electronic Survey Reports 2010-027. A Sociolinguistic Introduction to the Central Taic languages of Wenshan Prefecture, China. Dallas, Texas: SIL.

Lee, Kent A. 2002. Chinese Tone Sandhi And Prosody. MA Thesis. Urbana, IL: University of Illinois at Urbana-Champaign.

Lien, Chinfa. August 17-19, 1998. Denasalization, Vocalic Nasalization and Related Issues in Southern Min: A Dialectal and Comparative Perspective. International Symposium on Linguistic Change and the Chinese Dialects Dedicated to the Memory of the Late Professor Li Fang-kuei in Seattle Washington.

Liming, Zhao. The Women’s Script of Jiangyong: An Invention of Chinese, Chapter 4. In Tao, Jie, Zheng, Bijun, Mow, Shirley L., editors. 2004. Holding Up Half the Sky: Chinese Women Past, Present, and Future. New York: Feminist Press at the City University of New York.

Mair, Victor H. 1991. What Is a Chinese ‘Dialect/Topolect’? Sino-Platonic Papers:29

Mair, Victor H. Professor of Chinese Language and Literature, University of Pennsylvania. Philadelphia, PA. 2009. Personal communication.

Mair, Victor H. Professor of Chinese Language and Literature, University of Pennsylvania, Philadelphia, PA. July 2009. Personal communication.

McKeown, Adam. 2001. Chinese Migrant Networks and Cultural Change: Peru, Chicago, Hawaii, 1900-1936 . Chicago, IL: University of Chicago Press.

Ngù, George. Eastern Min speaker. 2009. Personal communication.

Olson, James Stuart. 1998. An Ethnohistorical Dictionary of China. Westport, CN: Greenwood Publishing Group.

Rickard, Kristine. 2006. A Linguistic-phonetic Description of Lanqi Citation Tones. Proceedings of the 11th Australian International Conference on Speech Science & Technology, pp. 349-353. Edited by Paul Warren & Catherine I. Watson. University of Auckland, New Zealand. December 6-8, 2006. Auckland, NZ: Australian Speech Science & Technology Association Inc.

Szeto, Cecilia .2000. Testing intelligibility among Sinitic dialects. Proceedings of ALS2K, the 2000 Conference of the Australian Linguistic Society.

Thurgood, Graham. 2006. Sociolinguistics and Contact-induced Language Change: Hainan Cham, Anong, and Phan Rang Cham.‭ Tenth International Conference on Austronesian Linguistics, 17-20 January 2006, Palawan, Philippines. Linguistic Society of the Philippines and SIL International.

Xun, Gong. Sichuan Mandarin and Putonghua speaker. Deyang, Sichuan, China. Personal communication. September 2009.

Zheng, Rongbin. 2008. The Zhongxian Min Dialect: A Preliminary Study of Language Contact and Stratum-Formation, pp. 517-526. Edited by Chan, Marjorie K.M. and Kang, Hana. Proceedings of the 20th North American Conference on Chinese Linguistics (NACCL-20). Volume 1. Columbus, Ohio: The Ohio State University.

93 Comments

Filed under Asia, Cantonese, China, Chinese language, Dialectology, Indonesia, Language Classification, Language Families, Linguistics, Malaysia, Mandarin, Min Nan, Philippines, Regional, Sinitic, Sino-Tibetan, Thailand, Vietnam