Category Archives: Dialectology

Awful English Dialects: Lancashire Dialect

Good God that is a horrible dialect! It’s spoken in Lancashire, in the north of England. It is in northwestern England, pretty far up, heading up towards Scotland. I just looked at a map and I am not sure if I recognize any cities – maybe Lancaster and Blackpool. It seems like it would be pretty cold up there. There seems to be some Norse influence on the dialect, as with a lot of the dialects in the north of England. If you head south, you start running into an expanding Scouse dialect from Liverpool, which is still very popular among young people.

I think I should be happy for Lancashire because Scouse is one of the worst dialects in all of England. I remember an interview with a boxer from Liverpool. It went on for 10 minutes, and I think I understand 25% of it. It was just awful. Americans who go to live there often never really catch onto the dialect. I remember an American on the Net said that they lived in Liverpool for some time, and after eight years, they still could not understand the young working class women, who had the worst dialects of them all.

Really Manchester dialect is about the same dialect as Lancashire. You can listen to a recording of the hard dialect on Wikipedia. I swear I only got ~80% of it, and that’s not enough for a good conversation, believe me. I would have to see a transcript of the audio  to see how many words I missed because a lot of it was just jumbles and I couldn’t even figure out how many words were in there, where one started and the other ended, much less what the words were. I am listening to John Robb, a musician and critic, who was born in Lancashire, and at times, he is maddeningly hard to understand. He is 56 years old, near my age.

Here is an audio of the dialect from a comedian called Johnny Vegas. I have never heard of this actor, and I don’t think I want to listen to his shtick if he talks in that damned dialect.

I listened to it again, and this time I got more of it. I calculated roughly 87% intelligibility, about the same as Swedish and Norwegian. I am sorry, but that is not enough for me. Also anything below 90% qualifies as a foreign language. Vegas is 46, so he is in the older generation. It looks like the pretty hard dialect is being spoken by people 40-over. The dialect is supposedly dying out, and all I have to say is it can’t happen soon enough!


Filed under Balto-Slavic-Germanic, Britain, Dialectology, English language, Europe, Germanic, Indo-European, Indo-Hittite, Language Families, Linguistics, Regional, Sociolinguistics

Update: A Reworking of Chinese Language Classification

If you want to know where I have been the past few days, I have been working on this piece. I work on it for hours every day. So far, I have put in over 500 hours on this piece. That’s over three months of full-time work. My haters say I don’t work. The Hell I don’t work! I’d like to see them try to do this sort of work. This piece has been ridiculed by some linguist idiots on the Net. I worked a bit with linguists outside the Web. In fact, one of the top Sinologists (they have a Wikipedia entry) has been mentoring me on this project for some time now. I will not reveal this person’s name.

The number of languages has increased vastly from 365 to 526. Actually there are probably more than that. I have reason to believe that there may be 1,000-2,000 separate Chinese languages using the 90% intelligibility barrier (<90% = separate language, >90% = dialect). Just to contrast, Wikipedia says there are 14 Chinese languages (a grotesque underestimate) and the Chinese government insanely lies that there is only one Chinese language. Despite their superior IQ’s, a huge number of Chinese people fall of this idiot lie that is obviously based on political BS and not science. The Net is littered with otherwise intelligent Chinese people arguing strenuously that there is only one Chinese languages. Just goes to show that you have a high IQ and still be an idiot if your thought processes are too biased, which is the basic problem with all human thinking anyway.

Just to toot my own horn a bit and in response to my detractors, this is the most elaborate and extensive overview of the Chinese languages written in English in terms of pure classification that I have ever seen. There may well be works of this caliber or even beyond that are written in Chinese. In fact, one of the problems with this work  is that so much of the original research is in Chinese. My Chinese is not very good, so it’s hard for me to read that stuff. This is not a finished product at all. This work will undergo revisions for quite some time if I keep working on it. It may not be done when I die. It’s a Herculean project.

I had to paste this in from a Word document, which is why the formatting looks so strange. But the post has received a huge update, in particular the Hokkien, Teochew, Wu and Cantonese sections. I am up to 526 languages. I would like any speakers of any Chinese language to look this over for me and add any corrections, explanations, elaborations, etc. I am especially interested in any mutual intelligibility data you might have as I have a bit of a mutual intelligibility fetish.

Warning: Very long. Runs to 87 pages in a Word document.

A Reworking of Chinese Language Classification

by Robert Lindsay

The Chinese languages have undergone a lot of reclassification lately (Mair 1991), from one Chinese language a couple of decades ago up to 14 Chinese languages today according to the latest Ethnologue.

However, Jerry Norman, one of the world’s top experts on Chinese, has stated that based on mutual intelligibility, there are 350-400 separate languages within Chinese (Mair 1991). According to Gong Xun, a Sichuan Mandarin speaker in Deyang, China, by my criteria of distinguishing between language and dialect, there would be 300-400 separate languages in Fujian alone.

So far, 2,500 dialects of the Chinese language have been identified, and a number of them are separate languages.

Based on the criteria of mutual intelligibility, I have expanded the 14 Chinese languages into 526 separate languages.

There are different ways of calculating mutual intelligibility. Mutual intelligibility is hard to determine. I am not interested in typological studies of varieties involving either lexicon, phonology or tones, unless this can be quantified in terms of mutual intelligibility in a scientific way (Cheng 1991). For the most part, what I am interested in is, “Can they understand each other?”

I decided to put it at 90%, with >90% being dialect and <90% being a separate language. This is based on what appears to be Ethnologue‘s criteria for establishing the line between a dialect and a language.

In the cases below where I had mutual intelligibility data available, a number of Chinese languages had no more than 65% intelligibility between them (Cheng 1991).

The best way to see this study is as a pilot study. The purpose of the classification below is more to stimulate academic interest and sprout new thinking and theory. It is not intended to be an end-all or be-all statement on the subject; in fact, it is quite the opposite. Pilot studies, which is what this is, are de facto never accurate and precise.

Reasonable, fair-minded, and professional comments, additions, criticisms, elaborations, presentations of evidence, etc. are highly encouraged.

I assume this paper will be controversial. Keep in mind that this work is extremely tentative and should not be taken as the last word on the subject by a long shot.

Interested scholars, observers or speakers of Chinese languages are encouraged to contribute any knowledge that they may have to add to, confirm or criticize this data below. So far as I know, this is the first real attempt to split Chinese beyond the 14 languages elucidated by Ethnologue.

There are many problems with the data below. In many cases, “separate language” just means that the variety is not intelligible with Putonghua. Unfortunately, I currently lack excellent mutual intelligibility data within the major language groups such as Gan, Xiang, Wu, and the branches of Mandarin. There is probably quite a bit of lumping still to be done below. Where varieties are mutually intelligible below, I have tried to lump them into one language with various dialects.

In many cases, we seem to be dealing with dialect chains. This is particularly the case with the Mandarin languages, incorrectly referred to as the Mandarin dialects.

For instance, in Henan each major city can understand the next city over fairly well, but at the second or third city over, you run into serious comprehension difficulties. But even there, the languages are fairly close, with intelligibility at ~70%, and after three weeks of close contact, they can communicate fairly well. In many cases, it is a matter of working out the tone changes, for tone changes are very common even among the Mandarin lects.


Putonghua is Standard Mandarin, based on the Beijing Mandarin dialect as of 1949, but it has since diverged wildly, and many Putonghua speakers today cannot understand Beijing Mandarin. Putonghua is being promoted as the national language of China.

In addition to Putonghua, there 1,500 other dialects of Mandarin spoken in China. In general, other Mandarin dialects are not intelligible to Putonghua speakers (Campbell 2009). However, the Northeastern Mandarin dialects and the dialects around Beijing are more intelligible with Putonghua than the Mandarin dialects in the rest of the country.

The implication is that there may be over 1,500 Mandarin languages in China. However, many of these Mandarin dialects are intelligible with at least some other Mandarin lects. Hence, despite the lack of intelligibility with Putonghua, there is a lot of potential lumping within Mandarin.

The degree to which Mandarin dialects are intelligible to each other is very much an open question and in general is poorly investigated.

We should also note here that even Putonghua, the language that was meant to tie the nation together, seems to be evolving into regional languages.

Guangdong Putonghua is not fully intelligible to speakers of the Putonghuas of Northern China and hence is probably a separate language.

Shanghai Putonghua is often not intelligible with Putonghua from other regions. It has heavy interference from Shanghaihua, which seriously effects the Putonghua accent. Even after four years of exposure, Standard Putonghua speakers often have problems with it.

Anhui Putonghua has poor intelligibility with Standard Putonghua due to its phonology. Therefore, it is a separate language.

In addition, Jianghuai Putonghua and Zhengcao Putonghua are not intelligible with Putonghua from other areas (Campbell 2009). These varieties of Mandarin cause a particular interference with Putonghua Mandarin that results in a severe dialectal disturbance in their Putonghua.

These Putonghuas are spoken in the regions native to the Jianghuai and Zhengcao branches of Mandarin. Jianghuai Mandarin is spoken in Anhui, Jiangsu, Hubei and to a much lesser extent Zhejiang Provinces. Zhengcao Mandarin is spoken in Anhui, Henan, Shandong, and Jiangsu, with one dialect spoken in Hebei.

Tibetan Mandarin has heavy Tibetan admixture.

There are also varieties of Putonghua that are spoken in Singapore and Taiwan. Claims that Taiwan Mandarin is fully intelligible with Putonghua are incorrect. Taiwanese Mandarin is about 80-85% intelligible with Putonghua. Based on that intelligibility figure, Taiwanese Mandarin is a separate language.

Singapore Mandarin has fewer differences with Putonghua than Taiwanese Mandarin and hence is a dialect of Putonghua.

Malay Mandarin is said to be quite different but nevertheless mutually intelligible with Putonghua. Nevertheless, Malay Mandarin speakers say they have to make speech adjustments with Chinese speakers, otherwise their speech is poorly intelligible. This implies that Malay Mandarin is indeed a separate language.

Yunnan Putonghua is intelligible with Putonghua from other regions (Campbell 2009).

Mandarin has 873 million speakers. There are an incredible 1,526 varieties of Mandarin.

Beijing Jilu Mandarin is has low intelligibility with other branches of Mandarin: 72% intelligible with Southwest Mandarin, 64% intelligible with Zhongyuan Mandarin and 55% intelligible with Jiaoliao Mandarin (Cheng 1997).

Putonghua was based on Beijing Dialect. However, many Putonghua speakers claim that Beijinghua is not inherently intelligible with Putonghua. Complaints about unintelligible taxi drivers in Beijing are legendary. At the very least, competing views of the intelligibility of Beijinghua and Putonghua deserve investigation.

On the other hand, Beijinghua is intelligible with Hebei Mandarin and Nanjing City Mandarin, yet Putonghua is not intelligible with Hebei.

The Beijinger variety of Beijing’s hutongs and taxi drivers is legendary for being hard to understand.

The truth is that Putonghua was never entirely based on Beijinghua. It was in terms of pronunciation but in for vocabulary. Putonghua got only 35% of its vocabulary from Beijinghua. Most of its vocabulary came from Japanese Kanji words. They used a form of Mandarin that was based on Chinese scholars who went to study in Japan at the end of the Qing Era. So Putonghua, like Standard Italian which is based on Florentine Italian of Dante circa 1400, is in a sense frozen in time.

The two lects may also have taken separate trajectories. This has also occurred in Italian, where, though Standard Italian was based on Florentine Tuscan, Standard Italian and Tuscan Italian have taken separate trajectories since. If you see old Tuscan men on TV in Italy, a speaker of Standard Italian from Southern Italy would need subtitles to understand them, but one from Northern Italy would not.

Others say that Putonghua was based on the language of the Beijing suburbs, not the city itself.

For whatever reason, Beijinghua often seems to have less than 90% intelligibility with Putonghua, though the question needs further research. Beijinghua, in its pure and least mutually intelligible form, seems to be spoken mostly in the innermost hutongs and among taxi drivers and other low-income and working class people. The variety of people with more education and money is probably a lot more comprehensible.

I would describe the real, pure, Putonghua as “CCTV speech”, the variety you hear on Chinese state television. Evidence that Beijinghua lacks full intelligibility with Putonghua is here, here, here, here, here, here, here and here.

The question of whether or not Beijinghua is a separate language from Putonghua is sure to be highly controversial. Perhaps intelligibility testing could settle the question.

Jinan (New Jinan) Jilu Mandarin is not intelligible with Putonghua.

Cangzhou Jilu Mandarin, spoken in southeastern Hebei, is a separate language. It is only partly intelligible with Putonghua. Renqiu Jilu Mandarin, Huanghua, Hejian Jilu Mandarin, Cangxian Jilu Mandarin, Qingxian Jilu Mandarin, Xianxian Jilu Mandarin, Dongguang Jilu Mandarin, Haixing Jilu Mandarin, Yanshan Jilu Mandarin, Suning Jilu Mandarin, Nanpi Jilu Mandarin, Wuqiao Jilu Mandarin, and Mengcun Jilu Mandarin, all spoken in Cangzhou Prefecture, are all dialects of Cangzhou Jilu Mandarin.

Cangzhou Jilu Mandarin shares some similarities with Tianjin Jilu Mandarin and Baoding Jilu Mandarin, but it is probably not fully intelligible with either.

Tianjin Mandarin‘s tones are quite different from Putonghua’s, its tone sandhi is much more complicated, and it is more closely related to varieties 150-500 miles away, since originally Tianjin Mandarin speakers came from Anhui (Lee 2002). Nevertheless, Tianjin Mandarin is a dialect of Beijing Mandarin.

Baoding Jilu Mandarin appears to be a separate language because there are people from the city who cannot speak it at all.

Beijing is in group called the Beijing Group of Jilu Mandarin. It contains 43 separate varieties and may contain more than one language.

Jinan is a member of the Liaotai Group of Jilu Mandarin Group, which has 37 lects.

The Baoding Group of Jilu Mandarin has 52 lects.

Cangzhou, Renqiu, Huanghua, Hejian, Cangxian, Qingxian, Xianxian, Dongguang, Haixing, Yanshan, Suning, Nanpi, Wuqiao, and Mengcun are members of the Huangle subgroup of Baotang, which has 25 lects.

Tianjin forms its own subgroup within Baotang.

Jilu Mandarin itself consists of 154 lects.

Northeastern (Dongbei) Mandarin is generally intelligible with Putonghua.

Shenyang Northeastern Mandarin is the main dialect in this group, and it is intelligible with Harbin Northeastern Mandarin, Liaoning Northeastern Mandarin, Changchun Northeastern Mandarin, and Heilongjiang Northeastern Mandarin. Harbin Northeastern Mandarin is also intelligible with Tianjin Jilu Mandarin and Beijing Jilu Mandarin. Nanjing City Northeastern Mandarin, Hebei Northeastern Mandarin, and much of the rest of NE Mandarin are all mutually intelligible.

Shenyang is a member of the Jishen Group of Northeastern Mandarin, which has 44 lects.

Within Jishen, Shenyang is a member of the Tongxi Group, which has 24 lects.

Harbin is a member of the Hafu Group of Northeastern Mandarin, which has 64 lects.

Within Hafu, Harbin Mandarin is a member of the Zhaofu Group, which has 18 lects.

Dongbei Mandarin has 108 lects.

Zhongyuan Mandarin is a large split in Mandarin. It is not fully intelligible with Putonghua.

Nanjing Zhongyuan Mandarin (evidence) is also a separate language – now mostly spoken in the suburbs, as city speech is not a separate language anymore. The city language is intelligible with the general Northeastern China Mandarin spoken in Beijing and Hebei.

So we shall call Nanjing Suburbs Zhongyuan Mandarin a separate language.

Luoyang Zhongyuan Mandarin, Kaifeng Zhongyuan Mandarin, Changyuan Zhongyuan Mandarin, and Zhengzhou Zhongyuan Mandarin, all in Henan Province, are not intelligible with Putonghua. However, all four are mutually intelligible, so they are dialects of a single language, Henan Zhongyuan Mandarin.

Xinyang Zhongyuan Mandarin, also spoken in Henan, is a separate language and cannot be understood by Luoyang Zhongyuan Mandarin speakers.

Nanyang Zhongyuan Mandarin has high but not complete intelligibility with Luoyang Zhongyuan Mandarin. Intelligibility between Nanyang Zhongyuan Mandarin and Luoyang Zhongyuan Mandarin is probably ~70%. Nanyang Zhongyuan Mandarin has 15 million speakers.

Gushi Zhongyuan Mandarin is not intelligible with Putonghua. In addition, Gushi Zhongyuan Mandarin is different from Nanyang Zhongyuan Mandarin and is probably not intelligible with it.

Intelligibility between Xinyang Zhongyuan Mandarin and Gushi Zhongyuan Mandarin is not known.

In general, intelligibility between many varieties in Henan is not full, but after a few weeks or so of close contact, they can start to understand each other. Mutual intelligibility between Xinyang Zhongyuan Mandarin, Gushi Zhongyuan Mandarin, and Nanyang Zhongyuan Mandarin may be ~70%.

In Shaanxi, Yanan Zhongyuan Mandarin, Xian Zhongyuan Mandarin, Huxian Zhongyuan Mandarin, Zhouzhi Zhongyuan Mandarin, and Hanzhou Zhongyuan Mandarin are not intelligible with Putonghua, but they may well be intelligible with each other. Xi’an Zhongyuan Mandarin, for instance, is about 65% intelligible with other Mandarin groups. It is closest to Jinan Jilu Mandarin, with which it has 75% intelligibility (Cheng 1997). Let us call this language Shaanxi Zhongyuan Mandarin.

Xining Zhongyuan Mandarin, spoken in Xinghai, seems to be very different from other Shaanxi Zhongyuan Mandarin varieties and is probably a separate language altogether.

In Gansu Province, Gansu Zhongyuan Mandarin appears to be a separate language. Tongwei Zhongyuan Mandarin appears to be a dialect of Gansu Zhongyuan Mandarin.

However, within Gansu Zhongyuan Mandarin, there are divergent lects, such as Sale Zhongyuan Mandarin, which are unintelligible with other Gansu Mandarin lects.

Bozhou Zhongyuan Mandarin (evidence), Yingshang Zhongyuan Mandarin (evidence), and Fuyang Zhongyuan Mandarin (evidence), spoken in Anhui, are at least unintelligible with Putonghua. Fuyang Zhongyuan Mandarin is very different. The unnamed variety spoken 300 km. south of Jinan around Mengcheng in rural Anhui is said to be completely unintelligible with Putonghua, Tianjin Jilu Mandarin, and Beijinghua. For the time being, we will refer to this as one language, Anhui Zhongyuan Mandarin. Intelligibility between varieties of Anhui Zhongyuan Mandarin is not known.

The Mandarin spoken in Qinghai, Quinghai Zhongyuan Mandarin, is very different from that spoken in Gansu.

Xian, Huxian, and Zhouzhi are members of the Guanzhong Group of Zhongyuan Mandarin, which has 45 lects.

Yanan, Hanzhong, and Xining are members of the Qinlong Group of Zhongyuan Mandarin, which has 67 lects.

Luoyang is a member of the Luoxu Group of Zhongyuan Mandarin, which has 28 lects.

Kiafeng, Nanyang, Zengzhou, Changyuan, and Bozhou are members of the Zhengcao Group of Zhongyuan Mandarin, which has 93 lects.

Xinyang and Gushi are in the Xinbeng subgroup of Zhongyuan Mandarin, which has 20 lects.

Tongwei and Sale are part of the Longzhong Group of Zhongyuan Mandarin, which has 25 lects.

Yingshang is a member of the Cailu Group of Zhongyuan Mandarin, which has 30 lects.

Zhongyuan Mandarin has a shocking 338 lects.

Zhongyuan Mandarin has 130 million speakers (Olson 1998).

Southwestern Mandarin is a huge and diverse group of Mandarin, contains a multitude of varieties and is not fully intelligible with Putonghua.

Yichang, Nanping Southwestern Mandarin (spoken near Mt. Wuyievidence), Longcheng Southwestern Mandarin (evidence), Luocheng Southwestern Mandarin (evidence), Lingui Southwestern Mandarin (evidence), Jiuzhaigou Southwestern Mandarin (evidence) Xindu Southwestern Mandarin, Wenshan Southwestern Mandarin (evidence), Mianzhu Southwestern Mandarin (evidence here and here), and Yangshuo Southwestern are all unintelligible with Putonghua.

Guilin Southwestern Mandarin is not intelligible with general Southwestern Mandarin speech either.

Wenshan at least is not intelligible with other Southwestern varieties (Johnson 2010).

Guiliu Southwestern Mandarin is at least not comprehensible with Putonghua or Chengdu Southwestern Mandarin.

Chengyu Southwestern Mandarin is not comprehensible with Putonghua or Guiliu Southwestern Mandarin.

Chengdu Southwestern Mandarin is part of a broadly intelligible Sichuan Southwestern Mandarin koine that is spoken in many of the larger cities in Yunnan.

It includes Ziyang Southwestern Mandarin, Kunming Southwestern Mandarin, Bazhong Southwestern Mandarin, Baojing Southwestern Mandarin, Dazhou Southwestern Mandarin, Neijiang Southwestern Mandarin, Yibin Southwestern Mandarin, Luzhou Southwestern Mandarin, Mianyang Southwestern Mandarin, Deyang Southwestern Mandarin, and Guiyang Southwestern Mandarin (Xun 2009).

Speakers of Chengdu Southwestern Mandarin say that Zigong Southwestern Mandarin and Meishan Southwestern Mandarin are not intelligible to them. Chengduhua is still very widely spoken in Chengdu by people of all ages.

Ziyang Southwestern Mandarin is intelligible with the koine but has a heavy accent.

Leshan Southwestern Mandarin is a separate language. It is unintelligible with the koine, but it can be learned in a few weeks of exposure (Xun 2009).

Intelligibility between Leshan Southwestern Mandarin and Sichuan Southwestern Mandarin may be ~70%.

Hankou Southwestern Mandarin is a separate language, with 80% intelligibility between it and Chengdu Southwestern Mandarin (Cheng 1997).

Chongqing Southwestern Mandarin is a separate language. Chongqing Southwestern Mandarin speakers cannot understand Chengdu or Luzhou speakers.

The many small Southwestern Mandarin varieties around Mt. Emei are not intelligible with Sichuan Southwestern Mandarin, appear to be be very different and may be one or more separate languages.

Wuhan Southwestern Mandarin is not intelligible to speakers of Southwestern Mandarin from other provinces; for instance, it is only 80% intelligible with Chengdu Southwestern Mandarin. Once you go an hour in any direction from Wuhan, Wuhan Southwestern Mandarin is no longer intelligible.

Dali Southwestern Mandarin is spoken in the city of Dali near Kunming. The variety is still widely spoken.

Dahua Southwestern Mandarin, spoken in and around Dahua village on the Puduhe River near Dongchuan in Yunnan Province, is apparently a separate language.

Another language spoken in Yunnan, Lanping Southwestern Mandarin, is also not intelligible with Putonghua.

Chuanlan Southwestern Mandarin is a little-known language spoken by the Tunbao people of Guangxi Province.

Yingshan Southwestern Mandarin is a separate language based on a 200 word Swadesh test (Ben Hamed 2005).

Menghai Southwestern Mandarin (evidence) may well be a completely separate language.

Shaoshan Southwestern Mandarin, spoken in Hunan Province, is a separate language.

Another language spoken in Hunan in Zhangjiajie County is called Zhangjiajie Maoxi Southwestern Mandarin. The Maoxi are a tribal group there that speak a strange variety of Southwestern Mandarin.

Tuoyuan Southwestern Mandarin in Hunan is not fully intelligible with other Southwest Mandarin lects, or at least not with Sichuan Southwestern Mandarin.

Gaoping Southwestern Mandarin and Baixi Southwestern Mandarin in Hunan speak mutually intelligible varieties, even though Gaoping is in Longhui County and Baixi is in Xinhua County. Although they are very far from each other, the two towns can communicate with each other in their own varieties without problems. This is because an extended family left Gaoping 150 years ago and moved to Baixi, marrying the two languages. It would be best to call this language Gaoping Southwestern Mandarin.

Xinfeng Southwestern Mandarin is traditionally categorized as Southwestern Mandarin. It is a Southwestern Mandarin dialect island spoken in Ganzou City in Xinfeng County, Jiangxi surrounded by Gannan Hakka lects. Over time, it has seen so much Hakka influence that it may now be characterized as a mixed dialect. Given the massive Hakka influence, Xinfeng Southwestern Mandarin is no doubt a separate language.

Gong’an Southwestern Mandarin is a very unusual Southwestern Mandarin variety spoken in Gong’an City in Hubei. Hunan is to the south. It is nearly a mixed language, having features of both Southwestern Mandarin and Xiang. As such, no doubt it is a separate language.

Guilin, Luocheng, Yangshuo, Liuzhou, and Lingui are members of the Guiliu Group of Southwestern Mandarin, which has 57 lects.

Leshan and Longchang are members of the Guanchi Group of Southwestern Mandarin, which has 85 lects.

Within Guanchi, Longchang is a member of the Renfu Group, which has 13 lects.

Yichang, Chengdu, Chongqing, and Yingshan are members of the Chengyu Group of Southwestern Mandarin, which has 113 lects.

Menghai, Kunming, Wenshan, and Guiyang are members of the Kungui Group of Southwestern Mandarin. The Kungui Group itself has an incredible 95 lects.

Lanping is in the Dianxi Group of Southwestern Mandarin, which has 36 lects.

Within Dianxi, it is a member of the Baolu subgroup, which has 21 lects.

Taoyuan is a member of the Changhe Group of Southwestern Mandarin, which has 14 lects.

Wuhan is a member of Wutian Group of Southwestern Mandarin, which has nine lects.

Dali is a member of the Dianxi Group of Southwestern Mandarin, which has 36 members.

Within Dianxi, Dali is a member of the Yaoli Group, which has 15 members.

Nanping, Chuanlan, Shaoshan, Jiuzhaigou, Zhangjiajie Maoxi, and Dahua are unclassified.

Southwestern Mandarin itself has a stunning 519 lects. There are 240 million speakers of Southwestern Mandarin (Olson 1998).

Jianghuai Mandarin is a separate branch of Mandarin that is very different from the rest of Mandarin. Language and is not fully intelligible with Putonghua. Some say that this is not even part of Mandarin, as it is better seen as in between Mandarin and Wu.

Jianghuai Mandarin, especially the variety spoken around Taizhou, is not intelligible at all with Anhui Zhongyuan Mandarin or Sichuan Southwestern Mandarin. Jianghuai Mandarin speakers cannot even tell that the Anhui Zhongyuan Mandarin or Sichuan Southwestern Mandarin speakers are speaking Mandarin because the language is so foreign.

Yangzhou Jianghuai Mandarin is considered to be a separate language by a 200 word Swadesh test (Ben Hamed 2005). Yangzhou Jianghuai Mandarin has about 52% intelligibility with the other branches of Mandarin (Cheng 1997). Phonetically, it resembles Wu.

Lianyungang Jianghuai Mandarin is a separate language, as is Yancheng Jianghuai Mandarin and Huaian Jianghuai Mandarin.

Nantong Jianghuai Mandarin, a very strange variety of Mandarin on the border of Wu and Mandarin that shares many features with Wu languages, is a separate language.

Nantong’s sister language, Tongdong Jianghuai Mandarin, is also a separate language. Jinsha Jianghuai Mandarin is a dialect of Nantong Jianghuai Mandarin.

Rugao Jianghuai Mandarin, next to Nantong, is also a separate language.

Hefei Jianghuai Mandarin is considered to be a separate language by a 200 word Swadesh list (Ben Hamed 2005). It is not understood outside of the city.

In 1933, there were three different languages spoken in Tongcheng, Anhui – Tongcheng Wenli Jianghuai Mandarin, East Jianghuai Tongcheng Mandarin, and West Tongcheng Jianghuai Mandarin and. Tongcheng Wenli Mandarin was the classical-based language spoken by the educated elite of the city. Whether these three languages still exist is not known, but surely some of the speakers in 1933 are still alive.

Chuzhou Jianghuai Mandarin, spoken in Anhui, is not intelligible with Putonghua, although it is said to be close to Nanjing Jianghuai Mandarin.

Dangtu Jianghuai Mandarin, also spoken in Anhui, is not intelligible with Putonghua.

Dongtai Jianghuai Mandarin is a separate language (evidence). Dafeng Jianghuai Mandarin, Taizhou Jianghuai Mandarin, Xinghua Jianghuai Mandarin and Haian Jianghuai Mandarin are said to be similar to Dongtai Jianghuai Mandarin, so for the time being, we will list them as dialects of Dongtai Jianghuai Mandarin.

Rudong Jianghuai Mandarin is at least not intelligible with Putonghua.

Jiujiang Jianghuai Mandarin, spoken in Jiangxi Province, is a separate language, as is Xingzi, located close by.

Intelligibility between Rudong Jianghuai Mandarin, Dafeng Jianghuai Mandarin, Taizhou Jianghuai Mandarin, Xinghua Jianghuai Mandarin, Haian Jianghuai Mandarin and Dongtai Jianghuai Mandarin is not known, however they may be closely related.

Jianghuai Mandarin is composed of an incredible 120 varieties. It has 65 million speakers (Olson 1998).

Yangzhou, Lianyungang, Yancheng, Huaian, Nanjing, Hefei, Anqing, the Tongchengs, and Chuzhou and Dangtu are in the Hongchao Group of Jianghuai Mandarin, which has 82 lects.

Dongtai, Dafeng, Taizhou, Haian, Xinghua, Jinsha, Nantong, Tongdong, Rudong, and Rugao are in the Tairu Group of Jianghuai Mandarin. Tairu has 11 different lects.

Jiujiang and Xingzi are members of the Huangxiao Group of Jianghuai Mandarin, which has 20 lects.

Lanyin Mandarin in the far northwest is also a separate language (Campbell 2004). Though Lanyin Mandarin is said to be intelligible with Putonghua, that does not appear to be the case. Minqin Lanyin Mandarin, (evidence) and Lanzhou Lanyin Mandarin (evidence) in Gansu are not fully intelligible with Putonghua, nor is Yinchuan Lanyin Mandarin (evidence) in Ningxia.

Intelligibility within Lanyin Mandarin is not known, but Jiuquan Lanyin Mandarin at least appears to be a completely separate language inside Lanyin Mandarin.

Jiuquan is a member of the Hexi Group of Lanyin Mandarin, which has 18 lects.

Yinchuan is a member of the Yinwu Group of Lanyin Mandarin, which has 12 lects.

Lanzhou is a member of the Jincheng Group of Lanyin Mandarin, which has four lects.

Lanyin Mandarin is composed of 57 separate lects. It has 9 million speakers (Olson 1998).

The Jiaoliao Mandarin spoken in Shandong as Shandong Jiaoliao Mandarin contains varieties such as Qingdao Jiaoliao Mandarin and Wehai Jiaoliao Mandarin which are not fully intelligible with Putonghua. Yantai Jiaoliao Mandarin is a dialect of Wehai Jiaoliao Mandarin. Qingdao Jiaoliao Mandarin, Wehai Jiaoliao Mandarin, Yantai Jiaoliao Mandarin and Yangzheng Jiaoliao Mandarin are all mutually intelligible. Dalian Jiaoliao Mandarin is quite different from Putonghua.

Wehai, Dalian and 21 other varieties are members of the Denglian Group of Jiaoliao Mandarin, which has 23 lects.

Jiaoliao Mandarin is composed of 45 lects. Jiaoliao is not fully intelligible with Putonghua. Intelligibility inside of Jiaoliao Mandarin is not known, but there may be multiple languages inside of it because some Shandong Peninsula varieties sound very strange even to speakers used to hearing Shandong Jiaoliao Mandarin.

Wutun or Wutunhua, is an unclassified language, a Mandarin-Mongolian-Tibetan creole mixed language spoken by 2,000 Tu or Monguar people in Eastern Qinghai Province. The Monguars speak Bonan, a Mongolic language with heavy Tibetan and Mandarin influence. Although the government regards them as Monguar Mongolians, the group self-identifies as Tibetan.

The source of the Mandarin is not known, but it is thought that the group came from outside the region, either Jilu Mandarin speakers from Tianjin in the northeast or from a group of Southwest Mandarin-speaking Hui Muslims in Sichuan Province who converted to Lamaist Buddhism for unknown reasons. They have been in their present location since at least 1585.

This is best seen as a Mandarin language that came under heavy influence of Bonan and to a lesser extent Tibetan after which when it was changed into an agglutinative language under the influence of these two other languages. The lexicon is 60% Mandarin with the tones lost, 25% Tibetan and 10% Bonan.

Karamay is an unclassified Mandarin language spoken in Xinjaing.

The Mandarin spoken around Tiantai in Zhejiang is not intelligible with Putonghua and may be a separate language. It is also unclassified.


Although it is related to Mandarin, Jin is a completely separate language, with only 57% intelligibility with other forms of Mandarin (Cheng 1997). The differences between Jin and Mandarin are somewhat greater than the differences between Mandarin itself.

Besides the Main Jin branch, Baoto Jin is apparently a separate language, as is possibly Taiyuan Jin (evidence).

Within Hohhot Jin, there are two separate languages.

One is Hohhot Xincheng Jin, a combination of Hebei Jin, Northeastern Mandarin and the Manchu language.

The other is Jiucheng Hohhot Jin, spoken by the Muslim Hui minority in the city. It is related to other forms of Jin in Shanxi Province.

Yuci Jin is a separate language from Taiyuan on a 200 word Swadesh test (Ben Hamed 2005).

Fenyang Jin, the language used in Chinese director Jia Zhanke’s movie Xiao Shan Going Home is not intelligible with Putonghua.

Jingbian Jin, in Shanxi, is a separate language.

Yulin Jin is also a separate language.

Hohhot is a member of the Zhanghu Group of Jin, which has 29 lects.

Baotou and Yulin are members of the Dabao Group of Jin, which has 29 lects.

Taiyuan and Yuci are members of the Bingzhou Group of Jin, which has 16 lects.

Fenyang is member of the Luliang Group of Jin, which has 17 lects.

Jingbian is a member of the Wutai Group of Jin, which has 30 lects.

Jin is composed of 171 lects, and some of them are separate languages. Jin has 48 million speakers (Olson 1998).


Gan is a macrolanguage spoken mostly in Jiangxi Province. The mountainous and rugged terrain of Jiangxi means that Gan is very diverse, with many mutually unintelligible varieties within it. Whether Gan is as diverse as Xiang or Hui is not known.

Outside of Gan Proper, Leping Gan is very different. It is not at all intelligible with Nangchang Gan, and hence is a separate language.

Nangchang Gan and Anyi Gan are apparently separate languages within Gan based on a 200 word Swadesh test (Ben Hamed 2005). Nanchang Gan has a great deal of dialectal diversity, with several dialects covering different cities and the rural areas. Intelligibility between these dialects is not known. Nanchang Gan is still spoken very heavily in Nanchang.

Boyang Gan is spoken in another part of Jiangxi and is apparently a separate language from Nanchang Gan.

The nine major dialectal splits in Gan are apparently not mutually intelligible. Similarly, they must surely be separate languages, so Yichun Gan Ji’an Gan, Fuzhou Gan, Yingtan Gan, Leiyang Gan, Huaining Gan, Daye Gan, Wanzai Gan, and Dongkou Gan are all separate languages. There is diversity even among these groups. For instance, Ji’an is divided into Nanxiang Ji’an in the south and Baixiang Ji’an in the north. The two are not intelligible with each other.

In the Yingyi Group, Chaling Dongxian Gan in Hunan near the Jinxiang border is a variety with mixed Gan and Xiang features. The best analysis is that this is a Gan variety. Due to the heavy Xiang mixture, it is no doubt a separate Gan language.

Linchuan Gan, spoken in East-Central Jiangxi, is a very interesting Gan that differs from all others. This seems to be the remains of the old language that was brought into Jiangxi by the ancestors of the Hakka, and it indicates a possible close relationship between Gan and Hakka.

Gao’an Gan, Ducheng Gan, Yongxiu Gan, and Nancheng Gan are quite different from the rest of Gan, so they may well be separate languages.

Hukou Gan, Wuning Ganand Fengxin Gan are major splits in Northern Gan, and are all probably separate languages.

Hancheng Gan is a major split in Southern Gan and as such is probably a separate language.

Nanchang and Anyi are in the Changdu Group of Gan, which has 15 different lects.

Yingtan is a member of the Yingyi Group, which has 12 lects.

Jiangyu and Huarong are members of the Datong Group of Gan, which has 13 lects.

Yichun is a member of the Yiliu Group of Gan, which has 11 lects.

Wanzai is a member of the Yiping Group of Gan, of which it is the only member.

Leiyang is a member of the Leizi Group of Gan, which has five lects.

Wanan is a member of the Jilian Group of Gan, of which it is the only member.

Ji’an is a member of the Jicha Group of Gan, which has 15 lects.

Huaining is a member of the Huaiyue Group of Gan, which has nine lects.

Fuzhou is a member of the Fuguang Group of Gan, which has 15 lects.

Dongkou is a member of the Dongsui Group of Gan, which has five lects.

Gan has 97 separate varieties in it. There are 30 million speakers of the Gan languages (Olson 1998).


Northern, Central and Eastern Min

Northern Min or Min Bei

Within the Min group, Northern Min (Min Bei), a macrolanguage, has already been identified as a separate language. There are 50 million speakers of all of the Min languages (Olson 1998). Northern Min has only 0-20% intelligibility with Min Nan.

Northern Min or Min Bei is said to be a single language. It has nine separate lects, including Shibei Northern Min in Pucheng County; Chong’an Northern Min, Wufu Northern Min, and Xingtian Northern Min in Wuyishan City; Zhenghe Northern Min and Zhenqian Northern Min in Zhenghe County; Jianyang Northern Min in Jianyang County, and Jian’ou Northern Min in Jian’ou County.

The dialects are said to be mutually intelligible, but Jianyang and Jian’ou have only about 75% intelligibility. Northern Min has 10 million speakers.

Central Min or Min Zhong

Central Min or Min Zhong is a separate language not intelligible with Northern or Eastern Min. It has three lects, Shaxian Central Min, Sanming Central Min, and Yongan Central Min, but we don’t know if there are languages among them. The tones of the three varieties are quite different. Further, there are many dialects in the interior of Sanming Prefecture, so there may be more than one language there. Central Min has 3.5 million speakers.

Eastern Min or Min Dong

The standard dialect of Min Dong, Eastern Min, Fukchiuor Fooshuw is Fuzhou Eastern Min.

Eastern Min has only 0-20% intelligibility with Min Nan.

Within Eastern Min, Chengguan Eastern Min, Yangzhong Eastern Min, and Zhongxian Eastern Min are separate languages, all spoken in Youxi County. Zhongxian Eastern Min is spoken in the south of the county, Chengguan is spoken in the middle of the county, and Yangzhong is spoken in the north of the county. The three varieties have markedly poor intelligibility between them (Zheng 2008).

Beyond that, Eastern Min is reported to have several other mutually unintelligible languages inside of it. One of them is Fuqing Eastern Min. Fuzhou speakers can understand Fuqing speakers better than the other way around. Fuzhou and Fuqing are about 65% intelligible in praxis, and it is about the same with the rest of the Hougan Group (Ngù 2009).

Ningde Eastern Min, Fuding Eastern Min and Nanping Eastern Min are other languages in this family (evidence). There are many dialects in the Eastern Min-speaking areas of Nanping, and there may be more than one language here. Of these three, Ningde Eastern Min is definitely a separate language. According to George Ngù, a passionate proponent of Fuzhou Eastern Min, “Fuzhou is not intelligible even within its many varieties.”

It’s not clear if that applies to all of Eastern Min, but it appears that it does. Therefore, Changle Eastern Min, Gutian Eastern Min, Lianjiang Eastern Min, Luoyuan Eastern Min, Minhou Eastern Min, Minqing Eastern Min, Pingnan Eastern Min, Pingtan Eastern Min, Yongtai Eastern Min, Fu’an Eastern Min, Shouning Eastern Min, Xiapu Eastern Min, Zherong Eastern Min, and Zhouning Eastern Min are all separate languages.

Tong’an Eastern Min should probably also be included.

Matsu Eastern Min is spoken on Matsu Island off the coast of China. It is similar to but probably not intelligible with Changle Eastern Min. Matsu may well be a separate language like all the rest of Hougan.

There are two other varieties lumped in with Eastern Min – Man, Mango or Taishun Manjiang Eastern Min is spoken in the central part of Taishun County in Southern Zhejiang in the far southern end of the Wu-speaking area, and Manhua spoken in the eastern part of Cangnan County. Both of these names mean “barbarian speech.”

Both are probably mixtures of Southern Wu (Wenzhou etc.), Eastern Min, Northern Min, and maybe even pre-Sinitic languages. Manhua and Manjiang are not intelligible with Fuzhou Eastern Min. However, Manjiang has affinity with Shouning Eastern Min in phonology, vocabulary, and grammar. Whether or not it is intelligible with Shouning Eastern Min is not known.

Min Nan speakers who have looked at Manjiang data say that it doesn’t even look like a Sinitic language. It is best seen as an Eastern Min language with very strong substratum of a Tai-Kadai or Austroasiatic language.

Manhua is best dealt with as a form of Wu. I discuss it further below under Wu.

Malaysian Eastern Min is spoken in Sibu, Sarawak and in Singapore. These people were originally Fuqing and Fuzhou speakers who came in the 1800’s and is spoken in two lects based on those two cities. Malaysian Fuqing Eastern Min and Malaysian Fuzhou Eastern Min only have 12% intelligibility, much less than the 65% of the parent languages in China. The two Malaysian lects are obviously not the same language, but intelligibility of the two lects with the parent languages in China is not known.

Fuding, Fuan, Shouning, Xiapu, Zherong, and Zhouning are in the Funing Group of Eastern Min, which has six lects.

Fuzhou, Fuqing, Chengguan, Yangzhong, Zhongxian, Ningde, Changle, Gutian, Lianjiang, Luoyuan, Minhou, Minqing, Pingnan, Pingtan, Yongtai, Matsu, Tong’an, and Nanping are in the Houguan Group of Eastern Min, which has 18 lects.

Taishun Manjiang is in an Eastern Min division of its own.

Eastern Min contains 24 separate lects, all of which are separate languages.

Southern Min or Min Nan


Within Min Nan or Southern Min, a macrolanguage, there are a number of separate languages. There is a proposal to split Xiamen, Qiongwen and Teochew into three separate languages before SIL. In fact, all three of those are macrolanguages also.

Amoy, Xiamen or Taiwanese Hokkien, Zhangzhou Hokkien, and Quanzhou Hokkien are part of a larger Southern Min group called Hokkien.

Amoy Hokkien and Taiwanese Hokkien are the same language, as Taiwanese is an Amoy dialect. A good name for the entire language of Amoy-Taiwanese Hokkien is Xiamen Hokkien.

Amoy, the variety spoken in Amoy city in China, is identical to certain Taiwanese dialects. It is more or less intelligible with Taiwanese, as the differences between the two are minor, akin to British and American English. There have only been 120 years of separation between Amoy and Taiwanese. Most of the differences are in modern and local vocabulary.

Amoy and Qaunzhou Hokkien are no longer intelligible with each other due to lack of a standard and the dialectal variations in each. Also, Amoy has developed more modern meanings for certain words, while Quanzhou retains more of the older meanings for the same terms.

Amoy, like Taiwanese, is a mixture of Quanzhou and Zhangzhou Hokkien.

Jinmen or Kinmen Hokkien is a dialect of Amoy spoken on Jinmen Island only two miles off the coast of Amoy. It has good intelligibility with Taiwanese.

A better name for Xiamen according to the Chinese literature is Quanzhang Hokkien (Campbell 2009). This would actually be a macrolanguage. Quanzhang is a combination of Quanzhou and Zhangzhou, two of the most important varieties in the language. Xiamen has only 51% intelligibility with Teochew.

Xiamen is still widely spoken in Taiwan as Taiwanese Hokkien. However, it is in trouble as fewer young people speak it anymore. 20 years ago in Đàoviên, Taiwan, it was common to hear young women in their late teens and twenties speaking Hokkien, but now it is uncommon (Kirinputra 2014).

Within Taiwanese Hokkien, the situation regarding Taipei Hokkien in the past was interesting. The dialects of the city were a mix of Zhangzhou and Quanzhou.

The dialect of the center of the city, Taipei City Hokkien, was mixed between the two, with a slight Quanzhou lean to it.

The dialect spoken in Sulim, Sulim (Shilin) Hokkien, heavily favored Zhangzhou. Other districts spoke a Tong’an-type dialect, which is just Quanzhou mixed with Amoy.

All these conditions are more common with the older generation. The Taiwanese Hokkien of the young generation speaks either the mixed Zhangzhou-leaning “Southern” style favored in the media, or they do not speak any Hokkien at all.

The Yilan Hokkien dialect on Taiwan is so different that it alone has posed serious problems for the task of standardizing Taiwanese, yet it is intelligible with Standard Taiwanese Hokkien. Yilan is a city in Taiwan.

Lugang Hokkien is also very different but is intelligible with Standard Taiwanese (Campbell 2009).

Elsewhere on Taiwan, there are some communication problems for Tainan Hokkien speakers hearing Taipei, but it appears that they are still intelligible with each other (Campbell 2009). Tainan is a city in Taiwan. A similar dialect is spoken in Gaoxiong as Gaoxiong Hokkien. Tainan and Gaoxiong are the prestige dialects of Taiwanese Hokkien that Standard Taiwanese is based on.

Taichung Hokkien is another dialect of Taiwanese spoken in the city of that name.

Tong’an Hokkien is said to be a dialect of Amoy, but the truth is that it is in between Amoy and Quanzhou. Tong’an Hokkien is spoken in the city of that name. A Tong’an variety is also spoken in Malaysia and Indonesia.

There are dialects within Quanzhou, including Anxi Hokkien, Shishi Hokkien, Yongding Hokkien, Dehua Hokkien, Hui’an Hokkien, Jinjiang Hokkien, Nan’an Hokkien, and Hong Kong Tanka Hokkien.

All Quanzhou dialects are apparently mutually intelligibile.

There is a group of Hokkien speakers among the Tanka fisherpeople located to the north of the Four Counties area. They speak a language that resembles Anxi Hokkien. We will call this Hong Kong Tanka Hokkien for now. They communicate well with speakers from the Hokkien homeland, so it looks like their language has not changed much. Most of them arrived in Hong Kong in the 1930’s and 1940’s.

There are differences within Zhangzhou Hokkien.

Longhai Hokkien, Haikang Hokkien, Zhangpu Hokkien, Zhao’an Hokkien, Yunxiao Hokkien, Dongshan Hokkien and Yinchuan Hokkien, are all dialects of Zhangzhou Hokkien, spoken in the vicinity of the city.

Longhai Hokkien is very similar to the standard variety, while Zhangpu Hokkien is somewhat different.

Zhao’an Hokkien, Yunxiao Hokkien, and Dongshan Hokkien are all spoken in Southern Zhangzhou. They have been strongly effected by Teochew such that there is controversy over whether they are Teochew or Hokkien. Yunxiao and Dongshan have changed n → ng and t → k as in Teochew. Zhao’an resembles Teochew more than the others, as it has an ir vowel. Intelligibility data for these diverse Zhangzhou varieties is not available.

With the possible exception of the three varieties mentioned above, all Zhangzhou varieties are mutually intelligible.

Zhangzhou and Quanzhou are not fully intelligible with each other in China. Taiwanese speakers can no longer understand the pure Quanzhou spoken in the Chinese city of that name, and some Quanzhou speakers say they cannot understand Taiwanese either. Nevertheless, Taiwanese has 80% intelligibility of Quanzhou and Zhangzhou. After all, Taiwanese itself is just a mixture between Zhangzhou and Quanzhou.

Zhangzhou and Quanzhou have marginal intelligibility with Teochew.

Zhangping Hokkien, though close to Xiamen, is a separate language according to a 200 word Swadesh test (Ben Hamed 2005).

Pinghe Hokkien is said to be a separate language.

Diaspora, Nusantaran or Overseas Hokkien, that is all Hokkien spoken outside of China in the area for a few hundred miles up and down the coast in either direction from Amoy in China, could be seen as being composed of two main groups. It is a language in trouble as young people everywhere in the diaspora switch to Mandarin, and many children are not learning Hokkien. Technically, Taiwanese is included in Overseas Hokkien, but since it is merely a dialect of Amoy, we put it under Amoy instead.

50 years ago, we could learn interesting things about Overseas Hokkien forms spoken in Jakarta, Yangon, Bandung, Phuket, Trang, Cebu, and possibly Palembang and Surabaya. Now Hokkien may be extinct in Jakarta, Yangon, Palembang and Surabaya and is in trouble in Phuket, Bandung and Cebu (Kirinputra 2014).

The first group, called Eastern Hokkien, is in the north and encompasses Taiwan (Kirinputra 2014).

The second group, which we shall call Malayland Hokkien for lack of a better term, is spoken in Malaysia and in Indonesia in Sumatra and Kalimantan. Malayland is heavily laced with Teochew.

However, the Hokkien spoken in the Philippines is classed as Malayland Hokkien because it is intelligible with Southern Malayland Hokkien even though it is in the east.

Malayland is split into two languages, Southern Malayland Hokkien and Northern Malayland Hokkien. The first language, Northern Malayland Hokkien, was formerly spoken in Northern Malaysia from Taiping along the coast formerly all the way to Phuket, Thailand but is now spoken for the most part only to Penang and over to Terangganu in Malaysia and in Medan and other places in Northern Sumatra in Indonesia.

The language is also referred to as Penang Hokkien or Medan Hokkien, after the very similar dialects spoken in those cities. Terangganu Hokkien is different. On Penang Island, two dialects are spoken, Baba Hokkien, which is heavily-creolized, and Sin Khek Hokkien, a more pure variety. There are also differences between Penang Island Hokkien and Butterworth Hokkien spoken in Butterworth just across the strait.

Hokkien is still very widely spoken in Penang, and it is possible to go through your entire day speaking nothing but Hokkien.

Northern Malayland is still spoken up into Thailand towards Phuket and in the Burmese Panhandle all the way to Rangoon. In Myanmar, the speakers are mostly elderly, and the language is dying out. Burmese Hokkien looks very much like Penang because many speakers came from Penang to Rangoon. Northern Malayland is still spoken in Surat Thani on the east side of the peninsula in Thailand by a few older speakers. On the Phuket side of the peninsula facing the Indian Ocean, it has been decimated.

All varieties of Northern Malayland are apparently mutually intelligible.

Speakers of Northern Malayland have a hard time understanding the Southern Malayland spoken in Klang and Malacca. Southern Malayland speakers in general say they cannot understand Penang.

Northern Malayland Hokkien is more of a Zhangzhou variety in terms of its accent. It is also heavily creolized, with a lot of Malay and Thai embedded deeply in the language. The differences between the two Malayland Hokkien languages are as great as between Hokkien and Teochew. Intelligibility between the two may be as low as 50%.

In Kuala Lumpur and Selangor, Southern and Northern Malayland mix, and it is difficult to say which language is being spoken here. However, the variety spoken in Selangor, Selangor Hokkien, is best described as Southern Malayland, as they cannot understand Penang well. Hokkien is still very widely spoken in Selangor.

The second language, Southern Malayland Hokkien, encompasses Southern Malaysia from Johor up to Kelantan where it is known as in the cities of Selangor, Kelang, Malacca, Muar, Tangkak, Segamat, Batu Pahat, Pontian, Singapore, Riau, the Riau Islands, and Johor Bahru. Kelang Hokkien, and Johor Hokkien are recognized as specific dialects, and Hokkien is still very widely spoken in both cities.

It is also widely spoken in Singapore and Brunei. In Indonesia, it is spoken in the state of Riau as Riau Hokkien, which is very close to Singapore Hokkien, and the city of Bagansiapiapi on Sumatra. It is also spoken in Bangkok, Thailand and in Saigon, Vietnam, where it is dying out (Kirinputra 2014).

Southern Malayland is less creolized than Northern Malayland, if it is creolized at all. Southern Malayland is more of a Xiamen Hokkien variety, while Northern is a type of Zhangzhou.

Kelantan, Kelantanese or Kelantan Peranakan Hokkien is spoken in the Malay state of Kelantan. It is wildly creolized with Malay and is probably not intelligible with any other form of Hokkien.

The variety of Hokkien spoken in Kuching, Sarawak, Kuching Hokkien, is also very different and is said to resemble Kelantan Hokkien. Nevertheless the Hokkien dialect situation in Kelantan is poorly understood, and there are said to be two different types of Hokkien spoken in this area, Kelantan Hokkien A and Kelantan Hokkien B (Kirinputra 2014). Kelantanese is still widely spoken.

The version of Southern Malayland Hokkien spoken in Singapore is called Singapore Hokkien and is based on Amoy, and possibly even more on Jinmen, but speakers also came from Tong’an, Zhangzhou, Quanzhou, Anxi, and Hui’an. It is similar to Taiwanese, but Singaporean speakers can no longer understand Taiwanese well, though they have partial understanding of it. For instance, they have only 30-40% intelligibility with Yilan Taiwanese Hokkien.

Southern Malayland lies between Northern Malayland and Taiwanese Hokkien on the continuum.

A Singapore speaker, if immersed in Taiwan, could pick up Taiwanese fairly quickly, within three months.

Singapore has been isolated from Taiwanese for quite some time, so it has retained older features that are losing ground in mainland Hokkien varieties. Word-final unvoiced stops p, t and k and starting to be lost in Zhangzhou on the mainland and replaced with a glottal stop, whereas in Singapore, they are still preserved.

Many Malay, Cantonese and Teochew words have gone into Singapore which hinder understanding with Taiwanese speakers. Mutual intelligibility between Singapore and Hokkien is ~55%. Similarly, Singapore is no longer intelligible with Amoy.

Singapore speakers, even the older ones, now mix a lot of Mandarin, English and Malay in with their speech. They have been isolated from the main Hokkien-speaking communities in Amoy and Taiwan for so long that they have lost many of the subtler aspects of the language spoken in these areas.

Singapore has withered into a weakened and corrupted version of the more pure Hokkien spoken in Taiwan and Fujian. Further, the language has changed a lot since the Singaporean speakers left the region, and Singaporean Hokkien speakers have not kept up with the continuously evolving Hokkien language spoken in the Hokkien homeland.

Singaporean has also become so heavily admixed with Teochew that it is more properly seen as Hokkien-Teochew than Hokkien Proper.

Singapore has good intelligibility with Philippines Hokkien.

All varieties of Southern Malayland Hokkien spoken in Malaysia and Indonesia are fully intelligible with Singapore Hokkien.

A very pure dialect of Southern Malayland is spoken in the Indonesian city of Bagansiapiapi as Bagansiapiapi or Bagan Hokkien. It has avoided the Mandarinization of Hokkien that is occurring elsewhere. It also lacks influence from Cantonese and Teochew and has fewer loans from Austronesian and English compared to neighboring Southern Malayland or Philippines Hokkien speakers (Kirinputra 2014).

Much of the good intelligibility between Bagan and Taiwanese seems to be due to bilingual learning. They speak like the Hokkien speakers of Tong’an, China. There are only a few thousand speakers remaining, and the language seems to be on its way out.

Another very pure version is the moribund Southern Malayland dialect still spoken by a few people in Saigon, Saigon Hokkien (Kirinputra 2014).

The Southern Malayland dialect spoken in Bangkok is called Bangkok Hokkien and contains Malay loans.

This seems to imply a large trading community involving Saigon, Bangkok and Malayland which exchanged words via different speech forms (Kirinputra 2014).

Intelligibility of Bangkok and Saigon with the rest of Southern Malayland is not known, but it is assumed to be full.

The version of Southern Malayland spoken in the Philippines is called Banlam-ue, Banlamhue, Binamhue, Lanlang-ue, Minnanhua or Philippines Hokkien by speakers. Although its tones are quite different from Indonesian Southern Malayland Hokkien, the two varieties are fully intelligible. Hence Philippines Hokkien is a dialect of Southern Malayland.

Philippines is not readily intelligible with Standard Hokkien. Speakers came to the Philippines long ago, so their Hokkien contains many old words that have fallen out of other Hokkien varieties. It derives from the Jinjiang and Sheshi dialects on the outskirts of Quanzhou. Lanlang-ue means “our language.” Minnanhua is the name of this language in Mandarin (Kirinputra 2014).

At present, it is not intelligible with Quanzhou or Xiamen. That is, Philippines speakers claim that they can only understand about 70% of Taiwanese television.

Despite intelligibility issues, Philippines and Taiwanese have a very similar lexicon. The lexicons of both are similar to Amoy speech. Apparently the Amoy-Luzon-Taiwan trade route produced a convergence in the lexicons of these varieties (Kirinputra 2014). Philippines is full of Tagalog words. Philippines, like Northern Malayland, resembles Zhangzhou from the late 1800’s.

Phillippines is spoken in Manila, Cebu, Zambaonga, Sulu, and Jolo. The standard is based on the variety spoken in Manila. Zamboanga Hokkien differs from Manila Hokkien in that it has more Spanish and Chavacano borrowings and fewer Tagalog words. The dialect on Sulu Island, Sulu Hokkien, is different from the rest of Philippines, sounding more like Amoy and Taiwanese with a trace of Singapore. Cebu Hokkien, spoken on Cebu, resembles Jolo Hokkien, which is spoken on the far southern island of Jolo.

Cebu and Jolo Islands were part of an important route for smuggling goods into the Philippines for centuries. Most of the smugglers were Hokkien Chinese. Philippines is still widely spoken on Sulu, in Zamboanga and in the Binondo region of Manila. Cebu is in trouble with a declining number of speakers. The situation with Jolo is not known.

Southern Malayland, Riau, Klang, Johor, Singapore, Saigon, Bangkok, Bagansiapiapi, Northern Malayland, Penang, Medan, Baba, Shin Kek, Terangganu, Myanmar, Kelantan, Kelantan A, Kelantan B, Kuching, Philippines, Manila, Zamboanga, Sulu, Jolo, Cebu, Yilan, Amoy, Tong’an, Jinmen, Taiwanese, Tainan, Taipei City, Sulim, Taichung, Lugang, Gaoxiong, Quanzhou, Shishi, Jinjiang, Longhai, Hui’an, Anxi, Nan’an, Dehua, Zhangzhou, Zhangpu, Yinchuan, Dongshan, Yunxiao, Zhao’an, Zhangping, and Pinghe are all part of Hokkien, which has 54 lects, eight of which are separate languages.

There are 30 million speakers of Hokkien.

Southern Min: Chaoshan Min or Teochew

Chaoshan Min or Teochew is a macrolanguage spoken in a nine-county region of Guangdong. It is also spoken a lot in Thailand. Most Overseas Chinese in Thailand speak Teochew. The Mandarin name for the language is Chaozhou, but Teochew speakers do not accept that appellation and prefer Teochew instead.

Dialects of Teochew include Chaozhou Teochew, Jieyang or Kek’iôⁿ Teochew, Puning Teochew, Chenghai Teochew, Shantou Teochew, Chaoyang Teochew, Raoping Teochew, Jindengzhan Teochew, Nanao Teochew, Huidong Teochew, Huilai Teochew, Jiexi Teochew, Dabu Teochew, and Fengshun Teochew.

Standard Teochew is based on Chaozhou Teochew or what was formerly the Fucheng language.

Chaoyang Teochew is a highly divergent Teochew lect. The other Teochew varieties cannot understand Chaoyang.

Shantou Teochew, Raoping Teochew and Jieyang Teochew are spoken outside of the Chaoyang-speaking area which hugs the coastline southwest of the Shantou area (Kirinputra 2014), which may explain why they have a hard time understanding Chaoyang.

Shantou is more intelligible with Hokkien than other types of Teochew, but intelligibility is still only 54%. However, Hokkien is utterly unintelligible with Jieyang (Kirinputra 2014). This implies that Shantou and Jieyang are quite different. The implication is that Jieyang Teochew is a separate language.

Shantou speakers cannot understand Chaozhou, as Shantou is quite a bit different from the other Teochew lects, and they also seem to have a hard time understanding other Teochew lects, as they say the Teochew changes every hour or so as you travel and becomes difficult to understand. Shantou Teochew is a separate Teochew language.

Sources report that Teochew varieties can vary greatly in the pronunciation of even single words, and the tones can be quite different too.

Intelligibility data for Raoping, Huilai Teochew, and Jindengzhan Teochew with the rest of Teochew is not known.

Teochew was formed by a group of Hokkien Min speakers who broke off from Zhangzhou Hokkien about 600-1,100 years ago. They moved down to Northeastern Guangdong, and after hundreds of years, a heavy dose of some sort of unknown substrate languages went into the language, possibly including a Cantonese-type variety, producing modern Teochew (Kirinputra 2014).

Teochew has only 51% intelligibility with Xiamen (Cheng 1997).

Overseas Teochew is a significant branch of Teochew that is spoken outside of the Teochew are in China in Vietnam, Cambodia, Thailand, Malaysia, Indonesia, and the Philippines. Overseas Teochew is an extremely variable macrolanguage consisting of a number of different languages.

Malayland Teochew is spoken in Malaysia, Singapore and Indonesia. Malayland Teochew, instead of being a language, is a macrolanguage composed of several languages.

The Teochew variant spoken in Malaysia, Malay Teochew, is composed of many highly variant lects. A different Teochew variety is spoken in each subregion, and varieties sometimes differ dramatically in pronunciation and tones. Whether or not they are mutually intelligible is not known.

Malay Teochew is spoken in four different places in Malaysia in two places at the southern tip of the peninsula and in Kedah and North Perak on the far northwestern coast where there are substantial Teochew populations. Malay is not intelligible with other SE Asian Teochew varieties. Malay has converged more with Hokkien than other types of Teochew.

It seems logical to split at least North Perak Teochew and Kedah Teochew along with Southern Malay Teochew A and Southern Malay Teochew B for the time being.

Singapore Teochew is different from Malay, and both have undergone separate divergent influences, so each one should be regarded as a separate language. However, Singapore Teochew is similar to Shantou because most Singaporean speakers came from there. Singaporean is regarded by Teochew speakers on the mainland as a heavily corrupted and impure variety of Teochew. Singaporean is not intelligible with any of the Teochew spoken in China anymore, not even the Shantou that it came from.

It has come under such heavy influence from Singaporean Hokkien that it is not better regarded as Singaporean Teochew-Hokkien than a pure Teochew tongue. Many of the original Teochew terms have been replaced with Hokkien words. It is also now heavily admixed with Malay and a lot of the characteristics of Mainland Teochew have been lost.

There are variations even among Singaporean Teochew. Speakers of some of the coarser, more rural dialects can only understand 50% of the purer varieties. This is derived from the early days when only some of the immigrants from Shantou were educated and most were uneducated peasants. The peasants did not speak the same higher, more refined Shantou than the educated people did.

In time, the differences became more dramatic. As these varieties still exist, we can call them High Singaporean Teochew and Low Singaporean Teochew, two separate languages. Lo Thia Khiang, the leader of Singapore’s Workers Party, speaks High Singaporean Teochew and is poorly understood by speakers of Low Singapore Teochew.

The variety spoken in Medan, Indonesia on Sumatra, Medan Teochew, is particularly interesting. It has heavy Malay, Hokkien and Cantonese influence and cannot be understood by other Teochew speakers (Kirinputra 2014). The town of Brahang 12 miles from Medan speaks Teochew.

Teochew is also spoken in other places in Indonesia such as Riau, Dabo Singrep, Tanjung Penang, Bantam Island, and Pontianak.

The Teochew spoken in Indochina – in particular in Vietnam and Cambodia (Indochinese Teochew) is a macrolanguage. Some Indochinese Teochew speakers who have returned to their family villages on the mainland say they could only understand 70% of the speech there.

Cambodian Teochew speakers say that Cambodian Teochew, Vietnamese Teochew, and Thai Teochew are all separate languages, and they cannot understand each other (Tek 2016).

Thailand Teochew or Diojiu-we is spoken in Thailand. The Chinese lingua franca in Thailand is not Mandarin but Teochew. There are 5 million Chinese Thais with roots in the Teochew region, and 3 million of them speak Diojiuwe.

Teochew is spoken in the Philippines, but there is little information available about Philippines Teochew.

Chaoyang, Shantou, Raoping, Jieyang, Huilai, Jindengzhan, Thai, Cambodian, Vietnamese, Medan, Singapore, Malay, Kedah, North Perak, Southern Malay A and B, Borneo, and Philippines are part of the Teochew, which has 17 lects 12 of which are separate languages.

Teochew has 10 million speakers.

Southern Min: Hailufeng, Zhenan, Hainanese, Leizhou, Shaojiang, Puxian, Zhongshan, Coastal, She and Datian Min

Hailufeng Min

Hailok’hong, Hailufeng or Haklau Min is a separate language in Southern Min that represents a later move of Zhangzhou speakers 400-500 years ago towards Northeastern Guangdong by the same group that formed Teochew. Since then there has been convergence with Teochew (Kirinputra 2014). It also has substantial Hakka influence. Hailok’hong (Haklau) Min is spoken down the coast between the Teochew zone and the Hong Kong area.

Hailufeng Min is usually better known as Hailok’hong or Haklou Min. It has at least three dialects, Haifeng Hailufeng Min, Lufeng Hailufeng Min, and Shanwei Hailufeng Min, and has limited intelligibility of Teochew proper.

The city of Haifeng has mostly Hailufeng speakers. Lufeng is spoken in the western half of Lufeng. Shanwei is the name of the prefectural city that encompasses Lufeng and Haifeng Counties. Shanwei Min is spoken more in the urban area of Shanwei.

Intelligibility among the three main Hailufeng Min varieties is full.

There is a group of Hailufeng speakers who originally came from Shanwei living in Hong Kong as part of the Tanka fisherpeople community. They live in the northern part of Hong Kong north of the Hokkien-speaking Tankas. They originally came from the Shanwei area which is just to the north. We will call them Hong Kong Tanka Hailufeng Min for now. Intelligibility data for this lect is not available.

Many insist that Hailufeng is a Teochew language because this area was redistricted into the Teochew area administratively in the 20th Century. Chinese people are jealously loyal to their home districts and see all languages spoken in their district in geographical and not linguistic terms. So to admit that Hailufeng is not Teochew would be a sort of treason to the homeland if you will (Kirinputra 2014). The area where the language is spoken along the coast of Guangdong is actually to the south of the Teochew area.

Hailufeng is said to be halfway between Teochew and Zhangzhou. Hailok’hong or Haklou etymologically is Haihong + Lok’hong, which is the same thing Haifeng + Lufeng, so it is a combination of Haifeng and Lufeng. Haklau is also cognate with Hokkien Holo and Cantonese Hoklo, referring either to Taiwanese Hokkien or Teochew. In an overall sense, it meant Hokkien + Teochew, which is a good description of the language (Kirinputra 2014). Hailufeng is still confused a lot with Hokkien in many casual descriptions.

Many Hailufeng speakers can now understand Teochew, but that is due to bilingual learning (Kirinputra 2014).

Lufeng is said to have over 90% intelligibility with Xiamen Hokkien, but if it is really halfway between Teochew and Hokkien, it should have 75% intelligibility instead. Intelligibility testing may be needed. There are 3 million speakers of Hailufeng Min.

Zhenan Min

Zhenan Min, spoken in pockets in Yixing, Anji, and Linan in Southern Jiangsu and Wenzhou in Changxing in Southern Zhejiang Province around Pingyang and Cangnan and in the Zhoushan Islands, is a separate language. Speakers are found in Anhui Guangde, Nigguo, Langxi, the eastern part of Wuhu, Jiangxi Shangrao, Yushan Island, and Guangfeng County, in addition to Pucheng on the northern border of Fujian. It is spoken along the coast far to the north of the general Min-speaking area.

Zhenan Min has 574,000848,000 speakers. Zhenan Min is influenced by Eastern and Northern Min and has limited intelligibility with other Min languages. In the area around Wenzhou, it has come under heavy Wenzhou Wu and Manhua Wu influence. Zhenan Min is still confused with Hokkien in casual descriptions.

Intelligibility among Zhenan Min varieties is not known. Zhenan Min is a result of a migration of Hokkien speakers from Hui’an, Jinjiang, Quanzhou, Nan’an, Xiamen, and Jinmen to the area in middle of the Ming Dynasty about 800 years ago due to pirate attacks and civil wars in the region they fled from. Once they arrived at their new home, high waves prevented them from returning, so they decided to make their new homes here in the north.

Jujiang Zhenan Min is spoken in Taishan County near the Manhua-speaking area.

Baizhang Zhenan Min is spoken as a dialect island in the south of Taishan County. It has come under severe influence from Luoyang Wu and Manhua. It is presently near extinction. Baizhang appears to be a dialect of Jujiang.

Ruoshan Zhenan Min has heavy Wu influence.

Taishun Zhenan Min has 14,000 speakers

Dongtou Zhenan Min has 52,000 speakers,

Pingyang Zhenan Min has 243,000 speakers

Cangnan Zhenan Min has 484,000 speakers.

In Yixing County, half the population speaks Zhenan Min.

Peng River, Fenwenxiang, Lake, Changxing, Liyang, Sanyang, Shiyang, Pengxi, Jujiang, Baizhang, Pingyang Aojiang, Yushan Island, Jingning, Yixing, Anji, Anhui Guangde, Taishun, Nigguo, Langxi, Northern Rui’an, Ni Island, Wuhu, Wenling Shitang, Dongtou, Ruoshan, Jiangxi Shangrao, Shengshi Island, Guangfeng, Linan, South Cangnan, Dongtou ,Yuhuan, Longhai, Lengkeng, Zhangpu, Anxi, Hui’an, Kengkou, Lengkugang, and Tong’an are all part of Zhenan Min, which has at least 41 lects.

Qiongwen Min (Hainanese and Leizhou Min)

Qiongwen Min is spoken on Hainan Island and to the north on the mainland. It has two divisions, Hainanese Min and Leizhou Min.

Hainanese Min has 8 million speakers, 5 million on Hainan and 3 million more overseas. It has the lowest intelligibility with the rest of Southern Min of all of the other Min Nan lects.

Qiongwen itself has 16 separate lects, all spoken on Hainan. Whether any of them are separate languages is not known. It is split into various lects, which in turn are split into various sublects.

The Funcheng Group of Hainanese Min is divided into nine lects, Chengmai Hainanese Min, Dingan Hainanese Min, Haikou Hainanese Min, Changliu Hainanese Min, Lingao Hainanese Min, Qiongzhong Hainanese Min, Qionghoi Hainanese Min, Bun-Sio Hainanese Min, and Tunchang Hainanese Min.

Intelligibility data is not available for Haikou Hainanese Min and Qionghoi Hainanese Min, but most of the vocabulary is not the same in these two lects.

Haikou Hainanese Min is spoken in Haikou City and a few miles away in Qiongshan County. There are no significant differences between the language of Haikou City districts and the suburbs.

Changliu city, six miles to the west, speaks Changliu Hainanese Min, a very closely related variety which appears to be intelligible with Haikou.

In between, residents speak both Changliu and Haikou.

Changliu is closely related to Lingao Hainanese Min spoken in Lingao County, and the two are mutually intelligible.

Chengmai Hainanese Min is spoken near Haikou.

A grammar written around 1900 on the Bun-Sio dialect of Hainanese Min stated that a number of the more distant Hainanese Min varieties were “perfectly unintelligible” to Bun-Sio Hainanese Min speakers (De Souza 1903).

Bun-Sio is spoken in an area called the Bun-Sio District, also known as the Wenchang District, on Hainan. This region encompasses the far northeastern end of the island. There are also Hainanese Min speakers in Malaysia and Vietnam. These speakers speak a version of Bun-Sio which looks a lot like the type described 100 years ago.

From a glance at this grammar, Bun-Sio or Wenchang Hainanese Min has more of a Tai-Kadai substrate than Southern Min in general. There is also a trace of Cantonese and more of a Mandarin influence than in the rest of Hokkien and Teochew. All in all, it is probably acceptable to split off Bun-Sio as a separate language.

Hainanese tones also vary from region to region, once again implying more than one language. The Hainanese Min tone system does not seem to be well described.

Leizhou Min is made up of two main groups: Leizhou Min and Zhanjiang Min. Leizhou Min is a separate language, and it has a close relationship with Hainanese. Nevertheless, Leizhou consists of seven different lects. Haikang is a dialect of Leizhou.

At least some of the other six Leizhou varieties are very different in phonology and lexicon. Intelligibility data is not known, but they may be mutually intelligible. Leizhou, with four million speakers, has low intelligibility with other Min varieties and has only 85% intelligibility with Hainanese, similar to Spanish and Portuguese.

Zhanjiang Min is apparently not intelligible with Leizhou. It is spoken in Zhanjiang City in the far southwest of Guangdong. It seems to be a separate language.

Shaojiang Min or Min Gan

Shaojiang Min or Min Gan is a completely separate high-level division of Southern Min. It is spoken in Nanping County in the far northwest of Fujian bordering the Northern Min and Wu-speaking area to the east by about 984,000 people. It has four languages inside of it – Shaowu Shaojiang Min, Guangze Shaojiang Min, Jiangle Shaojiang Min, and Shunchang Shaojiang Min – that have limited mutual intelligibility. There are subdialects within these larger lects.

The substratum of Shaojiang is not for the most part Min, Gan or Hakka – instead, it is the ancient Baiyue language, however, there are lesser Hakka and Gan influences. Others say that this is not Southern Min at all. Instead it is a division of Northern Min where Central Min is also included. This would make sense due to its location and the fact that Shaojiang split away from Northern Min several hundred years ago. These are Northern Min speakers who came under heavy influence of Hakka, Gan, and Baiyue.

Shaowu, Guangze, Jiangle, and Shunchang are all part of Shaojiang, which has four lects, all are separate languages.

Puxian Min

Puxian Min or Hinghua has already been identified as a separate language. It is spoken on the southeast coast of Fujian. Puxian Min is thought to have a close relationship with Hokkien. It was probably a Proto-Hokkien variety that broke away and came under serious Eastern Min influence and hence became a separate language.

It has limited intelligibility of other Min languages – for instance, Puxian Min has 60% intelligibility of Xiamen Hokkien Min, but the mutual intelligibility is lopsided, as Xiamen intelligibility with Puxian Min is lower at 30% (Terng 2016). Hence Puxian-Xiamen intelligibility is only 45% (Terng 2016).

The name is derived from the names of two different cities in China where this language is spoken – “Pu” for Putian and “Xian” for Xianyou.

Puxian Min has seven dialects. There is full intelligibility between all of the dialects, although there are some minor pronunciation and vocabulary differences (Terng 2016). The two main divisions of Puxian Min are into Putian Puxian Min and Xianyou Puxian Min, hence the name Puxian Min being a mix of the two main varieties. Both are dialects of the main Puxian Min language.

There are at least four subdialects spoken in Putian County, all subdialects of Putian Puxian Min. They are Jiangyou Putian Puxian Min, Changli Putian Puxian Min, two spoken in Putian City called North Putian City Puxian Min and South Putian City Puxian Min. There are other Putian Puxian varieties spoken in the county to the north and south of the Putian City other than Chengli and Jiangyou, but their names are not known. We will call them North Putian County Puxian Min and South Putian County Puxian Min.

There are three dialects spoken in Xianyou County, one in Xianyou City called Xianyou City Puxian Min or Central Xianyou Puxian Min, another in the north of the county called North Xianyou County Puxian Min, and a third in the south of county called South Xianyou County Puxian Min. All are subdialects of a single dialect of Puxian Min, Xianyou Puxian Min. All three subdialects are fully intelligible with each other with only some minor differences in pronunciation and some different vocabulary (Terng 2016).

For instance, North Xianyou kou, “to throw,” is lacking in Xianyou City.

South Xianyou has [i] and [e] for [y] and [ɵ] in Xianyou City and

North Xianyou has [θ] for Xianyou City [ɬ] (Terng 2016).

Xianyou city trades a lot with the north and south of the county, so there is a lot of contact between the subdialects. The city gets rice and rice-derived goods from the south and fish and shellfish from the south.

There is also a lot of intermarriage between speakers of the three subdialects. Most speakers of one of the Xianyou dialects have relatives who speak another of the dialects. The only research on Xianyou Putian Min has focused on the dialect of the city – Central Xianyou – with other two dialects being poorly known (Terng 2016).

Intelligibility between Xianyou and Putian Puxian Min is good at 90%-100%. There are some vocabulary differences.

For instance, “white”: Xianyou City pann, Chengli Putian 城里, Putian City pa; “officer”: Xianyou City kuann, Chengli Putian melon kua, are two pairs that cause some confusion. In these cases, Chengli Putian has lost nasalization that Xianyou City has retained. As we shall see below, loss of final nasalization is not just seen in Chengli Putian but in all of Putian. Nevertheless, Xianyou City intelligibility of Chengli Putian is full at 100% (Terng 2016).

There is some different vocabulary there too, and in some cases of common words, the differences are striking.

For instance, “children”: Xianyou kann en, Putian ta a; “wet”: Xianyou iunn, Putian tang. Once again we see than Xianyou has retained the older nasalization, whereas it appears that all of Putian, not just Chengli, has lost it (Terng 2016).

There are also rhyme differences between Putian and Xianyou. Xianyou has retained more rhymes at 50 rhymes, whereas Chengli Putian has 40, and Jiangyou Putian has 36 rhymes (Terng 2016).

So in addition to loss of nasalization, there may have been rhyme reduction in Putian also. It appears that Xianyou may be the older form of the Puxian Min language and that Putian broke away from it more recently.

Jiangyou Putian’s 36 rhymes versus Xianyou’s 50 rhymes leads to some difficulties in communication, however, Xianyou retains full intelligibility of Jiangyou at 90% (Terng 2016).

However, there is a form of Puxian Min spoken in Singapore, Hinghua Puxian Min, which lacks full intelligibility with Puxian Min in China. Hinghwa Puxian Min speakers are a minority in Singapore, and their language has mixed a lot with Singapore Hokkien, Malay, English, and other languages spoken in Singapore, resulting in a separate language.

South Putian City, North Putian City, Chengli, Jiangyou, North Putian County, and South Putian County are part of Putian Puxian Min.

Xianyou City or Central, South Xianyou, and North Xianyou are part of Xianyou Puxian Min.

Xianyou City, South Xianyou, North Xianyou, South Putian City, North Putian City, Chengli, Jiangyou, North Putian County, South Putian County, and Highwa are all part of Puxian Min, which has 10 lects, two of which are separate languages.

Zhongshan Min

In Guangdong Province in the Pearl River Delta near Hong Kong, there is a a large, divergent split in Min Nan called Zhongshan Min.

Zhongshan Min, a macrolanguage, has 130-150,000 speakers and has limited intelligibility with other Min lects. It is located to the south of Hailufeng Min just north of the Cantonese zone along the Southern Guangdong Coast.

This group is possibly a Northern or Eastern Min group stranded far down in Guangdong. They are sometimes referred to in old literature as “Northeastern Min”. That’s not really a category. It often means Northern Min, but sometimes it means Eastern Min. These languages have all borrowed extensively from Siyi Cantonese spoken in the Pearl River Delta.

Looking at the whole picture, it appears that various immigrants speaking Puxian Min, Northern Min, and Southern Min all settled around Zhongshan. These various Min elements, along with a hefty dose of Cantonese, have gone into the creation of Zhongshan Min.

Two Zhongshan lects, Namlong or Zhangjiabian Zhongshan Min (also spoken in Zhongshan), and Sanxiang Zhongshan Min, are separate languages. Each one is a dialect island surrounded by Cantonese speakers, and all three populations are unconnected.

Namlong is spoken 10 miles southeast of Zhongshan in Cuiheng. It is also spoken in Namlong and Zhangjiabian.

Sanxiang is spoken to the south of Zhongshan in the hilly rural areas.

The third is called Longdu Min and is also a separate language (evidence here and here). It is spoken in the southwest corner of Zhongshan City in Shaxi and Dayong.

In Chinese, Longdu, Namlong and Sanxiang are referred to as All-Lung Min, South Gourd Min, and Three Rural Min respectively. Sources give Longdu and Namlong 100,000 speakers and Sanxiang 30,000 speakers. 14% of the population of Zhongshan speaks Zhongshan Min. Namlong now has mostly elderly speakers.

Sanxiang, Namlong, and Longdu are apparently not mutually intelligible, although Namlong is close to Longdu.

Sanxiang is more divergent. Further, there are more dialects within these three languages, and dialectal divergence is considerable.

Sanxiang Min has at least two dialects, Phao Zhongshan Min and Tiopou Zhongshan Min. Phao is fairly uniform across a number of villages, but Tiopou is quite different. Nevertheless, there is near-full intelligibility between Phao and Tiopou (Bodman 1988).

For now, it is best to list Sanxiang, Namlong, and Longdu as separate languages, with possible dialects Phao, Tiopou, Namlong A, Namlong B, Longdu A, and Longdu B, among them.

Longyan Min or Coastal Min

Longyan Min or Coastal Min (Branner 2008) is a separate language. It is spoken in Longyan City’s Xinluo District and Zhangping City deep inside Fujian to the west of the Hokkien-speaking area. There is an overseas group of Coastal Min speakers in Malaysia in Penang around Parit Buntar. Although the language has been dying out in Malaysia for some time now, the language is still quite alive in Parit Buntar.

The language has anywhere from 300,000 (Branner 2008) to 740,000 speakers and has limited intelligibility with other Min languages. It has heavy Hakka influence due to the large number of Hakka speakers in the surrounding areas. Some put Coastal Min in a Southern Min Nan division of its own, others put it in Hokkien, and others put it outside of all other major Min varieties in its own Min category. The best analysis seems to be that it belongs in its own Southern Min division.

Koongfu Coastal Min and Shizhong Coastal Min are dialects of Coastal Min, but on examination, they are quite different. Koongfu is spoken in Kanshi Township in Yongding County. Shizhong is spoken in Southern Longyan County. Considering the rather extreme divergence of Coastal Min varieties in Wan’an, Koongfu Coastal Min and Shizhong Coastal Min are separate languages.

Another Coastal Min group is best called Wan’an Coastal Min. This is actually a macrolanguage comprising a number of separate languages in Wan’an County of Fujian.

Wan’an and Longyan are not mutually intelligible (Branner 2008).

Wan’an is a small township in northwestern Longyan County in Western Fujian which consists of very rugged, hard to access mountains with scattered very isolated villages made up of poor farmers. Some of these villages were visited for the first time by a Westerner only in the 20 years (Branner 2000).

To give you an idea of how remote the area is, to walk between two villages in Wan’an would take six difficult and confusing hours down ancient cobblestone paths through dark forests. But to take a bus between the two towns that are six hours walking distance away would take three days (Branner 2000)!

There are 13 varieties of Wan’an Min spoken in Western Fujian.

Among them are Wenheng Longgang Wan’an Coastal Min, Xi Wan’an Coastal Min, Xiangxi Wan’an Coastal Min, Shikou Wan’an Coastal Min, Wuzhai Longyan Wan’an Coastal Min, Songyang Longyan Wan’an Coastal Min, Baisha Youshui Longyan Wan’an Coastal Min, Tutan Longyan Wan’an Coastal Min, Shiahtsuen Buhyun Liliing Wan’an Coastal Min, Shanghang Buhyun Liliing Wan’an Coastal Min, Shanghang Gutian Laifang Wan’an Coastal Min, Shanghang Guanzhuang Shangzhuo Wan’an Coastal Min, and Shanghang Baisha Pengxin Wan’an Coastal Min. All are spoken in Wan’an township except  Shiahtsuen Buhyun Liling, which is spoken in Laiyuan Township in Southeastern Liancheng County (Branner 2000).

With many of these lects, they don’t understand each other at first, but after they talk to each other for a while, they start to figure out the other variety (Branner 2008). Owing to difficult intelligibility from village to village, the best analysis seems to be that all of the above are separate languages. Intelligibility among the Wan’an languages is ~70%.

Coastal Min seems to have about 85% intelligibility with Taiwanese Min. The intelligibility of Coastal Min with Penang Northern Malayland Hokkien is very poor.

She Min

A very strange variety called She Min is spoken by the She people in Zhejiang, Fujian and Guangdong. The She language was originally Hmong-Mien, which then added a Cantonese layer, then a Hakka layer, next a Min layer, and in Zhejiang, a Wu layer. It is best described as a Hmong-Mien language that has been Sinicized. There are probably 200,000 speakers of this language.

Zhejiang She Min is no doubt a separate language due to the distance between it and the other two principal varieties in addition to the Wu layer.

Fujian She Min is also a separate language.

In Eastern Guangdong, the She speak Chaosan or Teochew She Min. They live in the Phoenix Mountains in Chao’an County in Chaozhou Prefecture. The language has had heavy contact with Teochew. This is probably a separate language, unintelligible with other She languages and Teochew.

There is also an original She language that is non-Sinitic (Hmong-Mien) and is spoken by only about 1,000 people in Guangdong.

Datian Min

Datian Min in Fujian is also a separate language. Datian Min is in its own group in Min Nan.


Hakka is an extremely diverse group of languages spoken in Southern China. There may be up to 1,000 lects in Hakka. The dialect situation with Hakka is quite confused and somewhat contradictory. Some speakers report adequate intelligibility between lects, while others report difficulty. There are also reports of great diversity and difficult intelligibility even from village to village in Western Fujian, Gannan County in Jiangxi and Northern Guangdong. Intelligibility testing could clear up some of the confusion.

Hakka Proper (Meixian or Moiyen, formerly Jieyang) is spoken in Mei County in Northeastern Guangdong.

Hakka is very different from all other forms of Chinese. Although Southern Min and Hakka are said to be close, Taiwanese Hokkien can understand only 1% of even Taiwanese Hakka.

Meixian Hakka is the central Hakka version used as Standard Hakka. It is at least understood by 75% of Hakka speakers, so it is often used for communicating with Hakkas who speak other Hakka languages. Meixian was chosen as the standard because the region where it is spoken is one of the major strongholds of Hakka language and culture. In addition, it has preserved most of the original Hakka phonology and has less influence from Cantonese and Hokkien.

Nevertheless, Changting Hakka preserves more of the original Hakka than Meixian does.

Xingning Hakka, Zhenping Hakka, and Wuhua Hakka are all dialects of Meixian.

Wuhua Hakka or related varieties include the varieties of Wuhua County, Jiexi Hakka, Northern Bao’an Hakka, and Eastern Dongguan Hakka in Northern Guangdong; Shaoguan Hakka in Sichuan, and Tonggu Hakka in Jiangxi.

Tonggu speakers came from Wuhua a while back. Intelligibility data for these varieties is not available, but Tonggu Hakka is in its own separate group of Hakka, so it must be a separate language.

Meixian was formerly known as Jiaying Hakka. The Hakka varieties of Meixian, Pingyuan Hakka, Dabu Hakka, Xingning, Wuhua, and Jiaoling Hakka used to be included in Jiaying.

Dapu or Dabu Hakka, while close to Meixian, is a separate language. It is spoken in Dapu County, Guangdong. Dapu was the basis for Taichung Dongshi Hakka spoken in Taiwan. Actually, Donshi Hakka was derived directly from Chisan Hakka spoken by the founder of the Hakka community in the county. However, Donshi is now very different from Chisan. Intelligibility data for Chisan is not available.

Fengshun Hakka is a dialect of Dapu. Fengshun has five different varieties. Fengshun is also spoken in Bangkok as Bangkok Fengshun Hakka. Although it has been affected by Teochew influence in Bangkok, Bangkok Fengshun is still relatively pure.

Hopo Hakka is not intelligible with Dabu, Hailu or Meixian. Hopo Hakka has deep influence from Teochew because it is located right next to the Teochew area.

Chaoyang Hakka, Jieyang Hakka, Raoping Hakka, and Huilai Hakka are all dialects of Hopo.

Longchuan Hakka in Northeastern Guangdong is a separate language, with poor intelligibility with other Hakka lects.

Longchuan has six different lects, Huangbu Hakka, Sidu Hakka, Chetian Hakka, Huiyang Hakka, Huicheng Hakka, and Tuocheng Hakka.

Longchuan has heavy Cantonese and Teochew influence. It is mostly spoken in Huicheng District and Bolou County.

Sidu and Tuocheng are close and are probably dialects of Longchuan. Sidu has 18,000 speakers.

Intelligibility data on Huangbu Hakka, Huiyang Hakka, and Chetian Hakka is not known. Huiyang is close to Hong Kong Hakka. However, diversity is great within Longchuan, and dialects differ from village, with difficult intelligibility from village to village.

Boluo Hakka and Heyuan Hakka are separate languages, not mutually intelligible.

Longchuan, Boluo and Heyuan are quite distant from other Hakka.

Huizhou Hakka is in its own group of Hakka, so it must be a separate language. Huizhou is heavily spoken in Huizhou City. Huizhou is not intelligible with Moiyen, Taipu, Hopo, or Taiwanese.

Banshan Hakka is spoken in the Chengkang District of Tangnan town in close proximity to Jindengzhan village, where Teochew is spoken, and Changlin village in Tangnan town in Fengshun, Guangdong where Hakka called Changlin Hakka is spoken. Banshan is a dialect island surrounded by Teochew. Banshan may have significant Teochew influence. Banshan is quite probably a separate language.

Liannan Hakka is spoken in Northwest Guangdong and Wengyuan Hakka is spoken in Northwest Guangdong. They are members of the Yuebai Group of Hakka, which is highly divergent.

In Northern Guangdong, there may be many different Hakka languages, since dialects tend to differ from village to village, and in many cases, communication is difficult between villages.

The Yuemin Group of Hakka from Southern Fujian and Southeastern Guangdong is a separate language.

Heyuan Hakka is spoken in Central Guangdong.

Jiexi Hakka is spoken in Southeastern Guangdong.

Dongguan Qingxi Hakka is spoken in South-Central Guangdong.

Haifeng Hakka, Lufeng Hakka, and Luhe Hakka, located near each other in Haifeng, Lufeng, and Luhe Counties in Shanwei City of Guangdong, appear to be dialects of a separate language called Hailufeng Hakka. It is spoken most heavily in Luhe County, where most people speak Hakka. This is a Hakka with heavy influence from Hailufeng Min.

Sanxiang Hakka, spoken in Zhongshan Prefecture, is different from all other Hakka. In all probability, it is a separate language.

Hong Kong Hakka is not intelligible with the Hakka spoken on Taiwan, nor with Dabu and has no intelligibility of Meixian. Hong Kong Hakka is spoken in the New Territories in Sai Kung Peninsula, Shatin, Taipo, Shataukok, Tsuen Wan, Sai Kung Yam Tin Chi, Island Bridge, Ho Sheung Heung, Yen Kong, Ebara,and Eastern Yuen Long. It is close to Huiyang and Bao’an. They came to the area from the overpopulating Eastern Guangdong around 1650. By 1700, they had built more than 400 Hakka villages in the Hong Kong area. They may have some from the Huiyang area.

Intelligibility between Hong Kong Hakka, Huiyang and Bao’an is not available.

Despite the fact that Hong Kong Hakka lects seem similar to Hakka lects spoken in Eastern and Northeastern Guangdong, many Hong Kong Hakka trace their origins to Guangxi.

Hong Kong Hakka has three principal dialects, Dongguan Hakka, Taipu Hakka, and Wakia Hakka. The language is similar to the Hakka spoken around Huiyang in Eastern Guangdong. They moved from that area to Hong Kong as the beginning of the Qing Dynasty, so they came to Hong Kong 375 years ago.

Dongguan Hakka is spoken near Hong Kong.

Taipu or Taipo Hakka is spoken in the village of the same name in Hong Kong.

Wakia Hakka is also spoken in Hong Kong.

Intelligibility between the Hong Kong varieties is not known.

A variety of Hong Kong Hakka spoken in a part of Hong Kong called Shataukok, Satdiugok, Sathewkok, Shataukok, Satdiukok or Satdiugok Hakka. It is different from the rest of Hong Kong Hakka, and evidence indicates that Shataukok Hakka may indeed be a separate language.

Shataukok has a number of dialects within it, and they are different, but they may be more or less mutually intelligible. However, the MI is difficult to characterize, as it is said that speakers of other dialects can “get the gist” of what the other speakers are saying. “Getting the gist” of a variety usually implies less than 90% intelligibility.

Another variety of Hong Kong Hakka is spoken in Shuijian Village in the southern part of Yuen Long. This lect is completely different form the rest of Hong Kong Hakka. They moved to Hong Kong from Western Fujian 150 years ago. It is said to be similar to Boluo Hakka in Northeastern Guangdong, but this has not been proven.

The best name for this is Shujian Hakka, and it is best seen as a separate language, completely apart from the rest of Hong Kong Hakka. This language is now spoken only by older people who are ashamed of their language and generally refuse to speak it with outsiders.

Located near Hong Kong, Shenzhen/Bao’an Hakka is a separate language. However, it is close to Hong Kong Hakka.

The Gannan Hakka Group spoken in Southern Jiangxi is extremely diverse compared to the Hakka of Guangdong and Fujian. Gannan Hakka varieties differ even from village to village

With Gannan, we may be dealing with a situation of many different languages, as with Wu, Hui, Tuhua, and Xiang. In fact, it quite possible that with Jiangxi Hakka, we may be dealing with every Hakka variety being a separate language.

There are two separate groups there, Bendi Hakka and Keji Hakka. Bendi varieties are some of the most divergent Hakka varieties of all, while Keji varieties are more traditional, having moved out of the core Jiaying area within the last 300 years.

Xingguo Hakka is separate language spoken in Xingguo County in Ganzhuo Prefecture.

Ningdu Hakka is in all probability a separate language.

Ruijin Hakka, spoken in Southeastern Jiangxi, is very different and may well be a separate language. It looks a lot like Gan.

Xinfeng Tieshikou Hakka is in all probability a separate language, spoken in Xinfeng County by 90% of the population.

Many extremely diverse forms of Hakka are spoken in Fujian. Sources say that each Hakka village in Western Fujian speaks its own variety, and that the varieties are far enough apart to make communication from village to village very difficult.

The wildly diverse Tingzhou Hakka Group is spoken in Western Fujian. Even within this group, there are separate languages, including Tingzhou Hakka, Yongding Hakka, Liancheng Hakka, Changting Hakka, Xinquan Hakka, Qingliu Hakka, Mingxi Hakka, Taishun Hakka, Ninghua Hakka, Basel Mission Hakka,  Sanhang Hakka, and probably Gucheng Hakka.

Hakka is also spoken in far Southern Zhejiang in Taishun County.

Taishun Hakka is spoken there, but it has only 1,600 elderly speakers. It has 2,600 speakers.

Taishun She Hakka is spoken by the She minority in that county.

In recent years, both have come under the heavy influence of Luoyang Wu, Zhenan Min and Manhua.

Zhaoan Xiuzhuan Hakka, spoken in Southern Fujian, is a separate language.

Luoyuan She Hakka is spoken in Western Fujian. It is an extremely diverse form of Hakka that differs from all other Hakka. It must surely be a separate language.

Therefore, we conclude that in addition to the above, we will add Wuping Hakka, Longyan Hakka, Zhaoan Hakka, Yunxiao Hakka, Shangsixiang Hakka, Fuding Hakka, Fuan Hakka, Gucheng Hakka and Nanjing Qujiang Hakka.

Within Longyan Hakka, in one county, Lingcheng County, there is a huge variety of dialects, including Xinquan Linguo Liancheng Hakka, Xinquan Lelian Liancheng Hakka, Pengkou Wangcheng Liancheng Hakka, Miaoqian Zhixi Liancheng Hakka, Gechuan Zhuyu Liancheng Hakka, Miaoqian Jiangshe Liancheng Hakka, Sibao Shangjian Zhenbian Liancheng Hakka, Juxi Gaoding Liancheng Hakka, Liancheng Tangqian Dikeng Liancheng Hakka, Wenheng Hengming Liancheng Hakka, Xinquan Dongnancun Liancheng Hakka, Quxi Puxi Dongxiduan Liancheng Hakka, Quxi Qiaotou Liancheng Hakka, Xuanhe Shengxing Liancheng Hakka, Pengkou Wangcheng Liancheng Hakka, and Liwu Nanban Zhangwu Liancheng Hakka (Branner 2008).

Whether these are dialects of separate languages is difficult to determine. Usually they cannot understand each other at first, but after a while, they figure out how to communicate with each other (Branner 2008). There is significant enough difficulty in communicating between these villages that a local Mandarin dialect is used for inter-village communication (Branner 2008), suggesting difficult communication from village to village. This suggests that it is valid to split all of the above off into separate languages.

Hakka is also spoken in the south of Guangxi. There are 3.6 million Hakka speakers in Guangxi.

Dayu Hakka is spoken in Southern Guangxi.

Mengshan Xihe Hakka is spoken in Eastern Guangxi.

Each one is probably a separate language.

Mashan Old Naxing Hakka is spoken in Mashan Old Naxing village in Guangxi. It is located far from other Hakka and has come under the influence of other Sinitic and non-Sinitic languages such that it is now very different. It is surely a separate language.

Binyang Hakka is also spoken in Guangxi. They are Meixian speakers who came to Guangxi 400 years ago. The language is now very different from Meixian. It is quite probably a separate language.

Hakka speakers immigrated to Sichuan a long time ago.

Chengdu Hakka is spoken in Chengdu, Sichuan. It is quite different from other forms of Hakka and has poor intelligibility with other forms. At the moment, Hakka is the main means of communication in the Jinjiang, Jinniu, Chenghua, Longquanyi, Xindu, and Qingbaijiang Districts in Chengdu.

Longcheng Hakka is spoken in Longcheng by Hakka who immigrated there a long time ago. It has since come under heavy influence from Longcheng Southwestern Mandarin.

Five Hakka varieties – Longchang, Longtanshi Hakka , Yilong Hakka, Panlong Hakka, Xindu Hakka, and Huanglianguan Hakka are the main Hakka dialect islands in Sichuan. Although they have commonalities, they are all also quite different. Quite probably all of them are separate languages.

Longtanshi Hakka speakers came from Mei County in Guangdong long ago, but now Meixian and Longtanshi are very different. It resembles Wuhua and Xingning more and has since come under heavy influence from Chengdu Southwestern Mandarin.

Yilong Hakka speakers came to Sichuan 200 years ago.

Hakka varieties are also spoken in Sansheng, Tianhui, Shiling, Xihe, Shibantan, Taixing and Longwang in Sichuan. Intelligibility data is not available for Sansheng Hakka, Tianhui Hakka, Shiling Hakka, Xihe Hakka, Shibantan Hakka, Taixing Hakka, and Longwang Hakka. All have come under heavy influence from Southwestern Mandarin.

A distinct variety of Hakka is spoken by 2,300 Hakkas in Hainan. Hainanese Hakka is distinct and unintelligible with Mainland Hakka.

On Taiwan, Sixian (Four Counties) Taiwanese Hakka, Dongshi or Dapu Taiwanese Hakka and Hailu Taiwanese Hakka are not mutually intelligible, nor is the mixed Gaoxiong Taiwanese Hakka variety created in order that these three varieties could communicate with each other.

The present koine is called Sihai Taiwanese Hakka and is a combination of Sixian Taiwanese Hakka and Hailu Taiwanese Hakka, the two most widely spoken lects. Dongshi Taiwanese Hakka comes from Dapu County, Guangdong. Hailu Hakka comes from Huizhou prefecture.

Sixian itself is currently the most widely spoken Hakka variety in Taiwan. The name comes from the four Guangdong counties of Meixian, Jiaoling, Xingning, and Pingyuan. But the Sixian speakers who came to Taiwan generally came from Jiaoling, so Sixian currently resembles Jiaoling Hakka more than Meixian. Sixian is divided into two main dialects, Miaoli Taiwanese Hakka and Liudui Taiwanese Hakka. The differences between the two appear to be great, and they may well be separate languages.

Xingning Taiwanese Hakka is also still spoken in a few places. It is probably a dialect of Sixian.

Changle Taiwanese Hakka, now almost extinct, is almost certainly a Sixian. Changle speakers came from Wuhua County in Guangdong.

Zhao’an Taiwanese Hakka is very different and must be a separate language. Zhao’an comes from the Zhao’an, Pinghe, Nanjing, and Hua’an Counties of Zhangzhou prefecture in Fujian. Raoping Taiwanese Hakka in all probability is also a separate language. Raoping speakers came from Chaozhou Prefecture, specifically the Raoping and Huilai Counties in Guangdong.

Tingzhou Taiwanese Hakka is extremely different and is surely a separate language. Tingzhou comes from the Changting, Ninghua, Qingliu, Guihua, and Liancheng Counties of Tingzhou prefecture. Tingzhou and Zhao’an are the two most divergent Hakka varieties on Taiwan. Tingzhou is hardly spoken anymore and may be extinct on Taiwan.

Fengshun Taiwanese Hakka is also spoken in Taiwan, but it may be a dialect of Dapu. Fengshun came from Fengshun and Jieyang Counties in Guangdong. Fengshun still has a few speakers left on Taiwan.

Two other lects, Yongding Taiwanese Hakka and are said to be extinct on Taiwan, though each still has a few speakers. Yongding is surely a separate language, but Yongding speakers came from Yongding, Shanghang and Wuping Counties of Tingzhou prefecture of Fujian near Zhao’an.

Western Fujian Taiwanese Hakka, Zhangzhou Taiwanese Hakka, and Sixhai Taiwanese Hakka were all formerly spoken on Taiwan but have all gone extinct. No doubt all three were separate languages.

In general, speakers of other kinds of Hakka find Taiwanese Hakka to be hard to understand, possibly due to Southern Min influence. Hakka speakers make up only 5% of the population of Taiwan. Almost all are proficient in Mandarin or Hokkien, and there are few monolinguals left.

The Hakka spoken in Kunming, Sarawak, in Malaysia is known as Ho Po Hak Hakka. It is similar to Hopo Hakka, spoken in Hopo, near Meizhou.

Although Ho Po Hak speakers make up 70% of the Sarawak Hakka population, there are also speakers of Dapu, Fengshun, Huizhou, Bao’an, Dongguan, Lufeng, Wuhua, Meixian and Yongding on Sarawak. These speakers probably cannot be classed as Ho Po Hak. Intelligibility between these forms of Sarawak Hakka, Ho Po Hak and the Hakkas they are derived from is not known. Ho Po Hak is very different from the Hakka spoken in Sabah, Malaysia.

Hakka speakers make up the majority (57%) of the Chinese in Sabah where Sabah Hakka is spoken. Many arrived in the 1860’s fleeing the massacres perpetrated by the Manchus following the failed Taiping Rebellion. This group settled in Sandakan.

Others were brought from Longchuan County, Guangdong to Kudat in 1882 as laborers by the North Borneo Chartered Company. Sabah Hakka is identical to Huiyang/Fuiyong Hakka spoken in the Huiyang District of the city of Huizhou, near Shenzhen in Guangdong. Huizhou Hakka has heavy Cantonese influence. Most people in Huizhou are Hakka speakers. The main Hakka centers in Sabah are the cities of Sandakan, Kudat, Kota Kinabalu, and Tawau.

Dapu is still spoken in Malaysia and Singapore. Kuala Lumpur Dapu Hakka is very different from the Dapu spoken in China. It is now heavily creolized with Malay. It is quite probably a separate language. It is heavily spoken in the Serdang and Ampang regions of the capital.

There are also some Hakka speakers around Ipoh. It is not known what type of Hakka they speak.

In the 1800’s, there were Hakkas speaking Jiaying Hakka (Jieyang Hakka was the old name for Meixian), Yongding, Fengshun, and Jengcheng Hakka from Guangdong in Singapore, Penang, Malacca and Tel Anson on the Malay Peninsula. Whether they are still present is not known. Meixian speakers were known from Singapore as recently as 1950. A type of Huiyang is still spoken in Penang as Penang Hakka.

Bangka Island Indonesian Hakka, spoken on Bangka Island in Indonesia, has diverged so radically with its tones that it is now a separate language. That is, speakers of other Indonesian Hakka varieties say that they cannot understand Bangka Island speakers. It’s a Hakka creole more than anything else.

In Indonesia, two other major Hakka varieties are spoken, Kun Dian Indonesian Hakka, spoken in Borneo, and Belitung (Ngion Voi) Indonesian Hakka, spoken mostly on Sumatra and Borneo.

Kun Dian is the largest Hakka group in Indonesia. Most live at Pontianak and Singkawang, where they speak two different mutually intelligible lects, but they have spread all over Indonesia. Kun Dian is also spoken in Jakarta, Medan and Surabaya. Kun Dian has 80% intelligibility of Sabah (Longchuan) and Hong Kong. Kun Dian is also similar to Hopo.

Belitung is spoken mostly on Sumatra and Borneo and is characterized by a soft way of speaking. Belitung speakers mostly derived from Meixian speakers.

Belitung and Bangka Island say they cannot understand Kun Dian, but Kun Dian speakers say they can understand the other two for the most part.

Most old people in Belitung and Singkawang are Hakka monolinguals who cannot speak Bahasa Indonesia at all. These elderly speakers have to bring interpreters with them when they go to the doctor.

A type of Meixian is spoken in East Timor as East Timor Hakka.

Although some Indonesian Hakka speakers speak a very pure Hakka similar to the Huizhou spoken on the mainland, these are mostly the oldest generation. The younger generations speak a language that is very heavily adulterated with Indonesian languages.

Wuhua, Meixian, and Dabu are members of the Xinghua subgroup of Yuetai Group of Hakka, which which has five lects. Xinghua Hakka has 3.4 million speakers (Olson 1998).

Bao’an, Lufeng, Haifeng, and Hailufeng are in the Xinhui subgroup of Yuetai Hakka, which has nine lects. Xinhui Hakka has 2.4 million speakers (Olson 1998).

The Yuetai Group of Hakka has 23 lects.

Gaoxiong, Xinzhu, Dongshi, Jiaying, and Miaoli are members of the Jiaying Group of Hakka, which has seven lects.

Tingzhou, Yongding, Liancheng, Changting, Xinquan, Basel Mission, Wuping, Ninghua, Qingliu, and Mingxi are all part of the diverse Tingzhou Group of Hakka. All told, Tingzhou Hakka has 10 lects, most of which are separate languages.

Longchuan, Boluo, and Heyuan are members of the Yuezhong Group of Hakka, which has five lects.

Huizhou in its own subgroup of Hakka.

Xingguo and Ningdu are in the Ninglong or Gannan Group of Hakka, which has 13 lects. There may be as many as 13 different languages in this group.

Dayu is a member of the Yugui Group of Hakka, which has 43 lects.

Ho Po Hak, Bangka Island, Nanjing Qujiang, Jiexi, Hong Kong, Mengshan Xihe, Zhaoan Xiuzhuan, Fuan, Fuding, and Haifeng are unclassified.

There are 12 major Hakka varieties and 210 Hakka varieties altogether. Others claim that there are over 1,000 Hakka varieties spoken in China. There are 30 million speakers of the various Hakka languages.


Xiang is already recognized as a separate language.

Shuangfeng Xiang and Changsha Xiang are separate languages, having only 47% intelligibility (Cheng 1997).

In fact, Changsha itself is divided into multiple languages in the city itself. We do not know how many there are, but we know that they exist. For the moment, we shall just add one variety to Changsha, and divide it into Changsha City Xiang A and Changsha City Xiang B, but there may be more. Furthermore, there are significant differences within the Changsha spoken in Changsha City and in the surrounding countryside.

Shuangfeng is also very different within itself, as the vocabulary changes every 10 miles or so. Intelligibility data is lacking.

Lingshuijiang Xiang, also spoken in Hunan by 300,000 people, may well be a separate language.

Shuangfeng and Lingshuijiang are both part of the Luoshao group of Xiang. Shuihui Xiang and Suantang Xiang are also part of this group, however, Shuihui is so different that it is recommended to split it from Luoshao into its own group with Suantang Xiang. Suantang itself is very different. It has Southwest Mandarin and Xiang elements along with Hmong and Dong influences.

Suantang is so different that it is controversial whether it was Southwestern Mandarin or Xiang, but the best analysis seems to be that it is a Xiang variety. Clearly Shuihui Xiang and Suantang Xiang are separate languages.

Mao Zedong spoke Xiangtan Xiang, a notoriously difficult Xiang language in Hunan, about which it was said, “No one can understand it.” Xiangtan itself is internally diverse, with differences between the dialects of the city and rural areas, but intelligibility data is lacking.

Shaoshan Xiang and Lianyuan Xiang are both spoken near Xiangtan, and both are surely separate languages. There are a number of dialects within each of these languages.

Ningxiang Xiang is said to be very different from Changsha. Given the dramatic divergence present even as background in Xiang, this must mean that Ningxiang is at least not intelligible with Changsha.

Ningxiang County is split into two separate dialects, North Ningxiang Xiang and South Ningxiang Xiang. The differences between the two are great. Upper Ningxiang Xiang looks more like a Lianyuan dialect, and Lower Ningxiang Xiang looks more like a Changsha dialect.

Beyond that, Ningxiang is split into four major divisions – Chengguan Xiang, Shuangjiangkou Xiang, Huaminglou Xiang, and Liushahe Xiang. Surely each is a separate language.

Baishi Xiang, spoken near Xiangtan, is very different.

Liling Xiang is also spoken around Xiangtan and must be a separate language.

Hengyang Xiang is apparently a separate language, as is Jishou Xiang. There is significant dialectal diversity in Hengyang Xiang, but intelligibility data is lacking.

Shaodong Xiang is spoken in Shaodong County which borders Hengyang. There are transitional dialects between the two languages on the border of the two counties.

Liuyang Xiang is a separate Xiang language, actually a macrolanguage, spoken in Liuyang county-level city in Changsha prefecture east of Changsha City near the Jiangxi border in Hunan. Liuyang is split into five divisions – North Liuyang Xiang, South Liuyang Xiang, West Liuyang Xiang, East Liuyang Xiang, and Liuyang City Xiang.

South Liuyang Xiang and East Liuyang Xiang are separate languages, mutually unintelligible with the others. Liuyang City Xiang has recently arisen as a sort of a Liuyang koine that is understandable to speakers of all Liuyang lects. None of the three Liuyang languages is intelligible with Changsha. On closer observation, none of the Liuyang varieties are intelligible with each other. Therefore, North Liuyang Xiang and West Liuyang Xiang are separate languages also.

Even within this classification, each of the five Liuyang Xiang varieties has multiple dialects. Each village is said to have its own variety in Liuyang Xiang.

Henghshan Xiang is a macrolanguage with vast dialectal divergence divided by Mount Hengshan.

There are two Hengshan varieties on either side of the mountain – Qianshan Xiang in the southeast and Houshan Xiang in the northwest – that are very different and must be separate languages.

Jiashanqiang Xiang is a transitional area in the center containing features of both languages. There are 354 villages in the Hengshan Mountain area.

Huayuan Xiang appears to be a separate language.

In the city of Yiyang, Henan Province, three Chinese varieties are spoken. One is a Yiyang Changyi Xiang variety, another is a Yiyang Luoshao Xiang variety, and a third is Luoyang Southwest Mandarin, a dialect of Henan Mandarin, described above. All appear to be separate languages.

We will call the two Xiang varieties Yiyang Changyi Xiang and Yiyang Luoshao Xiang.

Huangxu Xiang, a Xiang dialect island in the Southwestern Mandarin-speaking city of Deyang in Sichuan, is very different from the rest of Xiang and must surely be a separate language.

Quanzhou Xiang in Guangxi is another Xiang dialect island. It has extreme differences with Hunan dialects like Shuangfeng.

According to good sources, there is a tremendous amount of variety diversity in Western Hunan, most of it probably involves Xiang lects, while most or all of these varieties are not mutually intelligible. But until we get more data, we cannot carve any languages out of this mess yet.

Shuangfeng, Shuihui, Suantang and Lingshuijiang are members of the Luoshao Group of Xiang, which has 21 lects.

Changsha City A, Changsha City B, Changsha Rural, Hengyang, Shaodong, Xiangtan, Shaoshan, Baishi, Liling, Lianyuan, Qianshan, Houshan, Jiashanqiang, Ningxiang, Chengguan, Shuangjiangkou, Huaminglou, Liushahe, North Liuyang, South Liuyang, East Liuyang, West Liuyang, and Liuyang City are members of the Changyi Group of Xiang, which has 32 lects.

Jishou and Huayuan are members of the Jixu Group of Xiang, which has eight lects.

Xiang is composed of 74 lects. Many or possibly all of them are separate languages. The various languages of Xiang have 50 million speakers (Olson 1998).


Wu is a major group of diverse Chinese languages that is often divided into Northern Wu and Southern Wu. Southern Wu has 18 million speakers. My opinion is that in general, the Wu varieties are mostly separate languages; however, some are merely dialects of other Wu lects.

A good general rule for Zhejiang Wu varieties is that you can sort of understand the variety of next city over, but the language of two cities away is incomprehensible. For instance, in the Taizhou Prefecture region, there are between four and five mutually unintelligible Wu varieties across a 12 mile area. In Zhejiang, the mountains go all the way down to the sea, so there are few flat areas where language can spread out and become mutually comprehensible.

Huzhou Wu, Jiaxing Wu, and Kunshan Wu are separate languages.

Although the Suzhou City administrative area is large, Suzhou Wu language is spoken only in the city proper and its suburbs. Suzhou City dwellers say that people in the suburbs have a rural or “hard” accent, while the speech of Suzhou City is called “soft.” Suzhou is presently divided into two sets of speakers, one over 50 and another under 50. Differences between age groups in Suzhou were noted as early as the 1930’s. Suzhou Wu is still very widely spoken in the area.

Suzhou is 70% similar to Shanghaihua. That is not enough for full intelligibility. Shanghaiese find Suzhou to be incomprehensible. The differences between Suzhou and Shanghainese are much greater than between suburban Shanghai languages. A Shanghainese speaker would need a few months in Suzhou to learn Suzhou. This is about the same as the difference between Castilian-Catalan and Castilian-Asturian.

Suzhou is more complex phonologically and tone-wise than Shanghainese, so it is harder to learn. Even native Suzhou speakers have problems with the tones sometimes. Further, tone sandhi in Suzhou is quite complex.

Zhangjiagang Wu may be intelligible with Suzhou, but data is lacking. Suzhou is only 43% intelligible with Wenzhou (Cheng 1997). None of these varieties is intelligible with Shanghainese.

Wuxi Wu is spoken in the city of Wuxi. Wuxi is spoken in two areas, referred to as East and West Mountain. East Mountain refers to the city of Dongshan, and West Mountain refers to the city Wuxi. Wuxi is not intelligible with Changzhou or Suzhou. Wuxi is only 20% similar to Shanghainese. Wuxi can understand Shanghainese, but that is no doubt due to bilingual learning. Shanghainese do not understand Wuxi well.

Changzhou Wu is not intelligible with Shanghainese, Wuxi or Suzhou. Changzhou and Wuxi have high but not full intelligibility. Changzhou and Wuxi are part of a dialect chain in which eastern Changzhou speakers can communicate with eastern Wuxi speakers, but as one moves further west into Wuxi or east into Changzhou, intelligibility drops off. It is best then to split Wuxi and Changzhou into separate languages.

Changzhou itself has considerable dialectal divergence, though apparently all dialects are mutually intelligible.

Changzhou is the most orthodox Taihu language. It has eight tones and compared to Suzhou, it is many more sounds and a lot more traditional vocabulary.

Changzhou has 3 million speakers.

Ningbo Wu is close to Shanghainese, and Ningbo speakers can learn Shanghainese in ~two months. This is because many Ningbo speakers moved to Shanghai in the past 100 years and Ningbo became a prestige language in Shanghai in the first part of the 20th Century, so Shanghainese has a lot of Ningbo influence in it.

Many of the local Wu varieties around Shanghainese Wu say that they can understand Shanghaiese well but not the other way around.

The reason for this is complex. About 100 years ago, Suzhou became a very prestigious language in Shanghai and was widely spoken there. However, in the past century, many immigrants came to Shanghai from other parts of China. In particular, many speakers of Ningbo came to Shanghai. Ningbo is quite a bit different from either Shanghaiese or Suzhou.

With speakers of Ningbo, Suzhou and Shanghaiese all present in the city in large numbers, a koine needed to develop. Shanghainese was chosen as the koine and because speakers of three different languages were communicating, Shanghainese got dramatically simplified phonologically in order for it to be better understood by everyone.

Hence, Shanghainese has evolved in a highly simplified form of Taihu. This is why many speakers of nearby Wu languages say that they can understand Shanghainese but not the other way around.

Several varieties are spoken in the suburbs of Shanghai. Reports vary, but Shanghai residents generally report that these varieties are not mutually intelligible with Shanghainese (Gilliland 2006).

Some of these languages are Baoshan Wu, Fengxian Wu, Nanhui Wu, Jiading Wu, Jinshan Wu, Pudong or Chuanshan Wu, Songjiang Wu, and Qingpu Wu.

Pudong Wu, the older form of the Shanghai language, is still spoken in the Pudong District of the city, but it is dying out. There is a question of  whether or not it is mutually intelligible with Shanghainese, but Shanghainese speakers seem to feel it is not mutually intelligible (Gilliland 2006).

These Shanghai suburbs varieties above are probably not fully mutually intelligible. For instance, Fengxian is not fully intelligible with Jiading. Intelligibility between the two may be ~70%, but it only takes a few weeks’ exposure for a Fengxian speaker to learn Jiading Wu.

Qidong Wu, spoken in the city of Qidong, is a separate language. Qidong is said to be very close to Chongming Wu, so for the time being, we will list Chongming as a dialect of Qidong. Chongming, spoken on Chongming Island in suburbs of Shanghai, is not intelligible with Shanghainese.

These varieties spoken in the suburbs of Shanghai are closer to the Old Shanghainese, which is quite a bit different from the New Shanghainese spoken in the city center nowadays.

Changyinsha Wu is very similar to Chongming and Qidong, so it is probably a dialect of Qidong also. Another name for Qidong is Qihai, which refers to the speech of Qidong, Haimen and Tongzhou. For the time being, we will list Changyinsha and Chongming as dialects of Qidong. Chongming, and hence Qidong, are not intelligible with Shanghainese.

Nanjing Wu is a separate language. It is close to Shanghainese Wu but is not fully intelligible with it.

However, there are two varieties spoken in Haimen, and they are not mutually intelligible. Haimen Wu A and Haimen Wu B are then two separate languages.

Wuhu Wu is a separate language, unintelligible with Shanghaihua.

Hangzhou Wu is reportedly much different from the varieties of Shanghainese, Ningbo, etc. to the northeast and is not intelligible with Shanghainese, nor with Suzhou. Hangzhou has 1.2 million speakers. Nevertheless, Hangzhou appears to be dying out in Hangzhou City, as only older people seem to speak the language anymore. Hangzhou is 40% similar to Shanghainese.

Yixing Wu, near Changzhou, is not intelligible with Shanghainese.

Tongxiang Wu also appears to be a separate language, as does Yuyao Wu and Zhoushan Wu.

Lvsi, Qisi or Tongdong Wu, spoken in the nearby town of Qisi, is a separate language from Qidong.

Jiangyin Wu is spoken in Jiangyin city. It is related to Changzhou and has high intelligibility with Changzhou and Wuxi. It has some definite differences with Suzhou. Nevertheless it appears to be a separate language because it cannot be understood outside the city. Many older people still speak only Jiangyin.

Jinxiang Wu also has its own Wu variety with Mandarin influences. This is a Taihu (Northern Wu) outlier spoken far to the south of the Taihu region.

Wenzhou Wu or Oujiang Wu is a macrolanguage, as it is made up of at least 14 separate languages. It is not understood outside of Wenzhou and it is not even intelligible within itself.

The standard version is spoken in Lucheng District by 1 million people and can be referred to as Lucheng Wu. Ouhai Wu, Yongjia Wu and Ruian Wu are said to be to be dialects of Wenzhou Wu, but Ouhai, spoken in the Ouhai District, is not intelligible with Ruian. Ruian is spoken by 1 million people in the city of Ru’ian, and is related to Pingyang Wu spoken in Pingcheng County.

Yongjia, spoken in Yongjia County, is separate too, since if you go five miles in any direction in Wenzhou, there’s a new dialect, and it’s hard to understand people.

Northern Yueqing Wu is a separate language within Wenzhou. They are separated from the rest of the Yuequing city by Yangdang Mountain. Wenzhou is 43% intelligible with Suzhou. Indeed, Wenzhou, instead of being a single language, is instead of family of partially mutually unintelligible lects. See more evidence for that here.

Wujiang Wu is a separate language within Wenzhou that has come under serious influence of Luoyang Wu.

Wenxi Wu is a separate language within Wenzhou. It is spoken in one town in Qingtian County.

Wencheng Wu, spoken in Wengcheng County, is a separate language within Wenzhou.

Chu River Wu is a closely related separate language from Wencheng spoken in Luoyang County in Zhejiang.

Since there are 11 different cities and counties in Wenzhou, and the language changes every five miles or so, it would be logical to assume that there are 11 separate languages within Wenzhou. However, closer analysis reveals at least 14 languages within Wenzhou.

So we should then split off at least one Wenzhou language for each major division. This gives us Cangnan Wu spoken in that county and Longwan Wu and Dongtu Wu spoken in those two districts. Although aberrant Wu varieties probably not a part of Wenzhou are spoken in Taishun and Cangnan, varieties of Wenzhou are also spoken there, so it makes sense to split those two off.

In addition, in Taishun County, there is an aberrant Wu variety spoken in the town of Luoyang influenced by both Manjiang Eastern Min and Oujiang Wu. We can call this Luoyang Wu. This is best seen as the southern extension of Yesou Wu. Liqu Wu is another Luoyang variety spoken in the area.

There is another Wu variety similar to Manjiang Eastern Min spoken in the town of Hedi in Qingyuan County in Lishui. We will call this Hedi Wu. In all probability, it is a separate language.

Manhua Wu, a macrolanguage, is quite different. It is spoken around Cangnan and Wuzhou City in Northern Zhejiang on the southern coast of Wuzhou City in about five townships. The word man literally means “barbarians.”

There is a controversy over whether or not Manhua is Macro-Min or Macro-Wu. It is probably Macro-Wu based on phonology, and it also shares some similar Min-like traits with other Wu varieties such as those in the Chuqu group. Some think it originated in a Southern Min variety that came under the influence of a non-Sinitic language. Word order is completely different from Chinese word order. However, the word order is changing under the influence of Mandarin, and many younger people are using a more Mandarin word order.

Some theories think it has Proto-Vietnamese, Austronesian, and She influences. The major components seem to be Old Cantonese, Old Chinese, and Mandarin. Some also suggest Northern Min, Eastern Min, Southern Min and especially Wu influences. It has 200,000-400,000 speakers.

Within Manhua Wu, there is a northern group spoken in the town of Yishan and a southern group spoken in the towns of Qianku, Qianku Manhua Wu, and Jinxiang, Jinxiang Manhua Wu. Qianku Manhua Wu is the standard for Manhua Wu. Although the internal differences in Manhua Wu are not great, Jinxiang Manhua Wu and Qianku Manhua Wu are not mutually intelligible. It is also very heavily spoken in the city of Lengkang.

All of the above are in the Taihu Group of Wu.

Taizhou Wu is a major split in Wu. It is centered around the city of Taizhou in Eastern Zhejiang, is composed of many known separate lects, all of which are separate languages, including Huangyan Wu, Jiaojiang Wu, Linhai Wu, Sanmen Wu, Tiantai Wu, Wenling Wu, Xianju Wu, Luquiao Wu, Ninghai Wu, Xiaoshan Wu, and Yuhuan Wu.

All in all, there are said to be 4-5 mutually unintelligible Wu varieties spoken in Taizhou City’s metropolitan area alone. Therefore, we will list Taizhou Wu A, Taizhou Wu B, Taizhou Wu C, Taizhou Wu D, and Taizhou E. This is a region that is only 12 miles across.

Jiaojiang Wu and Huangyan Wu cannot understand Linhai Wu. The area has split into so many mutually unintelligible languages mostly due to terrain.

For instance, Taizhou and Huangyan are only a 10 minute bus ride away from each other, but the highway was only built recently, and there is a huge mountain in between both cities. Taizhou and Jiaojiang are only another 10 minute bus ride apart, but there is a huge river separating them and it could be crossed only by boat until a ferry was built in the 1990’s.

Linhai is only 20 minutes away from Taizhou now that a new expressway was recently built that involved blasting through a few mountains that previously had separated the cities.

There are two groups of Southern Wu which are both highly divergent and have very low mutual intelligibility internally. These groups are Wuzhou Wu and Chuqu Wu.

Wuzhou Wu is another major split in Wu.

Wuzhou Wu consists of at least 30 languages: Jinhua Wu, Jinhua Xiaohuang Wu, Tangxi Wu, Lanxi Wu, Pujiang Wu, Yiwus A-R, Dongyang Wu, Pan’an Wu, Yongkang Wu, Wuyi Wu, Quzhou Wu, Longyou Wu and Jinyun Wu.

It is also highly divergent, much more so than even Taihu Wu. A single subgroup of Wuzhou Wu, Yiwu Wu – contains 18 separate languages, all mutually unintelligible. We will call them Yiwu Wu A, Yiwu Wu B, Yiwu Wu C, Yiwu Wu D, Yiwu Wu E, Yiwu Wu F, Yiwu Wu G, Yiwu Wu H, Yiwu Wu I, Yiwu Wu J, Yiwu Wu K, Yiwu Wu L, Yiwu Wu M, Yiwu Wu N, Yiwu Wu O, Yiwu Wu P, Yiwu Wu Q and Yiwu Wu R for the time being.

Lanxi Wu has 660,000 speakers (Rickard 2006).

Chuqu Wu is split into two subgroups, Chuzhou Wu and Longqu Wu. It contains contains at least 22 languages. Some members of this group extend south beyond Zhejiang into Northeastern Jiangxi and Northern Fujian. We are going to cautiously classify almost of Chuqu Wu as separate languages, since it is much more divergent and much less mutually intelligible than Taihu Wu, and Taihu Wu itself has low internal intelligibility.

Chuzhou Wu consists of Qingyuan Wu, Jingning Wu, Jinyun Wu , Lishui Wu, and Taishun Wu, all separate languages.

Longqu Wu consists of Pucheng Wu, Shangrao City Wu, Shangrao County Wu, Guangfeng Wu, Yushan Wu, Kaihua Wu, Changshan Wu, Jiangshan Wu, Suichang Wu, Songyang Wu, Xuanping Wu, Qingtian Wu, Yunhe Wu, Longyou, Quzhou and Longquan Wu, all separate languages.

Pucheng Wu has two dialects, Nampo Wu and North Dabei Wu. Intelligibility data is not known. Pucheng Wu is so diverse that some say it is a language isolate and is not even a part of Wu (Norman 1988).

Taihu Wu contains seven subgroups.

Jiaxing, Shanghainese, Baoshan, Fengxian, Nanhui, Jiading, Jinsha, Qingpu, Pudong, Suzhou, Wuxi, Songjiang, Tongxiang, Qidong, Chongming, Changyinsha, Lvsi, Yunhe, Kunshan, and 11 others are all in the Hujia Group of Taihu Wu. Hujia Wu contains 32 lects, most of which are separate languages.

Changzhou, Yixing, Jiangyin, the Haimens, and seven others are in the Piling Group of Taihu Wu, which has 12 lects. Piling Wu has 8 million speakers.

Wenzhou, Ouhai, Yongjia, Ruian, Wencheng, and seven others are in the Oujiang Group of Taihu Wu, which contains 14 separate languages.

Hangzhou has its own group, the Hangzhou Group of Taihu Wu.

Shaoxing, Fuyang, Xiaoshan, Linan, Yuyao, Zhuji, and six others are in the Linshao Group of Taihu Wu, which contains 12 lects.

Fenghua, Zhoushan, and nine others are in the Yongjiang Group of Taihu Wu. Yongjiang Wu contains 11 lects and has 4 million speakers (Olson 1998).

Changxing and four others are in the Taioxi Group of Taihu Wu, which has five lects.

Taihu Wu is composed of 85 separate lects, most of which are separate languages. Taihu Wu has 47 million speakers.

The Taizhous, Huangyan, Jiaojiang, Sanmen, Tiantai, Wenling, Xianju, Leping, and Yuhuan are members of the Taizhou Group of Wu, which has 13 lects, all separate languages.

The Yiwus, Dongyang, Jinhua, Jinhua Xiaohuang, Lanxi, Tangxi, Wuyi, Pan’an, Pujiang, and Yongkang are all members of the Wuzhou Group of Wu, which contains 27 lects, almost all of which are separate languages. Wuzhou Wu has 4 million speakers (Olson 1998).

Chuqu Wu has two subgroups, Chuzhou Wu and Longqu Wu.

Lishui, Qingyuan, Jingning, Jinyun, and Taishun, and four others are in the Chuzhou group of Chuqu Wu, which contains nine languages. Chuzhou Wu has 1.5 million speakers.

Pucheng, Shangrao County, Shangrao City, Jiangshan, Songyang, Guangfeng, Longquan, Kaihua, Changshan, Suichang, Longyou, Yushan, and Quzhou and one other are members of the Longqu Group of Chuqu Wu, which has 14 languages and 5 million speakers (Olson 1998).

Chuqu Wu contains 24 separate lects, almost all separate languages.

Nanjing Wu is unclassified.

There are at least 216 varieties within Wu. Some say that there are hundreds of mutually unintelligible languages inside of Wu alone.

The various Wu varieties have 85 million speakers (Olson 1998).


Hui or Huizhou is a major group of many different languages with wide internal variation. There is a possibility that all Hui varieties are separate languages. Hui is spoken in the historical area of Huizhou, located mostly in Southern Anhui but also partly in Zhejiang and Jiangxi. The area is very mountainous, leading to strong differentiation among the lects. Every county in the area has its own Hui version unintelligible to outsiders.

Xidi Hui, spoken in a village at the foot of Huangshan Mountain in Anhui, is a separate language. Xidi is unintelligible even to villages a few miles away.

Tunxi Hui, Wuyuan Hui and Xiuning Hui are separate languages. The first is spoken in Anhui, but Wuyuan and Xiuning are spoken in Jiangxi Province.

Within the Jingzhan Group of Hui, Jingde Hui, Ningguo Hui, Chilingkou Hui, (spoken in Chiling, Qimen County), Meixi Xiang Hui, and Shitai Hui are separate languages.

Within Qimen County itself, there are six different Hui lects with low intelligibility between them. It is quite possible that we are talking about six different languages here. One of them appears to be Chilingkou above. The others we will just call: Qimen Hui A, Qimen Hui B, Qimen Hui C, Qimen Hui D and Qimen Hui F.

All except Meixi Xiang Hui are spoken in Anhui Province. Meixi Xiang Hui is spoken in Meixi, Jiangxi.

Jixi Hui and Hongmen Hui are separate languages.

Within the Shexian Group of Hui, there are two different languages that we will only call Shexian Hui A and Shexian Hui B for now. Jixi and the Shexian languages are spoken in Anhui.

Dexing Hui and Dongzhi Hui are separate languages, the first spoken in Jiangxi and the second in Anhui.

In the Yangzhou Group of Hui, Jiande Hui and Chunan Hui are separate languages. Chunan is spoken in Jiangxi. There are two other varieties in the group, Suian Hui and Shouchang Hui. Suian and Chunan are very diverse and are in all probability separate languages. Shouchang is also extremely diverse, and Jiande has some differences with Shouchang.

The Yangzhou languages are interesting because there is controversy whether they are Wu or Hui languages. Careful examination reveals that they cannot be subsumed under Southern Wu due to their great divergence from it, despite having some similarities with Wu. Some authors feel that they are Hui-Wu merged lects, and their similarity with both is given as a reason for merging Wu and Hui into a supergroup.

While it is best to classify them as Hui, they are much different from most Hui lects. All are spoken in western Zhejiang. Discussion here.

Jiande, Chuan, Suian and Shouchang are members of the Yangzhou Group of Hui. Yangzhou Hui has four lects, all separate languages.

Huangshan, Tunxi, Wuyuan, Xiuning, and two others are members of the Xiuyi Group of Hui, which has six lects.

Meixi Xiang, the Qimens, Chilingkou, Jingde, Ningguo, Shitai, and two others are members of the Jingzhan Group of Hui. Jingzhan Hui has 12 lects.

Jixi, Huizhou, Hongmen, the Shexians, and She are members of the Jishe Group of Hui. Jishe Hui has six lects, all separate languages.

Dexing, Dongzhi, Fuliang, and two others are members of the Qide Group of Hui. Qide Hui has five lects.

Xidi is unclassified.

There are 37 different Hui lects, at least 24 of which are separate languages. The various Hui languages have 3.2 million speakers.


Cantonese is a major language group spoken in the south of China. Cantonese speakers are said to be a mix between the Yue people and the Han. They have great pride in their speech which is closer to ancient Chinese than Mandarin.

Some Cantonese activists denounce Mandarin as a pidgin language spoken by Manchu and Mongol invaders glommed onto the Chinese of the people they conquered.

Various attempts are utilized to determine intelligibility between lects. They vary in efficacy, as the following shows.

Attempts to determine intelligibility through the use of complex lexical, tonal, grammatical and phonological formulae produce results that are excessively high in terms of percentage of intelligibility.

A better method is presented in Szeto 2000, in which sentences in other varieties, say Varieties B and C, are played to speakers of Variety A, and speakers of Variety A are asked to give the basic meaning of the Variety B and C sentences played to them. A sentence is recorded as correct if the basic meaning was ascertained.

By this better method, Standard Cantonese has only 31.3% intelligibility of Siyi, 7.2% of Hakka, 2.7% of Teochew and 2.5% of Xiamen (Szeto 2000). This paper also highlights the very important role morphological and syntactic differences play in intelligibility, even apart from phonology and other factors.

In contrast, the more complex method through the use of complex lexical, tonal, grammatical and phonological formulae not relying on actual informants gives false positives. By this method, Cantonese has 54.7% intelligibility of Hakka, 47.45% of Teochew, and 43.5% of Hokkien. This method falsely overestimates the intelligibility of Hakka by 7.6X, of Teochew by 16.1X and of Hokkien by 19X.

Standard Cantonese is traditionally said to have nine tones, but phonemically there are only six tones, since the last three are just three of the first six with a voiceless stop consonant on the end.

These are often called entering tones in traditional Chinese scholarship. Entering tones disappeared from most Mandarin varieties about 800 years ago due to the influence of invading Mongols speaking Turkic languages but are still present in Cantonese, Hakka and Min.

The original entering tones of Middle Chinese have merged into other tones or into Mandarin’s four tones. Traditional Chinese tones or contour tones end in a vowel or a nasal. However, in Standard Cantonese, the entering tone has retained its original short and sharp character from Middle Chinese, so in a sense, it has a different sound quality.

One of the most well-known divisions in Cantonese is Yuehai. Yuehai contains four divisions: Guangfu, Sanyi, Zhongshan, and Guangbao.

The other major divisions of Cantonese are Goulou and Yongshun, found in the watershed of the Pearl River, and Siyi, Gaoyang, Wuhua and Qinlian.

The Guangfu division of Yuehai consists of Guangzhou Cantonese, Xiguan Guangzhou Cantonese, Sabah, Hong Kong Cantonese, Macao Cantonese, Wenzhou Cantonese, Wuzhou Cantonese, Huizhou Cantonese, Nishimura Cantonese, Dongshan Cantonese and Xiguan Cantonese.

Standard or Guangzhou Cantonese is based on the Guangzhou dialect spoken in the city of that name.

A very pure form of Cantonese is spoken in Sabah in Malaysia as Sabah Cantonese. It resembles Standard Cantonese so much that the speaker community is called Little Hong Kong.

Hong Kong Cantonese is spoken in Hong Kong. There are a few differences with Guangzhou but not enough to impair communication.

Macao Cantonese is spoken in Macao.

Xiguan Cantonese is spoken in the suburban areas of Guangzhou. It has a few differences with Guangzhou but presumably not enough to impair communication. It spoken mostly by the older people now, as young people now speak Xiguan Guangzhou Cantonese, which is more properly part of Guangzhou. The dialect is dying out.

Dialects spoken in Guangzhou City include Nishimura Cantonese, Dongshan Cantonese, and some others. Dongshun is spoken in the downtown area. Nishimura is spoken by a few old people in the Nishimura zone, but it is going extinct.

Wenzhou Cantonese is very close to Guangzhou.

Huizhou Cantonese is a Cantonese variety spoken in Huizhou City to the east of Guangzhou to the northeast of Dongguan and to the west of Shanwei. This is part of the Pear River Delta. Huizhou has very heavy Hakka influence such that it is probably a separate language.

Vietnamese Cantonese is quite different from Standard Cantonese, but it is said to be nevertheless intelligible with it. However, other Standard Cantonese speakers say they cannot understand Vietnamese Cantonese very well.

Malayland Cantonese is also quite different from Standard Cantonese. Cantonese speakers who talk to Malayland speakers say that Malayland sounds like a foreign language. Therefore, Malayland appears to be a separate language. Malayland is mostly spoken in Kuala Lumpur and Ipoh, less so in Singapore. There are dialects inside of Malay such as Kuala Lumpur Cantonese and Ipoh Cantonese.

Cantonese is the most commonly spoken Chinese language around Kuala Lumpur. Although Singapore South Malayland Hokkien is the most widely popular non-Mandarin Chinese language in Singapore, Cantonese is the most commonly spoken language in Chinatown.

The Sanyi Group of Cantonese consists of Shunde Cantonese, Panyu Cantonese, Nanhai Cantonese, Xiquiao Cantonese, Foshan Cantonese, Shiwan Cantonese, Shatin Cantonese, and Jiujiang Cantonese.

Around Foshan, Xiquiao Cantonese, Jiujiyang Cantonese, Shiwan Cantonese, and Nanhai Cantonese are all spoken.

Foshan and Nanhai are close to Standard Cantonese and may be intelligible with it. Nanhai and Shunde Cantonese are mutually intelligible. Foshan, Xiquiao, and Jiujiyang are quite similar to Shunde.

Panyu Cantonese is definitely a separate language (Chan 1981). Panyu Cantonese is spoken in Xiaolan and Huangpu in the Zhongshan area.

Shunde Cantonese is almost the same language as Panyu, so if Panyu is a separate language, then Shunde is also. Shunde and Panyu may well be a single language, and if Nanhai is intelligible with Shunde, then Nanhai is also a part of this language. Shunde is spoken in Daliang, Longjiang, Ronggui and Beijiao.

There is at least one separate language inside of Sunde centered around Shunde, Panyu, and Nanhai, all of which are known as the Three Counties Area.

The Zhongshan Group of Cantonese spoken in Guangxi, composed of Shiqi Cantonese and Sanjiao Cantonese, is a separate language. Speakers of Standard Cantonese cannot necessarily understand Shiqi, but Shiqi people can understand Standard Cantonese. Shiqi is spoken in the urban part of Zhongshan City. Whether Shiqi and Sanjiao Cantonese are mutually intelligible is not known. It is best to call this language Shiqi Cantonese for now.

The Guangbao Group of Cantonese is spoken east of the Pearl River Delta in Shenzen, Dongguan and Hong Kong. Within Guangbao are three major divisions, Dongguan Cantonese, Bao’an Cantonese, and Dapeng Cantonese.

Dongguan Cantonese is not intelligible with Standard Cantonese. It is spoken in Dongguan City. A lot of young people are forgetting how to speak it under the influence of Standard Cantonese.

Dongguan is divided into Guangcheng Cantonese, Houjie Cantonese, and Humen Cantonese. Guangcheng is spoken in the Guangcheng subdistrict. Humen is spoken Humen Township on the east side of the Pearl River. Houjie is spoken in Houjie Township to the north of Humen.

Bao’an Cantonese is divided into Danija Cantonese, Weitou Cantonese, Gashiau Cantonese and Nantou Cantonese.

Danija Cantonese is the Cantonese variety spoken by the Tanka fisherpeople who live on boats off the coast of Guangdong, Guangxi, and Zhejiang. The Tanka People also live in Fujian and Hainan. In Fujian, they speak Fuzhou Northern Min. In Hainan, they speak some form of Hainanese Min.

Another group of Tankas in Hong Kong in Aberdeen and Taio to the north of the Hokkien-speaking area are former Hakka and Hokkien speakers who speak Weitou Cantonese, a Cantonese variety close to Standard and Dongguan but closer to Dongguan. It is not intelligible with Hong Kong Hakka.

Weitou is spoken mostly by older people in Hong Kong’s New Territories in walled villages in Yuen Long, Kam Tin, Songgang, Pinghu, Ping Shan, Shantin, Sheung Shui, Tai Tau Leng, Yan Gang, Fanling, Fanling Po Tsuen, Lam Tsuen, Taipo, and Tam Chung Tsuen, in the Bao’an District, in Shenzen in Shangsha, Xiasha, Huanggang, Xinzhou, Fukuda, Gangxia, and Akao, in the Longgang District, in parts of Nantou, and in the Nanshan District.

Nantou Cantonese is spoken in the Namtam area of Nantou by 5,000 people. Intelligibility with the rest of Bao’an is not known.

In Hong Kong, Gashiau Cantonese is spoken by a group of fisherpeople related to the Tanka. This language is related to Danija/Weitou but is not intelligible with it.

Dapeng Cantonese is spoken on the Dapeng Peninsula in the city of Dapeng, in Hong Kong, and Shenzen, in Tung Ping Chau on the Ping Islands in Hong Kong, and in Tai Kok. It has been very heavily influenced by Hakka. It is so different that it must be a separate language. It may be related to or the same thing as the Junhua or Military Language, a mixed language now classified as Mandarin. If so, it is not Cantonese at all, and instead it is a Mandarin lect. In Hong Kong, Tung Ping Chau Dapeng is highly endangered.

The Siyi or Sze Yup Group of Cantonese is a huge group of Cantonese lects spoken in the Pearl River Delta. Siyi Cantonese is the language of the Four Counties: Enping, Kaiping, Taishan and Xinhui. Enping, Xinhui, and Kaiping. Researchers have found 664 different Cantonese dialects in the Pearl River Delta area alone. 194 of them were quite similar, but another 442 of them were quite different. Since it is mostly Siyi varieties that are spoken in this area, this implies that there may be up 664 different lects in Siyi alone.

Siyi has very low intelligibility with Standard Cantonese, 10-20%.

150 years ago, there were fewer, but still significant differences between Siyi and Sanyi (Standard Cantonese), but Siyi was disparaged as a “hill dialect” of poor farmers, while Sanyi was elevated as the prestige variety of the cultured and cosmopolitan. This is why Sanyi became the Standard Cantonese variety. The Siyi incorporated this negative view into their self-image even to the point where they held overseas meetings meeting in Sanyi.

Taishanese, Hoisonese, Hoisan Cantonese, or Toison Cantonese is spoken north of Macao in Taishan County where there are 20 townships, and there is a different lect in every township. Taishanese is the Standard Siyi dialect. As late as the early 1990’s, children in this area were still being taught in the local Taishanese lect. Taishanese is still widely spoken in Chinatowns in the US such as in San Francisco (especially Stockton Street) and in New York.

The varieties in Taishan County can be quite different. For certain, there are at least three distinct languages within Taishanese besides the standard variety, Taishan Cantonese A, Taishan Cantonese B and Taishan Cantonese C, and these three have a hard time understanding each other.

There are clearly at least 17 dialects within Taishan Proper alone. Each town has its own dialect, and in fact, each village has its own dialect. The main town dialects are Taicheng Cantonese, Dajiang Cantonese, Shuibu Cantonese, Sijiu Cantonese, Baisha Cantonese, Sanhe Cantonese, Chonglou Cantonese, Doushan Cantonese, Duhu Cantonese, Chixi Cantonese, Duanfen Cantonese, Guanghai Cantonese, Haiyan Cantonese, Wencun Cantonese, Shenjing Cantonese, Beidou Cantonese, and Chuandao Cantonese.

Baisha is spoken in Bei Hou.

Speakers of Enping Cantonese, spoken in Enping County, cannot understand some other Siyi lects. Therefore, Enping is a separate language.

Kaiping or Chikan Cantonese, spoken in Kaishan County, is not fully intelligible with Enping until they get used to each others’ sounds. Kaiping is so different from Taishanese that it is hard to imagine how they can communicate well, though there is partial intelligibility. There are many different dialects inside of Kaiping alone, and pronunciation varies almost from neighborhood to neighborhood. One dialect is called Gee Cantonese. However, they seem to be mostly mutually intelligible.

In Xinhui, there is a dialect called Hetang Cantonese that is very divergent and has many strange features not found in other Siyi lects. Doubtless it is less than fully intelligible with other Siyi lects.

Xinhui Cantonese is somewhat different from Taishanese but appears to be intelligible with it.

Heshan Cantonese is intelligible with Xinhui and Taishanese.

Siqian Cantonese, Doumen Cantonese and Jiangmen Cantonese are three other Siyi varieties. Intelligibility data for these three lects is not known.

The Yongxun Group of Cantonese consists of Nanning Cantonese, Yongning Cantonese, Guiping Cantonese, Chongzuo Cantonese, Ningmin Cantonese, Hengxian Cantonese, and Baise Cantonese.

Baise Cantonese must be a separate language. It is spoken in the Yongjiang District in Baise City. It is very different, having been influenced heavily by Zhuang speakers.

Conghua or Congzhou Cantonese is spoken in three different dialects in Central Guangdong. Intelligibility data is lacking.

Curiously, Nanning Cantonese is said to be intelligible with Standard Cantonese.

The Goulou Group of Cantonese is a separate from all of the rest of Cantonese and is linked with Ping and Tuhua. It is made up of Yulin Cantonese, Baobai Cantonese, Lizhou Cantonese, Guangning Cantonese, Huaiji Cantonese, Fengkai Cantonese, Deqing Cantonese, Shanglin Cantonese, Binyang Cantonese, Yangshan Cantonese, Ertang Cantonese, Shuishan Cantonese, Yunan Cantonese, and Tengxian Cantonese.

Ertang Cantonese, Shuishan Cantonese and Yunan Cantonese are all spoken in Guilin City in Guangxi Province. They are under Ping influence. Ertang and Shuishan arrived in Guangxi 100 years ago from the Yangshan region of Guangdong.

Yulin Cantonese is a representative variety in Goulou Cantonese and is the existing form of Chinese that is closest to Old Chinese.

Baobai Cantonese is spoken in Baobai south of Yulin. Yulin and Baobai are mutually intelligible, but they are not intelligible with the rest of Goulou Cantonese.

Lizhou Cantonese has difficult intelligibility with Standard Cantonese. It is spoken apart from the main group, so it may be a separate language.

Wuzhou Cantonese is a very divergent Cantonese variety spoken in Wuzhou City in Eastern Guangxi that is very hard even for other Cantonese speakers to understand.

The Gaoyang Group of Cantonese is a division of Cantonese that is composed of Gaozhou Cantonese, Yangiang Cantonese, Liangiang Cantonese and Maoming Cantonese.

Maoming Cantonese is an extremely diverse Cantonese variety that must be a separate language. Intelligibility of Maoming Cantonese with Yangiang Cantonese, Liangiang Cantonese and Gaozhou Cantonese is not known.

The Wuhua Group of Cantonese consists of Huazhou Cantonese, Zhanjiang Cantonese, Maihua Cantonese and Wuchuan Cantonese.

Huazhou Cantonese, spoken next door to Maoming, also cannot be understood by Standard Cantonese speakers.

Zhanjiang Cantonese is utterly unintelligible with Standard Cantonese. They speak Zhanjiang Min in this area, and the Cantonese has heavy Min influence, hence it is probably a separate language.

Maihua Cantonese is a Cantonese variety spoken on Hainan. This is the only Cantonese variety spoken on Hainan, so for that reason alone, it may be a separate language.

The Quinlian Group of Cantonese is a division of Cantonese spoken in the Guangxi coastal areas around Qinzhou, Lianzhou, Lingshan, Beihai and Fangchenggang.

The group is divided into urban varieties which share a high degree of mutual intelligibility with each other and even with other urban varieties in the Yongxun and Gaoyang Groups but have poor intelligibility with the rural varieties.

The reasons for the higher mutual intelligibility with urban varieties even outside of the group may be due to the cities themselves, even outside of known groups, being closer to each other than rural varieties even within the same group. This may have to do with histories of intense trade between cities even outside of groups which made them closer together.

The urban varieties are Qinzhou Cantonese, Fangcheng Cantonese, Dongxing Cantonese, and Lingcheng Cantonese. They would seem to constitute a language called Urban Quinlian Cantonese.

The rural varieties are split into three major groups: Lianzhou Cantonese, Lingshan Cantonese, and Xiaojiang Cantonese.

Lianzhou Cantonese varieties have a Ping base with some Min and Hakka blended in. They are spoken in Hepu, the southern part of Pubei, and the coastal areas of Qinzhou. Lianzhou is so different from even the rest of the rural varieties that it is a separate language.

Hepu Cantonese is a Lianzhou Cantonese lect.

Lingshan Cantonese varieties are spoken in the countryside of Qinzhou, Lingshan and Pubei.

Xiaojiang Cantonese varieties are spoken in Pubei.

The rural varieties have poor intelligibility with the urban lects. A separate language called Rural Quinlian Cantonese seems reasonable.

Beihai Cantonese is very widely spoken in the area around Nanning as the major language. Beihai itself has five separate dialects within it, Beihai Cantonese A, Beihai Cantonese B, Beihai Cantonese C, Beihai Cantonese D and Beihai Cantonese E.

Jimmi Cantonese is an unclassified Cantonese language spoken in Jilong and Tiechong in Huidong and Erbu and Chishi in Haifeng. The popular notion is that this is a blend of Cantonese, Hakka and Min. Hailufeng Min is widely spoken in the area, and Haifeng Hakka is also spoken. Jimmi varieties appear to be mostly Cantonese with some Hakka and an even smaller trace of Min. Surely Jimmi must be a separate language.

Namlong Cantonese, is an unclassified Cantonese language from the Pearl River area. It is also a separate language or at least it was in 1949. Whether it still exists is not certain, but native speakers must still be alive.

Dongguan, Shunde, Foshan, Zhongshan, Nanhai, Panyu, Xiquiao, Foshan, Shiwan, Shatin, and Jiujiang, Guangzhou, Vietnamese, Malayland, Macao, Hong Kong, Nishimura, Dongshan, Xiguan, Dongguan, Bao’an, Tanka, Shiqi, and Sanjiao are members of the Yuehai Group of Cantonese, which has 727 lects.

Yuehai itself is split into Guangfu, Zhongshan, Guangbao and Sanyi subgroups.

Guangzhou, Vietnamese, Malayland, Macao, Hong Kong, Nishimura, Dongshan, Wuzhou, Xiguan, and Tanka are members of the Guangfu Group of Yuehai, which has 10 lects.

Guangfu has 13 million speakers (Olson 1998).

Shunde, Panyu, Nanhai, Xiquiao, Foshan, Shiwan, Shatin, Jiujiang and one other are members of the Sanyi Group of Yuehai, which has eight lects.

Dongguan, Bao’an, and Daping are members of the Guangbao Group of Yuehai, which has three lects.

Shiqi and Sanjiao are members of the Zhongshan Group of Yuehai, which contains two lects.

Taicheng, Dajiang, Shuibu, Sijiu, Baisha, Sanhe, Chonglou, Doushan, Duhu, Chixi, Duanfen, Guanghai, Haiyan, Wencun, Shenjing, Beidou, Chuandao, Heshan, Jiangmen, Siquian, Doumen, Guzhen, Xinhuui, Enping, Gee, and Kaiping are members of the Siyi Group of Cantonese, which has at least 693 lects. There are 3.6 million speakers of Siyi Cantonese.

Nanning, Yongning, Guiping, Chongzuo, Ningmin, Hengxian, Baise, and five others are members of the Yongxun Group of Cantonese, which has 12 lects.

Yongxun Cantonese has five million speakers (Olson 1998).

Zhanjiang, Gaozhou, Maoming and nine others are members of the Gaoyang Group of Cantonese, which has 12 lects.

Gaoyang Cantonese has 5.4 million speakers (Olson 1998).

Huazhou, Zhanjiang, Maihua, and Wuchuan are members of the Wuhua Group of Cantonese, which has four lects.

Yulin, Baobai, Guangning, Wuzhou, Huaiji, Fengkai, Deqing, Yunan, Shanglin, Binyang, Yangshan, Ertang, Shuishan, and Tengxian are members of the Goulou Group of Cantonese, which has at least 14 lects.

Qinzhou, Fangcheng, Dongxing, Lingcheng, Beihai, Lianzhou, Lingshan, Xiaojiang, Conghua, Nanning, and Hepu are members of the Quinlian Group of Cantonese, which has 11 lects.

Namlong is unclassified.

There are 780 lects of Cantonese, and Cantonese has 64 million speakers.


Ping, now recognized as a major split from Cantonese, is composed of Guinan Ping, Guibei Ping, and Benihua Ping. Guinan and Guibei are definitely separate languages, and Benihua appears to be one also. There is high but apparently not full intelligibility between Guinan and Guibei.

Ping has been heavily influenced by the language of the Dong people. Cantonese has almost no intelligibility of Ping.

Guinan Ping is spoken in Northern Guangxi around the city of Guilin near the Southern Mandarin-speaking area.

Guibei Ping is spoken in Southern Guangxi around the city of Nanning. It is close to Cantonese, especially Nanning Cantonese spoken in the same area. Guibei has some loans from Zhuang.

Benihua is a Ping language that has been heavily influenced by the Gong language, and as such, no doubt it is a separate language.

Guinan Ping has 22 lects.

Yongjiang Pinghua, Guandao Pinghua and Rongjiang Pinghua are members of Guibei Ping, which has 11 lects.

There is one Ping variety that is unclassified.

Ping has 34 lects. Ping has 2 million speakers.


Tuhua is a separate branch of Chinese spoken in Northern Guangdong, Western, Southeastern, and Northeastern Hunan Province and parts of Southern Guangxi. It has 132 separate lects. Tuhua is not really a language group but a wastebasket group for various varieties derisively referred to as tuhua – or “farmer’s language.”

Initial examination suggests that a number of things.

First of all, that the Tuhua lects, especially those of Southern Hunan, are very diverse, possibly as diverse as Wu, Xiang and Hui. Many or all of them may well be separate languages. If Tuhua is really as diverse as Wu, Xiang and Hui, then quite probably there is a different Tuhua language spoken in every county. Further, they are poorly studied and dialectally very diverse. There are many dialects inside the known Tuhua lects, and these dialects are often very different. So there appear to be languages inside even the known Tuhua lects.

Further, there appear to be links between the Tuhua varieties of Southeastern Hunan and northern Guangdong and the Ping language of Northern Guangxi, as they border each other. They all appear to be related and to have descended from a common ancestor.

Tuhua may have originally begun as a Sinicized form of the Yao language, and many of its speakers are still Yao people. One theory is that Tuhua is simply an extension of Ping. Another theory is that Tuhua started out as Middle Gan and then mixed with Cantonese, Hakka and Southwestern Mandarin.

Additionally, many Tuhua varieties are starting to splinter recently, as influences from Hakka, Cantonese and Southwest Mandarin begin to affect the younger speakers such that the language of the youngest speakers is quite a bit different from the language of the older speakers.

The best known of the Tuhua varieties is Shaozhou, referred to here as Shaozhou or Shaoguan Tuhua. Sometimes this name is used to describe all Tuhua varieties. It is spoken on the border of Hunan, Guangdong and Guangxi. Most of the speakers are in Northern Guangdong, but there are also some speakers in Southeastern Hunan.

Shaozhou is very different from other Chinese lects. Shaozhou consists of many different varieties which are often strikingly different from the others. Some say that Shaozhou is a branch of Min Nan, while others say it is related to Hakka.

Shaozhou is composed of eight lects, all of which appear to be separate languages. Of these, Shibei Shaozhou Tuhua and Xiangyan Shaozhou Tuhua, spoken in adjacent towns, are separate languages. Shibei has heavy Hakka influence, and Xiangyang is turning more Cantonese. Xiangyang has only been in contact with Cantonese for a few decades, while Shibei has been in contact with Hakka for centuries.

Guitou Shaozhou Tuhua and Dacun Shaozhou Tuhua are also separate languages.

Zhoutian Shaozhou Tuhua and Shitang Shaozhou Tuhua are spoken in Renhua County. These they may both by separate languages.

Really all of the Shaozhou varieties seem to be separate languages, so Nanxiong Shaozhou Tuhua is also. Nanxiong apparently shares a common ancestor with Hakka.

Longgui Shaozhou Tuhua, spoken in Qujiang County in Guangdong, is a separate language. Longgui has 2,000 speakers.

Besides Shaozhou, another major split in Tuhua is Lianzhou Tuhua. It is spoken in Lianzhou County and in Liannan Autonomous Yao County in Quingyuan City in Northern Guangdong Lianzhou is composed of Xi’an Lianzhou Tuhua, Fengyang Lianzhou Tuhua, Xingzi Lianzhou Tuhua, and Bao’an Lianzhou Tuhua. Each is spoken in a distinct township or townships, so no doubt each is a separate language.

In Lechang Prefecture in Northern Guangdong bordering Hunan, there are five separate languages, Lechang Tuhua 1, Lechang Tuhua 2, Lechang Tuhua 3, Lechang Tuhua 4 and Lechang Tuhua 5, which are not fully intelligible with each other.

Xianghua is a branch of Tuhua that contains six varieties of its own. Xianghua Tuhua is a completely separate and highly diverse language that is spoken in Western Hunan.

Also in Hunan, in northeastern Quiyang County, another Tuhua variety is spoken – Quiyang Tuhua. This must certainly be a separate language. There is a great deal of dialectal diversity within Quiyang Tuhua. Yantang Quiyang Tuhua and Yangshi Quiyang Tuhua are two of these dialects.

Xintian Tuhua, spoken in Linwu County in Southern Hunan, is a major split in Tuhua, so it is surely a separate language.

Linwu Dachong Xintian Tuhua is a form of Xintian.

Jiahe Tuhua is a completely separate language, unintelligible with other lects. Furthermore, there are huge dialectal differences within Jiahe Tuhua that may or may not constitute separate languages.

In Yongzhou County in Southeastern Hunan, Yongzhou or Xiangnan Tuhua is spoken.

It is clearly a separate language. It has at least 18 different dialects: Xintian Southern Rural Yongzhou Tuhua, Xintian Yongzhou Northern Rural Yongzhou Tuhua, Ningyuan Zhangjia Yongzhou Tuhua, Ningyuan Yongzhou Pinghua Tuhua, Lanshan Shangdong Yongzhou Tuhua, Lanshan Tushi Yongzhou Tuhua, Lanshang Taiping Yongzhou Tuhua, Shuangpai Lijiaping Yongzhou Tuhua, Gangyu Yongzhou Tuhua, Xiangyu Yongzhou Tuhua, Guiyang Liuhe Yongzhou Tuhua, Jianghua Sumitang Qidouhua Yongzhou Tuhua, Jianghua Baimangying Yongzhou Tuhua, Jiangyong Songbai Yongzhou Tuhua, Jiangyong Chengguan Yongzhou Tuhua, Jiangyong Taochuan Yongzhou Tuhua, Daoxian Xianglinpu Yongzhou Tuhua, Dong’an Gaofeng Yongzhou Tuhua, Dong’an Xuaqiao Yongzhou Tuhua, Dong’an Shiqishi Yongzhou Tuhua, Lengshuitan Xiaojiangqiao Yongzhou Tuhua, and Lengshuitan Lanjiaoshan Yongzhou Tuhua.

There are four main types represented here:

The first type is a Dong’an-Lengshuitan type comprising Dong’an Xuaqiao, Dong’an Gaofeng, Dong’an Shiqishi, Lengshuitan Xiaojiangqiao, Lengshuitan Lanjiaoshan, and Sumitang Qidouhua.

Of these, Dong’an Gaofeng Yongzhou and Dong’an Xuaqiao Yongzhou are spoken in separate districts, so they are in all probability separate languages. Dong’an Shiqishi Yongzhou Tuhua has Xiang and Wu influences.

The Lengshuitan varieties appear to represent at least one language. Lengshuitan Lanjiaoshan has at least one dialect, Lengshuitan Shamuqiao Lanjiaoshan Yongzhou Tuhua. It has a close relationship to Dong’an Xuaqiao Yongzhou Tuhua.

The second type is a Jiangyong-Daoxian type comprising nine lects. At least seven of them are clearly separate languages.

Daoxian Xianglinpu Yongzhou Tuhua must be a separate language, as it is named after a county.

Daoxian Xiaojia Yongzhou Tuhua must be separate language also, as it is a major split in this group.

There are many different Yongzhou Tuhua lects in Jiangyong County, many of which are separate languages. Jiangyong Yunshan Yongzhou Tuhua, Jiangyong Xiaopu Yongzhou Tuhua, Jiangyong Xiacengpu Yongzhou Tuhua and Jiangyong Huilongxu Yongzhou Tuhua, all of which must surely be separate languages.

There are many dialects even within the town of Yunshan where Jiangyong Yunshan is spoken. Jiangyong Yunshan is transitional between Jiangyong Chengguan and Jiangyong Xiacengpu.

Jiangyong Xiacengpu has 21 different dialects.

Jiangyong Huilongxu is the language was the basis for the famous nishu, “women’s script”, a secret language of women (Leming 2004), originating from the Shangjiangxu (Xiao River) region of Northeastern Jiangyong County in Hunan, of which much has been written lately.of the famous Jiangyong women’s script referenced above.

Jiangyong Chengguan Yongzhou Tuhua, Jiangyong Taochuan Yongzhou Tuhua, Jiangyong Cushjiang Yongzhou Tuhua, and Jiangyon Huilongxu Tuhua also appear to be a separate languages.

Jiangyong Cushjiang has nine dialects.

Jiangyong Taochuan has 34 dialects, but there is a lot of uniformity between them.

Jiangyong Huilongxu has two dialects.

Jianghua Sumitang Qidouhua Yongzhou Tuhua has a reasonably close relationship to Jiangyong Songbai Yongzhou Tuhua and Jiangyong Chengguan, and all three are thought to have derived from the same base. Although it is spoken in the same county as Jianghua Baimangying, it appears to be completely different, so it must be a separate language.

Jianghua Baimangying Yongzhou Tuhua also appears to be quite different, so it is probably a separate language also.

As the other eleven main lects in this group are separate languages,

Intelligibility between varieties is not known, but dialectal divergence within Tuhua varieties is typically great, and some or all of the above may be separate languages. There are clearly at least 18 different languages here, and there may be up to 31 different languages.

The third type is a Xintian Southern Rural Yongzhou Tuhua type.

The fourth type is a Ningyuan Yongzhou Pinghua type.

There is also a group of unclassified types comprising Xintian Northern Rural Yongzhou Tuhua, Ningyuan Zhangjia Yongzhou Tuhua, Lanshan Shangdong Yongzhou Tuhua, Lanshang Taiping Yongzhou Tuhua, Guiyang Liuhe Yongzhou Tuhua, Jianghua Baimangying Yongzhou Tuhua, and Shuangpai Lijiaping Yongzhou Tuhua.

Of these, Lanshang Tushi Yongzhou Tuhua may well be a separate language. Guiyang Yongzhou Liuhe Tuhua is probably part of a separate language also, as Guiyang is a county in Southeastern Hunan. Gangyu Yongzhou Tuhua, Xiangyu Yongzhou Tuhua, Lanshang Taiping Yongzhou Tuhua, and Shuangpai Lijiaping Yongzhou Tuhua appear to represent the names of separate counties, so no doubt each one is a separate language.

Xintian Northern Rural Yongzhou Tuhua is apparently completely different from Xintian Southern Rural Yongzhou Tuhua, so it is probably a separate language also.

Another Tuhua variety spoken in Yongzhou in the southern part of the region, Huasheng Southern Yongzhou Tuhua, may have as many as 75 different dialects inside of it. This is undoubtedly a separate language.

The Tuhuas of Southern Hunan appear to be Gan/Xiang mixed languages.

Luojin Chongshan Tuhua is spoken in Yongfu in Southern Guangxi. It has a close relationship to Guibei Pinghua. It is clearly a separate language.


Danzou is a separate group of unclassified Chinese languages. Danzou is spoken in the northwest of Hainan, and Hainanese speakers cannot understand it. It is either related to the language spoken by the Lingao people or is the same language.

Yet the Danzou people speak nine different lects, including varieties described as Hakka, others described as Cantonese, and others described as Mandarin, so obviously there are at least three separate languages inside Danzou. Let us call these Danzou Cantonese, Danzou Hakka and Danzou Mandarin.

Lingling or Linghua is an unclassified language spoken in Longsheng County, Guangxi. Linghua is a separate language. It is spoken by 20,000 ethnic Hmong in Taiping, Pingdeng Township in Longsheng. It is spoken only by residents inside the city as a sort of secret language. Southwestern Mandarin is used with outsiders. The language is a mixture of Hmong and Southwestern Mandarin.

Junhua or Military Language is spoken in Taoyuan County and Luidui in Pingtung County in Taiwan, Lufeng County and Huizhou City in Guangdong; Sanya, Changjiang, Danzhou, Zonghe, and Lingao in Hainan; Guangxi; around Hakka speakers in Wuping County in Zhongshan, Fujian, and other places.

On a Mandarin base, Junhua adds Hakka, Cantonese and Taiwanese. It is considered to be an Old Mandarin language and is normally placed in Southwest Mandarin in a group called the Junhua Group, which contains four lects. But others say that different Military Language varieties are either Hakka or Gan. Wherever these varieties are spoken, they are not understood by people nearby.

Junhua seems to derive from a lingua franca spoken by soldiers in the Ming Dynasty Army and was widely learned and understood by all soldiers at the time. It bears a strong resemblance to Ming Era Chinese.

Military Language is not the same language in the various areas where it is spoken.

Huping Junhua, spoken by 16,000 people in Zhongshan, is not understood by the surrounding peoples and is not considered part of Hakka. The language began in the area in the 1390’s when the Ming Dynasty sent its army to Zhongshan to put down a rebellion. Soldiers came from all over China and remained in the area after the fighting, creating a new languages out of all of their languages mixed together along with local lects. Actually this is thought to be more of a Gan language with Hakka influences.

Taiwanese Junhua in Taiwan is not the same language as the Military Language elsewhere. This language also has heavy Hakka influences, but it also has Min Nan, Mandarin and even Japanese influences. Some say this is a Hakka language.

Uncertain Affiliation/Possibly Not Sinitic

Maojiahua is a language spoken by 20,000 Hmong in southwest of Hunan, in the northeast of Guangxi and in some areas of Hubei. Ethnologue originally listed this language as a form of Chinese, but it now listed as a Eastern Xiangxi Hmong. Another argument is that this is a Chinese language with heavy Hmong influence. As the matter is not yet settled and Ethnologue lists it as Hmong, we will not list it as Chinese.

Waxiang is an unclassified Chinese variety spoken by the Waxiang ethnic group in Luxi, Guzhang and Yongshun counties in Xiangxi Tujia and Miao Autonomous Prefecture, Zhangjiajie prefecture-level city in Dayong and Chenxi, Xupu and Yuanling Counties in Huaihua prefecture-level city in Northwestern Hunan. It is nothing like the Southwestern Mandarin, Xiang, Tujia and Xo Miao Hmong languages that surround it, and none of them can understand it. There are 362,000 speakers of Waxiang.

It shares some lexical influences from the Bai language, suggesting a substratum from the Bai languages. This is either an unclassified Chinese language or a separate minority tongue, maybe related to Hmong. Others view it as a Xiang-Hmong mixed language.


Ben Hamed, Mahe´. 2005. Neighbour-nets Portray the Chinese Dialect Continuum and the Linguistic Legacy of China’s Demic History. Proc. R. Soc. B 272:1015–1022.
Bodman, Nicholas C. 1988. Two Divergent Southern Min Dialects of the Sanxiang District, Zhongshan, Guangdong. BIHP 59 (2): 401-423.
Branner, David. 2000. Problems in Comparative Chinese Dialectology. The Classification of Min and Hakka. Berlin: Walter de Gruyter.
Branner, David. 2008. Personal communication.
Campbell, Hilary. 2004. Chinese Grammar – Synchronic and Diachronic Perspectives. Oxford, UK: Oxford University Press.
Campbell, James Michael. Putonghua and Taiwanese Min Nan speaker. Taipei, Taiwan. 2009. Personal communication.
曹志耘 (Cao, Zhiyun). 2002. 南部吴语语音研究 (Southern Wu Phonology Research). Beijing: Commercial Press (In Chinese).
Chan, Marjorie K.M., Lee, Douglas W. 1981. Chinatown Chinese: A Linguistic and Historical Re-evaluation. Amerasia Journal, Volume 8, Number 1.
Cheng, Chin-Chuan. 1997. Measuring Relationships among Dialects: DOC and Related Resources. Computational Linguistics & Chinese Language Processing 2.1: 41-72.
Cheng, Chin-Chuan. 1998. Extra-Linguistic Data for Understanding Dialect Mutual Intelligibility. Taipei, Taiwan: Paper delivered at the 1998 Annual Conference of the Pacific Neighborhood Consortium.
De Souza, S. C. 1903. A Manual of the Hainan Colloquial Bunsio Dialect. Singapore.
Gilliland, Joshua. 2006. Language Attitudes and Ideologies in Shanghai, China. MA Thesis. Columbus, OH: Ohio State University.
Hirata, Shoji. 1998. Aspect: A General System and its Manifestation in Mandarin Chinese. Taipei: Student Book Company.
Johnson, Eric. 2010. SIL Electronic Survey Reports 2010-027: A Sociolinguistic Introduction to the Central Taic languages of Wenshan Prefecture, China. Dallas, Texas: SIL.
Kirinputra, Láñitri. Hokkien speaker. November 2014. Personal communication.
Lee, Kent A. 2002. Chinese Tone Sandhi and Prosody. MA Thesis. Urbana, IL: University of Illinois at Urbana-Champaign.
Lien, Chinfa. August 17-19, 1998. Denasalization, Vocalic Nasalization and Related Issues in Southern Min: A Dialectal and Comparative Perspective. International Symposium on Linguistic Change and the Chinese Dialects Dedicated to the Memory of the Late Professor Li Fang-kuei. Seattle, Washington.
Liming, Zhao. The Women’s Script of Jiangyong: An Invention of Chinese, in Jie, Tao; Zheng, Bijun; and Mow, Shirley L., editors. 2004. Holding up Half the Sky: Chinese Women Past, Present, and Future, Chapter 4. New York: Feminist Press at the City University of New York.
Mair, Victor H. 1991. What Is a Chinese ‘Dialect/Topolect’?  Sino-Platonic Papers: 29.
McKeown, Adam. 2001. Chinese Migrant Networks and Cultural Change: Peru, Chicago, Hawaii, 1900-1936. Chicago, IL: University of Chicago Press.
Ngù, George. Eastern Min speaker. 2009. Personal communication.
Olson, James Stuart. 1998. An Ethnohistorical Dictionary of China. Westport, CN: Greenwood Publishing Group.
Rickard, Kristine. 2006. A Linguistic-phonetic Description of Lanqi Citation Tones. Proceedings of the 11th Australian International Conference on Speech Science & Technology, pp. 349-353. Edited by Paul Warren & Catherine I. Watson. University of Auckland, New Zealand. December 6-8, 2006. Auckland, NZ: Australian Speech Science & Technology Association Inc.
Szeto, Cecilia. 2000. Testing Intelligibility among Sinitic dialects. Proceedings of ALS2K, the 2000 Conference of the Australian Linguistic Society.
Tek, Rohana. 2016. Cambodian Teochew speaker. July 2016. Personal communication.
Terng, Brice. Central Xianyou Puxian Min speaker. September 2016. Personal communication.
Thurgood, Graham. 2006. Sociolinguistics and Contact-induced Language Change: Hainan Cham, Anong, and Phan Rang Cham.‭ Tenth International Conference on Austronesian Linguistics, January 17-20, 2006, Palawan, Philippines. Linguistic Society of the Philippines and SIL International.
Xun, Gong. Sichuan Mandarin and Putonghua speaker. Personal communication. September 2009.
Zheng, Rongbin. 2008. The Zhongxian Min Dialect: A Preliminary Study of Language Contact and Stratum-Formation, pp. 517-526. Edited by Chan, Marjorie K.M., and Kang, Hana. Proceedings of the 20th North American Conference on Chinese Linguistics (NACCL-20). Volume 1. Columbus, Ohio: The Ohio State University.


Filed under Asia, Cantonese, China, Chinese language, Comparitive, Dialectology, Language Classification, Language Families, Linguistics, Mandarin, Min Nan, Regional, Sinitic, Sino-Tibetan, Sociolinguistics

The Case for Splitting off Multiple English Dialects as Separate Languages

Here (on Italian dialects – actually many of which are separate languages).

One can make an excellent case that AAVE (Ebonics), Bayou/Cajun English, Deep South English, Appalachian English, New York English, Newfoundland English, and of course Jamaican creole and Scots are separate languages. Even Scottish English and Geordie probably qualify.

A recent study found only 54% intelligibility for Standard English speakers of Geordie. The speakers were L2 English learners in the Czech Republic, but they scored 100% on the “home” test, which was a test of a US television English. Another study found 42% intelligibility of Scots for native speakers of US English. Having heard Hard Scots spoken by the Scottish underclass, I would say my intelligibility of it was ~5-10% at best or possibly even less. It was almost as bad as listening to something like Greek, and one got the feeling listening to it that you were actually listening to some foreign tongue like, say, Greek.

At any rate, 42% and 54% very well qualify both Scots and Geordie as separate languages. Scots is already split, and it sure would be nice to split Geordie, but to say people would get mad is an understatement.

Scots and Jamaican creole are already split off. There is a lie going around the intellectual circles that it is still controversial in Linguistics whether Scots and Jamaican Creole are separate languages. In fact it is not controversial at all.

I have been listening to English my whole life as an American, and I still cannot understand Bayou speech, hard Southern English, Newfoundland English or the hard forms of Appalachian English or New York English. There are some very weird forms of English spoken on the US Atlantic coastal islands that cannot be understood by anyone not from there, or at least not by me. Gulla English in South Carolina is already split as a creole.

Generally the criterion we use is mutual intelligibility. Also if you can’t pick it up pretty quickly, it’s a separate language.

A speaker of hard New York English came to my mother’s school a while back, and no one could understand him. They still could not understand him after three months of listening to him – this is how you know you are dealing with a separate language. He finally learned how to speak California English, and then he was understood.

I have been listening to hard British English my whole life, and I still cannot understand them. I even had a British girlfriend for 1.5 years, and I still could not understand her on the phone. She went to my parents house for dinner, stayed a couple of hours, and my brother said he didn’t understand a word she said.

You can make an excellent case that the harder forms of British English (or Australian English for that matter) are not the same language as US English. The problem is that if you tried to split them off, everyone would go insane (including a lot of very foolish linguists), and there would be a wild uproar.

Generally we use 90% as the split between language and dialect. Less that that, separate language. More than that, dialect. We use this criterion to split languages from dialects everywhere, yet if we tried to do it for English, the resulting firestorm would be so ferocious that it would not be worth it, but it would be perfectly valid scientifically. Even the very well-validated split of Scots has driven the English-speaking world half-nuts.

I actually have a post in my drafts where I split English into ~10-15 different languages, but I have been terrified to post it. My post splitting German into 137 different languages did not go over well with the Net linguists (who are mostly loudmouths, fools, cranks, and idiots), although a major Germanist, a professor at a big university in Europe wrote me when I was only at 90 languages and said, “I think you are right!” Still, if I try to split English, I may ignite one Hell of a damned firestorm, and I’m just too chicken.


Filed under Australia, Balto-Slavic-Germanic, Britain, Canada, Caribbean, Dialectology, English language, German, Germanic, Indo-European, Indo-Hittite, Jamaica, Language Families, Linguistics, North America, Northeast, Regional, Scots, Sociolinguistics, South, South Carolina, USA

Fake Controversies, Fake Settled Questions, and Ideological Authoritarianism in Modern Linguistics, with an Emphasis on Mutual Intelligibility and the Dialect/Language Question

There is a lie going around that the dialect/language question is controversial in Linguistics. It really isn’t. Most linguists have a pretty good idea of where to draw the line. If you don’t believe me, study the internals of the Summer Institute of Linguistics change request forms for languages. The field is a lot more uniform on this question than the cranks think.

Hardly anyone thinks Valencian is a separate language. There were 5-10 experts writing in on Valencian and they were all in agreement.

Romagnolo and Emilian were split with zero controversy. All it took was a few authoritative statements by the experts in these varieties to settle the question.

In other words, the language dialect question is what is known as a fake controversy.

Really the only controversy about this question comes from nationalists and language activists.

Sadly, many linguists are nationalists, and their work has been poisoned by their ideology for a long time now. Some of the worst ones of all are in Europe.

Linguistics in the Balkans and Poland has been badly damaged by nationalist linguists for a long time, with no sign of things getting better.

Similar nonsense is going on in of all places ultra-PC Denmark and Sweden. Bornholmian and Southeast Jutnish should have been split from Danish long ago. In fact, Jutnish was split, but Danish nationalist linguists pathetically had it removed.

The many langues d’oil have never been listed and probably never will be. No doubt this is due to the state of Linguistics in ultra-nationalistic France. There are easily 10-15+ langues d’oil that could be split off.

Greek linguist nationalists have raised their ugly heads over splits in Macro-Greek.

Bulgarian Linguistics is all nationalist and has been lost in retardation forever now. No, Macedonian is not a Bulgarian dialect.

There have been some ugly and ridiculous fights in the Baltics especially with Estonian and Latvian, neither of which is a single language. I doubt that Estonian and Latvian linguists are comporting themselves well here given the fanatical nationalism that overwhelms both lands.

There are easily 350-400 language inside of Sinitic or Chinese according to the estimate of the ultimate Sinologist Jerry Norman. The real figure is clearly closer to 1,000-2,000 separate languages. Chinese nationalism is mandatory for anyone doing Sinitic linguistics. No one wants to bring down the wrath of the Chinese government by pulling the curtain on their big lie that Chinese is one language. I am amazed that SIL even split Chinese into 14 languages without getting deluged with death threats.

Arabic is clearly more than one language, and SIL now has it split into 35 languages.  This is one odd case where they may have erred by splitting too much. That’s probably too many, but no one can even do any work in this area, since Arabists and especially Arabic speakers keep insisting, often violently, that Arabic is a single language. Never mind that they routinely can’t understand each other. We have Syrians and Yemenis at my local store, and no, the Syrian Arabic speakers cannot understand hard Yemeni Arabic, sorry. Some of the Yemeni Arabic speakers have even whispered conspiratorially in my ear when the others were not around that speakers of different Yemeni Arabic varieties often cannot even understand each other, and that’s not even split by SIL. I have a feeling that the Arabic situation is more like Chinese than not.

A Swedish nationalist wiped out several well documented separate languages inside of Macro-Swedish simply by making a few dishonest change request forms. SIL pathetically fell for it.

Occitan language activists wiped out the very well-supported split of Occitan into six separate languages based on ideology. They are trying to resurrect Occitan, and they think this will only work if there is one Occitan language with many dialects under it. Splitting it up into six or more languages dooms the tongue. So this was a political argument masquerading as a linguistic one. SIL fell for it again. Pathetic.

No one has talked much about these matters in the field, but a man named Harold Hammerstrom has written some excellent notes about them. He also takes the language/dialect question very seriously and has proposed more scientific ways of doing the splitting.

SIL was recently granted the ability to give out new ISO codes for languages, and since then, SIL has become quite conservative, lumping varieties everywhere in sight. This is because lumping is always the easy way out, as conservatives love lumping in everything from Classification to Historical Linguistics, and the field has been taken over by radical conservatives for some time now. Splitters are kooks, clowns, and laughing stocks. One gets the impression that SIL is terrified to split off new tongues for fear of bad PR.

As noted above, the language/dialect question is not as controversial in the field as Net linguist cranks would have you believe. SIL simply decides whatever they decide, and all the linguists just shrug their shoulders and go back to Optimality Theory, threatening to kill each other over Indo-European reconstructions, scribbling barely readable SJW sociolinguistic blather, or whatever it is they are crunching their brains about.

SIL grants an ISO code or refuses to grant one, and that’s that. No ISO code, no language. The main problem is that they refuse to split many valid languages mostly out of PC fear of causing a furor. Most of the opposition to splitting off new languages comes from linguistic hacks and cranks who exist for the most part on the Internet.

Most real linguists don’t seem to care very much. I know this because I talk to real linguists all the time. When it comes to the dialect/language split, most of them find it mildly intriguing, but hardly anyone is set off. You tell them that some dialect has now been split off as a separate language or two languages have now been merged into one, and they just perk up their ears and say, “Oh, that’s interesting.” Sometimes they shrug their shoulders and say, “They (SIL) are saying this is a separate language now,” as if they really don’t care one way or another.

Linguists definitely get hot under the collar about some things, but not about the dialect/language question, which is regarded more as a quizzical oddity. Most linguists furthermore care nothing at all about the mutual intelligibility debate, which at any rate was resolved long ago by SIL way back in the 1950’s. See the influential book by Cassad written way back then for the final word on the science of mutual intelligibility. Some enterprising linguists are finally starting to take mutual intelligibility seriously, but even they are being much too wishy-washy and unsciency about it. A lot of very silly statements are made like “there is no good, hard scientific way to measure mutual intelligibility, so all figures are guesswork.”

There’s no need for these theoretical shields or hyper-hedging because no one cares. No one in the field other than a few nutcases and kooks on the Internet even gives two damns about this question in the first place. The mutual intelligibility question is actually much less controversial in the field that the linguist kook loudmouths on the Net would have you believe.

We have more important things to fight about, like Everett’s resurrecting of the hated Sapir-Whorf Hypothesis; Chomsky’s Universal Grammar (defended pathetically by the Old Guard and under attack by the Everett crowd who everyone hates); not to mention Altaic; and Joseph Greenberg’s poor, regularly pummeled ghost, along with mass comparison in general.

The field is full of many a silly and pretty lie. One for instance is that Linguistics rejected the Sapir-Whorf Hypothesis long ago, and now it is regarded as a laughing stock. Actually that’s not true. Really a bunch of bullies got together and announced very arrogantly that Sapir-Whorf was crap, and then it become written in stone the way a lot of nonsense our field believes does.

If you go back over the papers that “proved” this matter, it turns out that they never proved one thing. They just said that they proved Sapir-Whorf was nonsense, and everyone fell for it or just got in line like they were supposed to.

Not to mention that Linguistics is like an 8th Grade playground.

Let’s put it this way. If you advocate for Sapir-Whorf in academia, I pray for your soul. You also damn well better have tenure.

I don’t know how anyone advocates for Altaic these days. I would never advocate for Altaic or any remotely controversial historical linguistics hypothesis without tenure.

The field is out for blood, and they burn heretics at the stake all the time. We’ve probably incinerated more wrong thinkers than the Inquisition by now.


Filed under Afroasiatic, Altaic, Arabic, Balto-Slavic-Germanic, Chinese language, Comparitive, Danish, Denmark, Dialectology, Europe, France, Germanic, Greece, Greek, Hellenic, Indo-European, Indo-Hittite, Indo-Irano-Armeno-Hellenic, Italic, Italo-Celtic, Italo-Celtic-Tocharian, Language Classification, Language Families, Linguistics, Nationalism, Occitan, Poland, Political Science, Regional, Romance, Semitic, Sinitic, Sino-Tibetan, Sociolinguistics, Sweden

How I Determined Intelligibility For Turkic Lects

Steve: This is amazing. Well done. But how can you possibly know the degree of mutual intelligibility between two languages you don’t speak or know if something is a language or dialect when you don’t speak it? That seems strange. How is it worked out?

Linguists don’t speak all these languages we study. We just study languages, we don’t necessarily speak them. This is confused with the archaic use of the word linguist to mean polyglot. Honestly, many linguists do in fact speak more than one language, and quite a few of them have a pretty good knowledge of at least some of the languages that they study. But my mentor speaks only Turkish and English though he studies all Turkic languages. I don’t believe he has ever learned to speak any Turkic lect other than Turkish.

In reference to my paper here.

We are not looking for raw numbers. We just want to know if they can understand each other or not.

A lot of it is from talking to native speakers and also there was a lot of reading papers by other linguists. I also talked to other linguists a lot. Linguists typically simply state if two lects are intelligible or not. Also there is a basic idea among linguists of what the boundary is between a language and a dialect, and I used this knowledge a lot.

Can they understand each other? Yes or no. That’s pretty much about it. Also at some degree of structural difference, we can see the difference between a language and a dialect. It’s a judgement call, but linguists are pretty good at this.

There is a subsection of very loud linguists, mostly on the Internet, who like to screech a lot about this question cannot be answered by answered because of this or that red herring or some odd conundrums that work their way in. The thing is if you ask around enough, you will be able to get around all of the conundrums and you should be able to eventually reconcile all of the divergent responses to get some sort of a holistic or “big picture.” You finally “figure it out.” The answer to the question comes to you in a sort of a “seeing the answer as part of a larger picture” sort of thing.

The worst red herring is this notion that speakers from Group A will lie and say they do not understand speakers of Group B simply because they hate them so much. If this was such a concern, you would have think I would have run into it at some point. A much worse problem were ethnic nationalists who lie and say that they can understand neighboring tongues when they can’t.

The toxin called Pan-Turkism or Turkish ultranationalism comes into play here. It is almost normal for Turks to believe that there is only one Turkic languages, and it is called Turkish. All of the rest of the languages simply do not exist and are dialects of Turkish. I had to deal with regular attacks by extremely aggressive Ataturkists who insisted that any Turk could easily understand any other Turkic language. Actually my adviser told me that my piece would not be popular with the Pan-Turkics at all. I don’t really care as I consider them to be pond scum.

Granted, some of it was quite controversial and I got variable reports on intelligibility for some lects like Siberian Tatar vs. Tatar, the Altai languages, Kazakh vs. Kirghiz, Crimean Tatar vs. Turkish.

Where native speakers differ on such questions, often vociferously, you simply ask enough of them, talk to some experts and try to get a feel for that what best answer to the question is.

Some cases like Gagauz vs. Turkish probably need raw intelligibility testing. That’s the only one that is up in the air right now, but it is up in the air because the lects are so close. Intelligibility between Gagauz and Turkish is somewhere between  70-100%. In other words, they have marginal intelligibility at worst. My Gagauz expert who knows this language better than anyone though feels that Turkish intelligibility of Gagauz is less than 90%, which is where I drew the line at language and dialect.

It is also starting to look like Nogay is a simply a dialect of Kazakh instead of a separate language, but that might be a hard sell.

Some of these are seen as separate languages simply because they are spoken by different ethnies who do not want to be seen as part of the same group. Also they have different literary norms. Karapalkak is just a Kazakh dialect, but the speakers want to say they speak a separate language. Same with Bashkir, which is simply a dialect of Tatar. The case of Kazakh and Kirghiz is more controversial, but even here, we seem to be dealing with one language, yet the two dialects are spoken by different ethnies that have actually differentiated into two separate states, each with their own literary norm. Kazakhs wish to say they speak a language c called Kazakh and Kirghiz wish to say they speak a language called Kirghiz although they are probably really just one language.

We see a similar thing with Czech and Slovak. My recent research has proven that Czech and Slovak are actually a single language. But the dialects are spoken by different ethnic groups who claim different cultures and histories and they have actually divided into two different states, and each has its own literary norm.

It is here, where dialects become languages not via science by via politics, culture, history and sociology, that Weinrich’s famous dictum that “a language is a dialect with an army and a navy” comes into play.

Scientifically, these are all simply dialects of a single tongue but we call them languages for sociological, cultural and political reasons.


Filed under Altaic, Balto-Slavic, Balto-Slavic-Germanic, Bashkir, Comparitive, Crimean Tatar, Czech, Dialectology, Gagauz, Indo-European, Indo-Hittite, Kazakh, Kipchak, Kyrgyz, Language Classification, Linguistics, Nationalism, Political Science, Slavic, Slovak, Sociolinguistics, Tatar, Turkic, Turkish, Ultranationalism

A Few Words on Language Endangerment

Carlos Lam: Congrats! However, isn’t language death a rather standard occurrence among societies?

It is, but we linguists don’t really like it. It is quite a debate going on, but the bottom line seems to be that ethnic groups and speaker groups have the right to ownership of their languages. We worry that a lot of speaker groups are being pressured into blowing up their languages prematurely. We like to study these languages and we are not real happy about seeing them vanish into the horizon. On the other hand, is cultural death a natural thing too? Both cultural death and language death are occurring at rates far beyond the normal background rates. English and some of the other major languages are like weapons of mass destruction in taking out languages. You really want a world with one language and one culture? I don’t.

The best position seems to be that speakers have the right to decide the fate of their languages. If speakers wish to continue speaking their languages, then governments and linguists should help them to preserve and continue to develop their languages. Quite a few groups do not seem to care that their languages are going are extinct or they are even driving or drove their languages extinct, and they have the full right to do so. In these cases, we will simply do salvage linguistics. There are many salvage linguistics projects going on in the world today.

You won’t get very far with linguists arguing that language death is a good thing. Most people don’t think so.

Occurring at the same time as language death is a lot of language revitalization. Even fully dead languages are being resurrected from the grave. Also in addition to language death, we are creating new languages all the time. In this piece, I created a total of net 13 new languages. And new languages are occurring on their own.

To give you an example. A group of Crimean Tatars moved from Crimea to Turkey about 200 years ago in the course of the Crimean War. They have been speaking Crimean Tatar in Turkey ever since, for 200 years now. But in that time, Crimean Tatar in Turkey and Crimean Tatar in Ukraine has diverged so much that Turkish Crimean Tatar is now, in my opinion, a fully separate tongue from the Ukrainian language. This is because in Turkey, a lot of Turkish has gone into Turkish Crimean Tatar which is not well understand in the Ukraine. And in the Ukraine, a lot of Russian has gone in which is not well understood in Turkey. Hence, Crimean Tatar speakers in Turkey and Ukraine can no longer understand each other well.

To give you another example, there are many Kazakh speakers in China. However, Kazakh speakers in China can no longer understand Standard Kazakh broadcasts from Kazakhstan because so many Russian loans have gone into Standard Kazakh that it is no longer intelligible with Chinese Kazakh speakers. I learned this too late for my paper, otherwise I would have split Chinese Kazakh off as a separate language.

There are many cases like this.

Further, many languages are being discovered. Sonqori, Western Khalaj, Todzhin, Duha, Dukha and Siberian Tatar are just a few of the new languages that I created. Khorosani Turkic was split into three different languages. Dayi was subsumed into one of the Khorosani Turkic languages. Altai was split from one into five separate languages, but the truth is that it is six languages, not five. Salar was split into Western Salara and Eastern Salar. Ili Turki was eliminated becuase it does not even exist. It is simply a form of Uighur. Kabardian and Balkar, Tatar and Bashkir, Kazakh and Kirghiz were some languages that were eliminated and subsumed into single tongues such as Tatar-Bashkir, Kazakh-Kirghiz, and Kabardian-Balkar. And on and on.

Languages and of course dialects are dying all the time, but new languages are being created by humans and by linguists as we continue our splitting projects. Many lects referred to as dialects are more properly seen as separate languages. Chinese is at least 450 separate languages, only 14 of which are recognized. German may be up to 130 separate languages, only 20 of which are recognized.

There are quite a few more languages to be created out there, but there is a lot of resistance to splitters like me from more conservative linguists and especially from linguistic nationalists. For while Chinese may well be over 1,000 languages, the Chinese government is anti-scientifically insistent that there is but one Chinese language and maybe 2,000 “dialects,” most of which are probably separate languages. The German government is quite resistant to the idea that there is more than one form of German, though I believe Bavarian and Swiss German have official status in Austria and Switzerland.

1 Comment

Filed under Asia, Balto-Slavic-Germanic, Bashkir, Bavarian, China, Chinese language, Comparitive, Crimean Tatar, Dialectology, Europe, European, German, Germanic, Government, History, Indo-European, Indo-Hittite, Kazakh, Kyrgyz, Language Classification, Language Families, Linguistics, Regional, Sinitic, Sino-Tibetan, Sociolinguistics, Tatar, Turkey, Turkic, Ukraine

Galician, Portuguese, and the Possibility of a Third Language Between Them

Dwan Garcez: Portuguese and Galician are the same language.

This person is Portuguese, and what they are saying is Portuguese nationalism or Portuguese linguistic nationalism. Portuguese and Galician were one language until 1550 when they split. But that time period of 450 years is about the same as between Ukrainian and Russian and Belorussian and Russian. Russian, Belorussian and Ukrainian are regarded as separate languages. And that is about the same time split as between English and Scots as Scots split off from English right around that time. Scots is regarded as a separate language from English. English has only 42% intelligibility of Scots.

Boy, I do not agree with that for one second. If you want to be sure you are not understood when you go to Lisbon, speak Galician!

If you leave Galicia, you will only be understood for six miles inside the country. After that, forget it. People who live on the border in Galicia say that they can understand their friends across the border in Portugal fairly well but not completely, and they usually both speak in Spanish to avoid communication problems.

Furthermore, Ethnologue has decided that Galician and Portuguese are different languages.

Portuguese people cannot understand well the Galician/Portuguese mix spoken right around the border with Galicia. Some Portuguese can hardly understand Tras Os Montes Portuguese at all. In fact, the Alto-Minho and Tras Os Montes dialects of Portuguese are not well understood in Portugal or in most of Galicia. This is really Galician but it is not well understood to the north in Vigo and Santiago de Compostela. Residents of the Minho, though they really are Galicians, say they do not speak Galician. Their lect is even further from Portuguese. You could make a case that Alto-Minho/Tras Os Montes is a separate language, but it would be a hard sell.

Already at least one Galician dialect has been split off into a separate language. Fala is recognized as a separate language and there are good grounds for making that case.


Filed under Dialectology, Europe, Galician, Indo-European, Indo-Hittite, Italic, Italo-Celtic-Tocharian, Language Families, Linguistics, Portugal, Portuguese, Regional, Romance, Sociolinguistics, Spain

Simplification of Language with Increasing Civilization: A Result of Contact or Civilization Itself

Nice little comment here on an old post, Primitive People Have Primitive Languages and Other Nonsense? 

I would like to dedicate this post to my moronic field of study itself, Linguistics, which believes in many a silly thing as consensus that have never been proved and are either untrue or probably untrue.

One of the idiocies of my field is this belief that in some way or another, most human languages are pretty much the same. They believe that no language is inherently better or worse than any other language, which itself is quite a dubious proposition right there.

They also believe, incredibly, that no language is more complex or simple than any other language. Idiocy!

Another core belief is that each language is perfectly adapted for its speakers. This leads to their rejecting claims that some languages are unsuitable for the modern world due to lack of modern vocabulary. This common belief of many minority languages is obviously true. Drop a Papuan in Manhattan, and see what good his Torricelli tongue does him. He won’t have words for most of the things around him. He won’t even have verbs for most of the actions he sees around him. His language is nearly useless in this environment.

My field also despises notions that some languages are better suited to poetry, literature or say philosophy than others or that some languages are more or less concise or exact than others or that certain concepts or ways of thinking are better expressed in one language as opposed to another. However, this is a common belief among polyglots, and I would not be surprised if it was true.

The question we are dealing with below is based on the notion that many primitive languages are exceeding complex and the common sense observation that as languages acquire more speakers and civilization increases, one tends to see a simplification of language.

My field out and out rejects both statements.

They will tell you that primitive languages are no more complex than more civilized tongues and that there is no truth to the statement that languages simplify with greater numbers of speakers and increased civilization. However, I have shot these two rejected notions to many non-linguists, and they all felt that these statements had truth to them. Once again, my field violates common sense in the name of the abstract and abstruse “we can’t prove anything about anything” scientific nihilism so common in the intellectually degraded social sciences.

Indeed, some of the most wildly complex languages of all can be found among rather primitive peoples such as Aborigines, Papuans, Amerindians and even Africans. Most language isolates like Ket, Burashaski and Basque are pretty wild. The languages of the Caucasus are insanely complex, and that region doesn’t exactly look like Manhattan. Siberian languages are often maddeningly complex.

Even in China, in the remoter parts of China, language becomes highly differentiated and probably more complex. I know an American who was able to learn Cantonese and Mandarin who told me that at age 35, for an American to learn Hokkien was virtually impossible. He tried various schemes, but they all failed. He finally started to get a hold of the language with a strict eight hour a day study schedule. Anything less resulted in failure. Hokkien speakers that he spoke too said you needed to grow up speaking Hokkien to be able to speak the language well at all. By the way, this is another common sense notion that linguists reject. They say there are no languages so difficult that it is very hard to pick them up unless you grew up with them.

The implication here is that Min Nan is even more complex than the difficult Mandarin or even the forbidding Cantonese, which even many Mandarin speakers give up trying to learn because it is too hard.

Min Nan comes out Fujian Province, a land of forbiddingly high mountains where language differentiation is very high, and there is often difficult intelligibility even from village to village. In one area, fifteen years ago an American researcher decided to walk to a nearby village. It took him six very difficult hours over steep mountains. He could have taken the bus, but that was a four-day trip! A number of these areas had no vehicle roads until recently and others were crossed by vast rivers that had no bridges across them. Transportation was via foot. Obviously civilization in these parts of China is at a more primitive level, and it’s hard to develop Hong Kong-style cities in places with such isolating and rugged terrain.

It’s more like, “Oh, those people on the other side of the ridge? We never go there, but we heard that their language is a lot different from ours. It’s too hard to go over that range so we never go to that area.”

In the post, I theorized that as civilization increased, time becomes money, and there is a need to get one’s point across quickly, whereas more primitive peoples often spend no more than 3-4 hours a day working and the rest sitting around, playing  and relaxing. A former Linguistics professor told me that one theory is that primitive people, being highly intelligent humans (all humans are highly intelligent by default), are bored by their primitive lives, so they enjoy their wildly complex languages and like to relax, hang out and play language games with them to test each other on how well they know the structures. They also like to play tricky and maybe humorous language games with their complicated languages. In other words, these languages are a source of intellectual stimulation and entertainment in an intellectually impoverished area.

Of course, my field rejects this theory as laughably ridiculous, but no one has disproven it yet, and I doubt if the hypothesis has even been tested, hence it is an open question. My field even tends to reject the notion of open questions, preferring instead to say that anything not proven (or even tested for that matter) is demonstrably false. That’s completely anti-scientific, but that’s the trend nowadays across the board as scientistic thinking replaces scientific thinking.

Of course this is in line with the terrible conservative or reactionary trend in science where Science is promoted to a fundamentalist religion and scientists decide that various things are simply proven true or proven not true and attempts to change the consensus paradigm are regarded derisively or with out and out fury and rage and such attempts are rejected via endless moving of goalposts with the goal of making it never possible to prove the hypothesis. If you want to see an example of this in Linguistics, look at the debate around  Altaic. They have set it up so that no matter how much existing evidence we are able to gather for the theory, we will probably never be able to prove it as barriers to proof have been set up to make the question nearly unprovable.

It’s rather senseless to set up Great Wall of China-like barriers to proof in science because at some point,  you are hardly proving anything new, apparently because you don’t want to.

Fringe science is one of the most hated branches of science and many scientists refer to it as pseudoscience. Practitioners of fringe science have a very difficult time as the Scientific Establishment often persecutes them, for instance trying to get them fired from professorships. Yet this Establishment is historically illiterate because many of the most stunning findings in history were made by widely ridiculed fringe scientists.

The commenter below rejects my theory that increased civilization itself results in language simplification, as it gets more important to get your point across as quickly  as possible with increasing complexity and development of society. Instead he says civilization leads to increased contact between speakers of different dialects or language, and in such cases,  language must be simplified, often dramatically, in order for any decent communication to occur. Hence increased contact, not civilization in and of itself, is the driver of simplification.

I like this theory, and I think he may be onto something.

To me the simplification of languages of more ‘civilized’ people is mostly a product of language contact rather than of civilization itself. If the need arises to communicate with foreign people all of the time, for example in trade, then the language must become more simple in order to be able to be understood by more people.

Also population size matters a lot. It has been found that the greater the number of speakers, the greater the rate of language change. For example Polynesian languages, although having been isolated centuries or even millennia ago, still have only minor differences from one another.

In the case of many speakers, not all will be able to learn all the rules of a language, so they will tend to use the most common ones. And if the language is split in many dialects, then speakers of each dialect must find a compromise in order to communicate, which might come out as simple. If we add sociolects, specific registers for some occasions, sacred registers, slang etc, something that will arise in a big and stratified civilization, then the linguistic barriers people will need to overcome become greater. So it is just normal that after some centuries, this system to simplify.

We don’t need to look farther than Europe. Most languages of the western half being spoken in countries with strong trade links to one another and with much of the world later in history are quite analytic, but the languages of the more isolated eastern part are still like the older Indo-European languages. Basques, living in a small isolated pocket in the Iberian Peninsula, have kept a very complex language. Icelanders, also due to isolation, have kept a quite conservative Germanic language, whereas most modern Germanic languages are ridiculously simplified. No one can argue in his sane mind that Icelanders are primitives.

On the other hand, Romanian, being spoken in the more isolated Balkans, has retained more of the complex morphology of Latin compared to West Romance languages. And of course advance of civilization won’t automatically simplify the language, as Turkish and Russian, both quite complicated languages compared to the average European tongue, don’t seem to give up their complexity nowadays.

On the other hand, indigenous people were living in a much more isolated setting compared to the modern world, the number of speakers was comparatively low, and there was no need to change. Also, neighboring tribes were often hostile to one another, so each tribal group sought to make itself look special. That is the reason why places with much inter-tribal warfare like New Guinea have so many languages which are so different from one another. When these languages need to communicate, we get ridiculously simple contact languages like Hiri Motu.
So language simplification is more a result of language contact rather than civilization itself.


Filed under Aborigines, Altaic, Amerindians, Anthropology, Applied, Asia, Basque, Cantonese, Caucasus, China, Chinese language, Cultural, Dialectology, Europe, Germanic, Indo-European, Isolates, Language Families, Language Learning, Linguistics, Mandarin, Min Nan, Near East, Papuans, Race/Ethnicity, Regional, Russian, Science, Siberian, Sinitic, Sino-Tibetan, Sociolinguistics, Turkic, Turkish

The Chinese Language: The Wily Tiger That Cannot Be Tamed

Putonghua is the official version of Mandarin which the Communist government determined was to be the official language of the nation. It was created in 1949 and modeled mostly but not entirely on the variety spoken in Beijing.

Although Putonghua seems to be killing off a lot of dialects or even microlanguages, I have a feeling that this is mandatory. Nevertheless the process of accelerated language change in China (Why?) seems to be even catching up with Putonghua. For instance, Putonghua of course was modeled on the Beijing language. However, this was Beijing Mandarin of 1949, and it was also the language of the suburbs, not to the city.

Since then, Putonghua has taken off on its own and so has Beijing Mandarin with the strange result that the hard Beijing Mandarin of hutongs in the center of the city is now often unintelligible to Putonghua speakers! So this is a case of a standard language and the lect it was modeled off taking off via independent evolution such that 70 years later, the original lect is no longer intelligible with the Standard that was modeled on that very lect!

Chinese lects are wildly different, and tones adds another mess into the matter. This has shown up even in Putonghua, where some Putonghua varieties are now unintelligible with the rest of Putonghua due to severe influence of the local lects on the standard and possibly regional evolution of the standard! Hence even Putonghua seems to have split off into several languages itself! Thus Guangdong Putonghua, Anhui Putonghua, Shanghai Putonghua, Jianghuai Putonghua and Zhengcao Putonghua are no longer fully intelligible to Putonghua speakers outside the region!

In addition, Taiwan Mandarin, Tibetan Mandarin and Malay Mandarin have all taken off on their own independent evolutionary tracks such that these are no longer fully intelligible to Standard speakers either! So since 1949, Putonghua has split into at least 8 different languages that lack full intelligibility with each other!

It seems the Chinese tried to lasso that wily creature called the Chinese language to rein it in and domesticate it somehow, but the wily creature keeps slipping away due to its endlessly morphing patter.


Filed under Asia, China, Chinese language, Dialectology, Government, Language Families, Left, Linguistics, Maoism, Marxism, Politics, Regional, Sinitic, Sino-Tibetan, Sociolinguistics

Linguistic “Science”: Let’s Get the Scientists out of Science and Let the Politicians Do Science Instead

As you saw in a previous article, the Chinese government, against all reason and for purely dishonest political motives, lies and says there is only one Chinese language, when in fact linguistic science (SIL) says there are 14 Chinese languages, and Sinologists argue that there are 2,000 Chinese languages!

Linguists let this slide because we have decided to cop out on one of the more important questions of our field, the divide between a language and a dialect. We are copping out because the scientific question itself is politicized as many scientific questions are. But linguists are cowards who are afraid of big, bad politics, so we have decided to just let politicians and other professional liars decide some of the more important questions of our field.

Dig this.

If you ask a linguist what the difference between a dialect and a language is, he will either quote some flippant classroom quote from Paul Weyrich 70 years ago, “A language is a dialect with an army and a navy,” or he will avoid the question altogether. The standard linguistic cop-out answer is,

Linguistic science has no way to determine what is a language what is a dialect because the question is political and not scientific.

Brilliant! Any time there’s any questions in your scientific field that you don’t want to answer because you’re cowards/sophists, you simply decide that it wasn’t a scientific question at all, instead it was a political question. Wa-la! Problem solved! Now we can get back to the really important stuff, like figuring how many Proto-Indo-European laryngeals there. You know, stuff that everybody needs to know.

But hey, what the Chinese government lies, I mean says, goes, and all of us linguistic “scientists” (snicker) go along with this anti-scientific BS because we have decided that this particular branch of so-called science is so stupid that we can’t even figure out if a given lect is a language or a dialect because we idiotically state that there are no criteria for making such a discernment.

So of course, we throw the scientific question over to the most honest people in the whole world, the politicians! Yeah! That’ll solve the problem. How bout all of us social “scientists” get together and decide to let the world’s politicians (venerable empiricists of course) decide the most important questions of our field because we are too stupid to figure them out on our own. Let’s get the scientists out of science and let the politicians do it instead. That’s what the official determination of linguistic “science” (snicker) is.

Pitiful. Just pitiful.

You wonder why people chuckle when you say the phrase “social science.”


Filed under Chinese language, Dialectology, Government, Language Families, Linguistics, Politics, Science, Sinitic, Sino-Tibetan, Sociolinguistics