Languages of India
How India's languages are grouped into families, which ones the Constitution recognises and protects, the classical-language tag, and the ancient scripts they descend from.
India is one of the most linguistically diverse countries in the world. The languages spoken here belong to a handful of distinct language families. A language family is a group of languages that descend from a common ancestor older than recorded history. Most Indians speak a language from the Indo-Aryan group, but several families are indigenous to the subcontinent. Before the families, two terms worth fixing:
- Language family: a set of individual languages related through a single common ancestor.
- Dialect: a local form of a language. Many dialects can branch off one language.
Classification of Indian Languages
Think first
Many people assume Hindi is India's national language. Is it? The answer surprises most students, and so does the story of how English survived past 1965.
Indian languages fall into four major families, plus a small residual group.
Indo-Aryan group is by far the largest. About 74% of Indians speak a language from it. It is a branch of the wider Indo-European family and came to India with the Aryans. It is sub-divided by age into three stages:
- Old Indo-Aryan: developed around 1500 BC, when Sanskrit was born. The oldest Sanskrit is the language of the Vedas. The Upanishads, Puranas and Dharmasutras are also in Sanskrit. Sanskrit grammar was systematised by Panini (around 400 BC) in his Ashtadhyayi, the oldest surviving grammar. The first epigraphic use of Sanskrit appears in the inscriptions of Rudradaman at Junagadh (Gujarat). Literary Sanskrit flowered under the Guptas. Sanskrit is one of the 22 scheduled languages.
- Middle Indo-Aryan: roughly 600 BC to 1000 AD, the age of the Prakrits, the natural, rule-light common tongues. Key Prakrits include:
- Pali: the language of the Buddhist Tripitaka and the lingua franca of Theravada Buddhism, written in Brahmi.
- Ardha-Magadhi (Magadhi Prakrit): the court language of the Mauryas, used in Jain Agamas and several of Ashoka's rock edicts. It is the ancestor of the eastern languages: Bengali, Assamese, Odia, Maithili and Bhojpuri.
- Shauraseni: the language of medieval dramas and many Jain (Digambara) texts.
- Maharashtri Prakrit: the official language of the Satavahanas and the ancestor of Marathi and Konkani.
- Paishachi: also called Bhuta-bhasa, the medium of Gunadhya's Brihatkatha.
- Apabhramsha, the "corrupt" dialects that emerged by the 6th–7th century, mark the transition to the modern languages.
- Modern Indo-Aryan: the present-day languages that crystallised after about 1000 AD: Hindi, Assamese, Bengali, Gujarati, Marathi, Punjabi, Rajasthani, Sindhi, Odia, Urdu and others, spoken mainly across north, west and east India.
Dravidian group covers most of southern India and about 25% of the population. Proto-Dravidian gave rise to 21 languages, classed in three sub-groups:
- Northern group: Brahui (spoken in Baluchistan), Malto and Kurukh.
- Central group: eleven languages including Gondi, Khond, Kui and Telugu (the only one to become a major literary language).
- Southern group: seven languages: Kannada, Tamil, Malayalam, Tulu, Kodagu, Toda and Kota. Tamil is the oldest of them all.
The four major Dravidian languages, worth remembering with their tags:
- Telugu: numerically the largest of the Dravidian languages.
- Tamil: the oldest and purest form.
- Kannada.
- Malayalam: the smallest and youngest of the group.
Sino-Tibetan group belongs to the Mongoloid family. It is spread across the Himalayas, north Bihar, north Bengal, Assam and the north-eastern frontier. These languages are older than Indo-Aryan and were called Kiratas in early Sanskrit texts. Only about 0.6% of Indians speak them. The group splits into two branches. Tibeto-Burman includes the Kuki-Chin branch, whose most important language is Manipuri/Meitei. Siamese-Chinese includes Ahom, which is now extinct in India.
Austric group belongs to the Austro-Asiatic sub-family. It comprises the Munda or Kol languages of central, eastern and north-eastern India, plus the Mon-Khmer tongues like Khasi and Nicobarese. These predate the Aryans and were called Nishadas in Sanskrit literature. Santhali is the most important. Apart from Khasi and Santhali, most Austro-Asiatic languages in India are endangered.
Ranking Indian languages by speakers
Family shares tell only part of the story. Exams also ask how individual languages rank by number of speakers.
- Hindi: the most widely spoken Indian language, both within India and worldwide.
- Bengali: the second most widely spoken Indian language in the world after Hindi. Its strength comes from a double base: it is the majority language of Bangladesh as well as the language of West Bengal and parts of Assam and Tripura.
- Marathi and Telugu: the next largest, each with a very large speaker base within India. By the Indian census, Marathi ranks third and Telugu fourth, followed by Tamil.
So while Telugu is the largest Dravidian language, Bengali outranks it overall because Bengali draws speakers from both India and Bangladesh.
How Indo-Aryan and Dravidian differ. Beyond different root words, their grammar is built differently:
- Dravidian structure is agglutinative: roots are strung together with little or no change of form.
- Indo-Aryan structure is inflected: a word's ending or spelling changes with its grammatical function in the sentence.
Previous-year questions
Previous-year question
2021UPSCWith reference to India, the terms 'Halbi Ho' and 'Kui' pertain to
Previous-year question
2008UPSCAmong the Indian languages, which one is spoken maximum in the world after Hindi?
Previous-year question
1998UPSCWhich one of the following languages belongs to the Austric group?
Official Languages and the Eighth Schedule
Article 343(1) of the Constitution makes Hindi in the Devanagari script the official language of the Union. English was meant to be phased out 15 years after the Constitution took effect, by 26 January 1965. But protests by non-Hindi-speaking states led to the Official Languages Act, 1963. That Act kept Hindi as the official language and gave English the status of "subsidiary official language."
The Eighth Schedule lists the languages the Union and states may use for official purposes. It began with 14 languages and has grown to 22:
- The original 14: Assamese, Bengali, Gujarati, Hindi, Kannada, Kashmiri, Malayalam, Marathi, Odia, Punjabi, Sanskrit, Tamil, Telugu, Urdu.
- Sindhi: added as the 15th by the 21st Amendment, 1967.
- Konkani, Manipuri, Nepali: added by the 71st Amendment, 1992.
- Bodo, Maithili, Dogri, Santhali: added by the 92nd Amendment, 2003.
Important points the exam returns to:
- India has no national language. Neither the Constitution nor any Act names one. Hindi is not a national language, only an official one.
- States may pick their own official language, and it need not be on the Eighth Schedule. Examples: Kokborok (Sino-Tibetan) in Tripura, French in Puducherry, Mizo in Mizoram.
- English is the official language of Nagaland and Meghalaya, yet English itself is not on the Eighth Schedule.
The National Translation Mission (NTM) is a government scheme. It translates higher-education and knowledge texts, mostly available in English, into the 22 scheduled languages. Its aim is to make knowledge accessible and to build translation into an industry through training, databases and machine-aided translation.
Check yourself
English was due to be phased out as an official language by 26 January 1965. What actually happened?
Classical Language Status
In 2004 the Government of India created the tag of "Classical Language." The criteria are:
- High antiquity: early texts or recorded history spanning 1500–2000 years.
- A body of ancient literature treated as a valued heritage by its speakers.
- A literary tradition that is original, not borrowed from another community.
- A classical form distinct from the modern language, with a possible discontinuity between the two.
Six languages have been granted classical status, with their years:
- Tamil: 2004 (the first).
- Sanskrit: 2005.
- Kannada: 2008.
- Telugu: 2008.
- Malayalam: 2013.
- Odia: 2014.
The government has been criticised for not including Pali, which many experts argue meets every criterion.
Benefits that follow from the tag:
- Two annual international awards for eminent scholars of classical Indian languages.
- A Centre of Excellence for Studies in Classical Languages.
- Professorial chairs for classical languages in central universities, via the UGC.
Previous-year questions
Previous-year question
2014UPSCConsider the following languages:
- Gujarati
- Kannada
- Telugu
Which of the above has/have been declared as 'Classical Language/Languages' by the Government?
Ancient Scripts of India
A script (writing system or orthography) is a standard set of marks for representing the sounds of a spoken language on a medium. India's two ancient scripts are Brahmi and Kharosthi. Most later Indian scripts descend from Brahmi, earning it the title "mother of Indian scripts." Urdu is the main exception, using an Arabic-derived script.
- Indus script: the short, undeciphered symbols of the Indus Valley Civilization. It is not even certain they record a language.
- Brahmi script: the oldest writing system of the subcontinent, used in the final centuries BCE and early centuries CE. The best-known specimens are the rock-cut edicts of Ashoka (c. 250–232 BCE). It was deciphered in 1837 by James Prinsep. It runs left to right and is an abugida: each letter is a consonant with vowels marked by obligatory diacritics (matras).
- Gupta script: descended from Brahmi and used for Sanskrit under the Guptas. It gave rise to the Nagari, Sharada and Siddham scripts. Through them it further produced Devanagari, Gurmukhi (for Punjabi), Assamese, Bengali and the Tibetan script. All of these are collectively the Brahmic scripts.
- Kharosthi script: a sister script of Brahmi. It was used in ancient Gandhara (modern Afghanistan and Pakistan) from the 3rd century BC to the 3rd century AD to write Gandhari Prakrit and Sanskrit. It was also deciphered by James Prinsep. It is an abugida, runs mostly right to left, and has Roman-like numerals.
- Vatteluttu script: an abugida of South India, developed from Tamil-Brahmi, used by Tamil people alongside the Pallava (Grantha) and Tamil scripts.
- Kadamba script: a Brahmi descendant developed under the Kadamba dynasty (4th–6th centuries). It marks the birth of a dedicated script for Kannada and became the Kannada-Telugu script.
- Grantha script: used from the 6th to the 20th centuries by Tamil speakers in Tamil Nadu and Kerala to write Sanskrit and Manipravalam. It is a Brahmic script. The Malayalam, Tigalari and Sinhala scripts all descend from it.
- Urdu script: a right-to-left script that modifies the Persian (and ultimately Arabic) alphabet. It is closely tied to the Nastaliq style. In its extended form, Shahmukhi, it also writes Punjabi and Saraiki.
Check yourself
Why is Brahmi called the mother of Indian scripts?
The Bengali Language Movement and Mother Language Day
Language can be a question of identity strong enough to make and break states. The clearest example is the Bengali Language Movement in Pakistan. After the Partition of 1947, Bengali speakers of East Pakistan formed a majority of the new country's population. Yet the central leadership pushed Urdu as the sole state language. The demand that Bangla be recognised as one of the national languages was raised in the Constituent Assembly of Pakistan, the body drafting Pakistan's constitution, as early as 1948. The demand was rejected.
Protests grew, centred on Dhaka University. On 21 February 1952, police fired on students demonstrating for Bengali, killing several of them. The dead are remembered as the language martyrs, and the date became a day of mourning and pride. Pakistan finally accepted Bengali as a state language in 1956, but the grievance endured. The movement fed the larger Bengali nationalist current that led to the creation of Bangladesh in 1971.
In honour of this sacrifice, UNESCO, the United Nations agency for education, science and culture, proclaimed 21 February as International Mother Language Day in 1999. It has been observed worldwide every year since 2000 to promote linguistic and cultural diversity and multilingualism. A common exam trap swaps the agency: the day was declared by UNESCO, not UNICEF, which is the UN children's fund.
Previous-year questions
Previous-year question
2021UPSCConsider the following statements: 1) 21st February is declared to be the International Mother Language Day by UNICEF. 2) The demand that Bangla has to be one of the national languages was raised in the Constituent Assembly of Pakistan. Which of the above statements is/are correct?
Key takeaways
- Four families: Indo-Aryan, Dravidian, Sino-Tibetan, Austric.
- Indo-Aryan ~74%, Dravidian ~25%, Sino-Tibetan ~0.6%.
- Indo-Aryan stages: Old (Sanskrit), Middle (Prakrits), Modern.
- Tamil oldest, Telugu largest Dravidian language.
- Bengali: second most spoken Indian language after Hindi.
- Bengali's edge: speakers in both India and Bangladesh.
- Dravidian = agglutinative. Indo-Aryan = inflected.
- Article 343: Hindi (Devanagari) official, English subsidiary.
- Eighth Schedule: 14 → 22 languages.
- No national language. Hindi is not one.
- English official in Nagaland and Meghalaya, but not in the Eighth Schedule.
- Classical since 2004: Tamil, Sanskrit, Kannada, Telugu, Malayalam, Odia.
- Brahmi: mother of scripts, deciphered 1837 by James Prinsep.
- Kharosthi: Gandhara, right-to-left, sister of Brahmi.
- Gupta script → Nagari/Sharada/Siddham → Devanagari, Bengali, Tibetan.
- Bangla demand raised in Pakistan's Constituent Assembly, 1948.
- 21 February: Mother Language Day, declared by UNESCO (not UNICEF), 1999.
You’ve reached the end of this topic.
Review the takeaways above, then mark it done.