In the course of two projects☞Hungarian generative historical syntax [OTKA NK78074], Morphologically analysed corpus of Old and Middle Hungarian texts, representative of informal language use [OTKA 81189]. of the Research Institute for Linguistics of the HAS, morphologically annotated searchable corpora were created from surviving Hungarian texts dating from the 12th c. to the middle of the 18th c. The morphological analyzer software used in both projects is a modified version of the Humor Hungarian analyzer (Novák 2003) created originally to analyze present-day standard written Hungarian. In order that the program could be used to create a morphological annotation of the texts, they had to be normalized, i.e. the extremely varied orthography of the texts had to be made uniform. In the course of this process, orthographical and dialectal phonetic variation was neutralized, but morphological variation was kept. Normalization and analysis is illustrated in (1).☞For a list of abbreviations, see the Appendix.
original text | hōti | utan | vallia, |
---|---|---|---|
normalized text | hite | után | vallja: |
stems | hit | után | vall |
analysis | N.PxS3 | PP | V.S3.Def |
Owing to the great variety of texts, normalization proved to be a challenging task. Normalization of words containing inessive
The structure of our essay is as follows. First, the distribution and usage of
In the course of normalization, the original texts were rewritten according to present-day orthographic rules. Therefore, it is indispensable to review the synchronic situation of inessive and illative case marking.
Similarly to some other Uralic languages, Hungarian has a system of locative cases which exhibit a three-way contrast — a separate class of case endings and postpositions is used for marking location, point of origin or target of motion.☞Many Uralic languages also distinguish path of movement as a fourth type of relation. Hungarian uses locative cases (primarily superessive) and locative postpositions for this kind of relation. The same system applies to temporal location, beginning or endpoint of events. A part of this system refers to inner relations: the suffixes
a. | illative: -ba/be ‘into’ | ||
Be-ment-em | a | mozi-ba/kert-be. | |
into-go.past-1sg | the | cinema-ill/garden-ill | |
‘I went into the cinema/garden.’ |
b. | inessive: -ban/ben ‘in’ | ||||
A | mozi-ban/kert-ben | túl | meleg | volt. | |
the | cinema-ine/garden-ine | too | hot | be.past | |
‘It was too hot in the cinema/garden.’ |
c. | elative: -ból/ből ‘out of’ | ||
Ki-jött-em | a | mozi-ból/kert-ből. | |
out-come.past-1sg | the | cinema-el/garden-el | |
‘I came out of the cinema/garden.’ |
In addition of their spatio-local and temporal meaning, these suffixes are also used to mark oblique arguments of verbs in a more or less idiomatic manner:
a. | hisz | valami-ben |
believe | something-ine | |
‘believe in something’ |
b. | bízik | valaki-ben |
trust | somebody-ine | |
‘trust somebody’ |
c. | kerül | valamennyi-be |
cost | some-ill | |
‘cost some amount (of money)’ |
In addition to these idiomatic uses of the suffixes, there are also lexicalized adverbs, postpositions and verbal prefixes that etymologically contain one of these suffixes, like abba(hagy) ‘stop/give up’, elébe ‘in front of — as direction’, hiába ‘in vain’, általában ‘usually, in general’, mostanában ‘nowadays, recently’ etc.
In present-day Hungarian, spoken and written language considerably differ regarding the distinction of
Another process, n-insertion is also witnessed in speech. In sporadic cases the
written | spoken | |
---|---|---|
inessive | -ban/ben | -ba/be ~ -ban/ben |
illative | -ba/be | -ba/be (~ -ban/ben) |
How wide-spread are n-dropping and hypercorrection in actual language use? In a representative survey named Hungarian National Sociolinguistic Survey (HNSS, Kontra 2003), the usage of
Another sociolinguistic survey, the Budapest Sociolinguistic Interview, (BUSZI, Váradi 2003) collected data from people living in Budapest at the end of the 1980’s. This survey featured grammaticality judgments, elicited spoken production (e.g., the completion of sentences), reading aloud and spontaneous speech in directed interviews. Certain test situations were explicitly designed to test
In contrast to what seems to be suggested by the results found in the HNSS, the BUSZI spontaneous speech data show that while informants dropped the
Access to the BUSZI data has recently been made possible for any researcher wishing to examine it, and we applied for an account to get real-world data concerning contemporary spoken use of
In (5), columns bAn and bA mark uses of the corresponding suffixes conforming to the written standard, i.e.,
BUSZI-2 | bAn | bA’ | bA | bA’n | sum | bAn | bA’ | bA | bA’n | INE | ILL | BAN | BA | STD | NSTD |
all case data | 1605 | 2168 | 1234 | 5 | 5012 | 0.32 | 0.43 | 0.25 | 0.00 | 0.75 | 0.25 | 0.32 | 0.68 | 0.57 | 0.43 |
informers | 874 | 1596 | 881 | 4 | 3355 | 0.26 | 0.48 | 0.26 | 0.00 | 0.74 | 0.26 | 0.26 | 0.74 | 0.52 | 0.48 |
field workers | 731 | 572 | 353 | 1 | 1657 | 0.44 | 0.35 | 0.21 | 0.00 | 0.79 | 0.21 | 0.44 | 0.56 | 0.65 | 0.35 |
The results show that inessive is about three times as frequent as illative. As we shall see looking at the results for historic texts, we find a similar ratio there even when including lexicalized adverbs, postpositions and verbal prefixes in addition to productively case marked words in the analysis. In contrast, as we see in the BAN and BA columns, the
Mátyus and her colleagues have examined how different socio-cultural factors influence n-dropping (Mátyus et al. 2010) and found that people without a degree tend to drop more n’s than people with a degree. Mátyus (2009) also points out that the presence of n-dropping also depends on the exact function of
Although present-day spoken and written language marks inessive and illative in a different way, the standard marking of these case endings in writing only causes problems in the first years of schooling for most Hungarians. However, as manuscripts form the Old (13th c.–1526) and Middle Hungarian (1526–1772) era show, the orthography of these suffixes has not been uniform for several hundreds of years and seems to have been a problem for many authors.
In order to make automatic morphological annotation of the corpora of Old and Middle Hungarian texts possible, all texts were manually normalized to present-day orthography. In the course of this process, orthographic and dialectal variation was neutralized, but the identity of morphemes was kept.
The suffix pair inessive
It is a crucial question from the point of view of normalization how to deal with these alternating suffixes. How could one decide whether a specific
One solution, argued for by some of our colleagues, is to keep the original orthography and thus suppose that
Another solution is to consider any discrepancy of the
This issue is most likely to affect idiomatic oblique arguments that may have changed in the course of time. The corpus contains some examples of clear and unambiguous discrepancy between historical and contemporary argument structures. One such example is illustrated in the following example, where the verb megy ‘to go’ is used in a similar fashion to one use of the verb jár valahol ‘to visit a place’, where the verb has a locative argument: ‘Going to (or visiting?) Rozsnyó a second time, Mrs. Beke said to her husband.’
Masodban | ugyan | Rozsnyon | menven | Bekene | mondotta | az | Uranak |
Másodban | ugyan | Rozsnyón | menvén | Bekéné | mondta | az | urának: |
másod | ugyan | Rozsnyó | megy | Bekéné | mond | az | úr |
Adj.Ine | Adv | N.Sup | V.PartAdv | N | V.Past.S3.Def | Det | N.PxS3.Dat |
Here the
It is of course not at all evident that this ratio of 2 to 380 (0,5%) is representative of all cases of potential data falsification by normalization, but this negligible number seems to indicate that we are not bound to perform extremely massive data corruption if we choose Method 2.
Nevertheless, there is a third option: to normalize the texts in a way that allows for modification but also keeps the original encoding. In this case, all instances of
This encoding ensures that the morphological analyzer can assign an analysis to the data that corresponds to what is a presumably correct interpretation of the intended meaning of the text, while the normalized form itself explicitly indicates that the original form was altered in a specific way. This makes it possible to detect and correct mistakes — if later a class of instances turn out to have been modified or left unmodified in error, they can easily be located and fixed. For example, in view of the above example, it may well be the case that Malomban ‘in … mill’ in example (7) below was normalized to an illative in error. Nonetheless, we can locate such suspicious cases in the corpus easily. Moreover, this approach renders it possible to evaluate whether the first approach mentioned above would be feasible. The meaning of the following example is: ‘But once the witness went to (or visited?) the mill in Babót with Andor Bóna’.
ha nem | egykor | az | Baboti | Malomban | ment | volt | az | fatens, | Bona | Andorral, |
hanem | egykor | a | babóti | malomba’n | ment | volt | a | fatens | Bóna | Andorral, |
hanem | egykor | a | babóti | malom | megy | van | a | fatens | Bóna | Andor |
C | Adv | Det | Adj | N.Ill | V.Past.3S | V.Past | Det | N | N | N.Ins |
In the normalization process we opted for Method 3. Specifically, we marked
During the analysis, the morphological analyzer software was set to interpret all
Before discussing what our normalization revealed, it is useful to look at the possible causes of the great variation in spelling of
As Németh (2008) points out, n-dropping at the end of the inessive suffix
The first orthographic code for Hungarian was accepted only in 1832 (Szemere 1974), i.e., well after the time in which these texts were created. As Kniezsa (1952) points out, Hungarian orthography was formed relatively slowly and the lack of norm lead to the chaotic spelling of Old Hungarian and Middle Hungarian texts. He claims that the setting up of a permanent chancellery in the middle of the 13th c. was the first step towards the formation of a spelling norm. Németh (2008) adds that the writing traditions of offices were in several respects different from that of everyday correspondence.
As for the orthography of inessive
Németh (2008) claims that the latent orthographical norm of offices in the 17th–18th centuries was that all instances of the inessive and illative suffixes were written as
Printers, authorities and intellectuals were also influenced by the grammars published for Hungarian. Szathmári (1968), in his study on early Hungarian grammars, found that already Mátyás Dévai Bíró’s Orthographia Vngarica (1549) distinguishes inessive and illative case, and, from Albert Szenci Molnár (1610) on, all grammars claim that inessive case is marked by
In sum, the orthography of inessive and illative case markers was influenced by the following conflicting facts and demands.
Having normalized the texts as outlined in §3.3, we could compare texts in the corpora with respect to the orthography of word forms containing
The table in (9) below summarizes our findings for a variety of sources in the corpora. The first four pieces of text are codices containing religious texts from the Old Hungarian era: Jókai Codex, Könyvecse az szent apostoloknak méltóságokról [Booklet on the dignity of saint apostles], Festetics Codex and Guary Codex.
The witch trials court records subcorpus consists of the minutes of over a hundred witch trials. This subcorpus covers a time span over a century. The rest are selected parts of the Middle Hungarian correspondence corpus. Poppel-Batthyány is the already processed part of the correspondence of Éva Lobkowitz-Poppel mainly with members of her family (containing letters addressed to Éva Poppel as well). This corpus consists of letters written by several people. Nevertheless, this subcorpus is uniform with regard to the orthography of
Corpus | Date | Size | bAn | bA’ | bA | bA’n | sum | bAn | bA’ | bA | bA’n | INE | ILL | BAN | BA | STD | NST | Type |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Jókai Codex | 1370–1440 | 21945 | 414 | 38 | 153 | 3 | 608 | 0.68 | 0.06 | 0.25 | 0.00 | 0.74 | 0.26 | 0.69 | 0.31 | 0.93 | 0.07 | STD |
Könyvecse | 1521 | 8743 | 170 | 5 | 32 | 3 | 210 | 0.81 | 0.02 | 0.15 | 0.01 | 0.83 | 0.17 | 0.82 | 0.18 | 0.96 | 0.04 | STD |
Festetics Codex | 1492–1494 | 19358 | 395 | 143 | 64 | 43 | 645 | 0.61 | 0.22 | 0.10 | 0.07 | 0.83 | 0.17 | 0.68 | 0.32 | 0.71 | 0.29 | HYB |
Guary Codex | 1490–1508 | 20239 | 25 | 356 | 157 | 2 | 540 | 0.05 | 0.66 | 0.29 | 0.00 | 0.71 | 0.29 | 0.05 | 0.95 | 0.34 | 0.66 | BA |
Tamás Nádasdy ag. | 1544–1559 | 4535 | 4 | 72 | 30 | 0 | 106 | 0.04 | 0.68 | 0.28 | 0.00 | 0.72 | 0.28 | 0.04 | 0.96 | 0.32 | 0.68 | BA |
Tamás Nádasdy ?? | 1559 | 96 | 4 | 1 | 1 | 0 | 6 | 0.67 | 0.17 | 0.17 | 0.00 | 0.83 | 0.17 | 0.67 | 0.33 | 0.83 | 0.17 | ~STD |
Anna Nádasdy ag. | 1548–1558 | 2237 | 38 | 6 | 6 | 0 | 50 | 0.76 | 0.12 | 0.12 | 0.00 | 0.88 | 0.12 | 0.76 | 0.24 | 0.88 | 0.12 | ~STD |
Ferenc Nádasdy ag. | 1568–1569 | 2870 | 28 | 1 | 1 | 7 | 37 | 0.76 | 0.03 | 0.03 | 0.19 | 0.78 | 0.22 | 0.95 | 0.05 | 0.78 | 0.22 | BAN |
Pál Telegdy | 1592–1594 | 3799 | 71 | 0 | 4 | 22 | 97 | 0.73 | 0.00 | 0.04 | 0.23 | 0.73 | 0.27 | 0.96 | 0.04 | 0.77 | 0.23 | BAN |
Poppel–Batthyány | 1625–1641 | 17493 | 283 | 11 | 10 | 48 | 352 | 0.80 | 0.03 | 0.03 | 0.14 | 0.84 | 0.16 | 0.94 | 0.06 | 0.83 | 0.17 | BAN |
Witch trials | 1653–1767 | 132706 | 2399 | 42 | 62 | 638 | 3141 | 0.76 | 0.01 | 0.02 | 0.20 | 0.78 | 0.22 | 0.97 | 0.03 | 0.78 | 0.22 | BAN |
Sándor Károlyi | 1704–1722 | 14314 | 237 | 3 | 72 | 6 | 318 | 0.75 | 0.01 | 0.23 | 0.02 | 0.75 | 0.25 | 0.76 | 0.24 | 0.97 | 0.03 | STD |
The table contains the date of creation of the texts, their size in words and
The bAn columns contain the number and ratio of occurrences of standard usage of the
The INE column contains the ratio of
Concerning the orthography of
1a. Some of the texts clearly distinguish the two suffixes in a manner that to a great extent corresponds to our present-day grammatical intuition. We marked this class of documents as STD (standard) in the table. An example of this is Jókai Codex, which was written in the Old Hungarian period. Another is Könyvecse (another codex containing religious texts) from the beginning of the 16th century. Part of the Middle Hungarian correspondence, e.g., that of Sándor Károlyi from the beginning of the 18th century also belongs to this group.
1b. Some documents mostly use these suffixes as the present-day standard, but sometimes (a little more often than in cluster 1a)
2. Other sources mostly neutralize the two suffixes either as
3. Completely hypercorrect usage, i.e.,
4. The last category is made up of texts that use both forms of the suffix, generally in the way the present standard would require, but hypercorrect
As shown in §4.2 above, several factors must have influenced the orthography of the inessive and illative suffix in Hungarian. In the lack of Old and Middle Hungarian speech recordings, however, the distribution of n-dropping and hypercorrect
Our data confirm that the orthography of surviving texts was primarily influenced by factors other than the actual pronunciation of the inessive and illative suffixes. As the chart in (9) above shows, texts conforming to the present standard orthography appear in the whole examined range of time, i.e., from the 14th c. to the 18th c. These texts clearly mark the semantic distinction of inessive and illative case.
This distinction disappears in texts dated from the first half of the 16th c. Both cases are marked by the suffix
This orthography used as a norm for official legal documents of the Middle Hungarian era seems to be rather counterintuitive from our present-day perspective, given that hypercorrect
However, it is hard to believe that this orthography corresponded to actual spoken usage, although it may have influenced spoken performance of those using it. It is also interesting to note that although
Having a closer look at the texts of the Middle Hungarian corpus, it is worth noting that a considerable amount of letters come from the members of the renowned Nádasdy family. Baron Tamás Nádasdy (1498–1562) made a spectacular career in the 16th century, he was the governor of Croatia and Slavonia, and the palatine of Hungary. He also set up a printing press in Sárvár, Hungary. In his autographic letters, he almost exclusively uses
Other letters in the Nádasdy family come from Anna Nádasdy, who was the sister of Tamás Nádasdy. She, contrarily to her brother, uses an orthography which is near the present standard (~STD). Tamás Nádasdy’s son, Ferenc (1555–1604), however, almost exclusively used
This is rather unlikely. A much more feasible explanation is that they learnt and used different orthographic norms. The rise of the hypercorrect
At the beginning of this discussion, three ways of normalization were sketched in §3 above. According to Method 1 (§3.1), all
On the one hand, Method 1 would have suggested that in the Nádasdy family three contemporary relatives spoke three different dialects: one lacking inessive case marking, the other having both inessive and illative, while the third member having no illative marking. Method 2, on the other hand, would have hidden the different orthographic traditions used side by side in the 16th c.
Thus the choice of the Method 3 was justified, and it helped to reveal facts about the history of spelling norms of Old and Middle Hungarian. Method 3 made it possible to calculate how much a certain body of text deviates from present-day orthography and in what ways, and revealed that
It must be emphasized, however, that the actual analysis of data by the morphological analyzer software can follow either Method 1 — respecting original orthography —, or Method 2 — taking into consideration the normalization for the present-day norm. Our analyzer program was set to Method 2, i.e., all instances of
The table in (10) summarizes
bAn | bA’ | bA | bA’n | sum | bAn | bA’ | bA | bA’n | INE | ILL | BAN | BA | STD | NSTD | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Mid. Hung. | 3251 | 231 | 239 | 750 | 4471 | 0.73 | 0.05 | 0.05 | 0.17 | 0.78 | 0.22 | 0.89 | 0.11 | 0.78 | 0.22 |
Old Hung. | 834 | 537 | 374 | 48 | 1793 | 0.47 | 0.30 | 0.21 | 0.03 | 0.76 | 0.24 | 0.49 | 0.51 | 0.67 | 0.33 |
All | 4255 | 773 | 645 | 801 | 6474 | 0.66 | 0.12 | 0.10 | 0.12 | 0.78 | 0.22 | 0.78 | 0.22 | 0.76 | 0.24 |
BA | 29 | 428 | 187 | 2 | 646 | 0.04 | 0.66 | 0.29 | 0.00 | 0.71 | 0.29 | 0.05 | 0.95 | 0.33 | 0.67 |
BAN | 2781 | 54 | 77 | 715 | 3627 | 0.77 | 0.01 | 0.02 | 0.20 | 0.78 | 0.22 | 0.96 | 0.04 | 0.79 | 0.21 |
The data show that we would have misanalyzed 1574 tokens, about one quarter of all
This study shows how inessive
Based on the distribution of the standard and non-standard suffixes, texts fell into four major categories: near standard (STD), extensive n-dropping (BA), extensive hypercorrection (BAN) and mixed (HYB). As the first three types existed in the 16th c. even in sources from one family (namely the Nádasdy family), it is plausible to suggest that three orthographic norms were simultaneously present at that time. This confirms the findings of Németh (2008).
Dömötör Adrienn. 2006. Régi magyar nyelvemlékek. A kezdetektől a XVI. század végéig. Budapest: Akadémiai Kiadó.
Kniezsa István. 1952. Helyesírásunk története a könyvnyomtatás koráig. Budapest: Akadémiai Kiadó.
Kontra Miklós (ed.). 2003. Nyelv és társadalom a rendszerváltáskori Magyarországon. Budapest: Osiris.
Korompay Klára. 1991. A névszóragozás. In: Benkő Loránd (ed.), A magyar nyelv történeti nyelvtana I. Budapest: Akadémiai Kiadó. 284–318.
Korompay Klára. 1992. A névszóragozás. In: Benkő Loránd (ed.), A magyar nyelv történeti nyelvtana II/1. Budapest: Akadémiai Kiadó. 355–410.
Mátyus Kinga, Bokor Julianna, and Takács Szabolcs. 2010. „Abban a farmerba nem mehetsz színházba.” A (bAn) variabilitásának vizsgálata a BUSZI tesztfeladataiban. In: Váradi Tamás (ed.), IV. Alkalmazott Nyelvészeti Doktoranduszkonferencia. Szeged: SZTE. 85–99. (www.nytud.hu/alknyelvdok10/proceedings10.pdf)
Mátyus Kinga. 2009. Az inessivusi (bAn) nyelvtani szerepei. In: Váradi Tamás (ed.), III. Alkalmazott Nyelvészeti Doktoranduszkonferencia. Szeged: SZTE. 69–86. (www.nytud.hu/alknyelvdok09/proceedings.pdf)
Németh Miklós. 2008. Nyelvi változás és váltakozás társadalmi és műveltségi tényezők tükrében. Nyelvi változók a XVIII. században. Szeged: SZTE, Juhász Gyula Felsőoktatási Kiadó.
Novák Attila. 2003. Milyen a jó humor? In: Alexin Zoltán and Csendes Dóra (ed.), Az 1. Magyar Számítógépes Nyelvészeti Konferencia előadásai. Szeged: SZTE. 138–145.
Sinkovits Balázs. 2011. Nyelvi változók, nyelvi változások és normatív szabályozás. PhD dissertation, University of Szeged.
Szathmári István. 1968. Régi nyelvtanaink és egységesülő irodalmi nyelvünk. Budapest: Akadémiai Kiadó.
Szemere Gyula. 1974. Az akadémiai helyesírás története (1832–1954). Budapest: Akadémiai Kiadó.
Váradi Tamás. 2003. A Budapesti Szocilingvisztikai Interjú. In: Kiefer Ferenc and Siptár Péter (ed.), A magyar nyelv kézikönyve. Budapest: Akadémiai Kiadó. 339–359. (www.nytud.hu/oszt/elonyelv/adat/buszi.pdf)