Jump to content

Context-free language

From Wikipedia, the free encyclopedia

Informal language theory,acontext-free language(CFL), also called aChomskytype-2 language,is alanguagegenerated by acontext-free grammar(CFG).

Context-free languages have many applications inprogramming languages,in particular, most arithmetic expressions are generated by context-free grammars.

Background[edit]

Context-free grammar[edit]

Different context-free grammars can generate the same context-free language. Intrinsic properties of the language can be distinguished from extrinsic properties of a particular grammar by comparing multiple grammars that describe the language.

Automata[edit]

The set of all context-free languages is identical to the set of languages accepted bypushdown automata,which makes these languages amenable to parsing. Further, for a given CFG, there is a direct way to produce a pushdown automaton for the grammar (and thereby the corresponding language), though going the other way (producing a grammar given an automaton) is not as direct.

Examples[edit]

An example context-free language is,the language of all non-empty even-length strings, the entire first halves of which area's, and the entire second halves of which areb's.Lis generated by the grammar. This language is notregular. It is accepted by thepushdown automatonwhereis defined as follows:[note 1]

Unambiguous CFLs are a proper subset of all CFLs: there areinherently ambiguousCFLs. An example of an inherently ambiguous CFL is the union ofwith.This set is context-free, since the union of two context-free languages is always context-free. But there is no way to unambiguously parse strings in the (non-context-free) subsetwhich is the intersection of these two languages.[1]

Dyck language[edit]

Thelanguage of all properly matched parenthesesis generated by the grammar.

Properties[edit]

Context-free parsing[edit]

The context-free nature of the language makes it simple to parse with a pushdown automaton.

Determining an instance of themembership problem;i.e. given a string,determine whetherwhereis the language generated by a given grammar;is also known asrecognition.Context-free recognition forChomsky normal formgrammars was shown byLeslie G. Valiantto be reducible to booleanmatrix multiplication,thus inheriting its complexity upper bound ofO(n2.3728596).[2][note 2] Conversely,Lillian Leehas shownO(n3−ε) boolean matrix multiplication to be reducible toO(n3−3ε) CFG parsing, thus establishing some kind of lower bound for the latter.[3]

Practical uses of context-free languages require also to produce a derivation tree that exhibits the structure that the grammar associates with the given string. The process of producing this tree is calledparsing.Known parsers have a time complexity that is cubic in the size of the string that is parsed.

Formally, the set of all context-free languages is identical to the set of languages accepted by pushdown automata (PDA). Parser algorithms for context-free languages include theCYK algorithmandEarley's Algorithm.

A special subclass of context-free languages are thedeterministic context-free languageswhich are defined as the set of languages accepted by adeterministic pushdown automatonand can be parsed by aLR(k) parser.[4]

See alsoparsing expression grammaras an alternative approach to grammar and parser.

Closure properties[edit]

The class of context-free languages isclosedunder the following operations. That is, ifLandPare context-free languages, the following languages are context-free as well:

  • theunionofLandP[5]
  • the reversal ofL[6]
  • theconcatenationofLandP[5]
  • theKleene starofL[5]
  • the imageofLunder ahomomorphism[7]
  • the imageofLunder aninverse homomorphism[8]
  • thecircular shiftofL(the language)[9]
  • the prefix closure ofL(the set of allprefixesof strings fromL)[10]
  • thequotientL/RofLby a regular languageR[11]

Nonclosure under intersection, complement, and difference[edit]

The context-free languages are not closed under intersection. This can be seen by taking the languagesand,which are both context-free.[note 3]Their intersection is,which can be shown to be non-context-free by thepumping lemma for context-free languages.As a consequence, context-free languages cannot be closed under complementation, as for any languagesAandB,their intersection can be expressed by union and complement:.In particular, context-free language cannot be closed under difference, since complement can be expressed by difference:.[12]

However, ifLis a context-free language andDis a regular language then both their intersectionand their differenceare context-free languages.[13]

Decidability[edit]

In formal language theory, questions about regular languages are usually decidable, but ones about context-free languages are often not. It is decidable whether such a language is finite, but not whether it contains every possible string, is regular, is unambiguous, or is equivalent to a language with a different grammar.

The following problems areundecidablefor arbitrarily givencontext-free grammarsA and B:

  • Equivalence: is?[14]
  • Disjointness: is?[15]However, the intersection of a context-free language and aregularlanguage is context-free,[16][17]hence the variant of the problem whereBis a regular grammar is decidable (see "Emptiness" below).
  • Containment: is?[18]Again, the variant of the problem whereBis a regular grammar is decidable,[citation needed]while that whereAis regular is generally not.[19]
  • Universality: is?[20]
  • Regularity: isa regular language?[21]
  • Ambiguity: is every grammar forambiguous?[22]

The following problems aredecidablefor arbitrary context-free languages:

  • Emptiness: Given a context-free grammarA,is?[23]
  • Finiteness: Given a context-free grammarA,isfinite?[24]
  • Membership: Given a context-free grammarG,and a word,does?Efficient polynomial-time algorithms for the membership problem are theCYK algorithmandEarley's Algorithm.

According to Hopcroft, Motwani, Ullman (2003),[25] many of the fundamental closure and (un)decidability properties of context-free languages were shown in the 1961 paper ofBar-Hillel,Perles, and Shamir[26]

Languages that are not context-free[edit]

The setis acontext-sensitive language,but there does not exist a context-free grammar generating this language.[27]So there exist context-sensitive languages which are not context-free. To prove that a given language is not context-free, one may employ thepumping lemma for context-free languages[26]or a number of other methods, such asOgden's lemmaorParikh's theorem.[28]

Notes[edit]

  1. ^meaning of's arguments and results:
  2. ^In Valiant's paper,O(n2.81) was the then-best known upper bound. SeeMatrix multiplication#Computational complexityfor bound improvements since then.
  3. ^A context-free grammar for the languageAis given by the following production rules, takingSas the start symbol:SSc|aTb|ε;TaTb|ε.The grammar forBis analogous.

References[edit]

  1. ^Hopcroft & Ullman 1979,p. 100, Theorem 4.7.
  2. ^Valiant, Leslie G. (April 1975)."General context-free recognition in less than cubic time"(PDF).Journal of Computer and System Sciences.10(2): 308–315.doi:10.1016/s0022-0000(75)80046-8.
  3. ^Lee, Lillian(January 2002)."Fast Context-Free Grammar Parsing Requires Fast Boolean Matrix Multiplication"(PDF).J ACM.49(1): 1–15.arXiv:cs/0112018.doi:10.1145/505241.505242.S2CID1243491.Archived(PDF)from the original on 2003-04-27.
  4. ^Knuth, D. E.(July 1965). "On the translation of languages from left to right".Information and Control.8(6): 607–639.doi:10.1016/S0019-9958(65)90426-2.
  5. ^abcHopcroft & Ullman 1979,p. 131, Corollary of Theorem 6.1.
  6. ^Hopcroft & Ullman 1979,p. 142, Exercise 6.4d.
  7. ^Hopcroft & Ullman 1979,p. 131-132, Corollary of Theorem 6.2.
  8. ^Hopcroft & Ullman 1979,p. 132, Theorem 6.3.
  9. ^Hopcroft & Ullman 1979,p. 142-144, Exercise 6.4c.
  10. ^Hopcroft & Ullman 1979,p. 142, Exercise 6.4b.
  11. ^Hopcroft & Ullman 1979,p. 142, Exercise 6.4a.
  12. ^Stephen Scheinberg (1960)."Note on the Boolean Properties of Context Free Languages"(PDF).Information and Control.3(4): 372–375.doi:10.1016/s0019-9958(60)90965-7.Archived(PDF)from the original on 2018-11-26.
  13. ^Beigel, Richard; Gasarch, William."A Proof that if L = L1 ∩ L2 where L1 is CFL and L2 is Regular then L is Context Free Which Does Not use PDA's"(PDF).University of Maryland Department of Computer Science.Archived(PDF)from the original on 2014-12-12.RetrievedJune 6,2020.
  14. ^Hopcroft & Ullman 1979,p. 203, Theorem 8.12(1).
  15. ^Hopcroft & Ullman 1979,p. 202, Theorem 8.10.
  16. ^Salomaa (1973),p. 59, Theorem 6.7
  17. ^Hopcroft & Ullman 1979,p. 135, Theorem 6.5.
  18. ^Hopcroft & Ullman 1979,p. 203, Theorem 8.12(2).
  19. ^Hopcroft & Ullman 1979,p. 203, Theorem 8.12(4).
  20. ^Hopcroft & Ullman 1979,p. 203, Theorem 8.11.
  21. ^Hopcroft & Ullman 1979,p. 205, Theorem 8.15.
  22. ^Hopcroft & Ullman 1979,p. 206, Theorem 8.16.
  23. ^Hopcroft & Ullman 1979,p. 137, Theorem 6.6(a).
  24. ^Hopcroft & Ullman 1979,p. 137, Theorem 6.6(b).
  25. ^John E. Hopcroft; Rajeev Motwani; Jeffrey D. Ullman (2003).Introduction to Automata Theory, Languages, and Computation.Addison Wesley.Here: Sect.7.6, p.304, and Sect.9.7, p.411
  26. ^abYehoshua Bar-Hillel; Micha Asher Perles; Eli Shamir (1961). "On Formal Properties of Simple Phrase-Structure Grammars".Zeitschrift für Phonetik, Sprachwissenschaft und Kommunikationsforschung.14(2): 143–172.
  27. ^Hopcroft & Ullman 1979.
  28. ^"How to prove that a language is not context-free?".

Works cited[edit]

Further reading[edit]