layout | permalink | title |
---|---|---|
page |
/mcfl/ |
Multiple Context-Free Languages |
If you already know what multiple context-free languages and grammars are, then you can head over to the ADP for MCFL page if you're interested in how MCFGs were merged with ADP.
Mild context-sensitivity is defined in terms of sets of languages. A set of languages is mildly context-sensitive if and only if
- it contains all context-free languages,
- it admits limited cross-serial dependencies,
- all the languages are parsable in polynomial time, and
- all the languages have constant growth; this means that the distribution of string lengths should be linear rather than supralinear.
(Wikipedia)
One example of such a language formalism is the class of multiple context-free languages (MCFL) which are generated by multiple context-free grammars (MCFG). MCFG are a restricted instance of generalized context-free grammars.
A GCFG
Composition functions have the form
, where
Productions have the form
When the composition functions are unrestricted, then the class of languages generated by such grammars is equivalent to that of unrestricted grammars which belong to type-0 of the Chomsky hierarchy and are thus equivalent to turing machines.
A GCFG grammar is called a multiple context-free grammar when the functions are only defined as concatenations of constant strings and the components of the function arguments.
A MCFG is a GCFG
- Functions are only defined as concatenations of constant strings and the components of the function arguments.
- Each component of a single function argument must only appear at most once in the right-hand side
- For all
$A \in N$ , every tuple derivable from$A$ must have the same size. - All tuples derivable from the start symbol
$S$ are 1-tuples.
The class of languages generated by MCFG is called the class of multiple context-free languages.
Example from Kato (2005) - Copy language
Alternative representation (Wild (2010), Nebel and Weinberg (2012))
The formalism provided by GCFG (and with that MCFG) makes use of rewriting functions which allows for an efficient representation in the case that functions are actually reused.
Instead of regarding MCFG as a specialization of GCFG, they can also be regarded as a direct generalization of regular CFG by adding vectors to rules. As we will later see, this is equivalent to inlining functions and can make the whole representation more compact (if functions couldn't have been reused anyway) and understandable.
An MCFG is a tuple
$G = (N, d, T, P, S)$ comprised of
$N$ , a finite set of non-terminals,
$d$ , a function from$N$ to$N$ , assigning a dimension$d(A)$ to each$A \in N$ ,
$T$ , a finite set of terminals,
$P$ , a finite set of rules and
the start symbol$S \in N$ with$d(S) = 1$ .
This representation of MCFG will further be called inlined MCFG, or short iMCFG.
Example from Kato (2005) translated to alternative representation
To distinguish between multiple uses of a nonterminal
Both representations, MCFG and iMCFG, can be automatically transformed into each other.
Apply all functions symbolically and remove them afterwards. E.g.
gets
Reverse the inlining by:
- Let
$n$ be the number of nonterminal symbols on the right side (a symbol$N$ with$dim(N) > 1$ counts as one) - Create a new
$n$ -ary function with the counted symbols in 2. and use them in the rule
E.g.
becomes
It is obvious that in some cases the original MCFG form (if there was one) cannot be reconstructed if it is not in a canonical form, e.g. if it has identical functions or functions which don't use an argument. In all other cases, the original form can be reconstructed by eliminating identical functions of the generated MCFG.