Parikhs theorem

views updated

Parikh's theorem A theorem in formal language theory that concerns the nature of context-free languages when order of letters is disregarded.

Let the alphabet Σ be the set {a1,…,an}. The letter distribution, φ(w), of a Σ-word w is the n-tuple 〈N1,…,Nn

with Ni the number of occurrences of ai in w. The Parikh image, φ(L), of a Σ-language L is {φ(w) | wL}

i.e. the set of all letter-distributions of words in L. L1 and L2 are letter-equivalent if φ(L1) = φ(L2)

Letter distributions may be added component-wise as vectors. This leads to the following: a set S of letter distributions is linear if, for some distributions d and d1,…,dk, S is the set of all sums formed from d and multiples of di. S is semilinear if it is a finite union of linear sets.

Parikh's theorem now states that if L is context-free φ(L) is semilinear. It can also be shown that φ(L) is semilinear if and only if L is letter-equivalent to a regular language. Hence any context-free language is letter-equivalent to a regular language – although not all such languages are context-free.