Language Complexity (Part 2)

David A. Tanzer

Decision problems, decision procedures and complexity

In Part 1, we introduced the formal idea of a language, as being just a set of sentences. Now let’s approach the topic of language complexity.

For any given language, there is an associated decision problem: given a candidate string of characters, determine whether or not it belongs to the language. A decision procedure, or decider, is a program that solves the decision problem: it returns True or False to indicate whether a given character string belongs to the language.

decider_for_english("This is a tomato") = True 
decider_for_english("Tomato a Eso 3e") = False   

Here is the key idea for measuring language complexity. A simple language will have a decision procedure that is straightforward and runs quickly. In contrast, a complex language calls for a lot of “thinking” on the part of the decision procedure, in order for it to assess membership in the language. So the the complexity of a language will be defined by the computational complexity of its decision procedure.

Now let’s look at the complexity of a tiny language:  

L = {“Bob likes Alice”, “Alice likes Bob”}.

def decider_for_L(text):
    if text == "Bob likes Alice": return True
    if text == "Alice likes Bob": return True
    return False # none of the cases matched

Because this code runs quickly, the theory classifies L as a simple language. (Since we already knew that L was simple, this is a good check on the theory.) More generally, every language which is finite will get classified as simple, since a decision procedure can be written using this pattern.

With infinite languages, things get more interesting. Take the language ALL_A, with strings consisting only of the character ‘A’:

ALL_A = {“A”, “AA”, “AAA”, “AAAA”, …}

We can’t write a decision procedure for ALL_A using the above code pattern, as it would create a program with an infinite number of lines. That’s not feasible in reality, so we’ll rule it out.

However, we can still write a decision procedure for ALL_A. This is easy: just loop through every character in the string and check whether it equals ‘A’.

Again, the theory matches our expectations. We see that ALL_A is a simple language, and the theory classifies it as so.

Now consider the language PRIME_A, with strings just containing ‘A’, where the length is a prime number:

PRIME_A = {“AA”, “AAA”, “AAAAA”, “AAAAAAA”, “AAAAAAAAAAA”, …}

We can write a decision procedure for PRIME_A, but now the code has more work to do. First it has to loop through the input, to count the number of characters N. Then it has to analyze whether N is prime. This is related to the work of factorizing N, which is not a trivial matter. And the work increases as N gets larger.

So the theory tells us that PRIME_A has a greater complexity than ALL_A. And indeed it does. A human decider would have to do a lot more mental processing in order to determine membership in PRIME_A as compared to membership in ALL_A.

With these ideas, you should now have a qualitative understanding for what language complexity means. In the next posts, we will refine it into a quantitative concept.

Reposted from the Signal Beat, Copyright © 2021, All Rights Reserved.

3 Responses to Language Complexity (Part 2)

  1. Boris Borcic says:

    Hello… While I understand you aren’t meaning to discuss natural languages, I can’t help relating the above to classical Chomsky and how he assumed that are decoupled the learning of natural language vocabulary from that of natural language grammar, to then describe the latter as a complexity mystery in want of an autonomous resolution.

    The most direct objection to this classical Chomsky view is to point out that natural grammars and languages continuously evolve, what’s denied when painting languages as timeless sets of potential utterances. In line with your definition here.

    (or timeless sets of grammatical structures animated by a time-varying vocabulary. Grammars manifestly evolve too, if at a slower pace perhaps. Whatever)

  2. Roger Witte says:

    I agree with Boris. This is not the ‘formal idea of a language’ but the ‘idea of a formal language’. But even formal languages can have important context sensitivities above the sentence level. For example many languages are extensible using definitions, so sentences in the scope of the definition may be non-sentences elsewhere.

  3. […] David A. Tanzer, January 30, 2021, in course Language Complexity. Cross-posted to the Azimuth Blog. […]

You can use Markdown or HTML in your comments. You can also use LaTeX, like this: $latex E = m c^2 $. The word 'latex' comes right after the first dollar sign, with a space after it.

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.