David A. Tanzer
Summarizing computational complexity
In Part 3 we defined, for each program P, a detailed function P'(n) that gives the worst case number of steps that P must perform when given some input of size n. Now we want to summarize P into general classes, such as linear, quadratic, etc.
What’s in a step?
But we should clarify something before proceeding: what is meant by a ‘step’ of a program? Do we count it in units of machine language, or in terms of higher level statements? Before, we said that each iteration of the loop for the ALL_A decision procedure counted as a step. But in a more detailed view, each iteration of the loop includes multiple steps: comparing the input character to ‘A’, incrementing a counter, performing a test.
All these interpretations are plausible. Fortunately, provided that the definition of a program step is ‘reasonable’, all of them will lead to the same general classification of the program’s time complexity. Here, by reasonable I mean that the definition of a step should be such that, on a given computer, there is an absolute bound on the amount of clock time needed for the processor to complete one step.
Approximately linear functions
The classification of programs into complexity classes such as linear, quadratic, etc., is a generalization which doesn’t require that the time complexity be exactly linear, quadratic, etc. For an example, consider the following code:
def approximately_linear(text):
# perform a silly computation
counter = 0
for i = 1 to length(text):
if i is odd:
for j = 1 to 10:
counter = counter + 1
return counter
Here are the number of steps it performs, as a function of the input length:
f(0) = 2
f(1) = 13
f(2) = 14
f(3) = 25
f(4) = 26
f(5) = 37
...
The value increases alternately by 1 and then by 11. As it increases by 12 for every two steps, we could say that is “approximately linear,” with slope “roughly equal” to 6. But in fine detail, the graph looks like a sawtooth.
Soon, we will explain how this function gets definitively classified as having linear complexity.
Appendix: Python machines versus Turing machines
Here we are programming and measuring complexity on a Python-like machine, rather than a pure Turning machine. This is surfaced, for example, in the fact that without further ado we called a function length(text) to count the number of characters, and will regard this as a single step of the computation. On a true Turing machine, however, counting the length of the string takes N steps, as this operation requires that the tape be advanced one character at a time until the end of the string is detected.
This is a point which turns out not to substantially affect the complexity classification of a language. Assuming that steps are counted reasonably, any optimal decision procedure for a language of strings, whether written in Turing machine language, Python, C# or what have you, will end up with the same complexity classification.
The length function in Python really does take a bounded amount of time, so it is fair to count it as a single step. The crux of this matter is that, in a higher level language, a string is more than a sequence of characters, as it is a data structure containing length information as well. So there is order N work that is implied just by the existence of a string. But this can be folded into the up-front cost of merely reading the input, which is a general precondition for a language decider.
But, you may ask, what about languages which can be decided without even reading all of the input? For example, the language of strings that begin with the prefix “abc”. Ok, so you got me.
Still, as a practical matter, anything with linear or sub-linear complexity can be considered excellent and simple. The real challenges have do with complexity which is greater than linear, and which represents a real practical issue: software performance. So, for intents and purposes, we may treat any implied order N costs as being essentially zeros – as long as they can be on a one-time, up-front basis, e.g., the order N work involved in constructing a string object.
Reposted from the Signal Beat, Copyright © 2021, All Rights Reserved.
Python Integers are unlimited precision, so there is no “absolute bound on the amount of clock time needed for the processor to complete one step” for any Integer operations. So we can define a Python-complexity that is different from Turing-complexity in this specific respect (and perhaps in other respects, insofar as such different definitions can be useful).
True. Your point can be generalized, in connection with many data types. E.g. since strings have unbounded length, there is no bound on the running time for the single statement string.upper().
Consider a silly implementation of the tripling function:
def triple(n):
j = 0
for i in range(n): j += 3
return j
Despite the fact that there is no bound on the size of the integers, the whole program could be implemented with linear time complexity. For example, if j += 3 were implemented as three increments of 1 – since with regular bit operations, incrementing by 1 has an amortized cost that is constant. (50 percent of the time only one bit needs to be updated, 25 percent of the time two bits get updated, etc.)
So, whenever we can put a bound on the amortized running time of statements, then Python time complexity = Turing time complexity. A more complicated picture, clearly.
I am thinking that the a simple program can give a complex output with a simple input (for example the Maldelbrot set), so that the complexity of the output string is not a measure of the complexity of the program; I am thinking that the Kolmogorov complexity is not a clear measure of the complexity of the program, is only the measure of the program length (is it right for a Turing machine?); but the minimum cyclomatic complexity of a program that generates a given string, or the cyclomatic complexity of a compiled translation of the high level program, could be a correct measure: it is a topological measure of the polyhedron number of faces representing the control flow graph.
There are many measures of complexity for strings besides Kolmogorov complexity (which is theoretically nice, but unfortunately uncomputable in general). I’m fond of Levin’s time-bounded complexity, which penalizes programs that take a long time to run. This is computable.
[…] David A. Tanzer, February 8, 2021, in course Language Complexity. Cross-posted to the Azimuth Blog. […]