Skip to main content

Chapter 1, Automata: The Methods and the Madness

This series of documents in this column are summarized from this book: Introduction to automata theory, languages, and computation.[1]

Why Study Automata Theory?

Introduction to Finite Automata

Automata theory is the study of abstract computing devices or "machines".

Finite automata are a useful model for many important kinds of hardware and software.

  • Software for designing and checking the behavior of digital circuits.
  • The "lexical analyzer" of a typical compiler, that is, the compiler component that breaks the input text into logical units, such as identifiers, keywords, and punctuation.
  • Software for scanning large bodies of text, such as collections of Web pages, to find occurrences of words, phrases,or other patterns.
  • Software for verifying systems of all types that have a finite number of distinct states, such as communications protocols or protocols for secure exchange of information.

Perhaps the simplest nontrivial finite automaton is an on/off switch (Figure 1).

A simple finite automaton.
Figure 1. A simple finite automaton.

Figure 2 shows another finite automaton that could be part of a lexical analyzer. The job of this automaton is to recognize the keyword then. It thus needs five states, each of which represents a different position in the word then that has been reached so far. These positions correspond to the prefixes of the word, ranging from the empty string (i.e., nothing of the word has been seen so far) to the complete word. We could consider the state "then" as accepting state.

A finite automaton modeling recognition of then
Figure 2. A finite automaton modeling recognition of then.

Structural Representations

There are two important notations that are not automaton-like, but play an important role in the study of automata and their applications.

  • Grammars are useful models when designing software that processes data with a recursive structure. The best-known example is a "parser", the component of a compiler that deals with the recursively nested features of the typical programming language, such as expressions-arithmetic, conditional, and so on. For instance,a grammatical rule like EE+EE\Rightarrow E+E states that an expression can be formed by taking any two expressions and connecting them by a plus sign; this rule is typical of how expressions of real programming languages are formed. Later, we'll introduce context-free grammars, as they are usually called.
  • Regular Expressions also denote the structure of data, especially text strings. The patterns of strings they describe are exactly the same as what can be described by finite automata. The style of these expressions differs significantly from that of grammars, and we shall content ourselves with a simple example here. The UNIX-style regular expression [A-Z][a-z]*[ ][A-Z][A-Z], represents capitalized words followed by a space and two capital letters. This expression represents patterns in text that could be a city and state, e.g., Ithaca NY. It misses multiword city names, such as Palo Alto CA, which could be captured by the more complex expression [A-Z][a-z]*([ ][A-Z] [a-z]*)*[ ][A-Z][A-Z].

Automata and Complexity

Automata are essential for the study of the limits of computation. As we mentioned in the introduction to the chapter, there are two important issues:

  • What can a computer do at all? This study is called "decidability," and the problems that can be solved by computer are called "decidable."
  • What can a computer do efficiently? This study is called "intractability," and the problems that can be solved by a computer using no more time than some slowly growing function of the size of the input are called "tractable." Often,we take all polynomial functions to be "slowly growing," while functions that grow faster than any polynomial are deemed to grow too fast.

Introduction to Formal Proof

Perhaps more than other core subjects of computer science, automata theory lends itself to natural and interesting proofs, both of the deductive kind (a sequence of justified steps) and the inductive kind (recursive proofs of a parameterized statement that use the statement itself with "lower" values of the parameter).

Deductive Proofs

A deductive proof consists of a sequence of statements whose truth leads us from some initial statement, called the hypothesis or the given statement(s), to a conclusion statement. Each step in the proof must follow, by some accepted logical principle, from either the given facts, or some of the previous statements in the deductive proof, or a combination of these.

The hypothesis may be true or false, typically depending on values of its parameters. Often, the hypothesis consists of several independent statements connected bya logical AND. In those cases, we talk of each of these statements as a hypothesis, or as a given statement.

The theorem that is proved when we go from a hypothesis HH to a conclusion CC is the statement "if HH then CC." We say that CC is deduced from HH.

Reduction to Definitions

In many other theorems, including many from automata theory, the terms used in the statement may have implications that are less obvious.

tip

If you are not sure how to start a proof, convert all terms in the hypothesis to their definitions.

Here is an example of a theorem that is simple to prove once we have expressed its statement in elementary terms. It uses the following two definitions:

  • A set SS is finite if there exists an integer nn such that SS has exactly nn elements in a set SS. We write S=n\| S\|=n, where S\|S\| is used to denote the number of elements in a set SS. If the set SS is not finite, we say SS is infinite. Intuitively, an infinite set is a set that contains more than any integer number of elements.
  • If SS and TT are both subsets of some set UU, then TT is the complement of SS (with respect to UU) if ST=US\cup T=U and ST=S\cap T=\emptyset. That is, each element of UU is in exactly those elements off UU that are not in SS.

Theorem

Let SS be a finite subset of some infinite set UU. Let TT be the complement of SS with respect to UU. Then TT is infinite.

Other Theorem Forms

Ways of Saying "If-Then"

Here are some of the other ways in which "if HH then CC" might appear.

  • HH implies CC.
  • HH only if CC.
  • CC if HH.
  • Whenever HH holds, CC follows.
    • If HH holds, then CC follows.
    • Whenever HH holds, CC holds.

In formal logic one often sees the operator \rightarrow in place of "if-then". That is, the statement "if HH then CC" could appear as HCH\rightarrow C in some mathematical literature.

If-And-Only-If Statements

In formal logic, one may see the operator \Leftrightarrow or \equiv to denote an "if-and-only-if" statement. That is, ABA\equiv B and ABA\Leftrightarrow B mean the same as "AA if and only if BB".

tip

When proving an if-and-only-if statement, it is important to remember that you must prove both the "if" and "only-if" parts. Sometimes, you will find it helpful to break an if-and-only-if into a succession of several equivalences. That is, to prove "AA if and only if BB", you might first prove "AA if and only if CC" and then prove "CC if and only if BB".

Theorem

Let xx be a real number. Then x=x\lfloor x\rfloor=\lceil x\rceil if and only if xx is an integer.


x\lfloor x\rfloor, the floor of a real number, is the greatest integer equal to or less than xx.

x\lceil x\rceil, the ceil of a real number, is the least integer equal to or greater than xx.

Theorems That Appear Not to Be If-Then Statements

Sometimes,we encounter a theorem that appears not to have a hypothesis. An example is the well-known fact from trigonometry:

Pythagorean Identity

Theorem

sin2θ+cos2θ=1\sin^2\theta+\cos^2\theta=1

Additional Forms of Proof

In this section,we take up several additional topics concerning how to construct proofs:

  • Proofs about sets.
  • Proofs by contradiction.
  • Proofs by counterexample.

Proving Equivalences About Sets

The Distributive Law of Union Over Intersection

Theorem

R(ST)=(RS)(RT)R\cup(S\cap T)=(R\cup S)\cap(R\cup T)


An element xx is in R(ST)R\cup(S\cap T) if and only if xx is in (RS)(RT)(R\cup S)\cap(R\cup T).

The elements of R(ST)R\cup(S\cap T) are all and only the elements of (RS)(RT)(R\cup S)\cap(R\cup T).

The Contrapositive

Every if-then statement has an equivalent form that in some circumstances is easier to prove. The contrapositive of the statement "if HH then CC" is "if not CC then not HH".

A statement and its contrapositive are either both true or both false,so we can prove either to prove the other.

Proof by Contradiction

Another way to prove a statement of the form "if HH then CC" is to prove the statement: HH and not CC implies falsehood.

Counterexamples

In real life,we are not told to prove a theorem. Rather,we are faced with something that seems true - a strategy for implementing a program for example - and we need to decide whether or not the "theorem" is true. To resolve the question, we may alternately try to prove the theorem, and if we cannot, try to prove that its statement is false.

Theorems generallyare statements about an infinite number of cases, perhaps all values of its parameters. Indeed, strict mathematical convention will only dignify a statement with the title "theorem" if it has an infinite number of cases; statements that have no parameters, or that apply to only a finite number of values of its parameter(s)are called observations. It is sufficient to show that an alleged theorem is false in any one case in order to show it is not a theorem. The situation is analogous to programs, since a program is generally considered to have a bug if it fails to operate correctly for even one input on which it was expected to work.

It often is easier to prove that a statement is not a theorem than to prove it is a theorem.

Inductive Proofs

There is a special form of proof, called "inductive", that is essential when dealing with recursively defined objects.

Inductions on Integers

Suppose we are given a statement S(n)S(n), about an integer nn, to prove. One common approach is to prove 2 things:

  • The basis, where we show S(i)S(i) for a particular integer ii. Usually, i=0i=0 or i=1i=1, but there are examples where we want to start at some higher ii, perhaps because the statement SS is false for a few small integers.
  • The inductive step, where we assume nin\geq i, where ii is the basis integer, and we show that "if S(n)S(n) then S(n+1)S(n+1)".

The Induction Principle: If we prove S(i)S(i) and we prove that for all nin\geq i,S(n)S(n) implies S(n+1)S(n+1), then we may conclude S(n)S(n) for all nin\geq i.

More General Forms of Integer Inductions

Sometimes an inductive proof is made possible only by usinga more general scheme than the one proposed here, where we proved a statement SS for one basis value and then proved that "if S(n)S(n) then S(n+1)S(n+1)". Two important generalizations of this scheme are:

  • We can use several basis cases. That is, we prove S(i),S(i+1),,S(j)S(i),S(i+1),\dots,S(j) for some j>ij>i.

  • In proving S(n+1)S(n+1), we can use the truth of all the statements

    S(i),S(i+1),,S(j)S(i),S(i+1),\dots,S(j)

    rather than just using S(n)S(n). Moreover, if we have proved basis cases up to S(j)S(j), then we can assume njn\geq j, rather than just nin\geq i.

Structural Inductions

Like inductions, all recursive definitions have a basis case, where one or more elementary structures are defined, and an inductive step, where more complex structures are defined in terms of previously defined structures.

Inductive construction of a tree

Definition

Basis: A single node is a tree, and that node is the root of the tree.

Induction: If T1,T2,,TkT_1,T_2,\dots,T_k are trees, then we can form a new tree as follows:

  • Begin with a new node NN, which is the root of the tree.
  • Add copies of all the trees T1,T2,,TkT_1,T_2,\dots,T_k.
  • Add edges from node NN to the roots of each of the trees T1,T2,,TkT_1,T_2,\dots,T_k.

Figure 3 shows the inductive construction of a tree with the root NN from kk smaller trees.

Inductive construction of a tree
Figure 3. Inductive construction of a tree.

Mutual Inductions

Sometimes, we cannot prove a single statement by induction, but rather need to prove a group of statements S1(n),S2(n),,Sk(n)S_1(n),S_2(n),\dots,S_k(n) together by induction on nn.

Strictly speaking, proving a group of statements is no different from proving the conjunction (logical AND) of all the statements.

However, when there are really several independent statements to prove, it is generally less confusing to keep the statements separate and to prove them all in their own parts of the basis and inductive steps. We call this sort of proof mutual induction.

We can abstract the pattern for all mutual inductions:

  • Each of the statements must be proved separately in the basis and in the inductive step.
  • If the statements are "if-and-only-if," then both directions of each statement must be proved, both in the basis and in the induction.

The Central Concepts of Automata Theory

Alphabets

Alphabets

Definition

An alphabet is a finite, nonempty set of symbols. Conventionally, we use the symbol Σ\Sigma for an alphabet. Common alphabets include:

  • Σ={0,1}\Sigma=\{0,1\}, the binary alphabet.
  • Σ={a,b,,z}\Sigma=\{a,b,\dots,z\}, the set of lower-case letters.
  • The set of all ASCII characters, or the set of all printable ASCII characters.

Strings

Strings

Definition

A string (or sometimes word) is a finite sequence od symbols chosen from some alphabet.

The Empty String

The empty string is the string with zero occurrences of symbols. This string, denoted ϵ\epsilon, is a string that may be chosen from any alphabet whatsoever.

Length of a String

It is often useful to classify strings by their length, that is, the number of positions for symbols in the string. For instance, 01101 has length 5.

The standard notation for the length of a string ww is w|w|. For example, 1011=4|1011|=4.

Power of an Alphabet

If Σ\Sigma is an alphabet, we can express the set of all strings of a certain length from that alphabet by using an exponential notation. We define Σk\Sigma^k to be the set of strings of length kk, each of whose symbols is in Σ\Sigma.

Note that Σ0={ϵ}\Sigma^0=\{\epsilon\}, regardless of what alphabet Σ\Sigma is.

The set of all strings over an alphabet Σ\Sigma is Σ\Sigma^{*}, i.e. {0,1}={ϵ,0,1,00,01,10,11,000,}\{0,1\}=\{\epsilon,0,1,00,01,10,11,000,\dots\}. That is, Σ=Σ0Σ1Σ2\Sigma^{*}=\Sigma^0\cup\Sigma^1\cup\Sigma^2\cup\dots.

The set of nonempty strings from alphabet Σ\Sigma is denoted Σ+\Sigma^+. Thus, two appropriate equivalences are:

  • Σ+=Σ1Σ2Σ3\Sigma^+=\Sigma^1\cup\Sigma^2\Sigma^3\cup\dots;
  • Σ=Σ+{ϵ}\Sigma^*=\Sigma^+\cup\{\epsilon\}.
tip

Σ\Sigma is an alphabet, Σ1\Sigma^1 is a set of strings (length = 1).

Concatenation of Strings

Let xx and yy be strings, then xyxy denotes the concatenation of xx and yy. That is, the string formed by making a copy of xx and following it by a copy of yy. More precisely, if xx is the string composed of ii symbols x=a1a2aix=a_1a_2\cdots a_i and yy is the string composed of jj symbols y=b1b2bjy=b_1b_2\cdots b_j, then xyxy is the string of length i+j:  xy=a1a2aib1b2bji+j:\;xy=a_1a_2\cdots a_ib_1b_2\cdots b_j.

Languages

A set of strings all of which are chosen from some Σ\Sigma^*, where Σ\Sigma is a particular alphabet, is called a language. If Σ\Sigma is an alphabet, and LΣL\subseteq \Sigma^*, then LL is a language over Σ\Sigma.

Notice that a language over Σ\Sigma need not include strings with all the symbols of Σ\Sigma, so once we have established that LL is a language over Σ\Sigma, we also know it is a language over any alphabet that is a superset of Σ\Sigma.

\emptyset, the empty language, is a language over any alphabet.

{ϵ}\{\epsilon\}, the language consisting of only the empty string, is also a language over any alphabet.

{ϵ}\emptyset \neq \{\epsilon\}, the former has no strings and the latter has one string.

The only important constraint on what can be a language is that all alphabets are finite. Thus languages, although they can have. an infinite number of strings, are restricted to consist of strings drawn from one fixed, finite alphabet.

It is common to describe a language using a "set-former":

{wsomthing  about  w}.\{w|somthing\;about\;w\}.

For example:

{ww  is  a  binary  integer  that  is  prime}.\{w|w\;is\;a\;binary\;integer\;that\;is\;prime\}.

It is also common to replace ww by some expression with parameters and describe the strings in the language by stating conditions on the parameters. For example:

{0n1nn1}.\{0^n1^n|n\geq 1\}.

which is {01,0011,000111,}\{01,0011,000111,\dots\}.

Problems

In automata theory, a problem is the question of deciding whether a given string is a member of some particular language. It turns out, as we shall see, that anything we more colloquially call a "problem" can be expressed as membership in a language. More precisely, if Σ\Sigma is an alphabet, and LL is a language over Σ\Sigma, then the problem LL is:

  • Given a string ww in Σ\Sigma^*, decide whether or not ww is in LL.

Reference

1. J. E. Hopcroft, R. Motwani, and J. D. Ullman, Introduction to automata theory, languages, and computation, 2nd edition, vol. 32, no. 1. New York, NY, USA: ACM Press, 2001, pp. 60-65.