Model Theory
MODEL THEORY
In 1954 Alfred Tarski proposed the name theory of models for the study of "mutual relations between sentences of formalized theories and mathematical [structures] in which these sentences hold." This definition hides a program that was to apply metamathematical results (particularly the Compactness Theorem of first-order logic) in what Abraham Robinson in 1950 had called "the development of actual mathematics." Anatolii I. Mal'tsev had launched this program in the Soviet Union in 1940, but communications were bad in this period and the program started afresh in the late 1940s with Tarski in the United States and Robinson in Britain. Mathematical model theory in the sense of this program has been remarkably successful, particularly in its applications to group theory and geometry, and it has far outgrown Tarski's initial definition of the theory of models.
Tarski's definition rested on the fact that one can use formal languages to define classes of structures. For mathematical applications it has turned out to be just as important that one can use formal languages to define sets and relations within a single structure. But at its base, model theory is more general even than this. Arguably it stands in the same relation to the traditional theory of definitions as modern proof theory stands to the traditional theory of syllogisms.
Most sentences are true in some contexts and false in others. If S is a sentence, then by an interpretation of S we mean a parcel of information about some possible context, which is enough to make S either true or false in that context. Suppose I is an interpretation of S. If I makes S true, we call I a model of S and we say that S is true in I. "Truth-in-a-model" is honest to goodness truth, no less than (say) being true at 3 o'clock.
The sentence S defines a class of interpretations, namely the class of its models. A simple example is the mathematical equation
x2 + y2 = 1
where x and y are variables ranging over real numbers. An interpretation of this equation consists of a pair of real numbers b,a where x is to name a and y to name b. Under this interpretation the sentence is true if and only if the point b,a lies on the circle C of radius 1 around the origin in the cartesian plane. So the circle C is the class of models of the equation. This example assumes that we have specified what form an interpretation of the equation should take. In concrete applications of model theory one begins with such a specification.
The sentence S can come from a natural language or a formal one. The range of information that might appear in interpretations is vast. They can specify the time of utterance, the time spoken of, the place, the speaker's identity, salient objects in the context (to give reference to "the previous owner", "the latter symbol", "Peter", etc.). They can also supply meanings for words that have none. But mathematical model theory concerns itself almost entirely with interpretations of a kind called structures. A structure supplies a set of objects, called the domain or universe of the structure. Besides giving a domain, a structure interprets expressions by attaching them to elements of the domain, or to set-theoretic objects built up from elements of the domain. For example a mathematical model theorist, to interpret the sentence
The mome raths outgrabe.
would probably supply two sets X and Y, together with the information that X is the set of things that count as mome raths and Y is the set of things that count as having outgribben. This interpretation is a model of the sentence if and only if X is a subset of Y.
When the sentence S comes from a formal language of logic, one can describe precisely how the truth value of S depends on the sets or objects used to interpret symbols of S. Tarski's model-theoretic definition of truth and satisfaction is a paradigm for this kind of description. The model-theoretic truth definition was an adaptation of the truth definition that Tarski gave in 1933 for formal languages. In that earlier definition Tarski assumed that all symbols needing an interpretation already had one (in general a set-theoretic one), and so the definition was strictly not model-theoretic. But truth definitions that run along similar lines to Tarski's, for example the definitions of truth underlying Richard Montague's semantics for fragments of English, are called "model-theoretic"; probably the use of set theory and recursion on the complexity of formulas are the features that this name brings to mind.
As a discipline, model theory takes no stand at all on whether there are possible worlds or on what objects there are in the universe. If you believe in possible worlds you can study interpretations that involve possible worlds; if you don't, you probably won't. There are branches of model theory where one puts strong limits on the kinds of interpretation that are allowed: For example in recursive model theory the structures are built up from computable functions of natural numbers. But since structures are set-theoretic objects, most mathematical model theorists make free use of the axioms of Zermelo-Fraenkel set theory, including the Axiom of Choice.
One should distinguish between model theory and "mathematical modeling". Modeling a phenomenon usually involves constructing a formal theory rather than a set-theoretic structure. But there are overlaps. For example abstract state machines, introduced by the model theorist Yuri Gurevich, are set-theoretic structures used to model parallel computation. In another direction, papers in Morgan and Morrison discuss the relations between theories and structures in scientific research, with particular reference to physics and economics.
First-Order Model Theory
First-order model theory is the most developed part of model theory, and other parts of model theory tend to be generalizations or analogues of the first-order case. We begin with some preliminary definitions that rest on first-order logic.
defining structures, truth, and satisfaction
First we define signatures. A signature is a collection of symbols as follows:
(1) Relation symbols, usually
P, Q, R, R 0, R 1, R 2, ….
(2) Individual constant symbols, or more briefly constants, usually
a, b, c, c 0, c 1, c 2, ….
(3) Function symbols, usually symbols such as
F, G, H, F 0, F 1, F 2, ….
Each relation symbol and each function symbol in a signature has an arity, which is a positive integer. If a symbol has arity n, we say that the symbol is n-ary. We normally require that no symbol occurs in more than one of these three kinds, and that no relation or function symbol occurs with more than one arity. We say that a signature σ is a reduct of a signature τ (and that τ is an expansion of σ) if every constant in σ is also a constant in τ, every relation symbol of σ is also a relation symbol in τ with the same arity, and likewise for the function symbols in σ.
Let σ be a signature. A σ-structure is an ordered pair A = 〈dom(A ), f A 〉 as follows:
dom(A ) is a nonempty set, known as the domain of A.
f A is a function whose domain is the set of symbols in the signature σ.
For each constant c of σ, f A (c ) is an element of dom(A ); we write this element as c A .
For each relation symbol R of σ, f A (R ) is an n -ary relation on dom(A ), where n is the arity of R ; we write this relation as R A .
For each function symbol F of σ, f A (F ) is an n -ary function F A : dom(A )n → dom(A ), where n is the arity of F.
By a structure we mean a σ-structure for some signature σ.
If A is a τ-structure and σ is a reduct of τ then we can make A into a σ-structure by removing the symbols not in σ; the resulting σ-structure is written A |σ and called a reduct of A. Likewise A is an expansion of A |σ.
By the elements of a structure A we mean the elements of dom(A ). (For example a structure A and its reduct A |σ have the same elements.) By the cardinality of A we mean the cardinality of dom(A ).
For each signature σ there is a corresponding first-order language L (σ) as in the entry "First-Order Logic". Since each first-order language L is of the form L (σ) for a unique signature σ, we can also refer to σ-structures as L-structures. We borrow the following facts and definitions from the entry "First-Order Logic", under the assumption that L is a first-order language and A is an L -structure.
If ϕ is a sentence of L then ϕ is either true or false in A. If ϕ is true in A, we write A ⊧ ϕ and we call A a model of ϕ. If ϕ is false in A we write A ⊧̷ ϕ.
By a theory in L we mean a set T of sentences of L. By a model of T we mean a model of all the sentences in T. We say that T is consistent if T has a model; otherwise it is inconsistent.
Let T be a theory in L and ϕ a sentence of L. We say that ϕ is a consequence of T, and that T entails ϕ, in symbols
(1) T ⊧ ϕ,
if every L -structure that is a model of T is also a model of ϕ. The theory T is said to be complete if for every sentence ϕ of L, either ϕ or ¬ϕ is a consequence of T. The expression (1) is called a sequent ; it is valid if T does entail ϕ.
We write ϕ(x 1, … , x n ) for a formula of L whose free variables are all among x 1, … , x n . If a 1, … , a n are elements of A, we write
A ⊧ ϕ[a 1, … , a n ],
pronounced "a 1 to a n satisfy ϕ in A ", if ϕ is true in A when each free variable x i is interpreted as a name of a i . This notion can be defined set-theoretically without relying on the semantic notion "name of".
These fundamental facts and definitions allow us to use first-order sentences in order to define classes of structures, and to use first-order formulas in order to define classes of elements in structures.
defining classes of structures
We write Mod(T ) for the class of all L -structures that are models of the theory T. If A is an L -structure, we write Th(A ) for the set of all sentences ϕ of L which are true in A ; Th(A ) is known as the complete first-order theory of A. If K is a class of L -structures, we write Th(K ) for the set of those sentences of L which are true in every structure in K . We say that two L -structures A and B are elementarily equivalent, in symbols A ≡ B, if Th(A ) = Th(B ). Elementary equivalence is an equivalence relation on the class of L -structures. We say that two theories S and T in L are equivalent if Mod (S ) = Mod (T ); this is an equivalence relation on the class of theories in L.
Theorem 1 The notions Mod and Th are related as follows:
1. If T ⊆ U then Mod(T ) ⊇ Mod(U ).
2. If J ⊆ K then Th(J ) ⊇ Th (K ).
3. K ⊆ Mod(Th(K )) and Th(K ) = Th(Mod(Th(K ))).
4. T ⊆ Th(Mod(T)) and Mod (T) = Mod(Th(Mod(T))).
These facts are all immediate from the definitions.
The theory Th(Mod(T )) is called the deductive closure of the theory T ; it consists of all the consequences of T. A theory is said to be deductively closed if it is its own deductive closure. By 3 of Theorem 1, a theory is deductively closed if and only if it is of the form Th(K ) for some class K of structures. (In some older literature, deductive closure was included in the definition of "theory".)
A class of structures of the form Mod({ϕ}), where ϕ is a single sentence, is said to be first-order definable, or an EC class. A class of structures of the form Mod(T ), where T is a theory, is said to be first-order axiomatisable, or generalised first-order definable, or an ECΔ class. A class K of L -structures is said to be closed under elementary equivalence if every L -structure elementarily equivalent to a structure in K is also in K .
We pause for some examples.
Example 1 : Equivalence relations. We use the signature with one binary relation symbol E ; call this signature σ. We write Exy for E (x,y ). An equivalence relation is a σ-structure that is a model of the following finite theory, which we shall call T eq :
Example 2 : Fields. The following example has been central in the development of model theory. We adopt a signature with constants 0 and 1, binary function symbols + and · and a 1-ary function symbol −. This signature is appropriate for talking about rings, so it is known as the signature of rings. We normally write +(x,y ), ·(x,y ) and −(x ) as x + y, xy and −x respectively, and we use standard mathematical notation such as x ≠ y for ¬(x = y ). The theory of fields, T f , consists of the following sentences:
1. ∀x ∀y ∀z (x + (y + z ) = (x + y ) + z )
2. ∀x ∀y (x + y = y + x ),
3. ∀x (x + 0 = x )
4. ∀x (x + −x = 0)
5. ∀x ∀y ∀z (x (yz ) = (xy )z )
6. ∀x ∀y (xy = yx )
7. ∀x (x · 1 = x )
8. ∀x ∀y ∀z (x (y + z ) = xy + xz )
9. 0 ≠1
10. ∀x ∃y (x ≠ 0 → xy = 1)
We write 2 for 1 + 1, 3 for 1 + 1 + 1 and so on. A field is said to be of characteristic 0 if it is also a model of the infinitely many axioms
11. n ≠ 0
where n is any positive integer. We write x 2 for the term xx, x 3 for xxx and so on. Let t n (x,y 1, … , y n ) be the term
xn + y 1xn −1 + y 2xn −2 + … + y n −2x 2 + y n −1x + y n .
A field is said to be algebraically closed if it is a model of the infinitely many axioms
12. ∀y 1 ∀y 2 … ∀y n ∃x (t n (x, y 1, … , y n ) = 0)
where n is any positive integer. The classes of fields, fields of characteristic 0 and algebraically closed fields were all well known before these axioms were written down as first-order sentences, and the first-order sentences say exactly the same as the earlier informal definitions of those classes.
In the light of our earlier definitions, several natural questions arise. For example:
Question One. Is there an algorithm to determine whether any given sentence ϕ is a consequence of T eq ?
Question Two. For which equivalence relations A is Mod(Th(A )) first-order definable?
Question Three. If A and B are two algebraically closed fields of characteristic 0, how can we tell whether they are elementarily equivalent?
Question Four. What is an example of a class K that is closed under elementary equivalence but not first-order axiomatisable?
We will return to these questions below.
The infinite spectrum of a class K is the class of infinite cardinals κ such that K contains a structure of cardinality κ. Questions about the possible infinite spectra of classes of the form Mod(T ) were first raised by Leopold Löwenheim in 1915, and below we shall see some "Löwenheim-Skolem" theorems that describe these spectra.
defining classes of elements
The notions described above have analogues within a single structure. Suppose A is an L -structure. By an n-tuple in A we mean an ordered n -tuple (a 1, … , a n ) of elements of A. We write Φ(x 1, … , x n ) for a set Φ of formulas of L of the form ϕ(x 1, … , x n ) (the same integer n for each formula). We say that an n -tuple (a 1, … , a n ) in A realises Φ if
for all ϕ in Φ, A ⊧ ϕ[a 1, … , a n ].
We write Φ(An ) for the set of all n -tuples in A that realise Φ. If Φ contains just one formula ϕ, we write Φ(An ) as ϕ(An ) and we say that this set of n -tuples is (first-order ) definable without parameters. The sets Φ(An ) are said to be infinitarily definable without parameters, or ⋀-definable without parameters.
For the analogous notions of definability with parameters we allow the formulas ϕ to contain constants (in an expanded signature) to name some elements of A. For example if we are talking about the rational numbers in a signature whose symbols are < for the ordering and 0, 1 for the corresponding numbers, then the interval (0,1) of rational numbers strictly between 0 and 1 will be definable without parameters, the interval (3,4) will be definable with parameters, and the interval (√2, π) will not be definable at all. When model theorists talk about definable sets, they sometimes mean with parameters and sometimes without; if in doubt you have to ask.
Let T be a theory in the first-order language L, and ϕ(x 1, … , x n ) and ψ(x 1, … , x n ) formulas of L. We say that ϕ is equivalent to ψ modulo T if ϕ(An ) = ψ(An ) for every model A of T ; this is equivalent to saying that the sentence
(2) ∀x 1 …∀x n (ϕ ↔ ψ)
is a consequence of T. Likewise we say that ϕ is equivalent to ψ in the L -structure A if ϕ(An ) = ψ(An ); this is equivalent to saying that (2) is true in A.
If Φ(x 1, … , x n ) is a set of formulas of L, we can ask whether there is an L -structure A in which Φ(An ) is not empty. If the answer is Yes, we say that Φ is an n-type, or more briefly a type, and we say that the structure A realises the type. There may be other structures B for which Φ(Bn ) is empty; these structures are said to omit the type. We say that the type Φ is complete if for every formula ϕ(x 1, … , x n ) of L, exactly one of Φ ∪ {ϕ} and Φ ∪ { ¬ϕ} is a type.
For example let ℕ be the 'natural number' structure whose domain is the set of natural numbers 0, 1, 2, … , with symbols to express 0, 1, addition, multiplication and 'less than' <. Consider the infinite set Φ(x ) of formulas
0 <x, 1 <x, 2 <x, …
The set Φ(x ) is in fact a type, but it is clear that ℕ omits this type; there are no infinite natural numbers. A natural question is:
Question Five. Are there structures elementarily equivalent to ℕ which realise this type?
The answers to Questions One to Five are not obvious. Many of the techniques of model theory were devised in order to answer just such questions. Historically the first three major techniques in this area were elimination of quantifiers, back-and-forth, and the Compactness Theorem. The next three sections discuss these.
Elimination of Quantifiers
Thoralf Skolem, Charles Langford, and Alfred Tarski developed the method of elimination of quantifiers during the 1920s as a way of analyzing structures or classes of structures.
As the name indicates, the idea of elimination of quantifiers is to express as much as possible without using quantifiers. Let Φ be a set of formulas of a language L. Write Φ' for the smallest class of formulas of L such that (i) Φ⊆Φ', (ii) if ϕ is in Φ' then ¬ϕ is in Φ', and (iii) if ϕ and ψ are in Φ' then (ϕ∧ψ) and (ϕ∨ψ) are in Φ'. The formulas in Φ' are called the boolean combinations of formulas in Φ. There can be quantifiers in the formulas in Φ, but when we form boolean combinations of them we add no more quantifiers.
Let L be a first-order language and K a class of L -structures. A successful elimination of quantifiers for K consists of the following items:
(i) a set T of sentences of L that are true in all structures in K ;
(ii) a set Φ of formulas of L, called the elimination set ;
(iii) a proof that if ψ(x 1, … , x n ) is any formula of L then ψ is equivalent modulo T to a boolean combination ψ*(x 1, … , x n ) of formulas in the elimination set.
We carry out an elimination of quantifiers as follows. The class K of structures already determines the signature. We begin by choosing T to be—provisionally—a set of sentences that are clearly true in all structures in K ; our exact choice could depend on aesthetic or pedagogic considerations. We launch (ii) by including all atomic formulas in the elimination set. From this point on, we aim to prove (iii) by induction on the number of occurrences of quantifiers in ψ, with a subinduction on the complexity. If ψ*(x 1, … , x n ) and χ*(x 1, … , x n ) have been found, we can take (ψ∧χ)* to be ψ* ∧χ*, and likewise for other truth-functional combinations. We can take (∀x n ψ)* to be ¬∃x n ¬(ψ*). This leaves the case ∃x n ψ, and this is where we "eliminate the quantifier".
We first put ψ* into disjunctive normal form, and then we use the logical equivalence
∃x (ϕ1 ∨ϕ2) ≡(∃x ϕ1 ∨∃x ϕ2)
to reduce to the case of a formula ∃x n θ where θ is a conjunction of formulas in the elimination set and negations of formulas in the elimination set. The next step depends on K and perhaps on our mathematical skill. If we can find a boolean combination ϕ of formulas in the elimination set, and a proof that ϕ is equivalent to ∃x n θ modulo T, then this case is taken care of. Otherwise we have two options. First if we can find a suitable formula ϕ that is certainly equivalent to ∃x n θ in all structures in K but we can't prove this equivalence from T then we can add the equivalence statement to T. Second, as a last resort, we can add ∃x n θ to the elimination set. We hope to reach a point where we can prove (iii) for all formulas. When this point is reached the quantifier elimination proper is complete. If heaven favors us (and this is not guaranteed) by this stage we will also know which boolean combinations of sentences in the elimination set are true in all structures in K . Adding these to T gives a theory T′ equivalent to Th K ). With more good luck we may find that the reductions in (iii) allow us to construct an algorithm to determine whether any given sentence of L is a consequence of T′, so that we have a decision procedure for Th(K ).
Example 1 continued : Equivalence relations. For T we take the theory T eq defining the class of equivalence relations. As a first attempt at the elimination set Φ we take all atomic formulas of L. There are two kinds of atomic formula, namely (x = y ) and Exy. Trial and error shows that for every positive integer n the formula χn (x ) expressing "There are at least n elements that are in the same equivalence class as x " is not reducible to a boolean combination of atomic formulas; so we put χn in the set Φ too. Similarly we add to Φ all the sentences σm,n expressing 'There are at least m equivalence classes containing at least n members each', where m and n are any positive integers, and the sentences θm,n expressing "There are at least m equivalence classes of size exactly n ". It turns out that this is enough for an elimination set. There is an algorithm reducing each sentence to a boolean combination of sentences in Φ, and there is an algorithm determining which boolean combinations of sentences in Φ are consequences of T. Thus T is a decidable theory and we have an answer to Question One.
Example 3 : The field ℝ of real numbers. We take the signature to be the expansion of the signature of rings got by adding a binary relation symbol < for the ordering of the reals. (Without this added symbol we would need to put ∃y (x = y 2) into the elimination set; with < this formula is equivalent to (x = 0 ∨ 0 < x).) Tarski showed that a set of axioms for Th(ℝ) is given by the theory T f of Example 2 together with an axiom saying that for every element r, either r or −r is a square, an axiom saying that r <s if and only if r ≠s and s − r has a square root, axioms saying that −1 is not a sum of squares, and the axioms 12 for odd positive integers n. It then came to light that these axioms define the class of real-closed fields. Tarski also gave a decision procedure for the set of consequences of this theory. The elimination set is interesting: it consists exactly of the atomic formulas. As a corollary, the subsets of ℝ that are first-order definable with parameters consist of the finite unions of sets of the following kinds: singletons {a }, intervals (a,b ) (the set of elements r with a < r and r < b ), intervals (−∞,b) (the set of all elements < b ) and intervals (a,∞) (the set of all elements >a ).
A structure A whose elements are linearly ordered by an ordering relation <A , and for which the sets first-order definable with parameters are exactly the finite unions of singletons and intervals as in Example 3, is said to be o-minimal. The knowledge that a structure is o-minimal gives powerful information about the structure. Beginning with Alex Wilkie's demonstration in 1991 that the expansion of the field of real numbers with an exponentiation function xy is o-minimal, many other o-minimal expansions of ℝ have been found, and there is promise of deep applications in real function theory.
From around 1950 more powerful and algebraic methods were found that gave largely the same information as the method of elimination of quantifiers. But it remains one of the best methods for discovering decision procedures when the theory is decidable.
Back-and-Forth
Suppose A and B are σ-structures, where σ is a signature. By a partial isomorphism from A to B we mean a function e from a subset X of dom(A ) to dom(B ) such that if ϕ(x 1, … , x n ) is an atomic formula of L and a 1, … , a n are any elements in X then
A ⊧ ϕ[a 1, … , a n ] if and only if B ⊧ ϕ[e (a 1), … , e (a n )].
If e is a partial isomorphism from A to B and the domain X of e is the whole of dom(A ), we say that e is an embedding of A into B. If e is an embedding of A into B and every element of B is of the form e (a ) for some element a of A then we say that e is an isomorphism from A to B. We say that A is isomorphic to B, in symbols A ≅ B, if there is an isomorphism from A to B. The relation ≅ is an equivalence relation on the class of L -structures, and its equivalence classes are called isomorphism types.
If A and B are isomorphic σ-structures, then A and B must be elementarily equivalent, A ≡ B. The definition of "partial isomorphism", and hence also the definition of ≅, are easily rewritten in ways that refer to the signature σ but not to any formula of the language L (σ). In the 1950s one aim of research was to find an "algebraic" description of elementary equivalence that doesn't mention formulas either. Roland Fraïssé gave essentially the following answer, which is known as the back-and-forth method.
A back-and-forth system from A to B is a set I of partial isomorphisms from A to B such that
(a) I is not empty.
(b) If i is in I and a is an element of A then there are an element b of B and a partial isomorphism j in I such that
i ∪ {〈a, b 〉} ⊆ j ;
(c) If i is in I and b is an element of B, then there are an element a of A and a partial isomorphism j in I such that
i ∪{ 〈a,b 〉} ⊆ j.
By a finite relational signature we mean a signature with only finitely many symbols, none of which are function symbols. (Constants are allowed.)
Theorem 2 If there is a back-and-forth system I from A to B then A ≡ B. If A ≡ B then for every finite relational signature σ, there is a back-and-forth system from A|σ to B|σ.
Example 4 : Dense linear orderings without endpoints. We adopt a signature σ with one binary relation symbol <, and we write x < y for <(x, y ). A dense linear ordering without endpoints is a σ-structure that is a model of the following set of sentences:
We shall write this set of sentences as T dlo .
Suppose A and B are dense linear orderings without endpoints. An order-preserving partial map from A to B is a function e from a finite set X of elements of A to the domain of B such that if the elements of X are a 1, … , a n with
(3) a 1 <A a 2 <A … <A a n
then
(4) e (a 1) < B e (a 2) <B … <B e (a n ).
Write I(A,B ) for the set of all order-preserving partial maps from A to B.
One can check from the definitions that every function in I(A,B ) is a partial isomorphism from A to B. Also I(A,B ) is a back-and-forth system from A to B. Suppose for example e is in I with domain {a 1, … , a n } as in (3), and a is an element of A that is not in the domain of e. One possibility is that a <A a 1. By sentence 5 of T dlo there is an element b of B with b <B e (a 1); then e ∪{ 〈a,b 〉} is a function in I that extends e and has a in its domain. The other possibilities for a are similar, using sentences 4 and 5. The same argument, going from B to A, shows that if e is in I and b is an element of B then there is a function in I that extends e and takes some element of A to b.
By Theorem 2 it follows that A ≡ B ; so any two dense linear orderings without endpoints are elementarily equivalent, and the theory T dlo is complete.
We can say more. Suppose A and B both have countably many elements; list the elements of A as a 0, a 1, … and the elements of B as b 0, b 1, … . Let e 0 be any function in I(A,B ). There is e 1 in I(A,B ) that extends e 0 and has a 0 in its domain and b 0 in its image. Then there is e 2 that extends e 1 and has a 1 in its domain and b 1 in its image; and so on through e 3, e 4, and so on. Finally define a function e by putting e (a i ) = e i +1(a i ) for each element a i of A. By construction all elements of A are in the domain of e and all elements of B are in the image of e, and it follows that e is an isomorphism from A to B. We have proved a famous theorem of Cantor:
Theorem 3 If A and B are countable dense linear orderings without endpoints, then A and B are isomorphic.
There are many adaptations of Fraïssé's idea. One different presentation (though with the same content) uses the idea of a game between two players who take turns to choose elements from the structures A and B. The criteria for the second player to win can be set up so that this player has a winning strategy if and only if there is a back-and-forth system from A to B.
In another adaptation, we require that the domains of the functions in the back-and-forth system all have cardinality ≤ n for some positive integer n, dropping the requirements (b) and (c) when i has domain of size n. The existence of a back-and-forth system of this kind corresponds (as in Theorem 2) to the condition that A and B agree in all sentences with quantifier rank ≤n, in symbols A ≡n B. We omit the full definition of ≡n here, but we note that any sentence with at most n occurrences of quantifiers in it has quantifier rank ≤ n, and that in a finite relational signature there are only finitely many pairwise nonequivalent sentences of rank ≤ n. It follows that a class K with finite relational signature is first-order definable if and only if it is closed under ≡n for some n. This leads quickly to an answer to Question Two.
Theorem 4 The equivalence relations A with Mod(Th(A )) first-order definable are precisely the finite ones.
Back-and-forth methods are a model-theoretic generalisation of techniques developed in several areas of mathematics, notably in the study of linear orderings and abelian groups. They also adapt to some languages that are not first-order, and unlike much of first-order model theory, they work as well for finite structures as for infinite ones. This has made them useful tools of theoretical computer science, for example in database theory.
One shouldn't come away from this section with the impression that proving elementary equivalence is a matter of finding a clever model-theoretic technique. Model-theoretic ideas can help to bring to the surface the place where work has to be done, but most proofs of elementary equivalence involve substantial mathematics. For example a problem that Tarski posed in the 1950s, namely whether all free groups with more than one generator are elementarily equivalent, resisted decades of efforts. About half a century after Tarski put the problem, a positive solution was announced by Zlil Sela; besides quantifier elimination, it used a range of techniques from different parts of group theory.
The Compactness Theorem
Almost everything in first-order model theory depends on the Compactness Theorem.
Theorem 5 (Compactness Theorem) Let L be a first-order language and T a theory in L such that every finite subset of T has a model. Then T has a model.
We sketch a proof using Hintikka sets as in the entry "First Order Logic." The proof needs a little set theory in the form of infinite cardinals and ordinals. (For the special case in which L has finite or countable signature, one needs only finite numbers.) Suppose the number of formulas of L is κ. We expand the language L to a language L + by adding κ new constants, the witnesses. Each of the clauses (H1)–(H6) in the definition of a Hintikka set describes a set of requirements on a Hintikka set; for example (H4) describes, for each formula ϕ(x ) of L + and each equation (s = t ) where s and t are closed terms of L +, the requirement that if ϕ(s ) and (s = t ) are in the Hintikka set then ϕ(t ) is in the Hintikka set. We list all these requirements as (r i : i < κ), in a list of order-type κ, arranging that each requirement appears as r i for κ-many ordinals i.
Now we define a sequence of theories (T i : i ≤ κ), by induction on i, in such a way that three properties hold:
(i) If i < j ≤ κ then T i is a subset of T j .
(ii) Each theory T i has the property that every finite subset of T i has a model.
(iii) For each i < κ the number of sentences that are in T i +1 but not in T i is finite.
The intention is that T κ will be a Hintikka set.
We start by putting T 0 = T ; this ensures that (ii) holds for T 0. If i is a limit ordinal then we take T i to be ⋃j < i T j ; since (assuming (i)) every finite subset of T i is already a subset of some T j with j < i, this ensures that (ii) holds for T i provided it already holds for each T j with j < i.
Now for each ordinal i < κ we define T i +1, assuming that T i has been defined, in such a way that requirement r i will be met if it applies. (When r i doesn't apply, we put T i +1 = Ti.) The details depend on r i . We consider some typical cases.
Suppose r i is the requirement (from (H1)) that if (ϕ∧ψ) is in the Hintikka set then so are ϕ and ψ. If this requirement applies, that is, if (ϕ∧ψ) is in T i , then we take T i +1 to be T i ∪ {ϕ,ψ}. It has to be checked that every finite subset of T i +1 has a model. Suppose U is a finite subset of T i +1. Put V = (U ∩ T i ) ∪ {(ϕ∧ψ)}. Then V is a finite subset of T i , so by induction hypothesis it has a model, say A. Since A is a model of (ϕ∧ψ), it is also a model of ϕ and ψ, and hence it must be a model of U.
Suppose r i is the requirement (also from (H1)) that if ¬(ϕ∧ψ) is in the Hintikka set then so is at least one of ¬ϕ and ¬ψ. If ¬(ϕ∧ψ) is in T i then r i applies. Put S 1 = T i ∪ {¬ϕ} and S 2 = T i ∪ {¬ψ}. If every finite subset of S 1 has a model then we put T i +1 = S 1. If not then there is some finite subset U of T i such that U ∪ {¬ϕ} has no model. We claim that in this case every finite subset V of S 2 has a model. For consider U ∪ (V ∩ T i ) ∪ {¬(ϕ∧ψ)}, which is a finite subset of T i and hence has a model, say B, by induction hypothesis. Then B is a model of U and hence a model of ϕ; but B is also a model of ¬(ϕ∧ψ), so it must be a model of ¬ψ and hence of S 2, as claimed. Hence in this case we can put T i +1 = S 2.
Suppose r i is the requirement (from (H5)) that if ∃x ϕ(x ) is in a Hintikka set then so is ϕ(c ) for some constant c. Suppose that this applies, that is, that ∃x ϕ(x ) is in T i . By (iii) the number of witnesses used in sentences in T i is less than κ, and so there must be at least one witness c not used yet. Choose such a c and put T i +1 = T i ∪ {ϕ(c )}. Let U be a finite subset of T i +1. Then (U ∩ T i ) ∪ {∃x ϕ} is a finite subset of T i , and so by induction hypothesis it has a model, say C. Since C is a model of ∃x ϕ, there is an element a of C such that C ⊧ ϕ[a ]. Let D be the same structure as C, except that c D = a. Then since c appears nowhere in sentences of T i , D is also a model of U ∩ T i . But by choice of c D it is a model of ϕ(c ) too, so it is a model of U.
Now suppose we have completed the definition of T κ as described. Suppose (ϕ∧ψ) is in T κ. Then (ϕ∧ψ) is already in T i for some i < κ. Since the requirement referring to (ϕ∧ψ) is r j for κ distinct ordinals j, it is r j for some j > i. So the requirement will have been met when T j +1 was defined, and hence T κ meets the requirement. A similar argument for each of the requirements (H1)–(H6) shows that T κ meets all these conditions for a Hintikka set. Condition (H7) holds because every finite subset of T κ has a model. So T κ is a Hintikka set, and by Metatheorem 16 in the entry "First-Order Logic," it has a model, say A. Since T is a subset of T κ, A is a model of T, proving the theorem.
Now we can answer Question Five. Let L + be the first-order language of the structure ℕ, but with one extra constant c. Let T be ℕ together with the infinitely many sentences
0̲ < c, 1̲ < c, 2̲ < c, …
If U is any finite subset of T, then U includes at most finitely many of the sentences n < c, and so we can choose a natural number m greater than any of the numbers n for which U mentions n. Let ℕ+ be the expansion of ℕ got by putting
c ℕ+ = m.
Then ℕ+ is a model of U. It follows that every finite subset of T has a model, and hence by the Compactness Theorem there is a model A of the whole of T. Let B be the reduct of A to the language of ℕ. Then B ≡ ℕ since T contains Th(ℕ). But also B contains the element c A which realizes the type consisting of all the formulas n < x.
This argument illustrates the model-theoretic idea behind nonstandard analysis.
We can also answer Question Four. In any signature σ, let K be the class of finite structures. If A is a structure in K and B is a σ-structure elementarily equivalent to A, then A and hence also B are models of a sentence expressing "There are exactly n elements", for some finite n. So B is also in K . This shows that K is closed under elementary equivalence. But let τ be the expansion of σ got by adding infinitely many new constant symbols c 0, c 1, … and let T′ be the theory consisting of all the sentences (c i ≠ c j ) where i < j. Since every finite subset of Th(K ) ∪ T′ has a model (expanding a structure in K ), the Compactness Theorem tells us that Th(K ) ∪ T ' has a model, and hence that Th(K ) has an infinite model. Thus K is not first-order axiomatisable.
The general setting of our proof of the Compactness Theorem has many adaptations in model theory. A structure is built in a well-ordered sequence of steps, and we list in advance what feature of the structure has to be ensured at each step. Typical examples are the construction of models of a theory that omit certain types, the construction of "existentially closed" models of a theory, and the construction of "two-cardinal" models in which some definable parts are large but other definable parts are kept small.
Substructures and Elementary Embeddings
If X is a subset of Y then the inclusion map from X to Y is the function i : X → Y such that i (x ) = x for each element x of X. Let σ be a signature and A a σ-structure. We say that a σ-structure B is a substructure of A, and that A is an extension of B, in symbols A ⊆ B, if
- dom(B ) is a subset of dom(A ),
- the inclusion map from dom(A ) to dom(B ) is an embedding of A into B.
An embedding e : A → B between L -structures (for some first-order language L ) is said to preserve a formula ϕ(x 1,… ,x n ) of L if
A ⊧ ϕ[a 1, … ,a n ] ⇒ B ⊧ ϕ[e (a 1), … ,e (a n )]
for all elements a 1, … , a n of A. We say that e is an elementary embedding if e preserves all formulas of L. If A is a substructure of B and the inclusion map is an elementary embedding, then we say that B is an elementary extension of A and that A is an elementary substructure of B, in symbols A ≼ B. Always A ≼ A. Also if A ≼ B and B ≼ C then A ≼ C. If A ≼ B then A ≡ B.
Two important facts about elementary extension are:
Theorem 6 (a ) Let A be a substructure of the L-structure B such that
for every formula ϕ(x1, … , xn) of L and all elements a1, … , an−1 of A such that B ⊧ ∃xn ϕ[a1, … , an−1] there is b in A such that B ⊧ ϕ[a1, … , an−1,b].
Then A ≼ B.
(b ) (Union of elementary chains ) Suppose α is an ordinal and for every ordinal i < α an L-structure Ai is given, so that Ai ≼ Aj whenever i < j < α. Then writing Aα for the union of all the structures Ai, we have Ai ≼ Aα for all i < α.
Part (a) of Theorem 6 can be used to prove the following important result.
Theorem 7 (Downward Löwenheim-Skolem Theorem) Let L be a first-order language, A an L-structure, X a set of elements of A, and κ an infinite cardinal number which is (i) at least as great as the number of sentences of L, (ii) at least as great as the cardinality of X and (iii) no greater than the cardinality of A. Then A has an elementary substructure of cardinality κ whose domain contains all the elements of X.
This is the result that creates the Skolem paradox. If the axioms of set theory have a model at all, then by this theorem they have a model of countable cardinality, although a sentence expressing "There are uncountably many real numbers" is true in the model!
Part (b) of Theorem 6 is useful for proving a similar result in the other direction. The argument after Theorem 5 adapts to show that every infinite structure has a proper elementary extension. By making repeated elementary extensions and using (b) to take unions at limit ordinals, we reach arbitrarily large elementary extensions.
Theorem 8 (Upward Löwenheim-Skolem Theorem) Let L be a first-order language, A an infinite L-structure and κ a cardinal number which is at least as great as (i) the cardinality of A and (ii) the number of sentences of L. Then A has an elementary extension of cardinality κ.
There is also a more algebraic construction that yields a proof of Theorem 8. It involves taking a cartesian product AI of copies of the structure A and defining a homomorphic image in terms of an ultrafilter D on the set I indexing the copies. The resulting elementary extension AI/D of A is called an ultrapower of A. Ultrapowers also yield a characterisation of ≡:
Theorem 9 The following are equivalent, for any two L-structures A and B :
(a) A ≡ B.
(b) There are a set I and an ultrafilter D on I such that AI/D is isomorphic to BI/D.
Many useful properties of ultrapowers spring from the fact that we can make them "highly saturated," that is to say, realizing many types. One can also use the Compactness Theorem and union of elementary chains to build highly saturated structures.
The upward and downward Löwenheim-Skolem Theorems led to a natural question about first-order theories: How far can a first-order theory restrict its models? Assuming that the theory T is in a countable language and has an infinite model, we know that it has a model in each infinite cardinality. So the tightest restriction possible is that in every infinite cardinality κ, T is κ-categorical, that is, it has a model of cardinality κ but all its models of cardinality κ are isomorphic to each other.
Michael Morley published the following theorem in 1965. Its main importance lies in its proof, which revolutionised the techniques of model theory and began the developments reported in the final part of this article.
Theorem 10 (Morley's Theorem) Let L be a countable first-order language with infinite models, and T a theory in L. If T is κ-categorical for at least one uncountable cardinal κ then T is λ-categorical for all uncountable cardinals λ.
A theory that is κ-categorical for all uncountable κ is said to be uncountably categorical. One major effect of Morley's Theorem was to switch attention from theories to the detailed construction of their models, and a mark of this is that the models of an uncountably categorical theory are now also said to be uncountably categorical. A theory that is κ-categorical in all infinite cardinalities κ is said to be totally categorical, and so are its models.
By linear algebra, the theory of infinite dimensional vector spaces over a given finite field is totally categorical. A well-known theorem of Ernst Steinitz says that any two algebraically closed fields of the same characteristic and transcendence degree are isomorphic, and it follows that the theory of algebraically closed fields of a given characteristic is uncountably but not totally categorical. An answer to Question Three follows as well. Suppose A and B are any two algebraically closed fields of the same characteristic. Choose a cardinal κ greater than the cardinalities of both A and B. By Theorem 8, A and B have elementary extensions A′ and B′ of cardinality κ. Then A′ ≅ B′ by uncountable categoricity, and hence A ≡ A′ ≡ B′ ≡ B.
Interpolation and Definability
Let L be a first-order language containing a relation symbol R. Suppose ϕ is a sentence of L. We say that R is upwards monotone in ϕ if the following holds:
If A and B are two L -structures which are identical except that R A ⊂ R B , and A ⊧ ϕ, then B ⊧ ϕ.
Likewise we say that R is downwards monotone in ϕ if the following holds:
If A and B are two L -structures which are identical except that R A ⊂ R B , and B ⊧ ϕ, then A ⊧ ϕ.
In the late middle ages a relation symbol (a 'term' in the medieval terminology) was described as undistributed in a sentence if it was upwards monotone in the sentence, and distributed if it was downwards monotone in the sentence. For example one can symbolise 'All swans are white' as ∀x (S (x ) → W (x )); in this sentence S is distributed and W is undistributed. ("All swans are white" entails both of the sentences "All Bewick swans are white" and "All swans are non-red". The medievals said that "swans" is distributed and "white" is undistributed in "All swans are white.")
There is a syntactic test for upwards and downwards monotonicity. We say that a formula ϕ is in negation normal form if → and ↔ never occur in ϕ, and ¬ never occurs in ϕ except immediately in front of atomic formulas. Every formula is logically equivalent to one in negation normal form and with the same free variables. For example
(5) ∀x (S (x ) → W (x ))
is logically equivalent to
(6) ∀x (¬S (x ) ∨ W (x )).
When ϕ is in negation normal form, we say that an occurrence of a relation symbol R in ϕ is negative if it has ¬ immediately to the left of it, and positive otherwise. For example the occurrence of S in (6) is negative and the occurrence of W in (6) is positive. The next theorem is a straightforward consequence of the definition of satisfaction.
Theorem 11 Let L be a first-order language, R a relation symbol of L and ϕ a sentence of L in negation normal form. If R has no negative occurrences in ϕ then R is upwards monotone in ϕ. If R has no positive occurrences in ϕ then R is downwards monotone in ϕ.
Since upwards and downwards monotonicity clearly aren't affected when we pass between logically equivalent sentences, Theorem 11 confirms that S is downwards monotone and W is upwards monotone in (5).
Unlike Theorem 11, the next theorem is deep. It is known as Lyndon's Interpolation Theorem, after Roger Lyndon who published it in 1959.
Theorem 12 Let L be a first-order language and ϕ, ψ sentences of L in negation normal form, such that ϕ entails ψ. Then there is a sentence θ of L in negation normal form such that
- ϕ entails θ and θ entails ψ,
- any relation symbol (apart from = ) with a positive occurrence in θ has positive occurrences in both ϕ and ψ, and
- any relation symbol (apart from = ) with a negative occurrence in θ has negative occurrences in both ϕ and ψ.
The sentence θ in the theorem is called a Lyndon interpolant.
The following immediate consequence of Lyndon's Interpolation Theorem is called Craig's Interpolation Theorem. It was proved by William Craig before Lyndon's theorem was known.
Theorem 13 Let L be a first-order language and ϕ, ψ sentences of L such that ϕ entails ψ. Then there is a sentence θ of L such that
- ϕ entails θ and θ entails ψ,
- any relation symbol (apart from = ) that occurs in θ occurs in both ϕ and ψ.
The sentence θ here is called the Craig interpolant.
We give two applications of these interpolation theorems.
laws of distribution
A syllogistic sentence is a sentence of one of the forms ∀x (R (x ) → S (x ), ∀x (R (x ) → ¬S (x )), ∃x (R (x )∧S (x )) and ∃x (R (x ) ∧ ¬S (x )), where R and S are different relation symbols. For each syllogistic sentence and each relation symbol in it, Theorem 11 tells us that the relation symbol is distributed or that it is undistributed. A syllogism is a sequent of the form ϕ, ψ⊧ χ where each of ϕ, ψ and χ is a syllogistic sentence, three relation symbols are used, and each of them occurs in two sentences. For example one syllogism is
(7) ∀x (P (x ) → Q (x )), ∀x (R (x ) → Q (x )) ⊧ ∃x (P (x ) ∧ R (x )).
This syllogism happens to be invalid, but there are many examples of valid syllogisms.
Late medieval logicians looked for criteria to tell when a syllogism is valid. Two of their criteria were the following, known as the laws of distribution :
If a relation symbol occurs in both premises, then it must be distributed in at least one of them. If a relation symbol occurs in a premise and the conclusion, and is undistributed in the premise, it must be undistributed in the conclusion.
Why do these criteria work? The answer is Lyndon's Interpolation Theorem. We illustrate with the invalid syllogism (7) above, which fails the first distribution law by having Q undistributed in both premises. The same recipe works for all cases.
Suppose for contradiction that (7) is valid. Then after a small rearrangement we have
∀x (P (x ) → Q (x )) ⊧ ∀x (R (x ) → Q (x )) → ∃x (P (x ) ∧ R (x )).
Convert the sentences to negation normal form, and let θ be a Lyndon interpolant for the resulting sequent. Since Q occurs only positively on the left and only negatively on the right, it never occurs in θ. So we can introduce a new relation symbol Q′, and we have a valid sequent
∀x (P (x ) → Q′ (x )) ⊧ θ.
Hence by combining the sequents again we infer that the sequent
∀x (P (x ) → Q′ (x )) ⊧ ∀x (R (x ) → Q (x )) → ∃x (P (x ) ∧ R (x ))
is valid, and hence so is
∀x (P (x ) → Q′ (x )), ∀x (R (x ) → Q (x )) ⊧ ∃x (P (x ) ∧ R (x )).
But it can't be because the two premises have no relation symbols in common and hence can't establish any non-trivial relationship between P and R.
The Port-Royal Logic of Arnauld and Nicole (1662) explains that (7) is invalid because Q "may be taken for two different parts of the same whole" (Rule I in their III.iii). This is vague and not properly justified, but one can see our argument above as a repair of the Port-Royal argument.
explicit and implicit definability
Suppose L is a first-order language, one of whose symbols is a relation symbol R of arity n, and T is a theory in L. One can ask whether R is redundant in T, in the sense that in any model A of T the relation R A is determined by the rest of A. Here are two different ways of making this notion of redundancy precise. (We write σ for the signature of the language L but with R removed.)
(a) T has a consequence of the form
(8) ∀x 1 …∀x n R (x 1, … , x n ) ↔ ϕ(x 1, … , x n ))
where ϕ is a formula of L (σ).
(b) Whenever A and B are two L -structures that are models of T, and A |σ = B |σ, we have A = B.
When (a) holds we say that R is explicitly definable in T, and the sentence (8) is called an explicit definition of R in T. When (b) holds we say that R is implicitly definable in T.
It turns out that explicit definability and implicit definability are equivalent. (This is for first-order logic; part (b) in the theorem below fails for many other logics.)
Theorem 14 Let L be a first-order language, R a relation symbol of L and T be a theory in L.
(a) If R is explicitly definable in T then R is implicitly definable in T.
(b) If R is implicitly definable in T then R is explicitly definable in T.
Part (a) of the theorem, or more strictly its contrapositive, is known as Padoa's method, after Alessandro Padoa who was a researcher in Giuseppe Peano's school around 1900. The proof is straightforward.
Part (b) is called Beth's Theorem. It was proved by Evert Beth in 1953, but the following derivation from Craig's Interpolation Theorem is due to Craig. Assume that R is implicitly definable in T. Let T′ be T but written with a new relation symbol R′ in place of R, and let L + be L with R′ added. Then the statement that R is implicitly definable in T implies the following:
Suppose an L +-structure A is a model of T ∪ T′. Then R A ⊆ R′ A .
We can rewrite this as a sequent:
(9) T ∪ T′ ⊧ ∀x 1 …∀x n (R (x 1, … , x n ) → R′ (x 1, … , x n )).
Add n new constants c 1, … , c n to the language L +. By (9), using Metatheorem 10 of the entry "First-Order Logic",
(10) T ∪ T′ ⊧ R (c 1, … , c n ) → R′ (c 1, … , c n ).
Now by the Compactness Theorem there are finite subsets U, U′ of T, T′ respectively, such that
(11) U ∪ U′ ⊧ R (c 1, … , c n ) → R′ (c 1, … , c n ).
Adding sentences to U and U′ if necessary, we can suppose that U′ is the same as U except that R is replaced by R′. Write ψ for the conjunction of the sentences in U, and ψ' for ψ with R replaced by R′. Then after some rearrangement (11) gives
(12) ψ ∧ R (c1, … , c n ) ⊧ ψ' → R′ (c1, … , c n ).
Now apply Craig's Interpolation Theorem to find an interpolant θ such that the following sequents are valid:
(13) ψ ∧ R (c 1, … , c n ) ⊧ θ,
(14) θ ⊧ ψ' → R′ (c 1, … , c n ).
Since R occurs only on the left of (12) and R′ occurs only on the right, neither symbol occurs in θ. So by (14) we have the following valid sequent:
(15) θ ⊧ ψ→ R (c 1, … , c n ).
By (13) and (15),
ψ ⊧ R (c 1, … , c n ) ↔ θ.
Now let ϕ be θ but with each constant c i replaced by the variable x i . (If necessary we first change the bound variables of θ so that no quantifier in θ binds any of these variables x i .) Then by Metatheorem 10 again, we have
(16) T ⊧ ∀x 1 …∀x n (R (x 1, … , x n )↔ ϕ(x 1, … , x n ))
as claimed.
There are many model-theoretic results that are close relatives of the examples above, either in their statements or in their proofs. For example a preservation theorem is a theorem stating that some syntactic condition is necessary and sufficient for a formula to be preserved under some algebraic operation. Here follows a typical preservation theorem. We say that a formula ϕ is a ∀ formula if ϕ has the form ∀y 1 …∀y n ψ where ψ is quantifier-free.
Theorem 15 (Łoś-Tarski Theorem) Let L be a first-order language, ϕ(x1, … , xn) a formula of L and T a theory in L. Then the following are equivalent:
(a) For every embedding e : A → B between models of T and all elements a1, … , an of A,
B ⊧ ϕ[e (a 1), … , e (a n )] ⇒ A ⊧ ϕ[a 1, … , a n ].
(b) ϕ is equivalent modulo T to a ∀ formula ψ(x 1, … , x n ).
Extensions of First-Order Logic
The structures that we defined above are sometimes called first-order structures because of their connection with first-order languages. There are other kinds of structure that have analogous connections with other kinds of formal language. Here are two important examples.
(a) Suppose L is a first-order language. Let I be a nonempty set carrying a binary relation R, and for each element of I let A i be an L -structure. Then, given an element i of I and a sentence ϕ of L, we can ask whether A j ⊧ ϕ for all j such that Rij. We can introduce a sentence □ϕ which counts as true in A i if and only if the answer is Yes. Indexed families of structures of this type appear in modal logic, temporal logic and some logics of action.
(b) We can consider structures with two domains, where the second domain is a set of subsets of the first domain. Structures of this kind appear in second-order logic, where we have first-order and second-order variables ranging respectively over the first domain and the second domain. They are also found in topological logics, where the second domain contains (for example) a base of open sets for a topology over the first domain.
Sortal structures and languages are a less drastic extension. A sortal signature lists a set of "sorts" and may put restrictions on the function symbols in terms of the sorts. Each sortal structure with this signature has a family of domains, one for each sort. The corresponding sortal language has separate variables for each sort. Sortal structures have some natural mathematical applications. For example a vector space involves a set of vectors and a set of scalars; we can multiply a vector by a scalar, but in general we can't multiply two vectors. So it is natural to work with one sort for vectors and another sort for scalars, and to restrict multiplication so that two vectors can't be multiplied.
If the only changes we make in passing from first-order to sortal are those just described, then the resulting languages behave very much as ordinary first-order languages, and the model theory of sortal structures and languages is hardly distinguishable from ordinary first-order model theory. If we put further restrictions the situation may change; for example if we require that the elements of one sort are exactly the sets of elements of some other sort, then we move into second-order logic.
But even for ordinary first-order structures we need not restrict ourselves to first-order languages. For example we can add to first-order logic
- quantifiers Q κx that express 'There are at least κ elements x such that …';
- infinitary conjunctions of formulas ∧i <κϕi meaning 'ϕ0 and ϕ1 and …', and likewise infinitary disjunctions;
- transitive closure operators that express, given a formula ϕ(x,y ), the property 'There is a finite sequence a 1, … , a n such that ϕ(x,a 1) and ϕ(a 1,a 2) and ϕ(a 2,a3) and … and ϕ(a n −1,a n ) and ϕ(a n ,y )'.
For example the models of the infinitary disjunction
are exactly the finite structures. In section 5 we saw that there is no first-order sentence defining this class.
Some of these extensions of first-order logic, using first-order structures, have an elegant and well-developed model theory. But the general truth seems to be that none of them work as smoothly as first-order logic. In 1969 Per Lindström proved some theorems that capture this fact. He showed that if a logic ℒ contains first-order logic and obeys some of the metatheorems that hold for first-order logic, then ℒ must be equivalent to first-order logic, in the sense that for every sentence ϕ of ℒ there is a first-order sentence with exactly the same models as ϕ. For example
Theorem 16 Let ℒ be a logic that contains first-order logic. Suppose that ℒ has the properties:
(a) Every sentence of ℒ with infinite models has a model that is at most countable.
(b) If T is a set of sentences of ℒ and every finite subset of T has a model then T has a model.
Then ℒ is equivalent to first-order logic.
A fuller account would explain more precisely what is meant by a "logic that contains first-order logic".
Stability and Geometric Model Theory
Mathematical model theory has become a highly sophisticated branch of mathematics. The items below can be no more than pointers.
Morley's Theorem (Theorem 10) created a new paradigm for model theory. He made it possible to show that a purely model-theoretic condition—uncountable categoricity—on a theory T imposes some strong structural features on all the models of T. Each such model must contain a strongly minimal set, which is a set carrying a dependence relation that behaves like linear or algebraic dependence. In particular the strongly minimal set has a dimension in the same sense as a vector space, and the strongly minimal set is determined up to isomorphism by its dimension and T. (Steinitz' Theorem in section 6 is a special case of this fact, since every algebraically closed field is a strongly minimal set.) The rest of the model is very tightly constructed around the strongly minimal set. We can define a function assigning a "rank" to each set definable with parameters in the model; this Morley rank is a generalisation of Krull dimension and it allows a very detailed analysis of the model.
Much of the work following on from Morley's Theorem organised itself around one or other of two heuristic principles, known as Shelah's dichotomy and Zilber's trichotomy. Both of these principles rest on the fact that uncountably categorical theories are "good" in the sense that their classes of models are well-behaved. Both of them have been proved as theorems in certain cases.
For Saharon Shelah, "good" theories form one end of a scale from good to bad. There are several ways that a theory can be bad. One is that it has too many non-isomorphic models to allow any kind of cataloguing by invariants. Another is that it has models that are not isomorphic but are hard to tell apart. Shelah's policy is that at each point of the scale from good to bad, one should aim to maximise the difference between the theories on the "good" side and those on the "bad" side. On the "good" side one should aim to find as much good behaviour as possible, in terms of dependence relations, definability properties, rank functions, and so forth. On the "bad" side one should aim to construct intractable families of models, for example large families of models none of which is elementarily embeddable in any other. Though he applies this principle at all points of the scale, he also identified a main gap between the good side and the bad side, and when one speaks of "Shelah's dichotomy" one often has this particular gap in mind.
Shelah created a powerful body of techniques for handling models of theories towards the "good" end of the scale. Together with Morley's work it forms the bulk of stability theory. Shelah has also done a large amount of work towards eliminating the restriction to first-order theories. He has suggested several abstract frameworks, for example excellent classes, in which there is no counterpart of the Compactness Theorem but some techniques of stability theory still work.
Boris Zilber is more interested in exploiting the "goodness" of the good end of the scale. He is convinced that uncountably categorical structures should all be mathematically interesting, and in fact that they should all be equivalent, up to model-theoretic interpretation, to structures of interest in "classical" mathematics. So he set out to catalogue them, and in the early 1980s he pointed out a natural three-way division of uncountably categorical structures. The division rests on the dependence relation of the strongly minimal set. In the first place there are structures where this relation is "trivial". If it is not trivial, one looks at the lattice of closed sets under the dependence relation. The second class is where this lattice is modular, the third is where it is non-modular. So we have a division of uncountably categorical structures into trivial, modular, and non-modular. The trivial structures are now essentially all known. Modularity turns out to be a strong property, guaranteeing that the structure contains an infinite definable abelian group which exerts a controlling influence; broadly speaking, modular structures exhibit linear algebra.
Zilber conjectured that all non-modular uncountably categorical structures are (up to model-theoretic interpretation) algebraically closed fields. Several pieces of evidence pointed in this direction, notably (i) Angus Macintyre's observation in 1971 that an uncountably categorical field must be algebraically closed, and (ii) observations by Zilber himself and Greg Cherlin that uncountably categorical groups behave remarkably like algebraic groups over an algebraically closed field. Zilber's trichotomy is the division into trivial, modular and non-modular, together with the conjecture that the non-modular structures are algebraically closed fields (up to interpretation). Zilber's trichotomy has been proved to hold for Zariski geometries ; these are uncountably categorical structures that obey an axiomatisation of the Zariski topology. Ehud Hrushovski saw how to use this fact to solve some major open problems of diophantine geometry, for example proving the Mordell-Lang conjecture for function fields in all characteristics. (His proof in characteristic 0 has since been simplified by replacing the Zariski geometries by differential jet spaces). In 1998 Ya'acov Peterzil and Sergei Starchenko showed that a version of Zilber's trichotomy is true for o-minimal fields.
In 1989 Hrushovski found counterexamples to Zilber's conjecture: uncountably categorical non-modular structures containing no infinite field. At first Hrushovski's examples were mysterious. But Zilber was sure that they must have classical interest, and after some years he discovered structures of Hrushovski's type arising in complex analysis. He also pointed out a close link between Hrushovski's construction and Schanuel's Conjecture in number theory. At the same time Zilber gave examples from complex analysis to illustrate Shelah's excellent classes, thus bringing together two separate lines of research in model theory.
Through the 1990s a body of results converged to show that tools of stability theory are useful in contexts far outside those of 'good' first-order theories. In fact the complete theory Th(A ) of a structure A can be largely irrelevant to the application of these tools to A. The need to translate classical descriptions into first-order sentences had always been a practical obstacle to integrating model theory with classical mathematics, and this need seemed to be receding in part. In this context Ludwig Faddeev, after reading the relevant papers presented to the International Congress of Mathematicians in Beijing in 2002, said at the closing ceremony
Take for instance the sections of logic, number theory and algebra. The general underlining mathematical structures as well as language, used by speakers, were essentially identical.
See also Arnauld, Antoine; Computability Theory; Craig's Theorem; First-Order Logic; Hintikka, Jaakko; Infinitesimals; Infinity in Mathematics and Logic; Logic, History of, overview article; Modal Logic; Montague, Richard; Nicole, Pierre; Peano, Giuseppe; Second-Order Logic; Semantics; Set Theory; Tarski, Alfred.
Bibliography
Arnauld, Antoine, and Pierre Nicole. La logique ou l'art de penser. Paris: Gallimard, 1992.
Barwise, Jon, and Solomon Feferman. Model-Theoretic Logics. New York: Springer, 1985.
Bell, John L., and Alan B. Slomson. Models and Ultraproducts. Amsterdam: North-Holland, 1969.
Börger, Egon, and Robert Stärk. Abstract State Machines: A Method for High-level Design and Analysis. Berlin: Springer, 2003.
Buechler, Steven. Essential Stability Theory. Berlin: Springer, 1996.
Chang, Chen-Chung, and H. Jerome Keisler. Model Theory. 3rd ed. Amsterdam: North-Holland, 1990.
Doets, Kees. Basic Model Theory. Stanford, CA: CSLI Publications, 1996.
Dries, Lou van den. Tame Topology and O-minimal Structures. Cambridge, U.K.: Cambridge University Press, 1998.
Ebbinghaus, Heinz-Dieter, and Jörg Flum. Finite Model Theory. 2nd ed. Berlin: Springer, 1999.
Ebbinghaus, Heinz-Dieter, Jörg Flum, and Wolfgang Thomas. Mathematical Logic. New York: Springer, 1984.
Haskell, Deirdre, Anand Pillay, and Charles Steinhorn, eds. Model Theory, Algebra, and Geometry. Cambridge, U.K.: Cambridge University Press, 2000.
Hodges, Wilfrid. Model Theory. Cambridge, U.K.: Cambridge University Press, 1993.
Manzano, Maria. Model Theory. Oxford: Oxford University Press, 1999.
Marcja, Annalisa, and Carlo Toffalori. A Guide to Classical and Modern Model Theory. Dordrecht: Kluwer, 2003.
Marker, David. Model Theory: An Introduction. New York: Springer, 2002.
Morgan, Mary S., and Margaret Morrison. Models as Mediators. Cambridge, U.K.: Cambridge University Press, 1999.
Pillay, Anand. Geometric Stability Theory. Oxford: Clarendon Press, 1996.
Robinson, Abraham. "On the Application of Symbolic Logic to Algebra." In Proceedings of International Congress of Mathematicians. Cambridge, MA. 1950. Reprinted in Selected Papers of Abraham Robinson. Vol. 1, Model Theory and Algebra, edited by H. Jerome Keisler, 3–11. Amsterdam: North-Holland, 1979.
Rothmaler, Philipp. Introduction to Model Theory. Amsterdam: Gordon and Breach, 2000.
Tarski, Alfred. "Contributions to the Theory of Models, I." Indagationes Mathematicae 16 (1954): 572–581.
Tarski, Alfred, and Robert L. Vaught. "Arithmetical Extensions of Relational Systems." Compositio Mathematica 13 (1957): 81–102.
Tatsien, Li, ed. Proceedings of the International Congress of Mathematicians, Beijing 2002. Vols. I, II. Beijing: Higher Education Press, 2002. Available from http//:math.ucdavis.edu/ICM2002.
Wilfrid Hodges (2005)