116,99 €
Learn the skills and acquire the intuition to assess the theoretical limitations of computer programming Offering an accessible approach to the topic, Theory of Computation focuses on the metatheory of computing and the theoretical boundaries between what various computational models can do and not do--from the most general model, the URM (Unbounded Register Machines), to the finite automaton. A wealth of programming-like examples and easy-to-follow explanations build the general theory gradually, which guides readers through the modeling and mathematical analysis of computational phenomena and provides insights on what makes things tick and also what restrains the ability of computational processes. Recognizing the importance of acquired practical experience, the book begins with the metatheory of general purpose computer programs, using URMs as a straightforward, technology-independent model of modern high-level programming languages while also exploring the restrictions of the URM language. Once readers gain an understanding of computability theory--including the primitive recursive functions--the author presents automata and languages, covering the regular and context-free languages as well as the machines that recognize these languages. Several advanced topics such as reducibilities, the recursion theorem, complexity theory, and Cook's theorem are also discussed. Features of the book include: * A review of basic discrete mathematics, covering logic and induction while omitting specialized combinatorial topics * A thorough development of the modeling and mathematical analysis of computational phenomena, providing a solid foundation of un-computability * The connection between un-computability and un-provability: Gödel's first incompleteness theorem The book provides numerous examples of specific URMs as well as other programming languages including Loop Programs, FA (Deterministic Finite Automata), NFA (Nondeterministic Finite Automata), and PDA (Pushdown Automata). Exercises at the end of each chapter allow readers to test their comprehension of the presented material, and an extensive bibliography suggests resources for further study. Assuming only a basic understanding of general computer programming and discrete mathematics, Theory of Computation serves as a valuable book for courses on theory of computation at the upper-undergraduate level. The book also serves as an excellent resource for programmers and computing professionals wishing to understand the theoretical limitations of their craft.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 738
Veröffentlichungsjahr: 2014
Copyright © 2012 by John Wiley & Sons, Inc. All rights reserved.
Published by John Wiley & Sons, Inc., Hoboken, New Jersey.Published simultaneously in Canada.
No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permission.
Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representation or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.
For general information on our other products and services please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002.
Wiley also publishes its books in a variety of electronic formats. Some content that appears in print, however, may not be available in electronic formats. For more information about Wiley products, visit our web site at www.wiley.com.
Library of Congress Cataloging-in-Publication Data:
Tourlakis, George J.Theory of computation / George Tourlakis. p. cm.
Includes bibliographical references and index.ISBN 978-1-118-01478-3 (hardback)1. Computable functions. 2. Functional programming languages. I. Title.QA9.59.T684 2012
511.3’52—dc23
2011051088
10 9 8 7 6 5 4 3 2 1
Preface
At the intuitive level, any practicing mathematician or computer scientist —indeed any student of these two fields of study— will have no difficulty at all to recognize a computation or an algorithm, as soon as they see one, the latter defining, in a finite manner, computations for any given input. It is also an expectation that students of computer science (and, increasingly nowadays, of mathematics) will acquire the skill to devise algorithms (normally expressed as computer programs) that solve a variety of problems.
But how does one tackle the questions “is there an algorithm that solves such and such a problem for all possible inputs?” —a question with a potentially “no” answer— and also “is there an algorithm that solves such and such a problem via computations that take no more steps than some (fixed) polynomial function of the input length?” —this, too, being a question with a, potentially, “no” answer.
Typical (and tangible, indeed “interesting” and practically important) examples that fit the above questions, respectively, are
“is there an algorithm which can determine whether or not a given computer program (the latter written in, say, the C-language) is
correct
?”
1
and
“is there an algorithm that will determine whether or not any given Boolean formula is a tautology, doing so via computations that take no more
steps
than some (fixed) polynomial function of the input length?”
For the first question we have a definitive “no” answer,2 while for the second one we simply do not know, at the present state of knowledge and understanding of what “computing” means.3
But what do we mean when we say that “there is no algorithm that solves a given problem” —with or without restrictions on the algorithm’s computation lengths? This appears to be a much harder statement to validate than “there is an algorithm that solves such and such a problem” —for the latter, all we have to do is to produce such an algorithm and a proof that it works as claimed. By contrast, the former statement implies a, mathematically speaking, provably failed search over the entire set of all algorithms, while we were looking for one that solves our problem.
One evidently needs a precise definition of the concept of algorithm that is neither experiential, nor technology-dependent in order to assert that we encountered such a failed “search”. This directly calls for a mathematical theory whose objects of study include algorithms (and, correspondingly, computations) in order to construct such sets of (all) algorithms within the theory and be able to reason about the membership problem of such sets. This theory we call the theory of computation. It contains tools which, in principle, can “search”4 the set of all algorithms to see whether a problem is solvable by one; or, more ambitiously, to see if it can be solved by an algorithm whose computations are “efficient” —under some suitable definition of efficiency.
The theory of computation is the metatheory of computing. In the field of computing one computes: that is, develops programs and large scale software that are well-documented, correct, efficient, reliable and easily maintainable. In the (meta) theory of computing one tackles the fundamental questions of the limitations of computing, limitations that are intrinsic rather than technology-dependent.5 These limitations may rule out outright the existence of algorithmic solutions for some problems, while for others they rule out efficient solutions.
Our approach is anchored on the concrete (and assumed) practical knowledge about general computer programming attained by the reader in a first year programming course, as well as the knowledge of discrete mathematics at the same level. The next natural step then is to develop the metatheory of general computing, building on the computing experience that we have assumed the reader attained. This will be our chapter on computability, that is, the most general metatheory of computing. We de velop this metatheory via the programming formalism known as Shepherdson-Sturgis Unbounded Register Machines (URM) —which is a straightforward abstraction of modern high level programming languages. Within that chapter we will also explore a restriction of the URM programming language, that of the loop programs of A. Meyer and D. Ritchie. We will learn that while these loop programs can only compute a very small subset of “all the computable functions”, nevertheless are significantly more than adequate for programming solutions of any “practical”, computationally solvable, problem. For example, even restricting the nesting of loop instructions to as low as two, we can compute —in principle— enormously large functions, which with input x can produce outputs such as
(1)
The chapter —after spending considerable care in developing the technique of reductions— concludes by demonstrating the intimate connection between the unsolvability phenomenon of computing on one hand, and the unprovability phenomenon of proving within first-order logic (cf. Gödel (1931)) on the other, when the latter is called upon to reason about “rich” theories such as (Peano’s) arithmetic —that is, the theory of natural numbers, equipped with: the standard operations (plus, times); relations (less than); as well as with the principle of mathematical induction.
What to include and what not to include in an introductory book on the theory of computation is a challenge that, to some extend, is resolved by the preferences of the author. But I should like to think that the choices of topics made in this volume are more rational than simply being manifestations of “preference”.
The overarching goal is to develop for the reader a “first-order” grounding in the fundamentals, that is, the theoretical limitations of computing in its various models of computation, from the most general model —the URM— down to the finite automaton.
We view the technique of reductions as fundamental in the analysis of limitations of computing, and we spend a good deal of space on this topic, a variant of which (polynomial-time reductions) the student of computer science will encounter in Subsection 5.1.2 and will re-encounter in later studies as well, for example, in a course on algorithms and complexity. On the other hand, we do not hesitate to omit combinatorial topics such as “Post’s correspondence problem”, which only leads to specialized results (e.g., the algorithmic unsolvability of detecting ambiguity in context free languages) that we feel embody a less fundamental technical interest. Our emphasis is on laying the foundational tools and concepts that allow us to carry out a mathematical analysis of, and acquire a thorough understanding of, theoretical limitations of computing in both their absolute manifestation (uncomputability) and also in their relative manifestation (complexity and “intractability”).
Consistent with our stated goal and emphasis, we purposely give short shrift to the area of so-called “positive” results, apart from a few familiarization examples of “programming” with URMs, loop programs, FA, NFA, and PDA. This is not a course about writing algorithms, but mostly about what algorithms cannot do at all and about what they have a lot of trouble doing. For example, results of Chapter 5 immediately imply that, in general, FORTRAN-like programs that allow nesting of the loop instruction equal to just three have highly impractical run times; certainly as high as6
Thus, we leave out “applications” such as lexical scanners via finite automata; automata-minimization; parsing of context free languages using LL, LR, recursive descend, and other parsers; and defer them to a later course on compiler writing tools —these topics do not belong here. We would rather concentrate on what is foundationally important and omit what is not.
Another challenge is where to start building this metatheory. What should be our abstraction of a computer program? It should be a straightforward observation that since this metatheory, or “theory” as we nickname it, abstracts computing practices —in order to analyze and study said abstractions mathematically— the student must have encountered in the first instance the concrete counterparts of these abstractions for the latter to make any sense.
It is hardly the case that, prior to the second year of study, students have “pro grammed” scanners or parsers. Rather, students have programmed solutions for less specialized problems, using a high level general purpose language such as C/C++, Java, possibly Pascal, etc. They never programmed an automaton, a push-down automaton, or anything like a Turing machine (unless they have taken up machine language in the first year).
Yet the overwhelming majority of the literature develops the “theory of computation”, in a manner of speaking, backwards —invariably starting with the theory of finite automata, as if automata is precisely what the reader was programming in his7 first university course on programming. We apply this principle: Before the student studies the (meta) theory, he must have attained a good grasp of the practice that this theory attempts to dissect and discuss. Thus, it is natural to start our story with the (meta)theory of general purpose computer programs.
Because of these considerations, our first chapter is on URMs and computability. The choice of URMs as an abstraction of general-purpose computing —a relative latecomer (cf. Shepherdson and Sturgis (1963)) in the search for a good answer to “what would be a good technology-independent model of computation?”— also connects well with the experience of the student who will come to this course to learn what makes things tick in programming, and why some things do not tick at all. He most likely learned his programming via a high level language like C or Java rather than through machine language. The ubiquitous Turing machine (Turing (1936, 1937)) is more like machine language, indeed, is rather even less user-friendly.8 It offers no advantage at this level of exposition, and rather presents an obscure and hard-to-use (and hard to “arithmetize”9) model of computation that one need not use as the basis of computability. On the other hand it lends itself well to certain studies in complexity theory and is an eminently usable tool in the proof of Cook’s theorem (cf. Subsection 5.1.3). So we will not totally avoid the Turing machine!
We turn to the formulaic topics of a book on Automata and Languages — Chapter 3— only after we become familiar, to some extent, with the (general) computability theory, including the special computability theory of more “practical” functions, the primitive recursive functions. Automata are introduced as a very restricted programming formalism, and their limitations (in expressivity) and their associated languages are studied.
It is often said, with justification, that a course in theory of computation has as side-effect the firming up of the student’s grasp of (discrete) mathematical techniques and mathematical reasoning, as well as the ability to apply such techniques in computer science and beyond. Of course, it cannot be emphasized enough that the student of a theory of computation course must be equipped already with the knowledge expected to be acquired by the successful completion of a one-semester course on discrete mathematics. This required background knowledge is often encapsulated, retold, and aspects of it are emphasized, in the space of a few pages at the front-end of a book like this. This is the ubiquitous “Chapter 0” of many books on the subject. In the case of the present book I would like, most of all, to retell two stories, logic and induction, that I often found being insufficiently developed in the student’s “toolbox”, notwithstanding earlier courses he may have taken. Thus, in Subsection 1.1.1 we develop the notational and how-to parts of elementary predicate logic in the space of some 20 pages, paying special attention to correctness of exposition. Section 1.4 presents the induction principle on the natural numbers in two steps: One, how to use its various forms, and a proof of their equivalence to the least (positive) integer principle. Two, we argue, at the intuitive level, why induction must be a valid principle after all!10 We also go over concepts about sets and related notation, as well as relations and functions, very quickly since they do not need much retelling. We will also introduce quickly and in an elementary fashion a topic likely not encountered by the reader in the typical “discrete math” course: the distinction between two infinities —countable and uncountable— so that we can have an excuse to introduce the reader to Cantor’s ingenious (and simple) diagonalization argument, that recurs in one or another shape and form, over and over, in the computability and complexity part of the theory of computation.
On intuitive arguments; “formalization”and why a course in theory cannot be taught exclusively by hand-waving: The main reason that compels us to teach (meta)theory in a computer science curriculum is not so much to prevent the innocent from trying to program a solution for the halting problem (cf. 2.5.0.16), just as we do not teach courses in geometry just to prevent circle-squaring “research”. Rather, formal mathematical methods used in a course in the theory of computation, more so than the results themselves, are transferable skills that the student becomes endowed with, which equip him to model and mathematically analyze concrete phenomena that occur in computation, and through a mathematical process of reasoning to be able to recognize, understand, and correlate such phenomena. These formal methods, skills and results, put the “science” keyword into computer science.
Intuition, obtained through experience, is invaluable, of course, and we often argue intuitively before we offer a proof of a fact. But: one cannot have “proof-by-intuition”.
We have included in this volume a good amount of complexity theory that will likely be mostly skipped whenever the book is called upon to serve a second year course on the theory of computation. There are a few “high level complexity” results already in Section 2.7 using diagonalization (cf. 2.7.1.9 and 2.7.1.11). Later, quite a bit is developed in Chapter 5, including the concept of -completeness and Cook’s theorem; an account of Cobham’s class of feasibly computable functions (mostly delegated to the Exercises section, 5.3); and some elements of the hierarchy theory of the primitive recursive functions culminating in the rather startling fact that we cannot algorithmically solve the correctness problem of FORTRAN-like programs even if we restrict the nesting of loops to just two levels. FORTRAN-like languages have as abstract counterpart the loop programs of Meyer and Ritchie (1967) that we study in the chapters on computability (2nd) and complexity (5th).
Were I to use this book in a second year course in the theory of computation I would skim quickly over the mathematical “prerequisites” chapter, and then cover 2.1-2.7, parts of 2.10, certainly Gödel’s incompleteness theorem and its relation to uncomputability: 2.11 —but not 2.11.1. I would then cover only as much as time permits from Chapter 3 on finite automata; certainly the pumping lemma, consistent with my view that this is a “course” about what cannot be done, or cannot be done “easily”, rather than a toolbox for how to do things. The latter is deferred to a course and book on algorithms.
In a more advanced course where one can proceed faster, I would want also to cover the sections on creative sets and the recursion theorem, and also as much complexity theory as possible from Chapter 5, starting with the material leading to Cook’s theorem.
The reader will forgive the many footnotes, which some will assess as bad style! There is always a story within a story, the “... and another thing ...”, that is best delegated to footnotes.
The style of exposition that I prefer is informal and conversational and is expected to serve well not only the readers who who have the guidance of an instructor, but also those readers who wish to learn the elements of the theory of computation on their own. I use several devices to promote understanding, such as frequent “pauses” that anticipate questions and encourage the reader to rethink an issue that might be misunderstood if read but not studied and reflected upon. Additionally, I have included numerous remarks, examples and embedded exercises (the latter in addition to the end-of-chapter exercises) that reflect on a preceding definition or theorem. All pauses are delimited by “Pause.” and
The stylized “winding road ahead” warning, , that I first saw in Bourbaki’s books (Bourbaki (1966)) and have used in my other books, delimits a passage that is too important to skim over.
On the other hand, I am using to delimit passages that I could not resist including, but, frankly, can be skipped (unless you are curious).
There are over 200 end-of-chapter exercises and 41 embedded ones. Many have hints and thus I refrained from (subjectively) flagging them for level of difficulty. After all, as one of my mentors, Alan Borodin, used to say to us (when I was a graduate student at the University of Toronto), “attempt all exercises; but definitely do the ones you cannot do”.
GEORGE TOURLAKIS
TorontoNovember 2011
1A “correct” program produces, for every input, precisely the output that is expected by an a priori specification.
2There is some interesting “small print” here! As long as the concept of algorithm is identified with that of, say, the Shepherdson-Sturgis “machines” of this volume —or for that matter with Turing machines— then the answer is definitely a “no:” There is a simple mathematical proof that we will see later on, that no Shepherdson-Sturgis machine (nor a Turing machine) exists that solves the problem. Now, such an identification has been advocated by Alonzo Church as part of his famous belief known as “Church’s Thesis”. If one accepts this identification, then the result about the non-existence of a Shepherdson-Sturgis machine that solves the problem is tantamount to the non-existence of an algorithm that does so. However, Church’s “thesis” is empirical, rather than provable, and is not without detractors; cf. Kalmár (1957). Suffice it to say that this statement is mathematically valid: No program, written in any programming language, which is equivalent in expressive power to that of our Shepherdson-Sturgis machines, exists that solves the problem.
3There is substantial evidence that the answer, if discovered, will likely be “no”.
4The quotes are necessary since it is not precisely a search that one performs. For example, the unsolvability —by any algorithm— of the program correctness problem is based on a so-called reduction technique that we will learn in this volume. A reduction basically establishes that a problem A is solvable by algorithmic means if we assume that we have a “black-box” algorithmic solution —that we may “call” just as we call a built-in function— of another problem, B. We say that “A is reduced (or reducible) to B”. If we now know (say via a previous mathematical proof of the fact) that A cannot be algorithmically solved, then nor can B! We will, as a starting point, show the unsolvability by algorithmic means, certainly not by any Shepherdson-Sturgis machine, of a certain “prototype” problem, known as the halting problem, “x G KV?”. This will be done by a technique akin to Cantor’s diagonalization. After this, many reduction arguments are effected by showing that K is reducible to a problem A. This renders A unsolvable!
5However this metatheory is called by most people “theory”. Hence the title of this volume.
6See 5.2.0.47 and 5.2.0.49. L3 programs have run times bounded by Ackermann’s , for some k > 0.
7Pronouns such as “he”, “his”, “him” are, by definition, gender-neutral in this volume and are used solely for textual convenience.
8Machine language can manipulate numbers, whereas a Turing machine can only manipulate digits!
9This verb will make sense later.
10In so doing I will be sure to let the student know that I am not squaring the circle: Induction is not a provable principle of the arithmetic of Peano, it is an axiom. However, this will not stop us from arguing its plausibility, i.e., why it is a reasonable, “natural” axiom.
CONTENTS
Preface
1 Mathematical Foundations
1.1 Sets and Logic; Naïvely
1.2 Relations and Functions
1.3 Big and Small Infinite Sets; Diagonalization
1.4 Induction from a User’s Perspective
1.5 Why Induction Ticks
1.6 Inductively Defined Sets
1.7 Recursive Definitions of Functions
1.8 Additional Exercises
2 Algorithms, Computable Functions and Computations
2.1 A Theory of Computability
2.2 A Programming Formalism for the Primitive Recursive Functions
2.3 URM Computations and their Arithmetization
2.4 A Double Recursion that Leads Outside the Primitive Recursive Function Class
2.5 Semi-computable Relations; Unsolvability
2.6 The Iteration Theorem of Kleene
2.7 Diagonalization Revisited; Unsolvability via Reductions
2.8 Productive and Creative Sets
2.9 The Recursion Theorem
2.10 Completeness
2.11 Unprovability from Unsolvability
2.12 Additional Exercises
3 A Subset of the URM Language; FA and NFA
3.1 Deterministic Finite Automata and their Languages 3.1.1 The Flow-Diagram Model
3.2 Nondeterministic Finite Automata 3.2.1 From FA to NFA and Back
3.3 Regular Expressions
3.4 Regular Grammars and Languages
3.5 Additional Exercises
4 Adding a Stack to a NFA: Pushdown Automata
4.1 The PDA
4.2 PDA Computations
4.3 The PDA-acceptable Languages are the Context Free Languages
4.4 Non Context Free Languages; Another Pumping Lemma
4.5 Additional Exercises
5 Computational Complexity
5.1 Adding a Second Stack; Turing Machines
5.2 Axt, Loop Program, and Grzegorczyk Hierarchies
5.3 Additional Exercises
Bibliography
Index
In this chapter we will briefly review tools, methods and notation from mathematics and logic, which we will directly apply throughout the remaining of this volume.
The most elementary elements from “set theory” and logic are a good starting point for our review. The quotes are necessary since the term set theory as it is understood today applies to the axiomatic version, which is a vast field of knowledge, methods, tools and research [cf. Shoenfield (1967); Tourlakis (2003b)]—and this is not what we outline here. Rather, we present the standard notation and the elementary operations on sets, on one hand, and take a brief look at infinity and the diagonal method of Cantor’s, on the other. Diagonalization is a tool of significant importance in computability. The tiny fragment of concepts from set theory that you will find in this section (and then see them applied throughout this volume) are framed within Cantor’s original “naïve set theory”, good expositions of which (but far exceeding our needs) can be found in Halmos (1960) and Kamke (1950).
We will be forced to interweave our exposition of concepts from set theory with concepts—and notation—from elementary logic, since all mathematics is based on logical deductions, and the vast majority of the literature, from the most elementary to the most advanced, employs logical notation; e.g., symbols such as “∀” and “∃”.
The term “set” is not defined, 11 in either the modern or in the naive Cantorian version of the theory. Expositions of the latter, however, often ask the reader to think of a set as just a synonym for the words “class”, 12 “collection”, or “aggregate”. Intuitively, a set is a “container” along with its contents—its elements or members. Taken together, contents and container, are viewed as a single mathematical object. In mathematics one deals only with sets that contain mathematical objects (so we are not interested in sets of mice or fish).
Since a set is itself an object, a set may contain sets as elements.
All the reasoning that one does in order to develop set theory—even that of the naive variety—or any part of mathematics, including all our reasoning in this book, utilizes mathematical logic. Logic is the mathematics of reasoning and its “objects” of study are predominantly mathematical “statements” or “assertions”— technically known as formulae13—and mathematical proofs. Logic can be applied to mathematics either experientially and informally—learned via practice as it were— or formally. The predominance of mathematical writings apply logic informally as a vehicle toward reaching their objectives.14 Examples of writings where logic is formally applied to mathematics are the volumes that Bourbaki wrote, starting here [Bourbaki (1966)]. More recent examples at the undergraduate and graduate levels are Gries and Schneider (1994) and Tourlakis (2003b) respectively.
In this volume we apply logic informally. An overview is provided in the next subsection.
As is customary in mathematics, we utilize letters, upper or lower case, usually from near the end of the alphabet (u, v, y, x, z, S, T, V) to denote, that is, to name mathematical objects—in particular, sets.
By abuse of language we say that u, v, y, x, z, S, T, V are (rather than denote or name) objects. These letters function just like the variables in algebra do; they are object-variables.
As is the case in algebra, the variables x, y, z are not the only objects set theory studies. It also studies numbers such as 0, 1, −7 and π, matrices such as and objects that are the results of function applications such as 723000, xyz and 2 x.
Unlike axiomatic set theory, which introduces its objects via formal constructions, naïve set theory allows us to use, “off the shelf”, all the mathematical objects such as the above, as well as, of course, objects that are sets such as{2, 3, {l}} and A ∪ B.15
Logicians like to call mathematical objects terms. We utilize in this book the generic names t and s (with primes or subscripts, whenever we need more than two such names) to refer to arbitrary terms that we do not want to be specific about.
These relations are the atomic formulae (of set theory). The qualifier “atomic” refers to two facts:
These two types cannot be expressed (simulated) in terms of simpler relations by using the notation and tools of logic.
Using these two relations as building blocks we can
construct
every possible formula of set theory as we will explain shortly.
By “ … is an important constant” we mean, of course, via the habitual abuse of language exercised by mathematicians, the accurate “ … denotes (or names) an important constant”.
Here is an example that uses in an atomic formula: — 7 ∈ . Incidentally, this formula makes (i.e., encodes) a false statement; we say the formula is false.
One may form this basic formula as well, where the meaning of the symbols “{…}” and will be introduced later in this section.
Yet another example is {1} ∈ {2, 1}—a false statement (formula) as we will be able to determine soon.
Logic (and mathematics) contain much more complex formulae than those of the atomic variety. The added complexity is achieved by repeatedly “gluing” atomic formulae together employing as glue the logical, or Boolean, connectives
and the quantifiers
All formulae of arithmetic can be built, starting from the atomic ones, as explained in the general Definition 1.1.1.3 below. This assertion is revisited in Subsection 2.11.1.
The “practicing mathematician” prefers to work within an “impure” arithmetic, where he has access to sets and their notations, operations, and properties. In particular, this impure arithmetic employs set variables and, more generally, set objects in addition to number variables and number objects.
Throughout this volume a formula (whether specific to set theory or to any other area in mathematics, such as arithmetic—pure or impure) will be denoted by an upper case calligraphic letter, such as .
We now indicate how formulae are put together using brackets, connectives, and quantifiers, employing atomic formulae as basic building blocks. The definition below is generic, thus unified: it applies to the structure of all formulae of mathematics. The choice of atomic formulae (which presupposes an a priori choice of mathematical symbols, such as 0, +, ∈) and of types of variables is what determines whether we build set theory formulae, pure or impure arithmetic formulae, or “other”.
1.1.1.3 Definition. A set theory formula is one of:
We call ∀ the universal and ∃ the existential quantifiers. We will extend the terminology “quantifier” to apply to the compound symbols (∀x) or (∃).
1.1.1.4 Definition. (Immediate Predecessors) Let be a formula. By 1.1.1.3 it has one of the forms (l)–(8). If it is of type (1 ), then it has no immediate predecessors— i.e., it was not built using connectives or quantifiers from simper formulae. If it has the forms (2)–(8), then in each case its immediate predecessors are the formulae and [the latter enters in cases (3)–(6)] that were used to build it. We use the acronym ip for immediate predecessors.
The presence of brackets guarantees that the decomposition or deconstruction of a formula into its immediate predecessors is unique. This fact can be proved, but it is beyond our aims so we will not do so here [see Bourbaki (1966); Enderton (1972); Tourlakis (2008, 2003a)]. Logicians refer to it as the unique readability of a formula.
1.1.1.5 Example. Here are some formulae:
—by (1), followed by an application of (2); we usually write this more simply as “x ≠ y”,
The reader should check that we inserted brackets precisely as prescribed by Definition 1.1.1.3.
1.1.1.6 Remark. (Building a formula) If is (stands for, that is) a formula we can deconstruct it according to Definition 1.1.1.3 using a natural process.
Initialize: Write down Flag it pending.
Repeat this process until it cannot be carried further:
Write down, above whatever you have written so far, the ip of all pending formulae (if they have ip); and remove the flag “pending” from the latter. Add the flag to the ones you have just written.
The process is terminating since we write shorter and shorter formulae at every step (and remove the flags); we cannot do this forever!
Clearly, if we now review from top to bottom the sequence that we wrote, we realize that it traces forward the process of constructing by repeated application of Definition 1.1.1.3. This top-down view of our “deconstruction” is a formula-construction sequence for .
For example, applying the process to the last formula of the preceding example we get:
Going forward we can discard copies that we do not need. Thus a valid formula construction is also this one:
Indeed, we validate the first formula in the sequence via (1) of 1.1.1.3 ; the second using the first and (7); and the last one using the first two and (5).
1.1.1.7 Definition. (Input Variables—in Terms) All the variables that occur in a term—other than an x that occurs in a term of the form {x : …} (which is a set object that will be introduced shortly)—are input variables.
Formation rules (2)–(6) in Definition 1.1.1.3 “preserve” the input occurrences of variables in the constituent formulae and that we join together using one of , ⋀, ⋁, →, ≡ as glue. On the other hand, each quantifier (∀ z) or (∃ z) forces each occurrence of a variable as described below to become non-input:
The occurrence
z
in the quantifier
Any occurrence of
z
in the scope of said (∀z) or (∃
z
)
Thus, if we start with , of inputs x, y, z, the new formula , where Q stands here for one of ∀, ∃, has only x and z as input variables.
Thus “x is an input/ non-input variable “ (of a formula) means that there are occurrences of x that are input/ non-input.
The standard name utilized in the literature for input variables is free variables. Non-input variable occurrences are technically called bound occurrences, but are also called apparent occurrences, since even though they are visible, they are not allowed—indeed it makes no sense—to receive arguments (input). This is analogous to the “Σ-notation” for sums: means 1 + 2 + 3. While we can “see” the variable i, it is not really there!18 It cannot accept inputs. For example, is total nonsense.
The jargon input/non-input is deliberately chosen: We may substitute terms only in those variable occurrences that are free (input).
If is some formula and x, y, z, … is the complete list of variables that occur in it, we can draw attention to this fact by writing If x, y, z,… is a list of variables such that some19 among them occur in , then we indicate this by .
In the context of [correspondingly ] stands for the formula obtained from by replacing each original occurrence of x, y, z,… in by the terms t1, t2, t3,… respectively.
Observe also that if x does not occur in , then is just the original.
Before we turn to the meaning (and naming) of the connectives and quantifiers, let us agree that we can get away with much fewer brackets than Definition 1.1.1.3 prescribes. The procedure to do so is to agree on connective and quantifier “priorities” so that we know, in the absence of brackets, which of the two connectives/quantifiers is supposed to “win” if they both compete to apply on the same part in a formula.
By analogy, a high school student learns the convention that “× has a higher priority than +”, thus 2 + 3 × 4 means 2 + (3 × 4)—that is, × rather than + claims the part “3”.
Our convention is this: The connective as well as the quantifiers ∀ and ∃ have the highest priority, equal among the three. In order of decreasing priority, the remaining binary connectives20 are listed as ⋀, ⋁, →, ≡. If two binary connectives compete to glue with a subformula, then the higher-priority one wins. For example, assuming that has already in place all the brackets that are prescribed by Definition 1.1.1.3, then .
If two instances of the same binary connective compete to glue with a subformula, then the one to the right wins. For example, assuming that has all the brackets prescribed by Definition 1.1.1.3 in place, then ….
Similarly, if any of , compete for a part of a formula, again the one to the right wins. E.g., , where once again we assumed that has all the brackets prescribed by Definition 1.1.1.3 already in place.
How do we “compute” the truth or falsehood of a formula? To begin with, to succeed in this we must realize that just as a function gives, in general, different outputs for different inputs, in the same way the “output” of a formula, its truth-value, can only be computed, in general, if we “freeze” the input variables. For each such frozen instance of the input side, we can compute the output side: true or false.
But where do the inputs come from? For areas of study like calculus or arithmetic the answers are easy: From the set of real numbers—denoted by —and the set of natural numbers respectively.
For set theory it sounds easy too: From the set of all sets!
If it were not for the unfortunate fact that “the set of all sets” does not exist, or, to put it differently, it is a non-set class due to its enormity, we could have left it at that. To avoid paradoxes such as “a set that is not a set”—cf. Section 1.3 on diagonalization for an insight into why some collections cannot be sets—we will want to take our inputs from a (comfortably large) set in any given set-theoretic discussion: the so-called reference set or domain.
1.1.1.10 Remark. The mathematician’s intuitive understanding of the statement “ is true (resp. false)” is that “ is true (resp. false) for all the possible values of the free (input) variables of ”.
Thus, if we are working in arithmetic, “2n + 1 is odd” means the same thing as “it is true that, for all n ∈ , 2n + 1 is odd”. “2 n > n” again omits an implied prefix “it is true that, for all n ∈ ”. An example of a false statement with input variables is “2n is odd”.
1.1.1.11 Definition. An instance of a formula , in symbols ′, is a formula obtained from by replacing each of its variables by some value from the relevant reference set.
Clearly, ′ is variable-free—a so-called closed formula or sentence—and therefore it has a well-defined truth-value: exactly one of true or false.
Sometimes we use more explicit notation: An instance of or of is or , respectively, where i, j, k, … are objects (constants) from the reference set.
are consistent or common instances of if every free variable that appears in both of the latter receives the same value in both instances.
1.1.1.12 Example. Let stand for “x(x + 1) is even”, stand for “2x +1 is even” and stand for “x is even”, where x is a variable over . Then,
The lesson from this is that if the truth-value of a formula depends on variables, then not true is not necessarily the same as false.
We will not be concerned with how the truth-value of atomic formulae is “computed”; one can think of them as entities analogous to “built-in functions” of computer programming: Somehow, the way to compute their true/false output is a matter that a hidden procedure (alternatively, our math knowledge and sophistication) can do for us.
Our purpose here rather is to describe how the connectives and quantifiers behave toward determining the truth-value of a complex formula.
In view of Remark 1.1.1.10, the road toward the semantics of , etc., passes through the semantics of arbitrary instances of these; namely, we need to only define the meaning of , etc., respectively.
1.1.1.13 Definition. (Computing with Connectives and Quantifiers) Let and be any formulae, and be arbitrary common instances (1.1.1.11).
1.1.1.14 Remark. (Truth Tables) The content of the preceding definition—cases (l)–(5)—is normally captured more visually in table form (we have removed the primes for readability):
We read the table as follows: First, the symbols t and f stand for the values “true” and “false” respectively. Second, the two columns to the left of the vertical line || give all possible pairs of values (outputs) of and . Third, below , etc., we list the computed truth-values (of the formulae of the first row) that correspond to the assumed and values.
The odd alignment under is consistent with all the others: It emphasizes the placement of the “result” under the “operator”— here — that causes it.
1.1.1.15 Remark. According to Remark 1.1.1.10,
(†)
means precisely this:
(*)
By 6 of Definition 1.1.1.13, (∗) means
For every choice of the il and jr from the reference set,
and
for all possible values k of x from the domain,
The above, and hence also (†), translate via Remark 1.1.1.10 as
(‡)
Iterating this observation yields that (‡) is an equivalent statement to the one we obtain by quantifying universally any—in particular, all—of the variables y1,…, ym, x, z1,…, zn of . That is,
Adding or removing a “(∀x)” at the leftmost end of the formula makes no difference to the latter’s meaning.
Hm. This begs the question: Then what do we need the universal quantifier for?
Thus, adding or removing a “(∀x)” to parts of a formula can make a difference in meaning! The universal quantifier is useful after all.
Carrying around the predicate “is true” all the time is annoying. We will adopt immediately the mathematician’s jargon: Simply stating “” is synonymous to “ is true” or “ holds”.
1.1.1.17 Example. Let be our domain. Then,
(1)
is true. It says that “for every y, an x exists25 such that y < x”. According to 1.1.1.15 there is an implied (∀ y) at the beginning of the formula.
Note that there is no single value of x that makes (1) trae (because has no upper bound). For each value n of y, n + 1 is an appropriate value of x (so are n + 2, 2n + 1, etc.)
How can we write down the (false) statement that one value of x works for all y?
A final remark: How do we write that “there exists a unique x such that is true” (where is any formula)?
Fortunately, there is a short form for the above (∃! reads “for some unique”)
1.1.1.18 Example. The reader probably already knows that there is redundancy in the chosen set of connectives, that is, some of them can be simulated by the others.
For example, it is immediate by comparing (4), (1), and (2) in 1.1.1.13, that is the same as (has the same meaning as) . Similarly, is the same as .
Even can be expressed via and ⋁ as . This is easiest to see, perhaps, via a truth-table:
The numbers such as “(l)t” in the first row indicate order of evaluation using the operator at the top of the column. Comparison of the column labeled (4) with the last column shows that each of and yield the same output for any given value-pair of and . Thus we proved the equivalence of the and . This result is known as “de Morgan’s Law”.
1.1.1.19 Exercise. Prove that can be expressed as . This is the “other” (technically dual of) de Morgan’s Law.
1.1.1.20 Example. We know [(7) of Definition 1.1.1.13] that means
(1.1)
(1.2)
(1.3)
Now, the negation of “there is a k such that ” is
(1)
that is,
(2)
(2) says the same thing as
(3)
(1.2)−(1.3), (1), and (3) yield (via 1.1.1.13)
By (1.1) and 1.1.1.10, we have
for short
1.1.1.21 Exercise. Prove that
1.1.1.22 Remark. We note that is true, in particular, when no instance of is true, i.e., when is false—in all its instances, that is. In this case the so-called classical or material implication holds “vacuously”, even if there is no connection between and at all and even if is not truel For example, if is 0 ≠ 0 and is “in every Euclidean triangle the sum of the angles equals 9 π”, then is true. The same holds if stands for “n is even”. The latter has both true and false instances over , but that is immaterial. In each chosen instance, is true—that is (1.1.1.10), is true.
The Intuitionists, a school of thought founded on the writings of Kronecker, Brouwer and Heyting, do not like this state of affairs. The intended meaning, or intended semantics, for their , connects the hypothesis strongly to the conclusion . The meaning, informally speaking, is that, from a proof (intuitionistic proof, of course!) of , a proof for can be constructed.
We are not going to say what is an intuitionistic “proof”, as this is of no concern to us. As a matter of fact, “proof” (for classical logic) will be defined only later (see 1.1.1.34). At present, let the reader think of “proof” as a process used to establish that a formula holds. Nevertheless, the above stated intentions regarding (intuitionistic) proofs are a clear enough indication of how strongly and must be related before the Intuitionist will agree to write .
In particular, in intuitionistic logic → and ⋁ do not relate as in 1.1.1.18 above. In fact, in both logics holds, however, in intuitionistic logic does not hold! Or as we say, the law of the excluded middle does not hold. Intuitively speaking, the reason behind this phenomenon is that the intuitionistic proofs are so structured that, to establish that a formula holds, in general you need to construct a proof of one of .
For example, the following classical proof that there are irrational numbers26a, b such that ab is rational is unacceptable intuitionistically.
Take . If this works, done. If not, that means that is irrational.
Well, then take rational. End of proof.
Above, is rational” and is irrational”. We used the (classical) fact that is true, since one or the other of is (classically) true. However, classically, we do not need to know which is which!
A thorough exposition of intuitionistic logic can be found in the advanced book of Schütte ([Schü]).
1.1.1.23 Example. Suppose that x does not occur free in .
Pause. Ensure that this is consistent with the notation introduced in Definition 1.1.1.11.
Then
(1)
Indeed, let y, z, w, … be the complete list of free variables of , where x is not one of them. To verify (1) we need to show that for any choice of values k, l, m,… from the domain
(2)
that is,
(3)
We need to analyze It says
But this is true exactly when is, since n does not affect the output: the variable x is non-input in .
1.1.1.24 Exercise. Suppose we drop the condition “suppose that x does not occur free in ” above. Does (1) still hold? You must provide the “why”!
1.1.1.25 Exercise. In view of 1.1.1.15, “ is trae” iff “ is trae”. So, is this observation not all that we need to assert (1)? See also the previous exercise.
1.1.1.26 Definition. (Prime Formulae; Tautologies) A prime formula or Boolean variable is either an atomic formula, or a formula with a leading quantifier.
If a formula —when viewed as a Boolean combination of prime formulae, that is, as a formula built from prime formulae using only the formation rules (2)– (6) of 1.1.1.3—evaluates as t according to the truth table 1.1.1.14, for all possible assumed truth-values of its Boolean variables, then we call it a tautology and write this concisely as .
1.1.1.27 Remark. Every formula is built by appropriately applying Boolean glue on a number of prime formulae: Indeed, in any formula built according to 1.1.1.3 we can identify all its maximal prime subformulae—that is prime subformulae not contained in larger prime subformulae.
(1)
Double-boundary boxes enclose maximal prime formulae. The Boolean structure of (1) is
Only maximal prime formulae contribute to the Boolean structure and express the original formula as a Boolean combination of prime formulae glued together by connectives. The non-maximal prime formulae are hidden inside maximal ones.
The italicized claim above follows directly from an adaptation of the “deconstraction”in 1.1.1.6:
Just replace the step
Write down, above whatever you have written so far, the ip of all pending formulae (if they have ip); and remove the flag “pending” from the latter.
by
Write down, above whatever you have written so far, the ip of all non-prime pending formulae (if they have ip); and remove the flag “pending” from the latter.
Conversely, a Boolean combination of prime formulae is a formula in the sense of 1.1.1.3: Indeed, a Boolean combination is an expression built by applying (2)–(6) in the context of a formula-construction (cf. 1.1.1.6) with starting points prime formulae, rather than atomic formulae. Since a prime formula is a formula, the process leads to a formula. See Exercise 1.8.1 for a rigorous proof (that you will supply, equipped with the induction tool).
Rather, we assume for each prime formula, all the— in principle— possible output values; that is, both of t and f.
Pause. Ensure that this last observation fits with 1.1.1.26!
We indicate this fact by writing .
Assuming all possible truth-values of a prime formula (rather than attempting to compute “the” value) is tantamount to allowing the Boolean structure and Boolean structure alone—that is how the connectives have glued the formula together—to determine the truth-value via truth tables (1.1.1.14).
1.1.1.28 Example. Some tautologies: .
Some non-tautologies: . For example, looking at the value assumptions below—or value assignments, as is the accepted term,
we see that evaluates as f, thus it is indeed not a tautology. Incidentally, I used “:=“ to denote (truth) value assignment.
1.1.1.29 Definition. (Tautological Implication) We say that tautologically imply.
We write in this case. , we say, is (the result of) a tautological implication from .
1.1.1.30 Remark. We note that a tautological implication preserves truth, from left to right. See exercise below.
1.1.1.31 Exercise. Let p1, …, pm be all the Boolean variables that appear in the formulae .
Show that iff every set of values assigned to the Boolean variables that makes all the , also makes .
1.1.1.32 Example. All of the following are correct: (no matter what stands for). is also correct, since can be readily verified as a tautology using either 4 of 1.1.1.13 or the truth table (1.1.1.14). A shortcut is to just consider the two assumed values, t and f, for . The second makes the formula true outright, while the first makes the bracketed subformula t, hence the whole formula t.
This is incorrect: since choosing
we see that x < y → y < z → x < z evaluates as f, so it is not a tautology.
1.1.1.33 Remark. (Capture of a Free Variable) Let stand for (∃ y) y ≠ x—recall that y ≠ x is short for . It states (i.e., codifies the statement) “for any value of x there is a value of y that is different”. Assuming that our domain is , this is clearly a true statement. So is , obtained by substituting z for x, or, in programming jargon, “calling” with argument z.
What about ? This is (∃ y) y ≠ y which is evidently false: “there is a value of y which is different from itself”!
This is unfortunate because, intuitively, what says should have nothing to do with the name of the input variable!
What happened here is that when we substituted y for x, the free y was captured— i.e., became bound—by a lurking quantifier, (∃ y): y got into the latter’s scope.
We should never allow such substitutions since, as this example shows, they may change the intended meaning of the resulting formula. In general, a substitution into that results into should not be allowed, if the term t contains a free variable y that will become bound (captured) after the substitution.
Is there a workaround? Yes!
Consider an instance of
(1)
and a consistent instance (cf. 1.1.1.11) of
(2)
In view of 1.1.1.20 and 1.1.1.21, or directly using a similar argument as above, one sees at once that changing the bound variable of a universal quantifier into a brand new variable does not change the meaning either.
This leads to the so-called “variant theorem” of logic, that
Changing a bound variable in a formula into a new (i.e., not already used in the formula) variable does not change the meaning: the original and the new formulae are equivalent.
But then, given , we can always effect with impunity as long as we rename out of harm’s way, before the substitution takes place, all the original bound variables:27 All we need to do in a successful renaming is to ensure that none of them is the same as a free variable of t. Strictly speaking, in we do not have the original formula , but a variant of the original—since we have renamed the latter’s bound variables. Nevertheless since the old and the new are equivalent, we will use the same name for both, .
We now turn to what a mathematician or computer scientist does with logic: He writes proofs.
1.1.1.34 Definition. (Proofs) A proof is a construction process that builds a finite length sequence of true formulae, one at a time. We normally write this sequence vertically on the page, with explanatory annotation. Three simple rules regarding what formula we may write at each (construction-) step govern the process. We may write
Any formula that appears in a proof we call a theorem. We say that the proof “established” or proved the theorem .
A theorem follows from certain axioms . Saying just “theorem” does not indicate this dependence, unless what is the relevant set of axioms is clearly understood from the context. If in doubt, or if we are discussing theorems of various theories simultaneously, then we must name the applicable axiom set in each case by saying “-theorem” or “theorem from ”.
It is clear from what we have developed so far, that steps (2) and (3) preserve truth (cf. 1.1.1.15). An application of step (3) is called application of generalization, or rule Gen, as we will say.
