96,99 €
Broad in scope, involving theory, the application of that theory, and programming technology, compiler construction is a moving target, with constant advances in compiler technology taking place. Today, a renewed focus on do-it-yourself programming makes a quality textbook on compilers, that both students and instructors will enjoy using, of even more vital importance. This book covers every topic essential to learning compilers from the ground up and is accompanied by a powerful and flexible software package for evaluating projects, as well as several tutorials, well-defined projects, and test cases.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 838
Veröffentlichungsjahr: 2012
CONTENTS
Cover
Half Title page
Title page
Copyright page
Dedication
Preface
Notable Features
Software Package
Projects
Useful References
Acknowledgments
Chapter 1: Strings, Languages, and Compilers
1.1 Introduction
1.2 Basic Language Concepts
1.3 Basic Compiler Concepts
1.4 Basic Set Theory
1.5 Null String
1.6 Concatenation
1.7 Exponent Notation
1.8 Star Operator (Also Known As the Zero-or-More Operator)
1.9 Concatenation of Sets of Strings
1.10 Plus Operator (Also Known As the One-or-More Operator)
1.11 Question Mark Operator (Also Known as Zero-or-One Operator)
1.12 Shorthand Notation for a Set Containing a Single String
1.13 Operator Precedence
1.14 Regular Expressions
1.15 Limitations of Regular Expressions
Problems
Chapter 2: Context-Free Grammars, Part 1
2.1 Introduction
2.2 What is a Context-Free Grammar?
2.3 Derivations Using a Context-Free Grammar
2.4 Language Defined by a Context-Free Grammar
2.5 Different Ways of Representing Context-Free Grammars
2.6 Some Simple Grammars
2.7 Techniques for Generating Languages with Context-Free Grammars
2.8 Regular and Right Linear Grammars
2.9 Counting with Regular Grammars
2.10 Grammars for Lists
2.11 An Important Language that is not Context-Free
Problems
Chapter 3: Context-Free Grammars, Part 2
3.1 Introduction
3.2 Parse Trees
3.3 Leftmost and Rightmost Derivations
3.4 Substitution
3.5 Ambiguous Grammars
3.6 Determining Nullable Nonterminals
3.7 Eliminating Lambda Productions
3.8 Eliminating Unit Productions
3.9 Eliminating Useless Nonterminals
3.10 Recursion Conversions
3.11 Adding the Null String to a Language
Problems
Chapter 4: Context-Free Grammars, Part 3
4.1 Introduction
4.2 Grammars for Arithmetic Expressions
4.3 Specifying Associativity and Precedence in Grammars
4.4 Backus-Naur Form
4.5 Syntax Diagrams
4.6 Abstract Syntax Trees and Three-Address Code
4.7 Noncontracting Grammars
4.8 Essentially Noncontracting Grammars
4.9 Converting a Context-Free Grammar to an Essentially Noncontracting Grammar
4.10 Pumping Property of Context-Free Languages (Optional)
Problems
Chapter 5: Chomsky’s Hierarchy (Optional)
5.1 Introduction
5.2 Context-Sensitive Productions
5.3 Context-Sensitive Grammars
5.4 Unrestricted Grammars
Problems
Chapter 6: Top-Down Parsing
6.1 Introduction
6.2 Top-Down Construction of a Parse Tree
6.3 Parses that Fail
6.4 A Bad Grammar for Top-Down Parsing
6.5 Deterministic Parsers
6.6 A Parser that Uses a Stack
6.7 Table Representation of a Stack Parser
6.8 Handling Productions with Nonleading Terminals
6.9 Writing a Stack Parser in Java
Problems
Chapter 7: LL(1) Grammars
7.1 Introduction
7.2 FIRST Set of the Right Side of a Production
7.3 Determining Operation Sequences
7.4 Determining Selection Sets of Lambda Productions
7.5 Whatever-Follows-Left-Follows-Rightmost Rule
7.6 Selection Sets for Productions with Nullable Right Sides
7.7 Selection Sets Containing the End-of-Input Marker
7.8 A Stack Parser for a Grammar with Lambda Productions
7.9 Converting a Non-LL(1) Grammar to an LL(1) Grammar
7.10 Parsing with an Ambiguous Grammar
7.11 Computing FIRST and FOLLOW Sets
Problems
Chapter 8: Table-Driven Stack Parser (Optional)
8.1 Introduction
8.2 Unifying the Operations of a Stack Parser
8.3 Implementing a Table-Driven Stack Parser
8.4 Improving Our Table-Driven Stack Parser
8.5 Parsers that are Not Deterministic—A Digression on Theory (Optional)
Problems
Chapter 9: Recursive-Descent Parsing
9.1 Introduction
9.2 A Simple Recursive-Descent Parser
9.3 Handling Lambda Productions
9.4 A Common Error
9.5 Java Code for Productions
9.6 Left Factoring in a Recursive-Descent Parser
9.7 Eliminating Tail Recursion
9.8 Translating the Star, Plus, and Question Mark Operators
9.9 Doing Things Backward
Problems
Chapter 10: Recursive-Descent Translation
10.1 Introduction
10.2 A Simple Translation Grammar
10.3 Converting a Translation Grammar to Java Code
10.4 Specifications for a Translation Grammar
10.5 Passing Information During the Parse
10.6 L-Attributed Grammars
10.7 A New Token Manager
10.8 Solving the Token Lookahead Problem
10.9 Code for the New Token Manager
10.10 Translation Grammar for Prefix Expression Compiler
10.11 An Interesting Use of Recursion (Optional)
Problems
Chapter 11: Assembly Language
11.1 Introduction
11.2 Structure of the J1 Computer
11.3 Machine Language Instructions
11.4 Assembly Language Instructions
11.5 Pushing Characters
11.6 aout Instruction
11.7 Using Labels
11.8 Using the Assembler
11.9 stav Instruction
11.10 Compiling an Assignment Statement
11.11 Compiling Print and Println
11.12 Outputting Strings
11.13 Inputting Decimal Numbers
11.14 Entry Directive
11.15 More Assembly Language
Problems
Chapter 12: S1—A Simple Compiler
12.1 Introduction
12.2 The Source Language
12.3 Grammar for Source Language
12.4 The Target Language
12.5 Symbol Table
12.6 Code Generator
12.7 token Class
12.8 Writing the Translation Grammar
12.9 Implementing the S1 Compiler
12.10 Trying Out S1
12.11 Advice on Extending the S1 Compiler
12.12 Specifications for S2
Problems
Chapter 13: JavaCC (Optional)
13.1 Introduction
13.2 JavaCC Extended Regular Expressions
13.3 JavaCC Input File
13.4 Specifying Actions for Regular Expressions
13.5 JavaCC Input File for S1j
13.6 Files Produced by JavaCC
13.7 Using the Star and Plus Operators
13.8 Choice Points and Lookahead
13.9 JavaCC’s Choice Algorithm
13.10 Syntactic and Semantic Lookahead (Optional)
13.11 Using JAVACC to Create a Token Manager Only
13.12 Using the Token Chain
13.13 Suppressing Warning Messages
Problems
Chapter 14: Building on S2
14.1 Introduction
14.2 Extending println and print
14.3 Cascaded Assignment Statement
14.4 Unary Plus and Minus
14.5 readint Statement
14.6 Controlling the Token Trace from the Command Line
14.7 Specifications for S3
Problems
Chapter 15: Compiling Control Structures
15.1 Introduction
15.2 while Statement
15.3 if Statement
15.4 do-while Statement
15.5 Range Checking of Numerical Constants
15.6 Handling Backslash-Quote in a String
15.7 Handling Backslash-Quote with JavaCC (Optional)
15.8 Univeral Blocks in javaCC (Optional)
15.9 Handling Strings that Span Lines
15.10 Handling Strings that Span Lines Using javaCC (Optional)
15.11 SPECIAL_TOKEN Block in javaCC (Optional)
15.12 Error Recovery
15.13 Error Recovery in JavaCC (Optional)
15.14 Specifications for S4
Problems
Chapter 16: Compiling Programs in Functional Form
16.1 Introduction
16.2 Separate Assembly and Linking
16.3 Calling and Returning from Functions
16.4 Source Language for S5
16.5 Symbol Table for S5
16.6 Code Generator for S5
16.7 Translation Grammar for S5
16.8 Linking with a Library
16.9 Specifications for S5
16.10 Extending S5 (Optional)
Problems
Chapter 17: Finite Automata
17.1 Introduction
17.2 Deterministic Finite Automata
17.3 Converting a DFA to a Regular Expression
17.4 JAVA Code for a DFA
17.5 Nondeterministic Finite Automata
17.6 Using an NFA as an Algorithm
17.7 Converting an NFA to a DFA with the Subset Algorithm
17.8 Converting a DFA to a Regular Grammar
17.9 Converting a Regular Grammar to an NFA
17.10 Converting a Regular Expression to an NFA
17.11 Finding the Minimal DFA
17.12 Pumping Property of Regular Languages
Problems
Chapter 18: Capstone Project: Implementing Grep Using Compiler Technology
18.1 Introduction
18.2 Regular Expressions for Our Grep Program
18.3 Token Manager for Regular Expressions
18.4 Grammar for Regular Expressions
18.5 Target Language for Our Regular Expression Compiler
18.6 Using an NFA for Pattern Matching
Problems
Chapter 19: Compiling to a Register-Oriented Architecture
19.1 Introduction
19.2 Using the Register Instruction Set
19.3 Modifications to the Symbol Table for R1
19.4 Parser and Code Generator for R1
Problems
Chapter 20: Optimization
20.1 Introduction
20.2 Using the ldc Instruction
20.3 Reusing Temporary Variables
20.4 Constant Folding
20.5 Register Allocation
20.6 Peephole Optimization
Problems
Chapter 21: Interpreters
21.1 Introduction
21.2 Converting S1 to I1
21.3 Interpreting Statements that Transfer Control
21.4 Implementing the Compiler-Interpreter CI1
21.5 Advantages of Interpreters
Problems
Chapter 22: Bottom-up Parsing
22.1 Introduction
22.2 Principles of Bottom-Up Parsing
22.3 Parsing with Right- Versus Left-Recursive Grammars
22.4 Bottom-Up Parsing with Ambiguous Grammars
22.5 Do-Not-Reduce Rule
22.6 SLR(1) Parsing
22.7 Shift/Reduce Conflicts
22.8 Reduce/Reduce Conflicts
22.9 LR(1) Parsing
Problems
Chapter 23: yacc
23.1 Introduction
23.2 yacc Input and Output Files
23.2 A Simple yacc-Generated Parser
23.4 Passing Values Using the Value Stack
23.5 Using yacc with an Ambiguous Grammar
23.6 Passing Values Down the Parse Tree
23.7 Implementing Sly
23.8 jflex
Problems
Appendix A: Stack Instruction Set
Appendix B: Register Instruction Set
References
Index
Compiler Construction Using Java, JavaCC, and Yacc
IEEEcomputer society
Press Operating Committee
Chair Linda Shaferformer Director, Software Quality Institute The University of Texas at Austin
Editor-in-Chief Alan ClementsProfessor University of Teesside
Board Members
Mark J. Christensen, Independent Consultant James W. Cortada, IBM Institute for Business Value Richard E. (Dick) Fairley, Founder and Principal Associate, Software Engineering Management Associates (SEMA) Phillip Laplante, Professor of Software Engineering, Penn State University Evan Butterfield, Director of Products and Services Kate Guillemette, Product Development Editor, CS Press
Copyright © 2012 by the IEEE Computer Society, Inc.
Published by John Wiley & Sons, Inc., Hoboken, New Jersey. All rights reserved.
Published simultaneously in Canada.
No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permission.
Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representation or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.
For general information on our other products and services please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002.
Wiley also publishes its books in a variety of electronic formats. Some content that appears in print, however, may not be available in electronic formats. For more information about Wiley products, visit our web site at www.wiley.com.
Library of Congress Cataloging-in-Publication Data:
Dos Reis, Anthony J.Compiler construction using Java, JavaCC, and Yacc / Anthony J. Dos Reis.p. cm.ISBN 978-0-470-94959-7 (hardback)1. Compilers (Computer programs) 2. Java (Computer program language) I. Title.QA76.76.C65D67 2011005.4’53—dc23 2011013211
oBook ISBN: 978-1-118-11276-2ePDF ISBN: 978-1-118-11287-8ePub ISBN: 978-1-118-11277-9eMobi ISBN: 978-1-118-11288-5
To little sister
PREFACE
My principal goal in writing this book is to provide the reader with a clear exposition of the theory of compiler design and implementation along with plenty of opportunities to put that theory into practice. The theory, of course, is essential, but so is the practice.
Provides numerous, well-defined projects along with test cases. These projects ensure that students not only know the theory but also know how to apply it. Instructors are relieved of the burden of designing projects that integrate well with the text.
Project work starts early in the book so that students can apply the theory as they are learning it.
The compiler tools (JavaCC, Yacc, and Lex) are optional topics.
The entire book is Java oriented. The implementation language is Java. The principal target language is similar to Java’s bytecode. The compiler tools generate Java code. The form of attributed grammar used has a Java-like syntax.
The target languages (one is stack oriented like Java’s bytecode; the other is register oriented) are very easy to learn but are sufficiently powerful to support advanced compiler projects.
The software package is a dream come true for both students and instructors. It automatically evaluates a student’s compiler projects with respect to correctness, run time, and size. It is great for students: they get immediate feedback on their projects. It is great for instructors: they can easily and accurately evaluate a student’s work. With a single command, an instructor can generate a report for an entire class. The software runs on three platforms: Microsoft Windows, Linux, and the Macintosh OS X.
Demonstrates how compiler technology is not just for compilers. In a capstone project, students design and implement grep using compiler technology.
Includes a chapter on interpreters that fits in with the rest of the book.
Includes a chapter on optimization that is just right for an introductory course. Students do not simply read about optimization techniques—they implement a variety of techniques, such as constant folding, peephole optimization, and register allocation.
The book uses a Java-like form of grammars that students can easily understand and use. This is the same form that JavaCC uses. Thus, students can make transition to JavaCC quickly and easily.
Provides enough theory that the book can be used for a combined compiler/automata/formal languages course. The book covers most of the topics covered in an automata/formal languages course: finite automata, stack parsers, regular expressions, regular grammars, context-free grammars, context-sensitive grammars, unrestricted grammars, Chomsky’s hierarchy, and the pumping lemmas. Pushdown automata, Turing machines, computability, and complexity are discussed in supplements in the software package. The software package also includes a pushdown automaton simulator and Turing machine simulator.
Covers every topic that should be in a first course or in an only course on compilers. Students will learn not only the theory and practice of compiler design but also important system concepts.
The software package for the textbook has some unusual features. When students run one of their compiler-generated programs, the software produces a log file. The log file contains a time stamp, the student’s name, the output produced by the compiler-generated program, and an evaluation of the compiler-generated program with respect to correctness, program size, and execution time. If the output is not correct (indicating that the student’s compiler is generating incorrect code), the log file is marked with NOT CORRECT. If the compiled program is too big or the execution time too long, the log file is marked with OVER LIMIT.
The name of a log file contains the student’s name. For example, the log file for the S3 project of a student whose last name is Dos Reis would be S3.dosreis.log. Because each log file name is unique, an instructor can store all the log files for a class in a single directory. A single command will then produce a report for the entire class.
The software supports two instruction sets: the stack instruction set and the register instruction set. The stack instruction set is the default instruction set. To use the register instruction set, a single directive is placed in the assembly language source program. The software then automatically reconfigures itself to use the register instruction set.
The three principal programs in the software package are a (the assembler/linker), e (the executor), and I (the library maker). The software package also includes p (a pushdown automaton simulator) and t (a Turing machine simulator).
The software package for this book is available from the publisher. The compiler tools are available on the Web. At the time of this writing, JavaCC is at http://java.net/down-loads/javacc, Byacc/j is at http://byaccj.sourceforge.net/, and Jflex is at http://jflex.de/.
This textbook specifies many well-defined projects. The source language has six levels of increasing complexity. A student can write a compiler for each level that translates to the stack instruction set. A student can also write a compiler for each level that translates to the register instruction set, or incorporates optimization techniques. For each level, a student can write a pure interpreter or an interpreter that uses an intermediate code. A student can implement several variations of grep using compiler technology. A student can write the code for any of these projects by hand or by using JavaCC or Yacc. Many of the chapter problems provide additional projects. In short, there are plenty of projects.
For each project, the textbook provides substantial support. Moreover, many of the projects are incremental enhancements of a previous project. This incremental approach works well; each project is challenging but not so challenging that students cannot do it.
Most projects can be completed in a week’s time. Thus, students should be able to do ten or even more projects in a single semester.
For background material, the reader is referred to the author’s An Introductions to Programming Using Java (Jones & Bartlett, 2010) and Assembly Language and Computer Architecture Using C++ and Java (Course Technology, 2004). Also recommended is JFLAP (available at http://www.jflap.org), an interactive program that permits experimentation with various types of automata and grammars.
I would like to thank Professors Robert McNaughton and Dean Arden who years ago at RPI taught me the beauty of formal language theory, my students who used preliminary versions of the book and provided valuable feedback, Katherine Guillemette for her support of this project, my daughter Laura for her suggestions on the content and structure of the book, and my wife for her encouragement.
ANTHONY J. DOS REIS
New Paltz, New York
October 2011
Compiler construction is truly an engineering science. With this science, we can methodically—almost routinely—design and implement fast, reliable, and powerful compilers.
You should study compiler construction for several reasons:
Compiler construction techniques have very broad applicability. The usefulness of these techniques is not limited to compilers.
To program most effectively, you need to understand the compiling process.
Language and language translation are at the very heart of computing. You should be familiar with their theory and practice.
Unlike some areas of computer science, you do not typically pick up compiler construction techniques “on the job.” Thus, the formal study of these techniques is essential.
To be fair, you should also consider reasons for not studying compiler construction. Only one comes to mind: Your doctor has ordered you to avoid excitement.
In our study of compiler design theory, we begin with several important definitions. An alphabet is the finite set of characters used in the writing of a language. For example, the alphabet of the Java programming language consists of all the characters that can appear in a program: the upper- and lower-case letters, the digits, whitespace (space, tab, newline, and carriage return), and all the special symbols, such as =, +, and {. For most of the examples in this book, we will use very small alphabets, such as {b, c} and {b, c, d}. We will avoid using the letter “a” in our alphabets because of the potential confusion with the English article “a”.
A string over an alphabet is a finite sequence of characters selected from that alphabet. For example, suppose our alphabet is {b, c, d}. Then
cbd cbcc c
are examples of strings over our alphabet. Notice that in a string over an alphabet, each character in the alphabet can appear any number of times (including zero times) and in any order. For example, in the string cbcc (a string over the three-letter alphabet {b, c, d}), the character b appears once, c appears three times, and d does not appear.
A language is a set of strings over some alphabet. For example, the set containing just the three strings cbd, cbcc, and c is a language. This set is not a very interesting language, but it is, nevertheless, a language according to our definition.
Let us see how our definitions apply to a “real” language—the programming language Java. Consider a Java program all written on a single line:
class C {public static void main (String[] args) {} }
Clearly, such a program is a single string ove the alphabet of Java. We can also view a multiple-line program as a single string—namely, the string that is formed by connecting successive lines with a line separator, such as a newline character or a carriage return/newline sequence. Indeed, a multiline program stored in a computer file is represented by just such a string. Thus, the multiple-line program
class C { public static void main (String[] args) { } }
is the single string
where represents the line separator. The Java language is the set of all strings over the Java alphabet that are valid Java programs.
A language can be either finite or infinite and may or may not have a meaning associated with each string. The Java language is infinite and has a meaning associated with each string. The meaning of each string in the Java language is what it tells the computer to do. In contrast, the language is finite and has no meaning associated with each string. Nevertheless, we still consider it a language. A language is simply a set, finite or infinite, of strings, each of which may or may not have an associated meaning.
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
