Compiler Construction Using Java, JavaCC, and Yacc - Anthony J. Dos Reis - E-Book

Compiler Construction Using Java, JavaCC, and Yacc E-Book

Anthony J. Dos Reis

0,0
96,99 €

oder
-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

Broad in scope, involving theory, the application of that theory, and programming technology, compiler construction is a moving target, with constant advances in compiler technology taking place. Today, a renewed focus on do-it-yourself programming makes a quality textbook on compilers, that both students and instructors will enjoy using, of even more vital importance. This book covers every topic essential to learning compilers from the ground up and is accompanied by a powerful and flexible software package for evaluating projects, as well as several tutorials, well-defined projects, and test cases.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 838

Veröffentlichungsjahr: 2012

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



CONTENTS

Cover

Half Title page

Title page

Copyright page

Dedication

Preface

Notable Features

Software Package

Projects

Useful References

Acknowledgments

Chapter 1: Strings, Languages, and Compilers

1.1 Introduction

1.2 Basic Language Concepts

1.3 Basic Compiler Concepts

1.4 Basic Set Theory

1.5 Null String

1.6 Concatenation

1.7 Exponent Notation

1.8 Star Operator (Also Known As the Zero-or-More Operator)

1.9 Concatenation of Sets of Strings

1.10 Plus Operator (Also Known As the One-or-More Operator)

1.11 Question Mark Operator (Also Known as Zero-or-One Operator)

1.12 Shorthand Notation for a Set Containing a Single String

1.13 Operator Precedence

1.14 Regular Expressions

1.15 Limitations of Regular Expressions

Problems

Chapter 2: Context-Free Grammars, Part 1

2.1 Introduction

2.2 What is a Context-Free Grammar?

2.3 Derivations Using a Context-Free Grammar

2.4 Language Defined by a Context-Free Grammar

2.5 Different Ways of Representing Context-Free Grammars

2.6 Some Simple Grammars

2.7 Techniques for Generating Languages with Context-Free Grammars

2.8 Regular and Right Linear Grammars

2.9 Counting with Regular Grammars

2.10 Grammars for Lists

2.11 An Important Language that is not Context-Free

Problems

Chapter 3: Context-Free Grammars, Part 2

3.1 Introduction

3.2 Parse Trees

3.3 Leftmost and Rightmost Derivations

3.4 Substitution

3.5 Ambiguous Grammars

3.6 Determining Nullable Nonterminals

3.7 Eliminating Lambda Productions

3.8 Eliminating Unit Productions

3.9 Eliminating Useless Nonterminals

3.10 Recursion Conversions

3.11 Adding the Null String to a Language

Problems

Chapter 4: Context-Free Grammars, Part 3

4.1 Introduction

4.2 Grammars for Arithmetic Expressions

4.3 Specifying Associativity and Precedence in Grammars

4.4 Backus-Naur Form

4.5 Syntax Diagrams

4.6 Abstract Syntax Trees and Three-Address Code

4.7 Noncontracting Grammars

4.8 Essentially Noncontracting Grammars

4.9 Converting a Context-Free Grammar to an Essentially Noncontracting Grammar

4.10 Pumping Property of Context-Free Languages (Optional)

Problems

Chapter 5: Chomsky’s Hierarchy (Optional)

5.1 Introduction

5.2 Context-Sensitive Productions

5.3 Context-Sensitive Grammars

5.4 Unrestricted Grammars

Problems

Chapter 6: Top-Down Parsing

6.1 Introduction

6.2 Top-Down Construction of a Parse Tree

6.3 Parses that Fail

6.4 A Bad Grammar for Top-Down Parsing

6.5 Deterministic Parsers

6.6 A Parser that Uses a Stack

6.7 Table Representation of a Stack Parser

6.8 Handling Productions with Nonleading Terminals

6.9 Writing a Stack Parser in Java

Problems

Chapter 7: LL(1) Grammars

7.1 Introduction

7.2 FIRST Set of the Right Side of a Production

7.3 Determining Operation Sequences

7.4 Determining Selection Sets of Lambda Productions

7.5 Whatever-Follows-Left-Follows-Rightmost Rule

7.6 Selection Sets for Productions with Nullable Right Sides

7.7 Selection Sets Containing the End-of-Input Marker

7.8 A Stack Parser for a Grammar with Lambda Productions

7.9 Converting a Non-LL(1) Grammar to an LL(1) Grammar

7.10 Parsing with an Ambiguous Grammar

7.11 Computing FIRST and FOLLOW Sets

Problems

Chapter 8: Table-Driven Stack Parser (Optional)

8.1 Introduction

8.2 Unifying the Operations of a Stack Parser

8.3 Implementing a Table-Driven Stack Parser

8.4 Improving Our Table-Driven Stack Parser

8.5 Parsers that are Not Deterministic—A Digression on Theory (Optional)

Problems

Chapter 9: Recursive-Descent Parsing

9.1 Introduction

9.2 A Simple Recursive-Descent Parser

9.3 Handling Lambda Productions

9.4 A Common Error

9.5 Java Code for Productions

9.6 Left Factoring in a Recursive-Descent Parser

9.7 Eliminating Tail Recursion

9.8 Translating the Star, Plus, and Question Mark Operators

9.9 Doing Things Backward

Problems

Chapter 10: Recursive-Descent Translation

10.1 Introduction

10.2 A Simple Translation Grammar

10.3 Converting a Translation Grammar to Java Code

10.4 Specifications for a Translation Grammar

10.5 Passing Information During the Parse

10.6 L-Attributed Grammars

10.7 A New Token Manager

10.8 Solving the Token Lookahead Problem

10.9 Code for the New Token Manager

10.10 Translation Grammar for Prefix Expression Compiler

10.11 An Interesting Use of Recursion (Optional)

Problems

Chapter 11: Assembly Language

11.1 Introduction

11.2 Structure of the J1 Computer

11.3 Machine Language Instructions

11.4 Assembly Language Instructions

11.5 Pushing Characters

11.6 aout Instruction

11.7 Using Labels

11.8 Using the Assembler

11.9 stav Instruction

11.10 Compiling an Assignment Statement

11.11 Compiling Print and Println

11.12 Outputting Strings

11.13 Inputting Decimal Numbers

11.14 Entry Directive

11.15 More Assembly Language

Problems

Chapter 12: S1—A Simple Compiler

12.1 Introduction

12.2 The Source Language

12.3 Grammar for Source Language

12.4 The Target Language

12.5 Symbol Table

12.6 Code Generator

12.7 token Class

12.8 Writing the Translation Grammar

12.9 Implementing the S1 Compiler

12.10 Trying Out S1

12.11 Advice on Extending the S1 Compiler

12.12 Specifications for S2

Problems

Chapter 13: JavaCC (Optional)

13.1 Introduction

13.2 JavaCC Extended Regular Expressions

13.3 JavaCC Input File

13.4 Specifying Actions for Regular Expressions

13.5 JavaCC Input File for S1j

13.6 Files Produced by JavaCC

13.7 Using the Star and Plus Operators

13.8 Choice Points and Lookahead

13.9 JavaCC’s Choice Algorithm

13.10 Syntactic and Semantic Lookahead (Optional)

13.11 Using JAVACC to Create a Token Manager Only

13.12 Using the Token Chain

13.13 Suppressing Warning Messages

Problems

Chapter 14: Building on S2

14.1 Introduction

14.2 Extending println and print

14.3 Cascaded Assignment Statement

14.4 Unary Plus and Minus

14.5 readint Statement

14.6 Controlling the Token Trace from the Command Line

14.7 Specifications for S3

Problems

Chapter 15: Compiling Control Structures

15.1 Introduction

15.2 while Statement

15.3 if Statement

15.4 do-while Statement

15.5 Range Checking of Numerical Constants

15.6 Handling Backslash-Quote in a String

15.7 Handling Backslash-Quote with JavaCC (Optional)

15.8 Univeral Blocks in javaCC (Optional)

15.9 Handling Strings that Span Lines

15.10 Handling Strings that Span Lines Using javaCC (Optional)

15.11 SPECIAL_TOKEN Block in javaCC (Optional)

15.12 Error Recovery

15.13 Error Recovery in JavaCC (Optional)

15.14 Specifications for S4

Problems

Chapter 16: Compiling Programs in Functional Form

16.1 Introduction

16.2 Separate Assembly and Linking

16.3 Calling and Returning from Functions

16.4 Source Language for S5

16.5 Symbol Table for S5

16.6 Code Generator for S5

16.7 Translation Grammar for S5

16.8 Linking with a Library

16.9 Specifications for S5

16.10 Extending S5 (Optional)

Problems

Chapter 17: Finite Automata

17.1 Introduction

17.2 Deterministic Finite Automata

17.3 Converting a DFA to a Regular Expression

17.4 JAVA Code for a DFA

17.5 Nondeterministic Finite Automata

17.6 Using an NFA as an Algorithm

17.7 Converting an NFA to a DFA with the Subset Algorithm

17.8 Converting a DFA to a Regular Grammar

17.9 Converting a Regular Grammar to an NFA

17.10 Converting a Regular Expression to an NFA

17.11 Finding the Minimal DFA

17.12 Pumping Property of Regular Languages

Problems

Chapter 18: Capstone Project: Implementing Grep Using Compiler Technology

18.1 Introduction

18.2 Regular Expressions for Our Grep Program

18.3 Token Manager for Regular Expressions

18.4 Grammar for Regular Expressions

18.5 Target Language for Our Regular Expression Compiler

18.6 Using an NFA for Pattern Matching

Problems

Chapter 19: Compiling to a Register-Oriented Architecture

19.1 Introduction

19.2 Using the Register Instruction Set

19.3 Modifications to the Symbol Table for R1

19.4 Parser and Code Generator for R1

Problems

Chapter 20: Optimization

20.1 Introduction

20.2 Using the ldc Instruction

20.3 Reusing Temporary Variables

20.4 Constant Folding

20.5 Register Allocation

20.6 Peephole Optimization

Problems

Chapter 21: Interpreters

21.1 Introduction

21.2 Converting S1 to I1

21.3 Interpreting Statements that Transfer Control

21.4 Implementing the Compiler-Interpreter CI1

21.5 Advantages of Interpreters

Problems

Chapter 22: Bottom-up Parsing

22.1 Introduction

22.2 Principles of Bottom-Up Parsing

22.3 Parsing with Right- Versus Left-Recursive Grammars

22.4 Bottom-Up Parsing with Ambiguous Grammars

22.5 Do-Not-Reduce Rule

22.6 SLR(1) Parsing

22.7 Shift/Reduce Conflicts

22.8 Reduce/Reduce Conflicts

22.9 LR(1) Parsing

Problems

Chapter 23: yacc

23.1 Introduction

23.2 yacc Input and Output Files

23.2 A Simple yacc-Generated Parser

23.4 Passing Values Using the Value Stack

23.5 Using yacc with an Ambiguous Grammar

23.6 Passing Values Down the Parse Tree

23.7 Implementing Sly

23.8 jflex

Problems

Appendix A: Stack Instruction Set

Appendix B: Register Instruction Set

References

Index

Compiler Construction Using Java, JavaCC, and Yacc

IEEEcomputer society

Press Operating Committee

Chair Linda Shaferformer Director, Software Quality Institute The University of Texas at Austin

Editor-in-Chief Alan ClementsProfessor University of Teesside

Board Members

Mark J. Christensen, Independent Consultant James W. Cortada, IBM Institute for Business Value Richard E. (Dick) Fairley, Founder and Principal Associate, Software Engineering Management Associates (SEMA) Phillip Laplante, Professor of Software Engineering, Penn State University Evan Butterfield, Director of Products and Services Kate Guillemette, Product Development Editor, CS Press

IEEE Computer Society Publications
The world-renowned IEEE Computer Society publishes, promotes, and distributes a wide variety of authoritative computer science and engineering texts. These books are available from most retail outlets. Visit the CS Store at http://computer.org/store for a list of products.
IEEE Computer Society / Wiley Partnership
The IEEE Computer Society and Wiley partnership allows the CS Press authored book program to produce a number of exciting new titles in areas of computer science, computing and networking with a special focus on software engineering. IEEE Computer Society members continue to receive a 15% discount on these titles when purchased through Wiley or at wiley.com/ieeecs
To submit questions about the program or send proposals please e-mail [email protected] or write to Books, IEEE Computer Society, 10662 Los Vaqueros Circle, Los Alamitos, CA 90720-1314. Telephone +1-714-816-2169.
Additional information regarding the Computer Society authored book program can also be accessed from our web site at http://computer.org/cspress.

Copyright © 2012 by the IEEE Computer Society, Inc.

Published by John Wiley & Sons, Inc., Hoboken, New Jersey. All rights reserved.

Published simultaneously in Canada.

No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permission.

Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representation or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.

For general information on our other products and services please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002.

Wiley also publishes its books in a variety of electronic formats. Some content that appears in print, however, may not be available in electronic formats. For more information about Wiley products, visit our web site at www.wiley.com.

Library of Congress Cataloging-in-Publication Data:

Dos Reis, Anthony J.Compiler construction using Java, JavaCC, and Yacc / Anthony J. Dos Reis.p. cm.ISBN 978-0-470-94959-7 (hardback)1. Compilers (Computer programs) 2. Java (Computer program language) I. Title.QA76.76.C65D67 2011005.4’53—dc23                                  2011013211

oBook ISBN: 978-1-118-11276-2ePDF ISBN: 978-1-118-11287-8ePub ISBN: 978-1-118-11277-9eMobi ISBN: 978-1-118-11288-5

To little sister

PREFACE

My principal goal in writing this book is to provide the reader with a clear exposition of the theory of compiler design and implementation along with plenty of opportunities to put that theory into practice. The theory, of course, is essential, but so is the practice.

NOTABLE FEATURES

Provides numerous, well-defined projects along with test cases. These projects ensure that students not only know the theory but also know how to apply it. Instructors are relieved of the burden of designing projects that integrate well with the text.

Project work starts early in the book so that students can apply the theory as they are learning it.

The compiler tools (JavaCC, Yacc, and Lex) are optional topics.

The entire book is Java oriented. The implementation language is Java. The principal target language is similar to Java’s bytecode. The compiler tools generate Java code. The form of attributed grammar used has a Java-like syntax.

The target languages (one is stack oriented like Java’s bytecode; the other is register oriented) are very easy to learn but are sufficiently powerful to support advanced compiler projects.

The software package is a dream come true for both students and instructors. It automatically evaluates a student’s compiler projects with respect to correctness, run time, and size. It is great for students: they get immediate feedback on their projects. It is great for instructors: they can easily and accurately evaluate a student’s work. With a single command, an instructor can generate a report for an entire class. The software runs on three platforms: Microsoft Windows, Linux, and the Macintosh OS X.

Demonstrates how compiler technology is not just for compilers. In a capstone project, students design and implement grep using compiler technology.

Includes a chapter on interpreters that fits in with the rest of the book.

Includes a chapter on optimization that is just right for an introductory course. Students do not simply read about optimization techniques—they implement a variety of techniques, such as constant folding, peephole optimization, and register allocation.

The book uses a Java-like form of grammars that students can easily understand and use. This is the same form that JavaCC uses. Thus, students can make transition to JavaCC quickly and easily.

Provides enough theory that the book can be used for a combined compiler/automata/formal languages course. The book covers most of the topics covered in an automata/formal languages course: finite automata, stack parsers, regular expressions, regular grammars, context-free grammars, context-sensitive grammars, unrestricted grammars, Chomsky’s hierarchy, and the pumping lemmas. Pushdown automata, Turing machines, computability, and complexity are discussed in supplements in the software package. The software package also includes a pushdown automaton simulator and Turing machine simulator.

Covers every topic that should be in a first course or in an only course on compilers. Students will learn not only the theory and practice of compiler design but also important system concepts.

SOFTWARE PACKAGE

The software package for the textbook has some unusual features. When students run one of their compiler-generated programs, the software produces a log file. The log file contains a time stamp, the student’s name, the output produced by the compiler-generated program, and an evaluation of the compiler-generated program with respect to correctness, program size, and execution time. If the output is not correct (indicating that the student’s compiler is generating incorrect code), the log file is marked with NOT CORRECT. If the compiled program is too big or the execution time too long, the log file is marked with OVER LIMIT.

The name of a log file contains the student’s name. For example, the log file for the S3 project of a student whose last name is Dos Reis would be S3.dosreis.log. Because each log file name is unique, an instructor can store all the log files for a class in a single directory. A single command will then produce a report for the entire class.

The software supports two instruction sets: the stack instruction set and the register instruction set. The stack instruction set is the default instruction set. To use the register instruction set, a single directive is placed in the assembly language source program. The software then automatically reconfigures itself to use the register instruction set.

The three principal programs in the software package are a (the assembler/linker), e (the executor), and I (the library maker). The software package also includes p (a pushdown automaton simulator) and t (a Turing machine simulator).

The software package for this book is available from the publisher. The compiler tools are available on the Web. At the time of this writing, JavaCC is at http://java.net/down-loads/javacc, Byacc/j is at http://byaccj.sourceforge.net/, and Jflex is at http://jflex.de/.

PROJECTS

This textbook specifies many well-defined projects. The source language has six levels of increasing complexity. A student can write a compiler for each level that translates to the stack instruction set. A student can also write a compiler for each level that translates to the register instruction set, or incorporates optimization techniques. For each level, a student can write a pure interpreter or an interpreter that uses an intermediate code. A student can implement several variations of grep using compiler technology. A student can write the code for any of these projects by hand or by using JavaCC or Yacc. Many of the chapter problems provide additional projects. In short, there are plenty of projects.

For each project, the textbook provides substantial support. Moreover, many of the projects are incremental enhancements of a previous project. This incremental approach works well; each project is challenging but not so challenging that students cannot do it.

Most projects can be completed in a week’s time. Thus, students should be able to do ten or even more projects in a single semester.

USEFUL REFERENCES

For background material, the reader is referred to the author’s An Introductions to Programming Using Java (Jones & Bartlett, 2010) and Assembly Language and Computer Architecture Using C++ and Java (Course Technology, 2004). Also recommended is JFLAP (available at http://www.jflap.org), an interactive program that permits experimentation with various types of automata and grammars.

ACKNOWLEDGMENTS

I would like to thank Professors Robert McNaughton and Dean Arden who years ago at RPI taught me the beauty of formal language theory, my students who used preliminary versions of the book and provided valuable feedback, Katherine Guillemette for her support of this project, my daughter Laura for her suggestions on the content and structure of the book, and my wife for her encouragement.

ANTHONY J. DOS REIS

New Paltz, New York

October 2011

CHAPTER 1

STRINGS, LANGUAGES, AND COMPILERS

1.1 INTRODUCTION

Compiler construction is truly an engineering science. With this science, we can methodically—almost routinely—design and implement fast, reliable, and powerful compilers.

You should study compiler construction for several reasons:

Compiler construction techniques have very broad applicability. The usefulness of these techniques is not limited to compilers.

To program most effectively, you need to understand the compiling process.

Language and language translation are at the very heart of computing. You should be familiar with their theory and practice.

Unlike some areas of computer science, you do not typically pick up compiler construction techniques “on the job.” Thus, the formal study of these techniques is essential.

To be fair, you should also consider reasons for not studying compiler construction. Only one comes to mind: Your doctor has ordered you to avoid excitement.

1.2 BASIC LANGUAGE CONCEPTS

In our study of compiler design theory, we begin with several important definitions. An alphabet is the finite set of characters used in the writing of a language. For example, the alphabet of the Java programming language consists of all the characters that can appear in a program: the upper- and lower-case letters, the digits, whitespace (space, tab, newline, and carriage return), and all the special symbols, such as =, +, and {. For most of the examples in this book, we will use very small alphabets, such as {b, c} and {b, c, d}. We will avoid using the letter “a” in our alphabets because of the potential confusion with the English article “a”.

A string over an alphabet is a finite sequence of characters selected from that alphabet. For example, suppose our alphabet is {b, c, d}. Then

cbd cbcc c

are examples of strings over our alphabet. Notice that in a string over an alphabet, each character in the alphabet can appear any number of times (including zero times) and in any order. For example, in the string cbcc (a string over the three-letter alphabet {b, c, d}), the character b appears once, c appears three times, and d does not appear.

A language is a set of strings over some alphabet. For example, the set containing just the three strings cbd, cbcc, and c is a language. This set is not a very interesting language, but it is, nevertheless, a language according to our definition.

Let us see how our definitions apply to a “real” language—the programming language Java. Consider a Java program all written on a single line:

class C {public static void main (String[] args) {} }

Clearly, such a program is a single string ove the alphabet of Java. We can also view a multiple-line program as a single string—namely, the string that is formed by connecting successive lines with a line separator, such as a newline character or a carriage return/newline sequence. Indeed, a multiline program stored in a computer file is represented by just such a string. Thus, the multiple-line program

class C {     public static void main (String[] args)     {     } }

is the single string

where represents the line separator. The Java language is the set of all strings over the Java alphabet that are valid Java programs.

A language can be either finite or infinite and may or may not have a meaning associated with each string. The Java language is infinite and has a meaning associated with each string. The meaning of each string in the Java language is what it tells the computer to do. In contrast, the language is finite and has no meaning associated with each string. Nevertheless, we still consider it a language. A language is simply a set, finite or infinite, of strings, each of which may or may not have an associated meaning.

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!