Syllabus for

Harvard Extension School CSCI E-95 (formerly CSCI E-295)

Compiler Design and Implementation (15174)

Fall 2016
Site last revised 5:13 PM 2-Jan-2017

Dr. James L. Frankel

 

The next time CSCI E-95 will be offered is in the Spring 2018 semester.

The class ski trip will take place on Sunday, January 22, 2017. We'll be heading up to Killington, Vermont. Killington is the biggest ski area in the eastern US with suitable terrain for all skiers from beginners to experts. Bring your significant other and have a fun day.
Discount lift tickets are available at www.liftopia.com and possibly other web sites if you reserve early enough in advance.
Also, January is National Learn to Ski and Snowboard Month and Killington is participating (see www.killington.com/site/to-do/snow-sports-school/special_offers). You can purchase a $49 Learn-To package as follows: 1-day packages will include a lesson, rental equipment, and a limited Learning Area lift ticket for first-time skiers and riders ages 7 and up; Offer may not be combined with multi-day Learn To packages and discounts; Advanced reservations are recommended as space is limited to the first five reservations per day; Additional restrictions may apply; Offer not valid on January 1, 2017 & January 14-15, 2017. Also, Killington has a "Bring a Friend, Ski Free." Bring a friend or family member any day during the winter season and when your friend purchases a Learn To Ski & Ride package, you'll receive a FREE lift ticket valid the same day. This offer is only valid when friend/family member is a first time skier or snowboarder ages 7 years and older; Reservations required at least 48-hours in advance; mention "Bring a Friend" promotion; Offer cannot be combined with any other offers and discounts.

Before 4 PM ET on Tuesday, December 13, 2016, send the URL of your presentation to the course staff and, if you're a distance student who is not going to be present in the classroom, your logname for either Skype and/or Google Hangouts.

All students making a final project presentation in the last class meeting need to send e-mail to the course staff requesting a slot.

Final Project Presentation Slots:
Time Slot Available
5:30 PM - 5:45 PM Slot 1 Filled by Nate
5:45 PM - 6:00 PM Slot 2 Filled by Annalisa
6:00 PM - 6:15 PM Slot 3 Filled by Mark Warren
6:15 PM - 6:30 PM Slot 4 Filled by Stephen
6:30 PM - 6:45 PM Slot 5 Open
6:45 PM - 7:00 PM Slot 6 Filled by Matt
7:00 PM - 7:15 PM Slot 7 Filled by Rajeev
7:15 PM - 7:30 PM Slot 8 Open
7:30 PM - 7:45 PM Slot 9 Filled by Ansuman
7:45 PM - 8:00 PM Slot 10 Filled by Anna
8:00 PM - 8:15 PM Slot 11 Filled by Satyam
8:15 PM - 8:30 PM Slot 12 Filled by Bennett
8:30 PM - 8:45 PM Slot 13 Open
8:45 PM - 9:00 PM Slot 14 Filled by Nikitha
9:00 PM - 9:15 PM Slot 15 Filled by Surbhi
9:15 PM - 9:30 PM Slot 16 Open
9:30 PM - 9:45 PM Slot 17 Filled by Pavan
9:45 PM - 10:00 PM Slot 18 Filled by Michael
10:00 PM - 10:15 PM Slot 19 Filled by Debanshu

15-Nov-2016: Updated Next-Use, Liveness & Register Allocation slides to include details on the creation of the Register Interference Graph.

8-Nov-2016: Defined additional IR operation codes in the Intermediate Representation slides. See slides with the title Quadruple Operation Codes.

1-Nov-2016: Amended Problem Set 3 to clarify how to print types in the symbol tables. Updated Problem Set 4 to contain a reminder that we have no tentative declarations in our language, that casts should be inserted when an array is converted into a pointer, and to have a specific form for ++, --, and compound assignment "operand" and "result" casts.

13-Oct-2016: Clarified Problem Set 3 to indicate that it is necessary to deal appropriately with arrays as formal parameters. Also, added slides to the Review of the C Programming Language slide deck that explicitly mention how to deal with arrays as both formal and actual parameters to functions.

8-Oct-2016: Problem Set 3 is now due on Monday night, October 10, 2016 at midnight ET (rather than on Sunday night, October 9, 2016 at midnight ET).

19-Sep-2016: Calendar clarification: We will be holding a class meeting on Tuesday, November 22, 2016.

11-Sep-2016: Problem Set 1 has been updated to clarify the type of a "char" without a "signed" or "unsigned" modifier so that it is possible to determine if character constants need to potentially have sign-extention.

10-Sep-2016: Problem Set 1 has been updated to clarify the minimum size of string literals that your compiler should be able to accept, but, because of the late addition of this information, this will not be required. However, if you place a limit on the length of string literals, an error should be emitted for string literals that exceed that length. Additional extra credit sections have also been added.

8-Sep-2016: Problem Set 1 has been updated to clarify the differences between the Source Character Set and the Execution Character Set and how this affects characters that can be expressed in backslash escape codes.

 

Quick Links:

  • Class Hours & Location
  • Distance Learning Links
  • Prerequisites
  • Brief abstract
  • Overview
  • Books/Course Bibliography
  • MIPS Assembly Language Programming
  • Instructor
  • Teaching Assistants
  • Grading
  • Plagiarizing
  • Course Outline
  • Approximate Schedule
  • Course Documents On-Line

    Tuesdays 7:40-10:15 PM in 53 Church Street, Room L01.

    Distance Learning Links:

    During Class:

    Live video stream (during class meeting)

    Chat room (during class & section meetings)

    After Class:

    Videos of class and section are available on the course's Canvas web site under Lecture Video.

    Midterm Exam

    Our midterm exam is a three hour long proctored exam. Students who live within the six-state New England area (Connecticut, Maine, Massachusetts, New Hampshire, Rhode Island, and Vermont) must come to campus to take the exam (of course, any student may choose to come to campus to take the exam). Students who do not take the exam on campus in our classroom are required to procure the services of a Harvard-approved proctor as described in https://www.extension.harvard.edu/resources-policies/exams-grades-transcripts/exams-online-courses. That document states that the exam must be taken "within a specific 24-hour period." In our case, the 24-hour period starts on the date and time of our in-class exam and ends 24 hours later. That is, the exam must be started within that 24-hour period.

    The exam allows open-book access to *only* the two required textbooks. No notes are allowed. No electronic devices are allowed.

    Prerequisites:

    Knowledge of data structures and programming experience CSCI E-22 (formerly CSCI E-119) (Data Structures) with an advanced algorithms course preferred, but not required CSCI E-124 (Data Structures and Algorithms) or equivalents.

    Brief abstract:

    A study of the theory and practice required for the design and implementation of interpreters and compilers for programming languages. Coursework will range from the abstract, such as categorization of grammars and languages, to the concrete, such as specific algorithms used in compilers and practical performance issues. Topics include lexical analysis, parsing, symbol table generation, type checking, error detection, code generation, optimization, and run-time support. Techniques for top-down and bottom-up parsing both with and without the use of automated tools will be studied. Local and global optimization will be covered.

    4 credits. Noncredit, undergraduate, and graduate credit.

    Overview:

    Computer Science E-95 is a comprehensive introduction to the theory and practice of compiler design and implementation. Students are already expected to be comfortable with designing, coding, and debugging programs of modest complexity while employing good programming style and structured techniques. In particular, familiarity with terminal and text file I/O, iterative and conditional control structures, parameter passing and recursion, data structures, classes and object-oriented design in Java or C++ is presumed.

    A significant portion of the class will be the design and implementation of a major term project. The project will be developed by students working alone. That project is the creation of a compiler for a simple C-like programming language that produces code for the MIPS instruction set. The project will include the lexer, parser, symbol table manager, simple optimizer, and code generator. Initially, both the classroom lectures and the section meetings will be covering material important to the design and implementation of the final project. Later in the semester, advanced compiler techniques will be covered in class; however, both the class and sections will continue to support students as term projects progress. For the term project, students will continue working on and debugging their projects leading to their complete implementation and a final demonstration.

    Because the course requires a significant term project involving significant programming, the assignments will be time-consuming; therefore, a significant time commitment to the course is necessary. Although the relevant experience of students in the class is usually quite diverse, depending on background, it is not unusual for students to spend 15-20 hours per week or more completing the readings and homework assignments. Although the computers are available more-or-less around the clock, occasionally they will suddenly become unavailable (this is known as a crash). As with all such events, they always seem to occur at the worst possible time. Plan your computer work so that it is complete in advance of the deadlines. You have now been forewarned!

    Books/Course Bibliography:

    All course books are available from the Harvard Coop.

    Textbook:

    Compilers: Principles, Techniques, and Tools, 2/e; Alfred V. Aho, Monica S. Lam, Ravi Sethi, Jeffrey D. Ullman; Addison-Wesley, 2007; ISBN-10 0-321-48681-1; ISBN-13 978-0-321-48681-3; Errata for the Second Edition

    C Language Reference Manual:

    C: A Reference Manual, Fifth Edition; Samuel P. Harbison and Guy L. Steele, Jr.; Prentice Hall, 2002; ISBN-10 0-13-089592-X; ISBN-13 978-0-13-089592-9

    Optional MIPS Architecture Book:

    MIPS RISC Architecture, 2/e; Gerry Kane and Joseph Heinrich; Prentice Hall, 1992; ISBN-10 0-13-590472-2; ISBN-13 978-0-13-590472-5

    There will also be other handouts & supplementary readings

    MIPS Assembly Language Programming:

    Further information is available about SPIM and SPIM on SourceForge, the MIPS assembly language simulator that is used in class and needed for problem sets and for the term project.

    Sample MIPS assembly code is available:
    printint.s
    printstring.s
    readstring.s
    count.s
    count2.s
    count3.s
    squares.s
    storedints.s

    Instructor:

    Dr. James L. Frankel Dr. Frankel's Photo,

    Teaching Assistants:

    Each student is assigned to a Teaching Assistant (TA). Each TA holds weekly sections and office hours as described below. Attendance at your TA's section is strongly recommended. Although your TA should be your primary point of contact for questions and issues, you are welcome to attend another TA's section and/or office hours in addition.

    TA Section Meeting Time/Place Office Hours Time/Place Phone
    Daniel Willenson
    Daniel's Photo,
    Section Site
    Tuesday,
    6:35-7:35 PM ET,
    1 Story Street, Room 303
    Monday,
    7:30-8:30 PM ET by appt. only,
    Science Center, Room 101e
    E-mail: Daniel's e-mail address; +1.571.265.2932 (9:00 AM - 9:00 PM ET). If there's no answer, please leave a message with your name and a call-back number.
    Mark Ford
    Mark's Photo,
    Section Site
    Tuesday,
    6:35-7:35 PM ET,
    1 Story Street, Room 303
    Thursday,
    6:30-7:30 PM ET by appt. only,
    Science Center, Room 101e
    E-mail: Mark's e-mail address; +1.978.496.7213 (1:00 PM - 9:00 PM ET). If there's no answer, please leave a message with your name and a call-back number.

    Grading:

    Graduate-credit students:

    All problem sets and programming assignments are due at midnight on Sunday night (i.e., midnight between Sunday and Monday) Eastern Time (unless otherwise stated in the assignment or in the syllabus). Unless otherwise stated, all programming assignment solutions must be written in C or C++ (because the C programming language is the same language that the compiler will accept as input), have run under Cygwin or on nice.fas.harvard.edu, be submitted using "git" on cr.cscie9x.net (or, in dire circumstances, via submit or e-mail only if agreed to by your TA), be well-written (clear coding style, modular structure, appropriately commented and documented in English), and tested (include any programs and/or shell scripts used in testing your solution and a transcript showing your program being tested (using the "script" program) as part of your submission). In addition, each submission must include a makefile to build the assignment. The grade for programming assignments will include all of these attributes. (Of course, the solutions may be written and tested using any system of the student's choosing; however, when the solution is complete, it must be pushed to the git code repository on cr.cscie9x.net. Also, in addition to using code development tools under Cygwin, you may use the "nice" course computers and test your code there! We will be grading the solutions based on their behavior under Cygwin or on the Harvard "nice" computers.) The Science Center computers may be accessed using "ssh" over the Internet. Files may be transferred to these systems using "secure ftp" (SFTP). The SecureCRT and SecureFX programs are available from the Science Center at www.fas.harvard.edu/computing/download; these programs implement "ssh" and "secure ftp," respectively. Remember, in addition to handing in all parts of the problem set solution or programming assignment program, sample runs of the program which demonstrate that the program works must be attached.

    A late homework will lose 5% of its original grade for each day it is late (e.g. an assignment handed in two and a half days late will receive its original grade multiplied by 0.85). Late assignments may be submitted via "git" and an e-mail message notifying the instructor and your teaching assistant should be sent immediately after the late assignment is submitted. In addition, each student is given five free late days that may be used freely during the semester. However, keep in mind that almost all of the assignments are built on the previous assignments; handing in one assignment late does not extend the due date for subsequent assignments. The scope and difficultly level of the assignments increases during the class; therefore, we recommend against using the five free late days early in the class.

    Separate documentation is available describing how to install and use git on the section web site.

    In the preceding paragraph, the phrase "commented and documented" is used; this paragraph will clarify the necessary comments and documentation that should be provided with all programs. First, there should be a description of the entire application. This should include the user interface (i.e., how a user interacts with the program) and an explanation of what the program does. This documentation may be in a separate file from the program itself. Second, there should be a description at the beginning of each file which outlines the contents of that file. Third, each routine, function, method, etc. must be preceded by a section describing: (1) the name of the routine, (2) the purpose/function of the routine, (3) the parameters to the routine (name, type, meaning), (4) the return value from the routine (type, meaning), and (5) any side-effects (including modifying global variables, performing I/O, modifying heap-based storage, etc.) that the routine may cause. Fourth, declarations of variables should be commented with their purpose. Fifth, blocks of code should be commented to describe the purpose of the code section. Sixth, any complex or difficult to understand code statements or fragments should be commented to clarify their behavior.

    In addition to programming in conventional languages (either C, C++, or Java), students will learn how to write code in MIPS assembly language. This is the low-level language used by MIPS computers. All students are required to use the SPIM, released June, 2013. This software may be downloaded from the xxx, is free, and no license is required. The software runs only on either Windows or Linux.

    When using "git" and cr.cscie9x.net, make sure to follow the information on using "git" and setting up your repository that is available on the section web site.

    Our midterm exam is a three hour long proctored exam. Students who live within the six-state New England area (Connecticut, Maine, Massachusetts, New Hampshire, Rhode Island, and Vermont) must come to campus to take the exam (of course, any student may choose to come to campus to take the exam). See Distance Learning Links: Midterm Exam for more information for distance students.

    The exam allows open-book access to *only* the two required textbooks. No notes are allowed. No electronic devices are allowed.

    Plagiarizing:

    All work should be the personal creation of the individual student. Students are free to consult with each other and to study together, but all problem set solutions, programming assignments, exams, and the final project must be the personal contribution of each individual student. More explicitly, whenever a concept is reduced to a detailed algorithm or a program, no collaboration is allowed. If a paper, assignment, exam, program, or final project contains any information, algorithms, program fragments or other intellectual property taken from another source, that source and material must be explicitly identified and credit given. If you have any questions about this policy, it is the student's responsibility to clarify whether their activity is considered plagiarism.

    Course Outline:

    Approximate Schedule:

    August Description
    28 Registration ends.
    30 First class meeting. Introduction, course information & policies, outline, schedule. Present Problem Sets 0 and 1. Overview of Compiler. Review of the C Programming Language.

     

    September Description
    4 at Midnight Problem Set 0 (the course questionnaire) due.
    5 Labor Day
    6 Late registration ($50 late fee) ends; last day for course and credit status changes; last day to drop courses with full-tuition refund.
    6 Second class meeting. Introduction to Compiling. A Simple One-Pass Compiler. Regular Expressions. Presentation about Lex. Go over lexer-standalone.lex combined with lexer.c. Continue with Review of the C Programming Language. For today's class meeting, read Aho/Lam/Sethi/Ullman chapters 1-3.
    11 at Midnight Problem Set 1 due.
    13 Last day to drop courses with half-tuition refund.
    13 Third class meeting. Complete the Review of the C Programming Language. Present Problem Set 2. Syntax Analysis. Context-Free grammars. Ambiguity in grammars. Elimination of Left Recursion. Left Factoring. Presentation about Yacc. Go over lexer.lex combined with parser.y. Go over the grammar for our subset C Language. For today's class meeting, read Aho/Lam/Sethi/Ullman chapter 4.
    20 Fourth class meeting. Lexical Analysis theory. Transition Diagrams. Present NFA and DFA. How to create an NFA from a regex. How to convert an NFA into a DFA. Complete Syntax Analysis. Top-Down Parsing. Show recursiveDescentParser.c. Present Problem Set 3. Symbol Table Management. Types in the C Programming Language. Representation of Types in a Compiler. For today's class meeting, read Aho/Lam/Sethi/Ullman chapter 5.
    25 at Midnight Problem Set 2 due.
    27 Fifth class meeting. Parse Tree, AST, Type Tree. Generation of Symbol Tables (Incl. Scope & Overloading Classes). FIRST and FOLLOW Functions. LL(1) Grammars. How to construct a Predictive Parsing Table, M. Table-driven Predictive Parsing. Type Checking: Integral and Floating-Point Number Representations. For today's class meeting, read Aho/Lam/Sethi/Ullman chapter 6.

     

    October Description
    4 Sixth class meeting. Type Checking. Complete IEEE 754 Floating-Point Number Representation. C Standard Conversions. Bottom-Up Parsing. Shift-Reduce Parsing. MIPS Architecture and Instruction Set. Overview of CPU, Registers, Memory. Instruction Formats: I-Type (Immediate), J-Type (Jump), and R-Type (Register). Instruction Set Presentation: Arithmetic R-Type, Arithmetic Immediate, Load/Store, Jump and Branch. For today's class meeting, read Aho/Lam/Sethi/Ullman chapter 6.
    9 at Midnight Problem Set 3 due.
    10 Columbus Day
    11 Seventh class meeting. Syntax-Directed Translation. Run-Time Environments. Storage organization (code/text, static storage, heap, stack). Stack frames/activation records. Nested procedure definitions. Heap management. For today's class meeting, read Aho/Lam/Sethi/Ullman chapter 7.
    18 Midterm exam. Eighth class meeting.
    25 Ninth class meeting. Intermediate Code Generation. Three-address (quadruples) IR notation. Examples of IR code generation from all parse trees (expressions, accessing user variables for load and store, branching, conditional branching, function calling/return sequence, subscripting, pointer creation and dereferencing, casting). Dealing with types (including size and signedness) in IR. Single assignment form. Correctly handling lvalues and rvalues in the creation of IR nodes. Lvalues as a result of pointer dereferencing. For today's class meeting, read Aho/Lam/Sethi/Ullman chapter 6.
    30 at Midnight Problem Set 4 due.

     

    November Description
    1 Tenth class meeting. Code Generation. MIPS Architecture and Instruction Set. Overview of CPU, Registers, Memory. Instruction Formats: I-Type (Immediate), J-Type (Jump), and R-Type (Register). Instruction Set Presentation: Arithmetic R-Type, Arithmetic Immediate, Load/Store, Jump and Branch. For today's class meeting, read Aho/Lam/Sethi/Ullman chapter 8.
    8 Election Day

    Eleventh class meeting. Code Generation continued. MIPS Architecture and Instruction Set. Instruction Set Presentation: Jump and Branch (continued), Shift, Multiply/Divide. Review sample programs on the class web site. Present algorithm using graph coloring for register allocation. For today's class meeting, read Aho/Lam/Sethi/Ullman chapter 8.
    13 at Midnight Problem Set 5 due.
    15 Twelfth class meeting. Code Optimization. Basic Blocks. Use/Def Analysis. Liveness/Next Use Analysis. Reaching Definitions. Register Allocation & Spilling using graph coloring. Optimizations: Common subexpression elimination, copy propagation, dead code elimination, constant folding, code motion, reduction in strength, induction variables and reduction in strength, identities, inlining of functions, loop reordering, loop unrolling, array alignment/padding/layout, jump threading, instruction scheduling, tail recursion elimination. Order of optimizations. For today's class meeting, read Aho/Lam/Sethi/Ullman chapter 9.
    22 Thirteenth class meeting. Code Optimization continued. Complete discussion of tail recursion. Types of dependencies: flow/data/true, anti-, output, and control. Loop carried dependencies. Instruction-Level Parallelism. Data flow computing model. VLIW "Trace" scheduling. For today's class meeting, read Aho/Lam/Sethi/Ullman chapter 9.
    23-27 Thanksgiving Break
    25 Last day to withdraw with no tuition refund, course remains on record with WD/WN (withdrawal) grade.
    29 Fourteenth class meeting. Assertions vs. Assumptions. Optimizing for Parallelism and Data Locality. Massively-parallel computing model. Additional topics. Today we will discuss open issues for the Term Projects. For today's class meeting, read Aho/Lam/Sethi/Ullman chapter 10-12.

     

    December Description
    4 at Midnight Problem Set 6 due.
    6 Fifteenth class meeting. Additional topics. Today we will discuss open issues for the Term Projects.
    13-19 Exam period.
    13 Final Class Meeting during usual section and class time. Student project presentations/demonstrations.
    16 by 2 PM ET Term Project report, slides, code, makefiles, test programs, etc. are due.
    20-January 2, 2017 Winter Break.

     

    Course Documents On-Line:

    Slides used in class are available on-line:

    The course questionnaire is available on-line.

    The class problem sets are also available:

    A Piazza Wiki/Forum (on-line discussion list) for CSCI E-95 is set up at Harvard Extension School CSCI E-95 Piazza Forum.

    Look here for information about the GNU Project and the Free Software Foundation.

    Look here for information about getting GNU Emacs for Windows 95/98/ME/NT/XP and 2000.

    Look here for information about getting the Cygwin Linux-like environment for Windows.

    Look here for Unix distributions.

    There are computers available for Extension student use in the Science Center and at 53a Church Street.

    Harvard University Information Technology:

    Programs used in class as examples are also available:

    The grammar for the C Programming Language is available here.
    The HTML version of the grammar for the C Programming Language is available here.

    The C* Slides are available here in PDF format: The C* Language.

    Section specific home page is available: