Modernizing Legacy Applications in PHP - Paul M. Jones - E-Book

Modernizing Legacy Applications in PHP E-Book

Paul M. Jones

0,0
38,39 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

Get your code under control in a series of small, specific steps

About This Book

  • Learn to extract and replace legacy artifacts,
  • Improve your application from the ground up while keeping your codebase fully operational,
  • Improve the quality of your legacy applications.

Who This Book Is For

PHP developers from all skill levels will be able to get value from this book and will be able to transform their spaghetti code applications to clean, modular applications. If you are in the midst of a legacy refactor or you find yourself in a state of despair caused by the code you have inherited, this is the book for you. All you need is to have PHP 5.0 installed, and you're all set to change the way you maintain and deploy your code!

What You Will Learn

  • Replace global and new with dependency injection
  • Extract SQL statements to gateways
  • Convert action logic to controllers
  • Remove repeated logic in page scripts
  • Create maintainable PHP code from crufty legacy PHP

In Detail

Have you noticed that your legacy PHP application is composed of page scripts placed directly in the document root of the web server? Or, do your page scripts, along with any other classes and functions, combine the concerns of model, view, and controller into the same scope? Is the majority of the logical flow incorporated as include files and global functions rather than class methods? Working with such a legacy application feels like dragging your feet through mud, doesn't it?This book will show you how to modernize your application in terms of practice and technique, rather than in terms of using tools like frameworks and libraries, by extracting and replacing its legacy artifacts. We will use a step-by-step approach, moving slowly and methodically, to improve your application from the ground up. We'll show you how dependency injection can replace both the new and global dependencies. We'll also show you how to change the presentation logic to view files and the action logic to a controller. Moreover, we'll keep your application running the whole time. Each completed step in the process will keep your codebase fully operational with higher quality. When we are done, you will be able to breeze through your code like the wind. Your code will be autoloaded, dependency-injected, unit-tested, layer-separated, and front-controlled. Most of the very limited code we will add to your application is specific to this book. We will be improving ourselves as programmers, as well as improving the quality of our legacy application.

Style and approach

This book gives developers an easy-to-follow, practical and powerful process to bring their applications up to a modern baseline. Each step in the book is practical, self-contained and moves you closer to the end goal you seek: maintainable code. As you follow the exercises in the book, the author almost anticipates your questions and you will have the answers, ready to be implemented on your project.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 316

Veröffentlichungsjahr: 2016

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Table of Contents

Modernizing Legacy Applications in PHP
Credits
Foreword
About the Author
Acknowledgement
www.PacktPub.com
eBooks, discount offers, and more
Why subscribe?
Preface
1. Legacy Applications
The typical PHP application
File Structure
Page Scripts
Rewrite or Refactor?
The Pros and Cons of Rewriting
Why Don't Rewrites Work?
The Context-switching problem
The Knowledge problem
The Schedule Problem
Iterative Refactoring
Legacy Frameworks
Framework-based Legacy Applications
Refactoring to a Framework
Review and next steps
2. Prerequisites
Revision control
PHP version
Editor/IDE
Style Guide
Test suite
Review and next steps
3. Implement an Autoloader
PSR-0
A Single Location for Classes
Add Autoloader Code
As a Global Function
As a Closure
As a Static or Instance method
Using The __autoload() Function
Autoloader Priority
Common Questions
What If I Already Have An Autoloader?
What are the Performance Implications Of Autoloading?
How Do Class Names Map To File Names?
Review and next steps
4. Consolidate Classes and Functions
Consolidate Class Files
Find a candidate include
Move the class file
Remove the related include calls
Spot check the codebase
Commit, Push, Notify QA
Do ... While
Consolidate functions into class files
Find a candidate include
Convert the function file to a class file
Change function calls to static method calls
Spot check the static method calls
Move the class file
Do ... While
Common Questions
Should we remove the autoloader include call?
How should we pick files for candidate include calls?
What if an include defines more than one class?
What if the one-class-per-file rule is disagreeable?
What if a Class or Function is defined inline?
What if a definition file also executes logic?
What if two classes have the same name?
What about third-party libraries?
What about system-wide libraries?
For functions, can we use instance methods instead of static methods?
Can we automate this process?
Review and next steps
5. Replace global With Dependency Injection
Global Dependencies
The replacement process
Find a global variable
Convert global variables to properties
Spot check the class
Convert global properties to constructor parameters
Convert instantiations to use parameters
Spot check, Commit, Push, Notify QA
Do ... While
Common Questions
What if we find a global in a static method?
Is there an alternative conversion process?
What about class names in variables?
What about superglobals?
What about $GLOBALS?
Review and next steps
6. Replace new with Dependency Injection
Embedded instantiation
The replacement process
Find a new keyword
Extract One-Time creation to dependency injection
Extract repeated creation to factory
Change instantiation calls
Spot Check, Commit, Push, Notify QA
Do ... While
Common Questions
What About Exceptions and SPL Classes?
What about Intermediary Dependencies?
Isn't this a lot of code?
Should a factory create collections?
Can we automate all these Injections?
Review and next steps
7. Write Tests
Fighting test resistance
The way of Testivus
Setting up a test suite
Install PHPUnit
Create a tests/ directory
Pick a class to test
Write a test case
Do ... While
Common Questions
Can we skip this step and do it later?
Come On, Really, Can We Do This Later?
What about hard-to-test classes?
What about our earlier characterization tests?
Should we test private and protected methods?
Can we change a test after we write it?
Do we need to test Third-party libraries?
What about code coverage?
Review and next steps
8. Extract SQL statements to Gateways
Embedded SQL Statements
The extraction process
Search for SQL statements
Move SQL to a Gateway class
Namespace and Class names
Method names
An initial Gateway class method
Defeating SQL Injection
Write a test
Replace the original code
Test, Commit, Push, Notify QA
Do ... While
Common Questions
What about INSERT, UPDATE, and DELETE Statements?
What about Repetitive SQL strings?
What about complex query strings?
What about queries inside non-Gateway classes?
Can we extend from a base Gateway class?
What about multiple queries and complex result structures?
What if there is no Database Class?
Review and next steps
9. Extract Domain Logic to Transactions
Embedded Domain Logic
Domain logic patterns
The Extraction Process
Search for uses of Gateway
Discover and Extract Relevant Domain Logic
Example Extraction
Spot check the remaining original code
Write tests for the extracted transactions
Spot check again, Commit, Push, Notify QA
Do ... While
Common Questions
Are we talking about SQL transactions?
What about repeated Domain Logic?
Are printing and echoing part of Domain Logic?
Can a transaction be a class instead of a Method?
What about Domain Logic in Gateway classes?
What about Domain logic embedded in Non-Domain classes?
Review and next steps
10. Extract Presentation Logic to View Files
Embedded presentation logic
The Extraction process
Search for Embedded presentation logic
Rearrange the Page script and Spot Check
Extract Presentation to View file and Spot Check
Create a views/ Directory
Pick a View File name
Move Presentation Block to View file
Add Proper Escaping
Write View File Tests
The tests/views/ directory
Writing a View File Test
Asserting Correctness Of Content
Commit, Push, Notify QA
Do ... While
Common Questions
What about Headers and Cookies?
What if we already have a Template system?
What about Streaming Content?
What if we have lots of Presentation variables?
What about class methods that generate output?
What about Business Logic Mixed into the presentation?
What if a page contains only presentation logic?
Review and next steps
11. Extract Action Logic to Controllers
Embedded action logic
The Extraction Process
Search for Embedded Action Logic
Rearrange the Page Script and Spot Check
Identify Code Blocks
Move Code to Its Related Block
Spot Check the Rearranged Code
Extract a Controller Class
Pick a Class Name
Create a Skeleton Class File
Move the Action Logic and Spot Check
Convert Controller to Dependency Injection and Spot Check
Write a Controller Test
Commit, Push, Notify QA
Do ... While
Common Questions
Can we pass parameters to the Controller method?
Can a Controller have Multiple actions?
What If the Controller contains include Calls?
Review and next steps
12. Replace Includes in Classes
Embedded include Calls
The Replacement process
Search for include Calls
Replacing a Single include Call
Replacing Multiple include Calls
Copy include file to Class Method
Replace the original include Call
Discover coupled variables through testing
Replace other include Calls and Test
Delete the include file and test
Write a test and refactor
Convert to Dependency Injection and test
Commit, Push, Notify QA
Do ... While
Common QuestionsCan one class receive logic from many include files?
What about include calls originating in non-class files?
Review and next steps
13. Separate Public and Non-Public Resources
Intermingled resources
The separation process
Coordinate with operations personnel
Create a document root directory
Reconfigure the server
Move public resources
Commit, push, coordinate
Common Questions
Is This Really Necessary?
Review and next steps
14. Decouple URL Paths from File Paths
Coupled Paths
The Decoupling Process
Coordinate with Operations
Add a Front Controller
Create a pages/ Directory
Reconfigure the Server
Spot check
Move Page scripts
Commit, Push, Coordinate
Common Questions
Did we really Decouple the Paths?
Review and next steps
15. Remove Repeated Logic in Page Scripts
Repeated logic
The Removal Process
Modify the Front controller
Remove Logic from Page Scripts
Spot Check, Commit, Push, Notify QA
Common Questions
What if the Setup Work Is Inconsistent?
What if we used inconsistent naming?
Review and next steps
16. Add a Dependency Injection Container
What is a Dependency Injection Container?
Adding a DI Container
Add a DI Container Include File
Add a Router Service
Modify the Front Controller
Extract Page Scripts to Services
Create a Container Service
Route the URL Path to the Container Service
Spot Check and Commit
Do ... While
Remove pages/, Commit, Push, Notify QA
Common Questions
How can we refine our service definitions?
What if there are includes In the Page Script?
Can we reduce the size of the services.php file?
Can we reduce the size of the router service?
What if we cannot update to PHP 5.3?
Review and next steps
17. Conclusion
Opportunities for improvement
Conversion to Framework
Review and next steps
A. Typical Legacy Page Script
B. Code before Gateways
C. Code after Gateways
D. Code after Transaction Scripts
E. Code before Collecting Presentation Logic
F. Code after Collecting Presentation Logic
G. Code after Response View File
H. Code after Controller Rearrangement
I. Code after Controller Extraction
J. Code after Controller Dependency Injection
Index

Modernizing Legacy Applications in PHP

Modernizing Legacy Applications in PHP

Copyright © 2016 Paul M. Jones

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

First published: August 2016

Production reference: 1260816

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham B3 2PB, UK.

ISBN 978-1-78712-470-7

www.packtpub.com

Credits

Author

Paul M. Jones

Acquisition Editor

Frank Pohlmann

Technical Editor

Danish Shaikh

Indexer

Mariammal Chettiyar

Graphics

Disha Haria

Production Coordinator

Arvindkumar Gupta

Cover Work

Arvindkumar Gupta

Foreword

In early 2012, while attending a popular PHP conference in Chicago, I approached a good friend, Paul Jones, with questions about PSR-0 and autoloading. We immediately broke out my laptop to view an attempt at applying the convention and Paul really helped me put the pieces together in short order. His willingness to jump right in and help others always inspires me, and has gained my respect.

So in August of 2012 I heard of a video containing a talk given by Paul at the Nashville PHP User Group, and was drawn in. The talk, It Was Like That When I Got Here: Steps Toward Modernizing A Legacy Codebase, sounded interesting because it highlighted something I am passionate about: refactoring.

After watching I was electrified! I often speak about refactoring and receive inquiries on how to apply it for legacy code rather than performing a rewrite. Put another way, how is refactoring possible in a codebase where includes and requires are the norm, namespaces don't exist, globals are used heavily, and object instantiation runs rampant with no dependency injection? And what if the codebase is procedural?

Paul's focus of modernizing a legacy application filled the gap by getting legacy code to a point where standard refactoring is possible. His step-by-step approach makes it easier for developers to get the bear dancing so continued improving of code through refactoring can happen.

I felt the topic was a must see for PHP developers and quickly fired off an email asking if he'd be interested in flying to Miami and giving the same talk for the South Florida PHP User Group. Within minutes my email was answered and Paul even offered to drive down from Nashville for the talk. However, since I started organizing the annual SunshinePHP Developer Conference to be held February in Miami we decided to have Paul speak at the conference rather than come down earlier.

Fast forward two years later, and here we are in mid-2014. Developing with PHP has really matured in recent years, but it's no secret that PHP's low level of entry for beginners helped create some nasty codebases. Companies who built applications in the dark times simply can't afford to put things on hold and rebuild a legacy application, especially with today's fast paced economy and higher developer salaries. To stay competitive, companies must continually push developers for new features and to increase application stability. This creates a hostile environment for developers working with a poorly written legacy application. Modernizing a legacy application is a necessity, and must happen. Yet knowing how to create clean code and comprehending how to modernize a legacy application are two entirely different things.

Paul and I have been speaking to packed rooms at conferences around the world about modernizing and refactoring. Developers are hungry for knowledge on how to improve the quality of their code and perfect their craft. Unfortunately, we can only reach a small portion of PHP developers using these methods. The time has come for us to create books in hopes of reaching more PHP developers to improve the situation.

I see more and more developers embrace refactoring into their development workflow to leverage methods outlined in my talks and forthcoming book Refactoring 101. But understanding how to use these refactoring processes on a legacy codebase is not straight forward, and sometimes impossible. The book you're about to read bridges the gap, allowing developers to modernize a codebase so refactoring can be applied for continued enhancement. Many thanks to Paul for putting this together. Enjoy!

Adam Culp

(https://leanpub.com/refactoring101)

About the Author

Paul M. Jones is an internationally recognized PHP expert who has worked as everything from junior developer to VP of Engineering in all kinds of organizations (corporate, military, non-profit, educational, medical, and others). He blogs professionally at www.paul-m-jones.com and is a regular speaker at various PHP conferences.

Paul's latest open-source project is Aura for PHP. Previously, he was the architect behind the Solar Framework, and was the creator of the Savant template system. He was a founding contributor to the Zend Framework (the DB, DB_Table, and View components), and has written a series of authoritative benchmarks on dynamic framework performance.

Paul was one of the first elected members of the PEAR project. He is a voting member of the PHP Framework Interoperability Group, where he shepherded the PSR-1 Coding Standard and PSR-2 Coding Style recommendations, and was the primary author on the PSR-4 Autoloader recommendation. He was also a member of the Zend PHP 5.3 Certification education advisory board.

In a previous career, Paul was an operations intelligence specialist for the US Air Force. In his spare time, he enjoys putting .308 holes in targets at 400 yards.

Acknowledgement

Many thanks to all of the conference attendees who heard my It Was Like That When I Got Here presentation and who encouraged me to expand it into a full book. Without you, I would not have considered writing this at all.

Thank you to Adam Culp, who provided a thorough review of the work-in-progress, and for his concentration on refactoring approaches. Thanks also to Chris Hartjes, who went over the chapter on unit testing in depth and gave it his blessing. Many thanks to Luis Cordova, who acted as a work-in-progress editor and who corrected my many pronoun issues.

Finally, thanks to everyone who bought a copy of the book before it was complete, and especially to those who provided feedback and insightful questions regarding it. These include Hari KT (a long-time colleague on the Aura project), Ron Emaus, Gareth Evans, Jason Fuller, David Hurley, Stephen Lawrence, Elizabeth Tucker

Long, Chris Smith, and others too numerous to name. Your early support helped to assure me that writing the book was worthwhile.

www.PacktPub.com

eBooks, discount offers, and more

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at <[email protected]> for more details.

At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.

https://www2.packtpub.com/books/subscription/packtlib

Do you need instant solutions to your IT questions? PacktLib is Packt's online digital book library. Here, you can search, access, and read Packt's entire library of books.

Why subscribe?

Fully searchable across every book published by PacktCopy and paste, print, and bookmark contentOn demand and accessible via a web browser

Preface

I have been programming professionally in one capacity or another for over 30 years. I continue to find it a challenging and rewarding career. I still learn new lessons about my profession every day, as I think is the case for every programmer dedicated to this craft.

Even more challenging and rewarding is helping other programmers to learn what I have learned. I have worked with PHP for 15 years now, in many different kinds of organizations and in every capacity from junior developer to VP of Engineering. In that time, I have learned a lot about the commonalities in legacy PHP applications. This book is distilled from my notes and memories from modernizing those codebases. I hope it can serve as a path for other programmers to follow, leading them out of a morass of bad code and bad work situations, and into a better life for themselves.

This book also serves as penance for all of the legacy code I have left behind for others to deal with. All I can say is that I didn't know then what I know now. In part, I offer this book as atonement for the coding sins of my past. I hope it can help you to avoid my previous mistakes.

Chapter 1. Legacy Applications

In its simplest definition, a legacy application is any application that you, as a developer, inherit from someone else. It was written before you arrived, and you had little or no decision-making authority in how it was built.

However, there is a lot more weight to the word legacy among developers. It carries with it connotations of poorly organized, difficult to maintain and improve, hard to understand, untested or untestable, and a series of similar negatives. The application works as a product in that it provides revenue, but as a program, it is brittle and sensitive to change.

Because this is a book specifically about PHP-based legacy applications, I am going to offer some PHP-specific characteristics that I have seen in the field. For our purposes, a legacy application in PHP is one that matches two or more of the following descriptions:

It uses page scripts placed directly in the document root of the web server.It has special index files in some directories to prevent access to those directories.It has special logic at the top of some files to die() or exit() if a certain value is not set.Its architecture is include-oriented instead of class-oriented or object-oriented.It has relatively few classes.Any class structure that exists is disorganized, disjointed, and otherwise inconsistent.It relies more heavily on functions than on class methods.Its page scripts, classes, and functions combine the concerns of model, view, and controller into the same scope.It shows evidence of one or more incomplete attempts at a rewrite, sometimes as a failed framework integration.It has no automated test suite for the developers to run.

These characteristics are probably familiar to anyone who has had to deal with a very old PHP application. They describe what I call a typical PHP application.

The typical PHP application

Most PHP developers are not formally trained as programmers, or are almost entirely self-taught. They often come to the language from other, usually non-technical, professions. Somehow or another, they are tasked with the duty of creating webpages because they are seen as the most technically-savvy person in their organization. Since PHP is such a forgiving language and grants a lot of power without a lot of discipline, it is very easy to produce working web pages and even applications without a lot of training.

These and other factors strongly influence the underlying foundation of the typical PHP application. They are usually not written in a popular full-stack framework or even a micro-framework. Instead, they are often a series of page scripts, placed directly in the web server document root, to which clients can browse directly. Any functionality that needs to be reused has been collected into a series of include files. There are include files for common configurations and settings, headers and footers, common forms and content, function definitions, navigation, and so on.

This reliance on include files in the typical PHP application is what makes me call them include-oriented architectures. The legacy application uses include calls everywhere to couple the pieces of the program into a single whole. This is in contrast to a class-oriented architecture, where even if the application does not adhere to good object-oriented programming principles, at least the behaviors are bundled into classes.

File Structure

The typical include-oriented PHP application generally looks something like this:

/path/to/docroot/ bin/ # command-line tools cache/ # cache files common/ # commonly-used include files classes/ # custom classes Image.php # Template.php # functions/ # custom functions db.php # log.php # cache.php # setup.php # configuration and setup css/ # stylesheets img/ # images index.php # home page script js/ # JavaScript lib/ # third-party libraries log/ # log files page1.php # other page scripts page2.php # page3.php # sql/ # schema migrations sub/ # sub-page scripts index.php # subpage1.php # subpage2.php # theme/ # site theme files header.php # a header template footer.php # a footer template nav.php # a navigation template ~~

The structure shown is a simplified example. There are many possible variations. In some legacy applications, I have seen literally hundreds of main-level page scripts and dozens of subdirectories with their own unique hierarchies for additional pages. The key is that the legacy application is usually in the document root, has page scripts that users browse to directly, and uses include files to manage most program behavior instead of classes and objects.

Page Scripts

Legacy applications will use individual page scripts as the access point for public behavior. Each page script is responsible for setting up the global environment, performing the requested logic, and then delivering output to the client.

Appendix A, Typical Legacy Page Script contains a sanitized, anonymized version of a typical legacy page script from a real application. I have taken the liberty of making the indentation consistent (originally, the indents were somewhat random) and wrapping it at 60 characters so it fits better on e-reader screens. Go take a look at it now, but be careful. I won't be held liable if you go blind or experience post-traumatic stress as a result! As we examine it, we find all manner of issues that make maintenance and improvement difficult:

The include statements to execute setup and presentation logicinline function definitionsglobal variablesmodel, view, and controller logic all combined in a single scripttrusting user inputpossible SQL injection vulnerabilitiespossible cross-site scripting vulnerabilitiesunquoted array keys generating noticesThe if blocks not wrapped in braces (adding a line in the block later will not actually be part of the block)copy-and-paste repetition

The Appendix A, Typical Legacy Page Script example is relatively tame as far as legacy page scripts go. I have seen other scripts where JavaScript and CSS code have been mixed in, along with remote-file inclusions and all sorts of security flaws. It is also only (!) about 400 lines long. I have seen page scripts that are thousands of lines long which generate several different page variations, all wrapped into a single switch statement with a dozen or more case conditions.

Rewrite or Refactor?

Many developers, when presented with a typical PHP application, are able to live with it for only so long before they want to scrap it and rewrite it from scratch. Nuke it from orbit; it's the only way to be sure! is the rallying cry of these enthusiastic and energetic programmers. Other developers, their enthusiasm drained by their death march experience, feel cautious and wary at such a suggestion. They are fully aware that the codebase is bad, but the devil (or in our case, code) they know is better than the devil they don't.

The Pros and Cons of Rewriting

A complete rewrite is a very tempting idea. Developers championing a rewrite feel like they will be able to do all the right things the first time through. They will be able to write unit tests, enforce best practices, separate concerns according to modern pattern definitions, and use the latest framework or even write their own framework (since they know best what their own needs are). Because the existing application can serve as a reference implementation, they feel confident that there will be little or no trial-and-error work in rewriting the application. The needed behaviors already exist; all the developers need to do is copy them to the new system. The behaviors that are difficult or impossible to implement in the existing system can be added on from the start as part of the rewrite.

As tempting as a rewrite sounds, it is fraught with many dangers. Joel Spolsky had this to say regarding the old Netscape Navigator web browser rewrite in 2000:

 

Netscape made the single worst strategic mistake that any software company can make by deciding to rewrite their code from scratch. Lou Montulli, one of the 5 programming superstars who did the original version of Navigator, emailed me to say, I agree completely, it's one of the major reasons I resigned from Netscape. This one decision cost Netscape 3 years. That's three years in which the company couldn't add new features, couldn't respond to the competitive threads from Internet Explorer, and had to sit on their hands while Microsoft completely ate their lunch.

  --Joel Spolsky, Netscape Goes Bonkers

Netscape went out of business as a result.

Josh Kerr relates a similar story regarding TextMate:

 

Macromates, an indie company who had a very successful text editor called Textmate, decided to rewrite the code base for Textmate 2. It took them 6 years to get a beta release out the door which is an eternity in today's time and they lost a lot of market share. When they did release a beta, it was too late and 6 months later they folded the project and pushed it on to Github as an open source project.

  --Josh Kerr, TextMate 2 And Why You Shouldn't Rewrite Your Code

Fred Brooks calls the urge to do a complete rewrite the second-system effect. He wrote about this in 1975:

 

The second is the most dangerous system a man ever designs. ... The general tendency is to over-design the second system, using all the ideas and frills that were cautiously sidetracked on the first one. ... The second-system effect has ... a tendency to refine techniques whose very existence has been made obsolete by changes in basic system assumptions. ... How does the project manager avoid the second-system effect? By insisting on a senior architect who has at least two systems under his belt.

  --Fred Brooks, The Mythical Man-Month, pp. 53-58.

Developers were the same forty years ago as they are today. I expect them to be the same over the next forty years as well; human beings remain human beings. Overconfidence, insufficient pessimism, ignorance of history, and the desire to be one's own customer all lead developers easily into rationalizations that this time will be different when they attempt a rewrite.

Why Don't Rewrites Work?

There are lots of reasons why a rewrite rarely works, but I will concentrate on only one general reason here: the intersection of resources, knowledge, communication, and productivity. (Be sure to read The Mythical Man-Month (pp. 13-26) for a great description of the problems associated with thinking of resources and scheduling as interchangeable elements.)

As with all things, we have only limited resources to bring to bear against the rewrite project. There are only a certain number of developers in the organization. These are the developers who will have to do both maintenance on the existing program and write the completely new version of the program. Any developers working on the one project will not be able to work on the other.

The Context-switching problem

One idea is to have the existing developers spend part of their time on the old application and part of their time on the new one. However, moving a developer between the two projects will not be an even split of productivity. Because of the cognitive load of context-switching, the developer will be less than half as productive on each.

The Knowledge problem

To avoid the productivity losses from switching developers between maintenance and the rewrite, the organization may try to hire more developers. Some can then be dedicated to the old project and others to the new project. Unfortunately, this approach reveals what F. A. Hayek calls the knowledge problem. Originally applied to the realm of economics, the knowledge problem applies equally as well to programming.

If we put the new developers on the rewrite project, they won't know enough about the existing system, the existing problems, the business goals, and perhaps not even the best practices for doing the rewrite to be effective. They will have to be trained on these things, most likely by the existing developers. This means the existing developers, who have been relegated to maintaining the existing program, will have to spend a lot of time communicating knowledge to the new hires. The amount of time involved is non-trivial, and the communication of this knowledge will have to continue until the new developers are as well-versed as the existing developers. This means that the linear increase in resources results in a less-than-linear increase in productivity: a 100% increase in the number of programmers will result in a less than 50% increase in output, sometimes much less (cf. The Miserable Mathematics of the Man-Month – http://paul-m-jones.com/archives/1591).

Alternatively, we could put the existing developers on the rewrite project, and the new hires on maintenance of the existing program. This too reveals a knowledge problem because the new developers are completely unfamiliar with the system. Where will they get the knowledge they need to do their work? From the existing developers, of course, who will still need to spend valuable time communicating their knowledge to the new hires. Once again, we see that the linear increase in developers leads to a less-than-linear increase in productivity.

The Schedule Problem

To deal with the knowledge problem and the related communication costs, some may feel the best way to handle the project would be to dedicate all the existing developers on the rewrite, and delay maintenance and upgrades on the existing system until the rewrite is done. This is a great temptation because the developers will be all too eager to salve their own pains and become their own customers - becoming excited about what features they want to have and what fixes they want to make. These desires will lead them to overestimate their own ability to perform a full rewrite and underestimate the amount of time needed to complete it. The managers, for their part, will accept the optimism of the developers, perhaps adding some buffer in the schedule for good measure.

The overconfidence and optimism of the developers will morph into frustration and pain when they realize the task is actually much greater and more overwhelming than they first thought. The rewrite will go on much longer than anticipated, not by a little, but by an order of magnitude or more. For the duration of the rewrite, the existing program will languish - buggy and missing features - disappointing existing customers and failing to attract new ones. The rewrite project will, at the end, become a panicked death march to get it done at all costs, and the result will be a codebase that is just as bad as the first one, only in different ways. It will be merely a copy of the first system, because schedule pressures will have dictated that new features be delayed until after an initial release is achieved.

Iterative Refactoring

Given the risks associated with a complete rewrite, I recommend refactoring instead. Refactoring means that the quality of the program is improved in small steps, without changing the functionality of the program. A single, relatively small change is introduced across the entire system. The system is then tested to make sure it still works properly, and finally, the system is put into production. A second small change builds on the previous one, and so on. Over a period of time, the system becomes markedly easier to maintain and improve.

A refactoring approach is decidedly less appealing than a complete rewrite. It defies the core sensibilities of most developers. The developers have to continue working with the system as it is, warts and all, for long periods of time. They do not get to switch over to the latest, hottest framework. They do not get to become their own customers and indulge their desires to do things right the first time. Being a longer-term strategy, the refactoring approach does not appeal to a culture that values rapid development of new applications over patching existing ones. Developers usually prefer to start their own new projects, not maintain older projects developed by others.

However, as a risk-reducing strategy, using an iterative refactoring approach is undeniably superior to a rewrite. The individual refactorings themselves are small compared to any similar portion of a rewrite project. They can be applied in much shorter periods of time than a comparable feature would be in a rewrite, and they leave the existing codebase in a working state at the end of each iteration. At no point does the existing application stop operating or progressing. The iterative refactorings can be integrated into a larger process with scheduling that allows for cycles of bug fixes, feature additions, and refactorings to improve the next cycle.

Finally, the goal of any single refactoring step is not perfection. The goal in each step is merely improvement. We are not trying to realize an impossible goal over a long period of time. We are taking small steps toward easily-visualized goals that can be accomplished in short timeframes. Each small refactoring win will both improve morale and drive enthusiasm for the next refactoring step. Over time, these many small wins accumulate into a single big win: a fully-modernized codebase that has never stopped generating revenue for the business.

Legacy Frameworks

Until now, we have been discussing legacy applications as page-based, include-oriented systems. However, there is also a large base of legacy code out there using public frameworks.

Framework-based Legacy Applications

Each different public framework in PHP land is its own unique hell. Applications written in CakePHP (http://cakephp.org/) suffer from different legacy issues than those written in CodeIgniter, Solar, Symfony 1, Zend Framework 1, and so on. Each of these different frameworks, and their varying work-alikes, encourage different kinds of tight-coupling in applications. Thus, the specific steps needed to refactor applications built using one of these frameworks are very different from the steps needed for a different framework.

As such, various parts of this book may be useful as a guide to refactoring different parts of a legacy application based on a public framework, but as a whole, the book is not targeted at refactoring applications based on these public frameworks.

In-house, private, or otherwise non-public frameworks under the direct control of their own architects within the organization likely to benefit from the refactorings included in this book.

Refactoring to a Framework

I sometimes hear about how developers wisely wish to avoid a complete rewrite and instead want to refactor or migrate to a public framework. This sounds like the best of both worlds, combining an iterative approach with the developers' desire to use the hottest new technology.

My experience with legacy PHP applications has been that they are almost as resistant to framework integration as they are to unit testing. If the application was already in a state where its logic could be ported to a framework, there would be little need to port it in the first place.

However, by the time we have completed the refactorings in this book, the application is very likely to be in a state that will be much more amenable to a public framework migration. Whether the developers will still want to do so is another matter.

Review and next steps

At this point, we have realized that a rewrite, while appealing, is a dangerous approach. An iterative refactoring approach sounds a lot more like actual work, but has the benefit of being achievable and realistic.

The next step is to prepare ourselves for the refactoring approach by getting some prerequisites out of the way. After that, we will proceed toward modernizing our legacy application in a series of relatively small steps, one step per chapter with each step broken down into an easy-to-follow process with answers to common questions.

Let's get started!

Chapter 2. Prerequisites

Before we begin modernizing our application, we need to make sure we have the necessary prerequisites in place to do the work of refactoring. These are as following:

A revision control systemA PHP version of 5.0 or higherAn editor or IDE with multi-file search-and-replaceA style guide of some sortA test suite

Revision control

Revision control (also known as source control or version control) allows us to keep track of the prerequisites:revision control" changes we make to our codebase. We can make a change, then commit it to source control, make more changes and commit them, and push our changes to other developers on the team. If we discover an error, we can revert to an earlier version of the codebase to a point where the error does not exist and start over.