PHP 5 CMS Framework Development - Martin Brampton - E-Book

PHP 5 CMS Framework Development E-Book

Martin Brampton

0,0
39,59 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

If you want an insight into the critical design issues and programming techniques required for a web oriented framework in PHP5, this book will be invaluable. Whether you want to build your own CMS style framework, want to understand how such frameworks are created, or simply want to review advanced PHP5 software development techniques, this book is for you.As a former development team leader on the renowned Mambo open-source content management system, author Martin Brampton offers unique insight and practical guidance into the problem of building an architecture for a web oriented framework or content management system, using the latest versions of popular web scripting language PHP.The scene-setting first chapter describes the evolution of PHP frameworks designed to support web sites by acting as content management systems. It reviews the critical and desirable features of such systems, followed by an overview of the technology and a review of the technical environment.Following chapters look at particular topics, with:• A concise statement of the problem • Discussion of the important design issues and problems faced • Creation of the framework solution

At every point, there is an emphasis on effectiveness, efficiency and security – all vital attributes for sound web systems. By and large these are achieved through thoughtful design and careful implementation.

Early chapters look at the best ways to handle some fundamental issues such as the automatic loading of code modules and interfaces to database systems. Digging deeper into the problems that are driven by web requirements, following chapters go deeply into session handling, caches, and access control.

New for this edition is a chapter discussing the transformation of URLs to turn ugly query strings into readable strings that are believed to be more “search engine friendly” and are certainly more user friendly. This topic is then extended into a review of ways to handle “friendly” URLs without going through query strings, and how to build RESTful interfaces.

The final chapter discusses the key issues that affect a wide range of specific content handlers and explores a practical example in detail.

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB

Seitenzahl: 710

Veröffentlichungsjahr: 2010

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Table of Contents

PHP 5 CMS Framework Development
Credits
About the Author
Acknowledgement
About the Reviewers
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Errata
Piracy
Questions
1. CMS Architecture
The idea of a CMS
Critical CMS features
Desirable CMS features
System management
Technology for CMS building
Leveraging PHP5
Some PHP policies
Globalness in PHP
Classes and objects
Objects, patterns, and refactoring
The object-relational compromise
Basics of combining PHP and XHTML
Model, view, and controller
The CMS environment
Hosting the CMS
Basic browser matters
Security of a CMS
Some CMS terminology
Summary
2. Organizing Code
The problem
Discussion and considerations
Security
Methods of code inclusion
Practicality in coding
Exploring PHP and object design
Autoloading
Namespaces and class visibility
Singletons
Objections to use of singletons
Framework solution
Autoloading
The smart class mapper
Finding a path to the class
Populating the dynamic class map
Saving map elements
Obtaining class information
Summary
3. Database and Data Objects
The problem
Discussion and considerations
Database dependency
The role of the database
Level of database abstraction
Ease of development
Keeping up with change
Database security
Pragmatic error handling
Exploring PHP—indirect references
Framework solution
Class structure
Connecting to a database
Handling databases easily
Prefixing table names in SQL
Making the database work
Getting hold of data
Higher level data access
Assisted update and insert
What happened?
Database extended services
Getting data about data
Easier data about data
Aiding maintenance
Data objects
Rudimentary data object methods
Data object input and output
Setting data in data objects
Sequencing database rows
Database maintenance utility
Summary
4. Administrators, Users, and Guests
The problem
Discussion and considerations
Who needs users?
Secure authentication
Secure storage of passwords
Blocking SQL injection
Login
Managing user data
User self service
Customizing for users
Extended user information
Exploring PHP—arrays and SQL
Framework solution
The user database table
Indexes on users
Keeping user tables in step
Achieving login
Administering users
Generating passwords
Summary
5. Sessions and Users
The problem
Discussion and considerations
Why sessions?
How sessions work
Avoiding session vulnerabilities
Search engine bots
Session data and scalability
Exploring PHP—frameworks of classes
Framework solution
Building a session handler
Creating a session
Finding the IP address
Validating a session
Remembering users
Completing session handling
Session data
Session data and bots
Retrieving session data
Keeping session data tidy
Summary
6. Caches and Handlers
The problem
Discussion and considerations
Why build information handlers?
The singleton cache
The disk cache
Scalability and database cache
The XHTML cache
Other caches
Exploring PHP—static elements and helpers
Framework solution
Abstract cache class
Singleton object cache manager
Creating the base class cached singleton
Generalized cache
Summary
7. Access Control
The problem
Discussion and considerations
Adding hierarchy
Adding constraints
Avoiding unnecessary restrictions
Some special roles
Implementation efficiency
Where are the real difficulties?
Exploring SQL—MySQL and PHP
Framework solution
Database for RBAC
Administering RBAC
The general RBAC cache
Asking RBAC questions
Summary
8. Handling Extensions
The problem
Discussion and considerations
An extension ecosystem
Templates in the ecosystem
Modules in the ecosystem
Components in the ecosystem
Component templates
Modules everywhere
More on extensions
Templates
Modules
Components
Component for the administrator
Component for the user
Component standard structure
Plugins
Extension parameters
Exploring PHP—XML handling
Framework solution
Packaging extensions
Module interface and structure
The logic of module activation
Component interface and structure
A standardized component structure
Plugin interface and structure
Invoking plugins
Applications
Installing and managing extensions
Structuring installer tasks
Putting extension files in place
Extensions and the database
Knowing about extension classes
Summary
9. Menus
The problem
Discussion and considerations
Page management by URI
Menu database requirements
Menu management
Menu construction
Menu presentation
Exploring PHP—array functions
Framework solution
Building the menu handler
Interfacing to components
The menu creator
An example of a menu module
Summary
10. Languages
The problem
Discussion and considerations
Character sets
UTF-8 and XHTML
Specifying languages
Handling multiple languages in code
Languages in CMS extensions
Handling languages in data
Exploring PHP—character sets
Framework solution
The gettext implementation
File formats for gettext
Functions for gettext
The PHPgettext classes
The language class
Administrator language application
Language details
Translation
Handling extensions
Managing extension translations
Installing translations with CMS extensions
Handling multilingual data
Summary
11. Presentation Services
The problem
Discussion and considerations
Differing points of view
Model View Controller
XHTML, CSS, and themes
PHP for XHTML creation
GUI widgets and XHTML
Page control and navigation
WYSIWYG editors
XHTML cleaning
The administrator interface
Exploring PHP—clarity and succinctness
Framework solution
Using "heredoc" to define XHTML
Using templating engines
Some widgets
Building page control
Supporting editors
Cleaning up XHTML
Administrator database management
Customization through subclassing
Summary
12. Other Services
The problem
Discussion and considerations
Parsing XML
Configuration handling
WYSIWYG editing
File and directory handling
Sending mail
Parameter objects
Administrator ready-made functionality
Exploring PHP—file issues in web hosting
Basic file and directory permissions
Hosting and ownership
Living with split ownership
Avoiding split ownership
Framework solution
Reading XML files easily
Storing configuration data
Incorporating a WYSIWYG editor
Dealing with files and directories
Compound parameter objects
Administrator ready-made table handlers
Summary
13. SEF and RESTful Services
The problem
Discussion
Transforming query strings
Direct URI handling and REST
Mechanics of URI handling
Essential HTTP result codes
The importance of metadata
Exploring PHP—PHP and HTTP
Framework solution
Efficient lookup of very long keys
Cache and database transformation
Looking at SEF transformation code
Decoding an incoming URI
Encoding an outgoing URI
Direct URI handling
The future of direct URIs
Summary
14. Error Handling
The problem
Discussion
PHP error handling
Database errors
Application errors
Exploring PHP—error handling
Framework solution
Handling database errors
404 and 403 errors
Summary
15. Real Content
The problem
Discussion and considerations
Articles, blogs, magazines, and FAQ
Comments and reviews
Forums
Galleries, repositories, and streaming
E-commerce and payments
Forms
Calendars
Integrators
RSS readers
Other categories
Exploring technology—accessibility
General good practice
Use of JavaScript
Validation
Framework solution
A simple blog application
The database table for blog
A blog data object
Administering blog items—controller
Administering blog items—viewer
Showing blogs to visitors
Menu building
Summary
A. Packaging Extensions
The XML setup file
Parameters
Parameter types
B. Packaging XML Example
Index

PHP 5 CMS Framework Development

Second Edition

PHP 5 CMS Framework Development

Second Edition

Copyright © 2010 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

First published: October 2007

Second Edition: August 2010

Production Reference: 1120810

Published by Packt Publishing Ltd. 32 Lincoln Road Olton Birmingham, B27 6PA, UK.

ISBN 978-1-849511-34-6

www.packtpub.com

Cover Image by Vinayak Chittar ( <[email protected]>)

Credits

Author

Martin Brampton

Reviewers

Deepak Vohra

Hari K.T

Martien de Jong

Acquisition Editor

Douglas Paterson

Development Editor

Swapna V. Verlekar

Technical Editor

Smita Solanki

Indexer

Hemangini Bari

Editorial Team Leader

Aanchal Kumar

Project Team Leader

Priya Mukherji

Project Coordinator

Prasad Rai

Proofreader

Aaron Nash

Production Coordinator

Shantanu Zagade

Cover Work

Shantanu Zagade

About the Author

Martin Brampton is now primarily a software developer and writer, but he started out studying mathematics at Cambridge University. He then spent a number of years helping to create the so-called legacy, which remained in use far longer than he ever expected. He worked on a variety of major systems in areas like banking and insurance, spiced with occasional forays into technical areas such as cargo ship hull design and natural gas pipeline telemetry.

After a decade of heading IT for an accountancy firm, a few years as a director of a leading analyst firm, and an MA degree in Modern European Philosophy, Martin finally returned to his interest in software, but this time transformed into web applications. He found PHP5, which fits well with his prejudice in favor of programming languages that are interpreted and strongly object oriented.

Utilizing PHP, Martin took on development of useful extensions for the Mambo (and now also Joomla!) systems, and then became leader of the team developing Mambo itself. More recently, he has written a complete, new generation CMS named Aliro, many aspects of which are described in this book. He has also created a common API to enable add-on applications to be written with a single code base for Aliro, Joomla! (1.0 and 1.5), and Mambo.

All in all, Martin is now interested in many aspects of web development and hosting; he consequently has little spare time. But his focus remains on object-oriented software with a web slant, much of which is open source. He runs Black Sheep Research, which provides software, speaking and writing services, and also manages web servers for himself and his clients.

Acknowledgement

In some ways it is difficult for me to know who should be given credit for the valuable work that made this book possible. It is one of the strengths of the open source movement that good designs and good code take on a life of their own. Aliro, the CMS framework from which all the examples are taken, has benefited from work done by the many skilled developers who built the feature rich Mambo system. Some ideas have been inspired by other contemporary open source systems. And, of course, Aliro includes in their entirety the fruits of some open source projects, as is generally encouraged by the open source principle. My work would not have been possible had it not been able to build on the creations of others. Apart from remarking on those important antecedents, I would also like to thank my wife and family for their forbearance, even if they do sometimes ask whether I will ever get away from a computer screen.

About the Reviewers

Deepak Vohra is a consultant and a principal member of the NuBean.com software company. Deepak is a Sun Certified Java Programmer and Web Component Developer, and has worked in the fields of XML and Java programming and J2EE for over five years. Deepak is the co-author of the Apress book Pro XML Development with Java Technology and was the technical reviewer for the O'Reilly book WebLogic: The Definitive Guide. Deepak was also the technical reviewer for the Course Technology PTR book Ruby Programming for the Absolute Beginner, and the technical editor for the Manning Publications book Prototype and Scriptaculous in Action. Deepak is also the author of the Packt Publishing book JDBC 4.0 and Oracle JDeveloper for J2EE Development, and Processing XML documents with Oracle JDeveloper 11g.

Hari K. T completed his B.Tech course in Information Technology from Calicut University in the year 2007. He is an open source lover (LAMP on his head), and attendee of bar-camp kerala and different tech groups. When he was in the fourth semester (around 2005) searching for GNU/Linux he saw the blog of an Electrical student Dileep. From there onwards he started his own research in the web, started blogging at http://ijust4u.blogspot.com/ (some were his stupid thoughts :) ).

After completing his B.Tech he managed to get a job of his interest as a PHP Developer. In due course, he recognized the benefits of frameworks, ORM, and so on and he contributed his experience to others by starting a sample blog tutorial with zend framework for the PHP community. You can see the post at www.harikt.com and download the code from github. Worked on different open source projects such as os-commerce, drupal, and so on. Anybody interested in building your next web project can get in touch with him through e-mail, twitter, LinkedIn, or through www.harikt.com. For a more detailed information about Hari K. T, you can visit www.harikt.com, LinkedIn, Twitter, and so on.

First of all I would like to thank the entire Packt Publishing team for giving me an opportunity to get involvedin this book and also for giving me various other books for reviewing. It's always great pleasure to see our friends and family supporting us immensely. The Internet and technologies have changed me a lot ;-). Thanks to all who have supported me and still supporting me.

Martien de Jongis a creative, young developer who loves to learn. He has built and helps build many web applications. Even though he is still young, Martin has many years of experience as he started programming at a very young age.

His main employer of interest at the moment is iDiDiD, a social network (www.ididid.eu) focusing on events and sharing experiences. He has developed many of the core parts of the website

I want to thank Martin for letting me read and use his work.

Preface

If you want an insight into the critical design issues and programming techniques required for a web-oriented framework in PHP5, this book will be invaluable. Whether you want to build your own CMS style framework, want to understand how such frameworks are created, or simply want to review advanced PHP5 software development techniques, this book is for you.

As a former development team leader on the renowned Mambo open source content management system, author Martin Brampton offers unique insight and practical guidance into the problem of building an architecture for a web-oriented framework or content management system, using the latest versions of popular web scripting language PHP.

The scene-setting first chapter describes the evolution of PHP frameworks designed to support websites by acting as content management systems. It reviews the critical and desirable features of such systems, followed by an overview of the technology and a review of the technical environment.

The following chapters look at particular topics, with:

A concise statement of the problemDiscussion of the important design issues and problems facedCreation of the framework solution

At every point, there is an emphasis on effectiveness, efficiency, and security all—vital attributes for sound web systems. By and large these are achieved through thoughtful design and careful implementation.

Early chapters look at the best ways to handle some fundamental issues such as the automatic loading of code modules and interfaces to database systems. Digging deeper into the problems that are driven by web requirements, following chapters go deeply into session handling, caches, and access control.

New for this edition is a chapter discussing the transformation of URLs to turn ugly query strings into readable strings that are believed to be more "search engine friendly" and are certainly more user friendly. This topic is then extended into a review of ways to handle "friendly" URLs without going through query strings, and how to build RESTful interfaces.

The final chapter discusses the key issues that affect a wide range of specific content handlers and explores a practical example in detail.

What this book covers

Chapter 1, CMS Architecture: This chapter introduces the reasons why CMS frameworks have become such a widely used platform for websites and defines the critical features. The technical environment is considered, in particular the benefits of using PHP5 for a CMS. Some general questions about MVC, XHTML generation, and security are reviewed.

Chapter 2, Organizing Code: Before we go further with CMS development, let's look at a problem that can be neatly solved using PHP5. Substantial systems do not consist of a single file of code. Whatever our exact design, a large system should be broken down into smaller elements, and it makes sense to keep them in separate files, if the language supports it. Code is more manageable this way, and systems can be made more efficient.

As we are considering only PHP implementations, the source code files are used at runtime. PHP is an interpreted language and, at least in principle, runs the actual source code. So we need a good technique for handling many source files at runtime.

This creates issues; a paramount one is security. Another is ease of coding, where it is tedious and cumbersome to have to repeatedly include code to load other files. Yet another is efficiency, as we do not want to load code that is not needed for a particular request.

Chapter 3, Database and Data Objects: It is in the nature of a content management system that the database is at its heart. Before we get into the more CMS-specific questions about handling different kinds of users, it is worth considering how best to handle storage of data in a database. Applications for the web often follow similar patterns of data access, so we will develop the database aspect of the framework to offer methods that handle them easily. A relational database holds not just data, but also information about data. This is often underutilized. Our aim is to take advantage of it to make easier the inevitable changes in evolving systems, and to create simple but powerful data objects. Ancillary considerations such as security, efficiency, and standards compliance are never far away.

Chapter 4, Administrators, Users, and Guests: With some general ideas about a CMS framework established, it is time to dive into specifics. First, we will look at handling the different people who will use the CMS, creating a basis for ensuring that each individual is able to do appropriate things. Although we might talk generally of users, mostly the discussion of "users" means those people who have identified themselves to the system, while those who have not are deemed "guests". A special subset of users contains people who are given access to the special administrator interface provided by the system.

Questions arise concerning how to store data about users securely and efficiently. If the mechanisms are to work at all, the ability to authenticate people coming to the website is vital. Someone will have to look after the permanent records, so most sites will need the CMS to support basic administrative functions. And the nature of user management implies that customization is quite likely.

Not all of these potentially complex mechanisms will be fully described in this chapter, but looking at what is needed will reveal the need for other services. They will be described in detail in later chapters. For the time being, please accept that they are all available, to help solve the current set of issues. In this chapter, we are solely concerned with the general questions about user identification and authentication. Later chapters will consider the technical issues of sessions and the question of who can do what, otherwise known as access control.

Chapter 5, Sessions and Users: Here we get into the detailed questions involved in providing continuity for people using our websites. Almost any framework to support web content needs to handle this issue robustly, and efficiently. In this chapter, we will look at the need for sessions, and the PHP mechanism that makes them work. There are security issues to be handled, as sessions are a well known source of vulnerabilities. Search engine bots can take an alarmingly large portion of your site bandwidth, and special techniques can be used to minimize their impact on session handling. Actual mechanisms for handling sessions are provided. Session data has to be stored somewhere, and I argue that it is better to take charge of this task rather than leave it to PHP. A simple but fully effective session data handler is developed using database storage.

Chapter 6, Caches and Handlers: Running PHP has quite a high cost, but in return we gain the benefit of a very powerful and flexible language. The combination of power and high cost suggests that for any code that will be executed frequently, we should use the power of PHP to aid efficiency. The greatest efficiency is gained by streamlined design. After all, not doing things at all is always the best way to achieve efficiency. Designing with a broad canvas, so as to solve a number of problems with a single mechanism, also helps. And one particular device the cache provides a way to store data that has been partly or wholly processed and can be used again. This obviates doing the processing over again, which can lead to great efficiency gains.

The discussion here is entirely about server-side caching. In general, a CMS is serving dynamic pages that may change without warning. It is usually undesirable for proxies between the server and the client to hold copies of pages and there are severe limits on the feasibility of allowing the browser to cache pages. Individual elements such as images, CSS, or JavaScript have much more potential, but this is often better handled by careful configuration of the web server than by adding PHP code. But there are large gains to be had by building an efficient server-side caching mechanism.

Chapter 7, Access Control: With ideas about users and database established, we quickly run into another requirement. Many websites will want to control who has access to what. Once embarked on this route, it turns out there are many situations where access control is appropriate, and they can easily become very complex. So in this chapter we look at the most highly regarded model-role based access control-and find ways to implement it. The aim is to achieve a flexible and efficient implementation that can be exploited by increasingly sophisticated software. To show what is going on, the example of a file repository extension is used.

Chapter 8, Handling Extensions: Now we have reached a critical point in our book. In the previous chapters a core framework was created, but it did not actually make a significant website. Content is so varied that it makes good sense to follow the approach of creating a minimal framework to support user facing functions. But now we need to make the big step of adding real functionality. If we take this step to be a question of extending the minimal framework, it's logical to call our additions extensions. Flexibility in implementing our CMS suggests that it should be easy to install extensions into the basic framework.

This means two things. One is an issue of principle a sound architecture is needed for building extensions. The other is a practical one a simple and effective mechanism is needed for installing extensions, preferably using a web interface.

Extensions will be divided into four types, which represent the different ways in which they operate, and their individual purposes. The justification for this breakdown will be explained shortly, followed by consideration of how they fit together, and how they should be implemented.

Chapter 9, Menus: Most websites use menus, although great inventiveness goes into forms of presentation. A menu is simply a named list of possible destinations, which may be inside the site or elsewhere. The list may contain subsidiary lists within it, which obviously form submenus. It is a matter for presentation whether the sublists are always visible, or only become visible when the parent item is selected.

The site administrator needs a mechanism for maintaining these lists, with the ability to give each item an appropriate name. That implies some basic functionality. A subsidiary requirement is that it is often desirable to keep track of which menu item is relevant to the user's current activities. Menu entries that refer into the site can also be used to define page content.

Despite the huge variety in menu styling, the concept is standard, and there is no reason why a good CMS framework should not provide all the fundamental mechanisms for menu handling. It is important that these are provided in a way that does not constrain presentation.

Chapter 10, Languages: In the early days of computing, languages did not figure prominently. Much of the development and commercialization took place in English speaking countries. The "standard" character sets were ASCII and EBCDIC. At best, schemes were employed so that a computer could operate with one particular non-English language.

The world has changed a great deal since then. Especially with the rise of the internet, computer systems need to deal with more than one language. In fact, they need to be capable of dealing with a huge variety of languages, many of which require different alphabets. Information has to be stored in alternative versions for different languages, especially while computer translation remains a joke. So while some people may be able to do without it, many builders of a CMS will require language support.

Chapter 11, Presentation Services: Despite, or maybe because of, the huge amount of work that has been devoted to techniques for creating presentation output for websites, thorny issues continue to be disputed. To some extent, these can be regarded as turf wars between software developers and web designers. The story probably has a long way still to go. With honorable exceptions, the question of how to present the output from computer programs was rarely the subject of serious design effort prior to the advent of World Wide Web. Now, good design is vital to website creation, and both software architects and creative designers have to find a way to cope with the unaccustomed situation of working together.

Chapter 12, Other Services: This chapter could be described as a rag bag of miscellaneous services, but they are all significant in the construction of a CMS. Adding services to the framework in a standard way considerably eases the development of specific systems. Dealing with XML, handling configurations for extensions and manipulating sets of parameters are all loosely related services that have obvious uses, especially given that XML provides a simple, robust, and widely applicable technique for handling information.

File and directory handling is best treated as a service rather than being implemented in an ad hoc fashion using PHP functions, partly because of the complex permissions issues that can easily arise. Also, common operations are repeatedly needed, such as finding all the files in a directory that match a certain pattern.

Most systems need WYSIWYG editing in order to satisfy user expectations, and the sending of e-mail is often a requirement.

The most complex section of this chapter deals with the emerging possibilities for building standard logic for managing database tables. This is likely to evolve further with growing experience, but enough is given here to indicate some suggested directions.

Chapter 13, SEF and RESTful Services: Resources on the Web are accessed by the use of the Universal Resource Indicator, the URI. Although technology can lead to complicated formats for the URI, people prefer them to be readable. It is often thought that search engines also prefer a readable URI, and so making them look appealing has been a major part of efforts to make a CMS "search engine friendly". There are actually many other factors, including the handling of metadata and particularly titles.

A loosely related development is the rise of RESTful services. This is a move to adopt a style of interaction between websites that aims to naturally exploit the characteristics of the HTTP protocol, including the URI. The aim is to move away from protocols such as XML-RPC that wrap up all the information being passed to and fro, instead making more of it visible through standard features of web access. This includes the building of families of meaningful URIs.

Although the various applications added to a framework will have to do some of the work, there are important steps that can be taken within the framework to provide the tools that are needed. It is those we shall concentrate on in this chapter.

Chapter 14, Error Handling: In an ideal world software would never experience errors but we don't live in an ideal world! So we need to consider what to do when errors arise. One option is to simply leave PHP5 to do its best, but when the issues are considered, that doesn't look a good choice.

What are our concerns over errors? Perhaps the overriding issue here has to be that in the case of an error we need the software to degrade gracefully and not damage the system. Another consideration for web software is that errors should not provide information or opportunities that will aid crackers any more than can be helped.

Errors create problems for developers. One is that in the nature of the Web, errors are often not reported. People simply give up and do something else. Web software is often written quickly, and it is surprising how many errors exist in released software. Other factors for developers are that error handling can be a big overhead; also it is often unclear what counts as a good way to deal with errors.

Given this range of issues, it is clear that it will be helpful if the CMS framework can contribute useful functionality for error handling. Also included here for convenience is the special processing that takes place when a URI does not correspond to any page in our site, thus demanding a "404 error"; likewise handling of situations where a user has attempted something not permitted, making a "403" error appropriate.

Chapter 15, Real Content: Here we are at the last chapter, and our CMS framework still has no content! The reason for this state of affairs is that the provision of a CMS has a lot of common features, but most of them operate at a basic level below the provision of specific services. This is illustrated by looking at a popular off the shelf CMS and observing that of all the available extensions, the largest single category is simply described as "content management". So, however much the standard package provides, it seems that there is still enormous scope for additions.

In this chapter, I aim to describe a number of specific application areas, discussing the particular issues that arise with implementations. Looking at our framework solution, I will concentrate on one sample extension. It is a very simple text handling mechanism that can be explained in detail. Also, the ways in which the simple text system could be extended will be described.

Appendix A, Packaging Extensions: It provides information for those who want to build an installer following similar design principles to those described in this book, or for people who intend to use Aliro itself.

Appendix B, Packaging XML Example: It shows the packaging XML for the Aliro login component, which includes user management.

What you need for this book

Code requires PHP version 5 and some sections will require at least version 5.1.2. Increasingly, version 5.2.3 (released May 2007) is regarded as the oldest version that should be supported by advanced software systems. At the time of writing the code is believed to run on all released PHP versions up to 5.3.2.

Examples of SQL assume MySQL of at least version 4.1 although development will increasingly require version 5 which is now widely used by typical web hosting services.

The author's testing is all done using Linux systems running the Apache web server. Code will probably run on other platforms but has not been extensively tested on them.

Who this book is for

If you are a professional PHP developer who wants to know more about web-oriented frameworks and content management systems, this book is for you. Whether you already use an in-house developed framework or are developing one, or if you are simply interested in the issues involved in this demanding area, you will find discussion ranging from design issues to detailed coding solutions in this book.

You are expected to have experience working with PHP 5 object-oriented programming. Examples in the book will run on any recent version of PHP 5, including 5.3.

Reader feedback

Feedback from our readers is always welcome. Let us know what you think about this book—what you liked or may have disliked. Reader feedback is important for us to develop titles that you really get the most out of.

To send us general feedback, simply send an e-mail to <[email protected]>, and mention the book title via the subject of your message.

If there is a book that you need and would like to see us publish, please send us a note in the SUGGEST A TITLE form on www.packtpub.com or e-mail <[email protected]>.

If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide on www.packtpub.com/authors.

Customer support

Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.

Tip

Downloading the example code for this book

You can download the example code files for all Packt books you have purchased from your account at http://www.PacktPub.com. If you purchased this book elsewhere, you can visit http://www.PacktPub.com/support and register to have the files e-mailed directly to you.

Errata

Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books—maybe a mistake in the text or the code—we would be grateful if you would report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/support, selecting your book, clicking on the errata submission form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded on our website, or added to any list of existing errata, under the Errata section of that title. Any existing errata can be viewed by selecting your title from http://www.packtpub.com/support.

Piracy

Piracy of copyright material on the Internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works, in any form, on the Internet, please provide us with the location address or website name immediately so that we can pursue a remedy.

Please contact us at <[email protected]>with a link to the suspected pirated material.

We appreciate your help in protecting our authors, and our ability to bring you valuable content.

Questions

You can contact us at <[email protected]>if you are having a problem with any aspect of the book, and we will do our best to address it.

Chapter 1. CMS Architecture

This chapter lays the groundwork that helps us to understand what Content Management Systems (CMS) are all about. First, it summarizes the whole idea of a CMS where it came from and what it looks like. This is followed by a review of the technology that is advocated here for CMS building. Next, we will take account of how the circumstances in which a CMS is deployed affect its design; some of the important environmental factors, including security, are considered. Finally, all these things are brought together in an overview of CMS architecture. Along the way, Aliro is introduced—the CMS framework that is used for illustrating implementations throughout this book.

The idea of a CMS

Since you are reading this book, most likely you have already decided to build or use a CMS. But before we go into any detail, it is worth spending some time presenting a clear picture of where we are and how we got here. To be more precise, I will describe how I got here, in the expectation that at least some aspects of my experiences are quite typical.

The World Wide Web (WWW) is a huge set of interlinked documents built using a small group of simple protocols, originally put together by Tim Berners-Lee. Prominent among them was HTML, a simplified markup language. The protocols utilized the Internet with the immediate aim of sharing academic papers. The Web performed this useful function for some years while the Internet remained relatively closed, with access limited primarily to academics. As the Internet opened up during the nineties, early efforts at web pages were very simple. I started up a monthly magazine that reflected my involvement at the time with OS/2 and wrote the pages using a text editor. While writing a page, a tag was needed occasionally, but the work was simple, since for the most part the only tags used were headings and paragraphs, with the occasional bold or italic. With the addition of the odd graphic, perhaps including a repeating background, the result was perfectly presentable by the standards of the time.

But that was followed by a period in which competition between browsers was accompanied by radical development of complex HTML to create far higher standards of presentation. It became much harder for amateurs to create presentable websites, and people started to look for tools. One early success was the development of Lotus Notes as a CMS, by grafting HTML capability onto the existing document-handling features. While this was not a final solution, it certainly demonstrated some key features of CMS. One was the attempt to separate the skills of the web designer from the knowledge of the people who understood the content. Another was to take account of the fact that websites increasingly needed a way to organize large volumes of regularly changing material.

As HTML evolved, so did the servers and programs that delivered it. A significant evolutionary step was the introduction of server-side scripting languages, the most notable being PHP. They built on traditional "third generation" programming language concepts, but allied to special features designed for the creation of HTML for the Web. As they evolved, scripting languages acquired numerous features that are geared specifically to the web environment.

The next turning point was the appearance of complete systems designed to organize material, and present it in a slick way. In particular, open source systems offered website-building capabilities to people with little or no budget. That was exactly my situation a few years ago, as a consultant wanting a respectable website that could be easily maintained, but costing little or nothing to buy and run. A number of systems could lay claim to being ground breakers in this area, and I tried a few that seemed to me to not quite achieve a solution.

For me, the breakthrough came with Mambo 4.5. It installed in a few minutes, and already there was the framework of a complete website, with navigation and a few other useful capabilities. The vital feature was that it came with templates that made my plain text look good. By spending a small amount of money, it was possible to have a personalized template that looked professional, and then it took no special skills to insert articles of one kind or another. Mambo also included some simple publishing to support the workflow involved in the creation and publication of articles. Mambo and its grown up offspring Joomla! have become well-known features in the CMS world.

My own site relied on Mambo for a number of years, and I gradually became more and more involved with the software, eventually becoming leader of the Mambo development team for a critical period in the development of version 4.6. For various reasons, though, I finally departed from the Mambo organization and eventually wrote my own CMS framework, called Aliro. Extensions that I develop are usually capable of running on any of MiaCMS, Mambo, Joomla!, or Aliro. The Aliro system is used to provide all the code examples given here, and you can find a site that is running the exact software described in this book at http://packt.aliro.org.

Some people said of the first edition of this book that it was only about Aliro. In one sense that is true, but in another it is not. Something like a CMS consists of many parts, but they all need to integrate successfully. This makes it difficult to take one part from here, another from there, and hope to make them work together. And in order to give code examples that could be relied on to work, I was anxious to take them from a complete system. However, when creating Aliro I sought to question every single design decision and never do anything without considering alternatives. This book aims to explain the issues that were reviewed along the way, as well as the choices made. You may look at the same issues and make different choices, but I hope to help you in making your choices. I also hope that people will find that some of the ideas here can be applied in areas other than CMS frameworks.

From time to time, you will find mentions of backwards compatibility, mostly in relation to the code examples taken from Aliro. In this context, backwards compatibility should be understood to be features that have been put into Aliro so that software originally designed to run with Mambo (or its various descendants) can be used with relatively little modification in Aliro. The vast majority of the Aliro code is completely new, and no feature of older systems has been retained if it seriously restricts desirable features or requires serious compromise of sound design.

Critical CMS features

It might seem that we have now defined a CMS as a system for managing content on the Web. That would be to look backwards rather than forwards, though. In retrospect, it is apparent that one of the limitations of systems like Mambo is that their design is geared too heavily to handling documents. While every website has some pages of text, few are now confined to that. Even where text is primary, older systems are pushed to the limit by demands for more flexibility in who has access to what, and who can do what.

While the so called "core" Mambo system could be installed with useful functionality, an essential part of Mambo's success was the ability to add extensions. Outside the core development, numerous extra functions were created. The existence of this pool of added capabilities was vital to many users of Mambo. For many common requirements, there was an extension available off the shelf. For unusual cases, either the existing code could be customized or new code could be commissioned within the Mambo framework. The big advantages were the ability to impose overall styling and the existence of site-wide schemes for navigation and other basic services.

The outcome is that the systems have outgrown the CMS tag, as the world of the Web has become ever more interactive. Sites such as Amazon and eBay have inspired many other innovations where the website is far more than a compendium of articles. This is reflected in a trend for the CMS to migrate towards being a framework for the creation of web capabilities. Presentation of text, often with illustrations, is one important capability, but flexibility and extensibility are critical.

So what is left? As with computing, generally, new ideas are often implemented as islands. There is then pressure to integrate them. At the very least, the aim is to show users a single, rich interface, preferably with a common look and feel. The functionality is likely to be richer if the integration runs deeper than the top presentation level. For example, integration is excessively superficial if users have to authenticate themselves separately for different facilities in the same website. Ideally, the CMS framework would be able to take the best-of-breed applications and weave them together through commonly-agreed APIs, RESTful interfaces, and XML-RPC exchanges. Today's reality is far from this, and progress has been slow, but some integration is possible.

It should now be possible to create a list of essential requirements and another list of desirable features for a CMS. The essentials are:

Continuity: Despite the limitations of basic web protocols, many website functions need to retain information through a series of user interactions and the information must be protected from hijacking. The framework should handle this in a way that makes it easy for extensions to keep whatever data they need.User management: The framework needs to provide the fundamentals for a system of controlling users via some form of authentication. But this needs to be flexible so that the least amount of code is installed to handle the requirement, which can range from a single administrative user to handling hundreds of thousands of distinct users and a variety of authentication systems.Access control: Constraints are always required, if only to limit who can configure the website. Often much more is needed as various groups of users are allocated different privileges. It is now widely agreed that the best approach is the Role-Based Access Control (RBAC) system. This means that it is roles that are granted permissions, and accessors are allocated roles. It is preferable to think of accessors rather than users, since roles also need to be given to things other than just users, such as computer systems.Extension management: A framework is useful if it can be easily extended. There is no single user visible facility that is essential to every website, so ideally the framework is stripped of all such functions. Each capability visible to users can then be added as an extension. When the requirements for building a website are considered, it turns out that there are several different kinds of extensions. One well known classification is into components, modules, plugins, and templates. These are explained in detail in Chapter 8,Handling Extensions.Security and error handling: Everyone is aware of the tide of threats from spam to malicious cracking of websites. To be effective, security has to be built in from the start so that the framework not only achieves the best possible security, but also provides a helpful environment for building secure extensions. Errors are significant both as a usability problem and a potential security flaw, so a standard error handling mechanism is also required.

Desirable CMS features

Most people would not be content to stop with the list of critical features. Although they are the essentials, it is likely that more facilities will be needed in practice, especially if the creation of extensions is to be made easy. The list of desirable features certainly includes:

Efficient and maintainable code handling: The framework is likely to consist of a number of separate code files. It is essential that they be loaded when needed, and preferable that they are not loaded if not needed. The mechanisms used need to be capable of handling extra code files added as extensions.Database interface: Many web applications need access to a database to be able to function efficiently. The framework itself needs a database to perform its own functions. While PHP provides an interface to various databases, there is much that can be done in a CMS framework to provide higher level functions to meet common requirements. These are needed both by the framework and by many extensions.Caches: These are used in many different contexts for Internet processing. To date, the two most productive areas have been object and XHTML caching. Both the speed of operation and the processing load benefit considerably from well implemented caches. So it is highly desirable for a CMS framework to provide suitable mechanisms that are lightweight and easy to use.Menus: These are a common feature of websites, especially when taken in the widest sense to include such things as navigation bars and other ways to present what are essentially lists of links. It is not desirable for the framework to create final XHTML because that preempts decisions about presentation that should belong to templates or other extensions. But it is desirable for the framework to provide the logic for creating and managing menus, including a standard interface to extensions for menu creation. The framework should also provide menu data in a way that makes it easy to create a menu display.Languages: Nowadays, as a minimum, software development should take account of the requirements imposed by implementation in different languages, including those that need multi-byte characters. It is now broadly agreed that part of the solution to this requirement is the use of UTF-8. A mechanism to allow fixed text to be translated is highly desirable. The bundle of issues raised by demands for language support are usually described using the terms internationalization and localization. The first is the building of capabilities into a system to support different ways of doing things, of which the most prominent is choice of language. Localization is the deployment of specific local characteristics into a system that has been internationalized. Apart from language itself, matters to be considered include the presentation of dates, times, monetary amounts, and numbers.

Many other services are useful, such as handling the sending of e-mails, assistance in the creation of XHTML, insulating applications from the file system, and so on. But before considering an approach to implementation, there is an important matter of how a CMS is to be managed.

System management

In this discussion of system management, it is assumed that a web interface is provided. The person in control of a site, typically called the manager or administrator, is often in the same situation as the user of the site. That is to say, the site itself is installed on a hosted web server distant from both its users and its managers. A logical response to this scenario is to implement all interactions with the site through web interfaces.

There are disagreements about how much, if any, system management should be kept apart from user access. One school of thought requires a distinct management login using a slightly different URI. Opposing this is the view that everything should be done from the same starting point, but allowing different facilities according to the identity of the user. Drupal is the best known example of the latter approach, while Mambo and Joomla! keep the administrator separate. Aliro continues along the path trodden by Mambo and Joomla!

There is some justification for the idea that everything should be merged, with no distinct administrator area. As the CMS grows in sophistication, user groups proliferate; the distinction between an administrator and a privileged user is hard to sustain. Typically, visitors may be given quite a lot of read access to site material, but constrained write access, mainly because of misuse problems. But users who have identified themselves to the site may be given quite extensive capabilities. These might extend to having areas of the site where they are able to publish their own material. The registered user can thus become an administrator of his/her own material, needing similar facilities to a site administrator.

The argument in favor of splitting off some administrative functions is largely to do with security. Someone at the highest administrator level is likely to have access to tools that are capable of destroying the site and possibly the whole server. With everything merged, the safety of key administrative functions depends critically on the robustness of user management. It is difficult to be completely confident in this, especially as the total volume of software deployed on a site becomes large. Allowing access to the most sensitive administrative functions only through a distinct URI and login mechanism allows for other security mechanisms to be combined with the CMS user management. This might be a different user and password scheme implemented using Apache, or it might be a constraint on the IP addresses permitted to access the administrator login URI. No security mechanism is perfect, but combining more than one mechanism increases the chances of keeping out intruders. More information is said about security issues in a later section of this chapter.

Because of the separatist arguments, Aliro is implemented with a distinct administrator login to a small range of critical functions. Extensions added to the CMS have the ability to implement an administrator-side interface, but are free to make their own design decisions on the balance to be struck. The functions provided by the Aliro base system for administrators are as follows:

Basic system configuration such as details of databases used, caching options, mailing options, and presentation of system informationManagement of extensions through the ability to install packages of software or to remove them, and the ability to manage what appears on which displayA particular part of extension management is the handling of themes (formerly known as templates in the Mambo world) that affect the presentation of the whole siteManagement of a folder system that supports a tree structure of arbitrary depth, around which site content can be constructedCreation and management of menu informationAccess to error reports that contain detailed diagnostic informationA generalized system for modifying URIs to be friendly to humans and search engines, and to manage metadataWhatever management functions are provided by extensions to the basic CMS

In Aliro, some of the critical classes that provide these facilities are not known to the general user side of the system, which provides another obstacle to misuse. Indeed it is possible to rename the directory under which code exclusive to the administrator side of the system resides. Code on the general user side does not have any straightforward means to find out where the administrator code exists. On balance, I believe that splitting off the most fundamental administrative functions is the more secure policy.

Now we have lists of essential and desirable CMS features, together with a set of administrator functions. We also need to start thinking about the technology needed for building a CMS.

Technology for CMS building

Earlier we looked at how changing demands on websites occurred alongside innovation in technology, and particularly mentioned the arrival of scripting languages. Of these, JavaScript is the most popular at present, but for server-side scripting the favorite is PHP. With version 5, PHP reached a new level. The most significant changes are in the object-oriented features. These were thought to be a kind of "extra" when they were introduced into version 4. But extensive and enthusiastic use of these features to build object-oriented applications has led to PHP5 being built with a much better range of class and object capabilities. This provides the opportunity to adopt a much more thoroughgoing object orientation in the building of a new CMS framework. Strangely, despite all the talk of "Internet years" and rapid change, the move to PHP5 has been extremely slow, taking about five years from first release to widespread deployment.

Leveraging PHP5

Software developers can argue at length about the relative merits of different languages, but there is no doubt that PHP has established itself as a very popular tool for creating active websites. Two factors stand out, one of which applies to PHP generally, the other specifically to PHP5.

The general consideration is the ongoing attempt to separate the creation of views (which in practice means creating XHTML) from the problem-oriented logic. More generally, the aim is to split development into the MVC model model, view, and controller. While some have seen a need to create templating systems to achieve this, such systems have always been questionable on the grounds that PHP itself contains the necessary features for handling XHTML in a sound way. The trend recently has been to see templating systems as an unnecessary overhead. Indeed, one developer of a templating system has written to say that he now considers such systems undesirable. So a significant advantage of using PHP is the ability to handle XHTML neatly. There still remain plenty of unsolved problems in this area, notably the viability of widget libraries and the issue of how to support easy customization. Despite those problems, PHP offers powerful mechanisms for dealing with XHTML, briefly illustrated in a later section.

The specific advantage of PHP5 is its greatly improved provisions for classes and objects. Many experienced developers take the view that object principles lead to more flexible systems and better quality code. Of course, this does not happen automatically. Knowledge and skill are still required. More detailed comments about object-oriented development are made in a later section.

Note

After I had left the Mambo development team and decided to create a radically changed CMS to evolve out of the Mambo history, it was a major commitment of development effort. Given the huge advantage of PHP5 through its radically improved handling of classes and objects, it would have seemed foolish to commit so much effort to an obsolescent system. Because object orientation enables such radical improvements to the design of a CMS framework, it seemed to me that the logical conclusion was to work in PHP5 and wait for the world to catch up. It is now easy to find PHP5 hosting, and most developers have either migrated or are currently making the transition.

Some PHP policies

Before we go into specifics in later chapters, there are some general points about PHP that apply everywhere. There is scope for varying opinions in programming practice, so it has to be said that these are only my opinions and others may well disagree. But they do affect the way in which the code examples are written, so mentioning them may aid understanding. Much more could be said; the following comments are a selection of what seem the most important considerations for sound use of PHP. Other points will become apparent through the rest of the book.

PHP will not fail if variables are uninitialized, as it will assume that they are null and will issue a notice to tell you about it. Sometimes, PHP software is run with warnings and notices suppressed. This is not a good way to work. It hardly requires any more effort to write code so that variables are always initialized before use. The same applies to all other situations that give rise to notices or warnings, which can be easily avoided. Often, quite serious logical errors can be picked up by seeing a notice or warning. The error may not make the code fail in an obvious way, but nonetheless something may be going badly wrong. A low-level error is frequently an important sign of a problem. It is therefore best to make sure that you find out about every level of error.

Declarations are powerful, and it pays to maximize their power. Classes can be declared as abstract when they are not intended to be used on their own to create objects, but used only to build subclasses. Conversely, classes or methods that should not be subclassed can be declared final. Methods can be declared as public, private, or protected