XML News from Wednesday, May 14, 2008

I am pleased to announce that my latest book, Refactoring HTML has been released by Addison Wesley. This book endeavors to improve the design of existing web sites along multiple axes: maintainability, security, attractiveness, and performance. It does this by moving sites to web standards: XHTML, CSS, and REST.

Rather than approaching this as a big bang project, small changes can be made in small steps that offer linear improvement. You don't need to spend months of developer time and thousands of dollars before you see any payback. You can improve your site some today, and then some more tomorrow. Refactoring a web site doesn't require large blocks of uninterrupted development time. Add up enough small changes in the little pieces of time scattered throughout the workday, and before you know it, your site is dramatically improved.

Not convinced yet? Let me offer a brief excerpt from Chapter 1:

Refactoring. What is it? Why do it?

In brief, refactoring is the gradual improvement of a code base by making small changes that don’t modify a program’s behavior, usually with the help of some kind of automated tool. The goal of refactoring is to remove the accumulated cruft of years of legacy code and produce cleaner code that is easier to maintain, easier to debug, and easier to add new features to.

Technically, refactoring never actually fixes a bug or adds a feature. However, in practice, when refactoring I almost always uncover bugs that need to be fixed and spot opportunities for new features. Often, refactoring changes difficult problems into tractable and even easy ones. Reorganizing code is the first step in improving it.

The concept of refactoring originally came from the object-oriented programming community, and dates back at least as far as 1990 (William F. Opdyke and Ralph E. Johnson, “Refactoring: An Aid in Designing Application Frameworks and Evolving Object-Oriented Systems,” Proceedings of the Symposium on Object-Oriented Programming Emphasizing Practical Applications [SOOPPA], September 1990, ACM), though likely it was in at least limited use before then. However, the term was popularized by Martin Fowler in 1999 in his book Refactoring (Addison-Wesley, 1999). Since then, numerous IDEs and other tools such as Eclipse, IntelliJ IDEA, and C# Refactory have implemented many of his catalogs of refactorings for languages such as Java and C#, as well as inventing many new ones.

However, it’s not just object-oriented code and object-oriented languages that develop cruft and need to be refactored. In fact, it’s not just programming languages at all. Almost any sufficiently complex system that is developed and maintained over time can benefit from refactoring. The reason is twofold.

  1. Increased knowledge of both the system and the problem domain often reveals details that weren’t apparent to the initial designers. No one ever gets everything right in the first release. You have to see a system in production for a while before some of the problems become apparent.

  2. 2. Over time, functionality increases and new code is written to support this functionality. Even if the original system solved its problem perfectly, the new code written to support new features doesn’t mesh perfectly with the old code. Eventually, you reach a point where the old code base simply cannot support the weight of all the new features you want to add.

When you find yourself with a system that is no longer able to support further developments, you have two choices: You can throw it out and build a new system from scratch, or you can shore up the foundations. In practice, we rarely have the time or budget to create a completely new system just to replace something that already works. It is much more cost-effective to add the struts and supports that the existing system needs before further work. If we can slip these supports in gradually, one at a time, rather than as a big-bang integration, so much the better.

Many sufficiently complex systems with large chunks of code are not object-oriented languages and perhaps are not even programming languages at all. For instance, Scott Ambler and Pramod Sadalage demonstrated how to refactor the SQL databases that support many large applications in Refactoring Databases (Addison-Wesley, 2006). However, while the back end of a large networked application is often a relational database, the front end is a web site. Thin client GUIs delivered in Firefox or Internet Explorer are everywhere, replacing thick client GUIs for all sorts of business applications, such as payroll and lead tracking. Adventurous users at companies such as Sun and Google are going even further and replacing classic desktop applications like word processors and spreadsheets with web apps built out of HTML, CSS, and JavaScript. Finally, the Web and the ubiquity of the web browser have enabled completely new kinds of applications that never existed before, such as eBay, Netflix, PayPal, Google Reader, and Google Maps.

HTML made these applications possible, and it made them faster to develop, but it didn’t make them easy. It didn’t make them simple. It certainly didn’t make them less fundamentally complex. Some of these systems are now on their second, third, or fourth generation; and wouldn’t you know it? Just like any other sufficiently complex, sufficiently long-lived application, these web apps are developing cruft. The new pieces aren’t merging perfectly with the old pieces. Systems are slowing down because the whole structure is just too ungainly. Security is being breached when hackers slip in through the cracks where the new parts meet the old parts. Once again, the choice comes down to throwing out the original application and starting over, or fixing the foundations; but really, there’s no choice. In today’s fast-moving world, nobody can afford to wait for a completely new replacement. The only realistic option is to refactor.

Most of the refactorings in this book focus on upgrading sites to web standards, specifically:

They are going to help you move away from

These are not binary choices, or all-or-nothing decisions. You can often improve the characteristics of your sites along these three axes without going all the way to one extreme. An important characteristic of refactoring is that it’s linear. Small changes generate small improvements. You do not need to do everything at once. You can implement well-formed XHTML before you implement valid XHTML. You can implement valid XHTML before you move to CSS. You can have a fully valid CSS-laid-out site before you consider what’s required to eliminate sessions and session cookies.

Nor do you have to implement these changes in this order. You can pick and choose the refactorings from the catalog that bring the most benefit to your applications. You may not require XHTML, but you may desperately need CSS. You may want to move your application architecture to REST for increased performance but not care much about converting the documents to XHTML. Ultimately, the decision rests with you. This book presents the choices and options so that you can weigh the costs and benefits for yourself.

It is certainly possible to build web applications using tag-soup table-based layout, image maps, and cookies. However, it’s not possible to scale those applications, at least not without a disproportionate investment in time and resources that most of us can’t afford. Growth both horizontally (more users) and vertically (more features) requires a stronger foundation. This is what XHTML, CSS, and REST provide.

Refactoring HTML is available now at Amazon, Safari, and other fine bookstores everywhere. The price is a very reasonable $39.99, and most stores are offering their customary discounts. (Amazon is 10% off at the moment.) I hope you enjoy it.