Sharing our knowledge

Nothing is more useful than the information you've been looking for.  Our blogs communicate our recent findings and expertise within our skillset.

Please select a blog from the menu.

XHTML vs HTML

Written on Tue, 02 Jun 2009
By Amy Varga

Amy Varga assesses the XHTML 2.0 and HTML 5 specification in terms of which is the most appropriate to use in web development now.

I am in the process of re-building my website and have always used the XHTML 1.0 specification.  With the recent interest in HTML 5 I thought it timely to investigate whether I should be implementing HTML 5 on my website.

Most XHTML web pages (including BBC, Times Online and Facebook) are effectively not XML on the web because they are being parsed as HTML not XML.  This is because they generally use a MIME content type of “text/html” as opposed to “application/xhtml+xml”; browsers use MIME type to distinguish between two types of syntax to represent HTML documents namely, HTML serialisation and XML serialisation.

The reason for parsing XHTML pages as HTML and not XML makes sense.  Whilst the DTD’s for HTML and XHTML are both based on SGML, XHTML is a more restrictive subset of SGML meaning that it has very strict syntax rules.  An XML parser will stop processing an invalid page and nothing except an error message will be displayed.  An HTML parser on the other hand is much more complex and lenient.  It uses an error-correcting tag soup parser to display an invalid page to the best of its ability.

In reality the interoperability of the Web means that invalid pages are a reality; content feeds, user generated content, body content (CMS), trackbacks, ad services and widgets can all create invalid pages.  By using an HTML parser the page is displayed regardless of its well-formedness.

The XHMTL 2.0 specification has been in progress since 2001 and is in its eighth public working draft.  It is based solely on XML and is driven by how markup should be used, rather than how markup is currently being used.  It proposes sophisticated and elegant solutions that may also  require a learning curve and it is not backward compatible with XHTML 1.0.

The proposed XForms, which replaces forms in the XHTML 2.0 specification, was the catalyst for the current HTML 5 specification.  In late 2003 Opera intended to prove that it was possible to extend HTML 4’s forms to provide many of the introduced features of XForms 1.0 without requiring browsers to implement rendering machines incompatible with existing HTML.  The draft proposal, presented jointly by Mozilla and Opera, was rejected by the W3C on the grounds that it conflicted with the previously chosen direction for the Web’s evolution.

Subsequently, Apple, Mozilla and Opera announced their intention to continue working on the effort under the Web Hypertext Application Technology Working Group (WHATWG).  In 2006 the W3C expressed interest in the specification, and created an HTML Working Group to work with the WHATWG on the development of the HTML 5 specification.

Whilst the HTML 5 specification aims to be backwards compatible with HTML4.0 and XHTML 1.0 this hasn’t been achieved completely as some elements have new meanings.  The specification focuses on creating a language for web applications as well as the creation of API’s that improve the client side web development environment.

The proposed API’s which include a 2D drawing API, a video and audio API, an offline web application API, an API that allows web applications to register themselves for certain protocols or MIME types, an editing API, a drag and drop API, network API’s and a cross document messaging API will no doubt prove most useful to web developers and is certainly a highlight of HMTL 5 for me.

Over and above that the HTML 5 specification has overwhelming industry support from browsers, search engines, CMS’s and authoring tools.

Perhaps the notion that all documents on the web should be written in an XML format is an ideal one.  Ian Hickson in a conversation about the HTML 5 specification stated that the HTML 5 specification came about because 95% of the Web today is HTML.  He also questioned whether the web would have been successful if it showed error messages whenever something was the least bit wrong.  Ian estimated that 93% of documents on the web have markup errors which would mean that over 90% of the web could not be browsed.

Neither specification has been implemented in its entirety and the HTML 5 specification looks unlikely to reach its deadline of 2010.  Saying that the <canvas> feature has been widely implemented and with such industry support implementation of some sections may take place sooner.

Since the XHTML 2.0 specification is not backwards compatible with XHTML 1.0 I am going to embrace the HTML 5 specification until the XHTML 2.0 specification gets implemented.

Comments

Add a comment

* denotes required fields