HTML Validation

Today I wanted to know how to get the functionality of the HTML validator Firefox extension. The extension has two modes: Tidy and SGML parser. Each of these modes reports differently on the HTML under test. Both reports can be useful (I’m not going to get into the differences here).

Specifically, I wanted to be able to generate either a Tidy or an SGML parser report from the command line. And I wanted to be able to run my report for any public Web page.

Tidy

Getting a Tidy validation report on the command line was pretty straightforward for me, because I already use Tidy to indent and validate the HTML I write for clients. The other element I needed was Perl’s LWP module, which I’ve used for years to get remote resources in batches.

So I already had the Cygwin port of Tidy (formerly HTMLTidy) installed on my Windows PC via Cygwin. I also had ActivePerl installed. And I already had installed Perl’s LWP module. With those elements in place, it’s a snap to use the command line to get Tidy validation on any remote page:

lwp-request http://site.com/page.html | tidy -e

The onsgmls SGML parser

Installing onsgmls

The Firefox extension documentation states that the SGML-parsing mode is powered by something called OpenJade. OpenJade is apparently a versatile SGML tool, that can validate and transform XML, SGML and HTML. It can also be used to do XSLT transforms.

After doing some reading, I installed OpenJade via Cygwin. It sounded like I also needed a tool named OpenSP so I installed that with Cygwin as well.

I also grabbed the W3’s DTD library and dropped the sgml-lib folder into my C:\ drive.

Using onsgmls

After more reading, I found that the name of the OpenJade validation tool is actually onsgmls. At first, I couldn’t get a one-line command line solution, so I used a temp file :(

lwp-request http://site.com > temp.html

onsgmls -wxml -E0 -s c:\sgml-lib\xml.dcl temp.html

But then I read the onsgml manpage and learned that in order to get onsgml to expect the code under test to be on standard input, I just needed to give onsgml a final argument of “-”.

lwp-request http://site.com | onsgmls -wxml -E0 -s c:\sgml-lib\xml.dcl -

XHTML validation in one line! So far this only works with XHTML files.

I haven’t learned how to get onsgmls to process HTML 4 yet.

For validating HTML (as opposed to XHTML) the following appears to work:

lwp-request http://onemorebug.com | onsgmls -E0 -s c:\sgml-lib\ISO-HTML\15445.dcl -

The options -wxml -E0 -s mean, respectively:

  1. treat the document under test as XML and not SGML
  2. don’t limit the number of errors that will be reported
  3. don’t show the XML parse tree, just show errors (if any)

Command-line vs. Firefox extension

The commands above give exactly the same validation results as the two corresponding modes of the Firefox extension. This was nice to confirm, but not surprising since I was basically just calling the core functionality of the Firefox extension.

Leave a Reply

You must be logged in to post a comment.