Text Processing

I spent quite a bit of time working on that BSD UNIX system at BGSU, as it was the only way to print to the Imagen 8/300 laser printer! It was an early Canon engine, but I’m not certain what page description languages it supported. I wrote most of my documentation on the VAX, using vi to edit the text, and troff (actually it was ditroff) to process the output. One of the manuals that I prepared was a guide to all of the computing facilities on campus, including many tables, so I used the tbl and eqn packages from the “Documenters Workbench” as preprocessors.

Unix Text Processing, by Dale Dougherty and Tim O’Reilly, was published by Hayden Books in 1987, back when O’Reilly & Associates wrote technical documentation for hire. Hayden later took the book out of print, but Dale and Tim retained the copyright and have decided to make it available through their web site under Creative Commons’ Attribution License. It’s available through O’Reilly & Associates Open Books project as a PDF of scanned images (about 27Mb.) of the original document. Some members of the groff mailing list have transcribed the document back into its original source, with the intention of updating the material. They have posted the source, along with PostScript and PDF versions on Larry Koller’s website.

I’m also interested in DocBook, a standard generalized markup language (SGML) system for text processing that can produce output for HTML and printed output like RTF or PDF from the same source. It is also focused on content markup, rather than stylistic appearance. The formatting is done by applying DSSSL stylesheets using the OpenJADE engine, such as the set that Norm Walsh maintains for DocBook. The DocBook source can be edited as plain text in any text editor, including vi or EMACS. There are major mode editing modules available for EMACS that simplify the tag markup. Other specialized editors are available, such as oXygen XML Editor, XMLspy or the shareware utility UltraEdit. There are also a number of Java-based editors like jEdit and Xerlin. Markus Hoenicka has written up a fabulous tutorial on installing all of the SGML/XML tools that you need to process DocBook on Windows, either standalone or with the Cygwin tools. It hasn’t been updated in a while, but its still a good outline to work from. There are newer versions of many of the tools, so be sure to grab the current ones! A more current tutorial can be found at Lars Vogel’s site. Mezis has gathered several XML-related tools out there in order to build a comprehensive DocBook processing package on Mac OS X, initially for the team at Project:Omega. The result of this work is the DocBook-X package. There is also a tutorial. For the Website DTD, I use xsltproc on Mac OS X from www.Zveno.com.