Review: Davisor Offisor 1.5.1
by Drew Falkman
Summary
Sometimes it's the most mundane, seemingly basic tasks that end
up taking a lot of time and effort to deal with. I've found this
to be particularly true with content management - especially
dealing with Microsoft Word documents and getting them to work
on Web sites. Davisor Offisor is a Java tool library to help
developers handle Word documents and get them into an easier
format to work with: eXtensible Markup Language (XML). In this
review, I will take a look at Offisor to see if this can help us
with development. More Information
Introduction
As Internet technology has evolved, so too have the formats of
documents -- we now have PDF, a pretty solid HTML standard and
XML. In theory, as more end-users move towards universal
document formats, this should make the prospect of content
management easier. Unfortunately, in most circumstances all of
these newer formats require special tools or technical
understanding. And let's be realistic; most people still use
Microsoft Word. Anyone who has worked with Word documents, and
even the HTML/XML output of Word documents, knows that this is
not an easy format to work with. Tools like Macromedia
Dreamweaver MX even have special processes to, as Dreamweaver
calls it, "Clean up Word HTML". Microsoft seems to be addressing
this issue by adding significant XML support in Office 2003, but
many users are still using Word 2002, 2000, 97 and earlier or
don't have the understanding (or inclination to obtain it)
necessary to work with XML. It is in this arena where Davisor
Offisor can help.
How Offisor Works
One of the nice things about Offisor is that it doesn't require
any proprietary plug-ins or libraries, such as you might expect
when working with Microsoft formats. Offisor will work in any
native Java application, on Windows, Linux or whatever. The only
requirement is a SAX (1 or 2) compatible XML parser. In version
1.5.1, Offisor will handle two basic types of files; standard
Word docs (versions 6, 95, 97 and 2000, and though undocumented
I had luck with 2002) and "real-world" HTML files. The real-
world HTML parser is a nice addition to the package, as it will
parse looser and sloppier (as their Davisor calls it, "almost-
but-not-quite compliant") HTML into XML, allowing developers to
create a universal XML storage paradigm for any HTML and Word
documents that are imported into an application.
Using Offisor is straightforward to say the least. There are two
primary classes that are used to parse documents.
com.davisor.ms.doc.DocParser and
com.davisor.xml.html.HTMLParser. As you have probably surmised,
these will process Word docs and HTML documents respectively.
The examples included with Offisor are actually quite handy and
provided a good look at how to use the API to transform
documents. Additionally, the API is quite comprehensive and a
number of core classes include utilities, interfaces and
exceptions that you can use when coding with Offisor.
Setup, Installation and Documentation
Setting up Offisor on my computer was a simple task. The zip I
downloaded included a WAR file which I deployed on my JRun 4
server. Everything worked on the first try! The download also
includes the examples and a good bit of documentation. The
documentation includes the Offisor user's manual, the API docs,
a guide for obfuscating Offisor code (if a developer wanted to
include this code in a larger software package), information
about the output XML format, some sample transformation style
documents and a version history. Frankly, this was more than I
expected from a relatively simple tool(from an implementation
standpoint at least).
New on the Java Boutique:
New Review:
Time Management Made Easy with the Quartz Enterprise Job Scheduler
Why not just use the Java timer API? This open source scheduling
API boasts simplicity, ease-of-integration, a well-rounded feature
set, and it's free!
New Applet:
Reverse Complement
Reverse Complement is a simple applet that converts DNA or RNA
sequences into three useful formats.
Elsewhere on internet.com:
WebDeveloper Java
Lots of Java information on webdeveloper.com
WDVL Java
Thorough Java resource at the Web Developer's Virtual Library.
ScriptSearch Java
Hundreds of free Java code files to download.
jGuru: Your View of the Java Universe
Customizable portal with online training, FAQs, regular news updates, and tutorials.