Reviews : Davisor Offisor 1.5.1 :

Review: Davisor Offisor 1.5.1

by Drew Falkman

Summary

Sometimes it's the most mundane, seemingly basic tasks that end up taking a lot of time and effort to deal with. I've found this to be particularly true with content management - especially dealing with Microsoft Word documents and getting them to work on Web sites. Davisor Offisor is a Java tool library to help developers handle Word documents and get them into an easier format to work with: eXtensible Markup Language (XML). In this review, I will take a look at Offisor to see if this can help us with development. More Information

Introduction

As Internet technology has evolved, so too have the formats of documents -- we now have PDF, a pretty solid HTML standard and XML. In theory, as more end-users move towards universal document formats, this should make the prospect of content management easier. Unfortunately, in most circumstances all of these newer formats require special tools or technical understanding. And let's be realistic; most people still use Microsoft Word. Anyone who has worked with Word documents, and even the HTML/XML output of Word documents, knows that this is not an easy format to work with. Tools like Macromedia Dreamweaver MX even have special processes to, as Dreamweaver calls it, "Clean up Word HTML". Microsoft seems to be addressing this issue by adding significant XML support in Office 2003, but many users are still using Word 2002, 2000, 97 and earlier or don't have the understanding (or inclination to obtain it) necessary to work with XML. It is in this arena where Davisor Offisor can help.

How Offisor Works

One of the nice things about Offisor is that it doesn't require any proprietary plug-ins or libraries, such as you might expect when working with Microsoft formats. Offisor will work in any native Java application, on Windows, Linux or whatever. The only requirement is a SAX (1 or 2) compatible XML parser. In version 1.5.1, Offisor will handle two basic types of files; standard Word docs (versions 6, 95, 97 and 2000, and though undocumented I had luck with 2002) and "real-world" HTML files. The real- world HTML parser is a nice addition to the package, as it will parse looser and sloppier (as their Davisor calls it, "almost- but-not-quite compliant") HTML into XML, allowing developers to create a universal XML storage paradigm for any HTML and Word documents that are imported into an application.

Using Offisor is straightforward to say the least. There are two primary classes that are used to parse documents. com.davisor.ms.doc.DocParser and com.davisor.xml.html.HTMLParser. As you have probably surmised, these will process Word docs and HTML documents respectively. The examples included with Offisor are actually quite handy and provided a good look at how to use the API to transform documents. Additionally, the API is quite comprehensive and a number of core classes include utilities, interfaces and exceptions that you can use when coding with Offisor.

Setup, Installation and Documentation

Setting up Offisor on my computer was a simple task. The zip I downloaded included a WAR file which I deployed on my JRun 4 server. Everything worked on the first try! The download also includes the examples and a good bit of documentation. The documentation includes the Offisor user's manual, the API docs, a guide for obfuscating Offisor code (if a developer wanted to include this code in a larger software package), information about the output XML format, some sample transformation style documents and a version history. Frankly, this was more than I expected from a relatively simple tool(from an implementation standpoint at least).

How to Add Java Applets to Your Site

New on the Java Boutique:

New Review:

Time Management Made Easy with the Quartz Enterprise Job Scheduler
Why not just use the Java timer API? This open source scheduling API boasts simplicity, ease-of-integration, a well-rounded feature set, and it's free!

New Applet:

Reverse Complement
Reverse Complement is a simple applet that converts DNA or RNA sequences into three useful formats.

Elsewhere on internet.com:

WebDeveloper Java
Lots of Java information on webdeveloper.com

WDVL Java
Thorough Java resource at the Web Developer's Virtual Library.

ScriptSearch Java
Hundreds of free Java code files to download.

jGuru: Your View of the Java Universe
Customizable portal with online training, FAQs, regular news updates, and tutorials.