Microsoft's next version of Office will use XML to store the data by default, and that XML format will be something we can get the spec for and use royalty free.

This is, of course, massive. Not very original - the actual format description (XML files stored inside a ZIP file) should sound suspiciously familiar to users of OpenOffice.org's products, for starters. But the Office document formats have been a pain for years, as have most other formats.

Take RTF, for instance. Rich Text Format. What could be more of a lingua franca than the venerable RTF that we all know and love? You know - the one that's akin to a Word 6.0 document from years ago, and hasn't changed since?

Well, even there we're ignoring the fact that there are loads of different RTF specifications from different vendors, and different products use different RTF standards. (Like Notes, for instance - there's a reason that the development documentation constantly calls it "Notes Rich Text", and that reason is to avoid confusion.)

But even with Microsoft's own RTF specification, they're up to version 1.8. Which was news to me, because I thought that they were still on 1.7 - this change must have come about within the last year or so. So much for an unchanging, constant format. In fact, RTF seems to change once per Office release (funnily enough), and what most people think of as RTF is somewhere around the version 1.1/1.2 mark. And many users - even technical experts - assume that the RTF that Word exports is this RTF. Well, I can introduce you to some developers who work with RTF, and have some unkind things to say about how compliant Word if with its RTF export on the current standards - let alone how well many RTF libraries handle the format. The bottom line is that you should never assume that RTF is just RTF and can be read by anything - RTF is a minefield of incompatibility in real life.

Because XML is almost self-describing, I'd expect better compliance from all programs outputting these new XML formats, as otherwise they won't validate. So this is a good thing, I hope. There will still no doubt be the ever-present almost yearly feature creep from Microsoft, which will mean that there will be a mad scramble to implement the new features in import/export filters. But otherwise, this is a positive move towards interoperability.

Of course, I note that only the document formats are open. No opening of a format for Outlook, for instance - nor for Access. Richard Schwartz asks whether or not IBM should open up the NSF format, and I think that the absence of an open Outlook or Access storage format shows us that he's on the wrong track. This is about opening up formats for data interchange - and you're more likely to send an individual item than a whole collection that's stored in a database.

Stan Rogers pointed out in a comment that this has serious security implications, as a lot of the Notes security relies upon the API rather than any inherent security in the format (unless you use the encryption facilities, of course!).

Basically, I don't see the need to open up database formats like NSF. I'd rather see the access methods for those storage formats opened up - which we already have with Notes, through access to NSFs via the Notes client (LotusScript/Formula language/Java), DXL, COM, C and C++ APIs. They could perhaps be made a bit better, but APIs are certainly the way to go when you're looking at a database, and Notes offers the most choice of any database format I know of when it comes to allowing access via APIs.

And most importantly, an API means that we don't suffer massive problems with different sub-versions of NSF out there. If the NSF format was documented, we might end up with the same kind of mess that RTF ended up in. Which would be totally unacceptable for NSF, given how many users it has and the good standing it maintains with them...

Comments (2)
Philip Storry June 6th, 2005 20:05:22

 Comments
1 Office, XML and NSF...
Brian Benz 07/06/2005 00:52:45

Hi Philip - Agreed - added my own posting here -

{ Link }

The announcement that the next version of all MS office documents will be based on XML actually gives us an interesting peek into the MS mindset when it comes to software. Office file formats are not containers; they are a way of transporting information. The file system (the Windows file system, of course….) is the container. And that won't be XML. Smart.

2 Office, XML and NSF...
Richard Schwartz 07/06/2005 05:45:38

My response back in my blog

{ Link }

-rich


Discussion for this entry is now closed.