First off, why SAS XPT? SAS XPT is the format that the FDA requires for New Drug Applications as part of the drug development process. It's the least common denominator for representing and transferring clinical data. And, because of US law (and the fact that it's just the right thing to do), the FDA must remain vendor neutral. As part of that neutrality, SAS XPT must be an open format. From the SAS FDA FAQ:
The specification can be found here. My initial thinking was, there's gotta be some sort of open source implementation with this stuff; and apparently I wasn't the only one thinking that. I had seen some things with R where one could read SAS XPT, but nothing to write it. Why not? I'm guessing there isn't much out there because it's hard to do, and once people crack it, they don't want to share :)
Q. What do you mean, the XPORT format is "open?" A. Specifications for the XPORT transport format are in the public domain. Data can be translated to and from the XPORT transport format to other commonly used formats without the use of programs from SAS Institute or any specific vendor
In my opinion, the hardest part of the specification is the fact that, "All integers are stored using IBM-style integer format, and all floating-point numbers are stored using the IBM-style double." Really, instead of using the common IEEE format, the FDA mandates something as archaic and ancient as IBM Floating Point? Fortunately, one of my developers is good with C programming, and I had just enough experience on the mainframe to come up with a solution.
So yeah... our reporting engine runs on a JVM, and Groovy was a natural fit to develop this with. Why Groovy instead of just Java?
- Groovy I/O - leveraging the dynamic left shift operator is just a natural way to write to an java.io.OutputStream, and build the XPT file. Furthermore, with Groovy's management of Exceptions, I don't have to constantly handle java.io.IOException.
- Duck Typing - as we write out each data point, if the object acts like a duck, quacks like a duck... it's a duck. With SAS XPT, you define a type for each column. And since Groovy has a dynamic toDouble() method on java.lang.String and all implementations of java.lang.Number, the code doesn't care about the underlying object type passed in... it's just converted to a double.
- Because Groovy is still Java - even using the Groovy programming language, we can still do the bit banging necessary to convert IEEE to IBM's Floating Point.
- Incredibly useful dynamic methods for implementing some of the subtleties of the specification. A few gems are java.lang.String's pad methods, and java.util.Date's getAt method.


