NavigationUser login |
difficulty with ole_type = 'b'I've found your site useful and I've taken several examples from there and tried to pull them into a single program for some conversion work we're doing. I seem to be having difficulty with the OLE_TYPE = 'b' documents. I can extract the data to a file, append the LZW magic number and get a valid uncompress, but the document won't open in Word. I'm pretty sure that the file is valid (it will open through Portal/J), and I can see valid content in the file (from the strings command), but I'm not sure what's keeping it from opening in Word. I found a program that displays OLE container information (http://sourceforge.net/projects/mvole) and it can display information about the file without errors. Any ideas on how to get over the final hump? Once I get the programs finished and debugged, I plan on submitting them back to the site (since the site helped me write them in the first place).
By Anonymous | Miscellaneous object embedding and linking (OLE) topics | add new comment
The object class isThe object class is Word.Document.8 The Unix uncompress suceeds, and I had some issues with the way my program used the UncompressInputStream that required some re-thinking. I found out that you can't keep opening a new input stream with each row of tidblob data but they have to be sent to the UncompressInputStream as one stream of data themselves. Not an easy task and I ended up writing (still in progress) a TidBlobInputStream to do that. Actually, when that gets done, it should clean up a LOT of code in the main OleRead. I have access to Linux and HP-UX and both uncompress versions seemed to agree. This isn't a mainframe, so EBCDIC isn't an issue. Plain American English in use. Just looking at the files in a hex dump, the top of the file dumped by the program and a regular word document seem to start with the same preamble of bytes, so that part looks ok. Plus the fact that the mvole program can decode the entire document without errors seems to indicate that SOMETHING is valid, but for whatever reason, it's throwing Word for a loop. I'm working on a detailed test now. Should have something concrete by today I hope. By Anonymous | reply
OLE reading codeThis is my revamped OLE reading code for PassPort. This version is driven by the WorkOrderDocs class and the OleDocument class itself is kind of driven by work orders, but the rest of it could be adapted for other purposes pretty easiliy by somebody that's good at slinging Java code. Also, I added code to decode the internals of the "link" type documents to try to figure out what they were linked to before and extract that data. We've used it to successfully extract a couple of thousand documents in bulk from our various PassPort databases. It requires some dependancies, namely the little endian switcher from Canadian Mind Products, the Apache POI code and an ODBC driver. LEDataStream - http://mindprod.com/jgloss/ledatastream.html People probably already have the Oracle driver handy. Hopefully this will help somebody. Eric Ladner By Anonymous | reply
|
Thanks for the URL for
Thanks for the URL for sourceforge. I was unaware of this program.
Sounds like you are very close and will probably have this problem resolved by the time you read this.
If not, what’s the value of TIDOBLOK.OLE_OBJECT_CLASS? Word.document.8? Word.document.12 indicates MS Word 2007 and it would need to be handled differently than Word 2002, 2003.
Does a Unix uncompress give the same result as UncompressInputStream.java? Do you have other Unix operating systems to try their uncompress?
Oh. Are you by any chance on an IBM mainframe? There would be an EBCDIC to ASCII conversion involved that I’ve not yet been able to experiment with.
Are languages other than American English involved (code pages, keyboards, the Word document contents)?
Very curious to hear results