Read embedded pdf file in excel using Java -


i new java programming. current project requires me read embedded(ole) files in excel sheet , text contents in them. examples reading embedded word file worked fine, unable find reading embedded pdf file. tried few things looking @ similar examples.... didn't work out.

http://poi.apache.org/spreadsheet/quick-guide.html#embedded

i have code below, can in right direction. have used apache poi read embedded files in excel , pdfbox parse pdf data.

public class readexcel1 {  public static void main(string[] args) {      try {          fileinputstream file = new fileinputstream(new file("c:\\test.xls"));          poifsfilesystem fs = new poifsfilesystem(file);         hssfworkbook workbook = new hssfworkbook(fs);          (hssfobjectdata obj : workbook.getallembeddedobjects()) {              string olename = obj.getole2classname();             if(olename.equals("acrobat document")){                 system.out.println("acrobat reader document");                  try{                     directorynode dn = (directorynode) obj.getdirectory();                     (iterator<entry> entries = dn.getentries(); entries.hasnext();) {                          documententry nativeentry = (documententry) dn.getentry("contents");                         byte[] data = new byte[nativeentry.getsize()];                          bytearrayinputstream bao= new bytearrayinputstream(data);                         pdfparser pdfparser = new pdfparser(bao);                          pdfparser.parse();                         cosdocument cosdoc = pdfparser.getdocument();                         pdftextstripper pdfstripper = new pdftextstripper();                         pddocument pddoc = new pddocument(cosdoc);                         pdfstripper.setstartpage(1);                         pdfstripper.setendpage(2);                         system.out.println("text pdf "+pdfstripper.gettext(pddoc));                     }                 }catch(exception e){                     system.out.println("error reading "+ e.getmessage());                 }finally{                     system.out.println("finally ");                 }             }else{                 system.out.println("nothing ");             }         }          file.close();     } catch (filenotfoundexception e) {         e.printstacktrace();     } catch (ioexception e) {         e.printstacktrace();     } } 

}

below output in eclipse

acrobat reader document 

error reading error: end-of-file, expected line nothing

the pdf weren't ole 1.0 packaged, somehow differently embedded - @ least extraction worked me. not general solution, because depends on how embedding application names entries ... of course pdfs check documentnode-s magic number "%pdf" - , in case of ole 1.0 packaged elements needs done differently ...

i think, real filename of pdf somewhere hidden in \1ole or compobj entries, example , apparently use case that's not necessary determine.

import java.io.*; import java.net.url; import org.apache.poi.hssf.usermodel.*; import org.apache.poi.poifs.filesystem.*; import org.apache.poi.util.ioutils;  public class embeddedpdfinexcel {     public static void main(string[] args) throws exception {         npoifsfilesystem fs = new npoifsfilesystem(new url("http://jamesshaji.com/sample.xls").openstream());         hssfworkbook wb = new hssfworkbook(fs.getroot(), true);         (hssfobjectdata obj : wb.getallembeddedobjects()) {             string olename = obj.getole2classname();             directorynode dn = (directorynode)obj.getdirectory();             if(olename.contains("acro") && dn.hasentry("contents")){                 inputstream = dn.createdocumentinputstream("contents");                 fileoutputstream fos = new fileoutputstream(obj.getdirectory().getname()+".pdf");                 ioutils.copy(is, fos);                 fos.close();                 is.close();             }         }         fs.close();     } } 

Comments

Popular posts from this blog

java - activate/deactivate sonar maven plugin by profile? -

python - TypeError: can only concatenate tuple (not "float") to tuple -

java - What is the difference between String. and String.this. ? -