java - How to decode special characters with Apache Tika -
i'm using apache tika parse ms word documents html (string). problem documents contains special characters (e.g. mathematical operators). way how solve it? thank help.
input:
output
source code
saxtransformerfactory.newinstance(); transformerhandler handler = null; try { handler = factory.newtransformerhandler(); } catch (transformerconfigurationexception e) { logger.warn(string.format("sax processing not available: ", e)); return; } handler.gettransformer().setoutputproperty(outputkeys.indent, "yes"); handler.gettransformer().setoutputproperty(outputkeys.method, "xml"); handler.gettransformer().setoutputproperty(outputkeys.encoding, "utf-8"); handler.setresult(new streamresult(output)); // stringwriter output
Comments
Post a Comment