java - How to decode special characters with Apache Tika -


i'm using apache tika parse ms word documents html (string). problem documents contains special characters (e.g. mathematical operators). way how solve it? thank help.

input: enter image description here

output

enter image description here

source code

saxtransformerfactory.newinstance(); transformerhandler handler = null;  try {   handler = factory.newtransformerhandler(); } catch (transformerconfigurationexception e) {    logger.warn(string.format("sax processing not available: ", e));    return; }  handler.gettransformer().setoutputproperty(outputkeys.indent, "yes"); handler.gettransformer().setoutputproperty(outputkeys.method, "xml"); handler.gettransformer().setoutputproperty(outputkeys.encoding, "utf-8"); handler.setresult(new streamresult(output)); // stringwriter output 


Comments

Popular posts from this blog

c++ - Linked List error when inserting for the last time -

java - activate/deactivate sonar maven plugin by profile? -

tsql - Pivot with Temp Table (definition for column must include data type) -- SQL Server 2008 -