python - Get Text for XML-Node including childnodes (or something like this) -
i have pure text out of xml-node , child nodes, or else these strange inner-tags are:
example-nodes:
<booktitle> <emphasis type="italic">z</emphasis> = 63 - 100 </booktitle>
or:
<booktitle> mtn <emphasis type="italic">z</emphasis> = 74 - 210 </booktitle>
i have get:
z = 63 - 100 mtn z = 74 - 210
remember, example! there type of "child-nodes" inside booktitle-node, , need pure text inside booktitle.
i tried:
tagtext = root.find('.//booktitle').text print tagtext
but .text can't deal strange xml-nodes , gives me "nonetype" back
regards & thanks!
that's not text
of booktitle
node, it's tail
of emphasis
node. should like:
def parse(el): text = el.text.strip() + ' ' if el.text.strip() else '' child in el.getchildren(): text += '{0} {1}\n'.format(child.text.strip(), child.tail.strip()) return text
which gives you:
>>> root = et.fromstring(''' <booktitle> <emphasis type="italic">z</emphasis> = 63 - 100 </booktitle>''') >>> print parse(root) z = 63 - 100
and for:
>>> root = et.fromstring(''' <booktitle> mtn <emphasis type="italic">z</emphasis> = 74 - 210 </booktitle>''') >>> print parse(root) mtn z = 74 - 210
which should give basic idea do.
update: fixed whitespace...
Comments
Post a Comment