python - Get Text for XML-Node including childnodes (or something like this) -
i have pure text out of xml-node , child nodes, or else these strange inner-tags are:
example-nodes:
<booktitle> <emphasis type="italic">z</emphasis> = 63 - 100 </booktitle> or:
<booktitle> mtn <emphasis type="italic">z</emphasis> = 74 - 210 </booktitle> i have get:
z = 63 - 100 mtn z = 74 - 210 remember, example! there type of "child-nodes" inside booktitle-node, , need pure text inside booktitle.
i tried:
tagtext = root.find('.//booktitle').text print tagtext but .text can't deal strange xml-nodes , gives me "nonetype" back
regards & thanks!
that's not text of booktitle node, it's tail of emphasis node. should like:
def parse(el): text = el.text.strip() + ' ' if el.text.strip() else '' child in el.getchildren(): text += '{0} {1}\n'.format(child.text.strip(), child.tail.strip()) return text which gives you:
>>> root = et.fromstring(''' <booktitle> <emphasis type="italic">z</emphasis> = 63 - 100 </booktitle>''') >>> print parse(root) z = 63 - 100 and for:
>>> root = et.fromstring(''' <booktitle> mtn <emphasis type="italic">z</emphasis> = 74 - 210 </booktitle>''') >>> print parse(root) mtn z = 74 - 210 which should give basic idea do.
update: fixed whitespace...
Comments
Post a Comment