python - Get Text for XML-Node including childnodes (or something like this) -


i have pure text out of xml-node , child nodes, or else these strange inner-tags are:

example-nodes:

<booktitle> <emphasis type="italic">z</emphasis>  = 63 - 100 </booktitle> 

or:

<booktitle> mtn <emphasis type="italic">z</emphasis>  = 74 - 210 </booktitle> 

i have get:

z = 63 - 100 mtn z = 74 - 210 

remember, example! there type of "child-nodes" inside booktitle-node, , need pure text inside booktitle.

i tried:

tagtext = root.find('.//booktitle').text print tagtext 

but .text can't deal strange xml-nodes , gives me "nonetype" back

regards & thanks!

that's not text of booktitle node, it's tail of emphasis node. should like:

def parse(el):     text = el.text.strip() + ' ' if el.text.strip() else ''     child in el.getchildren():         text += '{0} {1}\n'.format(child.text.strip(), child.tail.strip())     return text 

which gives you:

>>> root = et.fromstring('''     <booktitle>     <emphasis type="italic">z</emphasis>      = 63 - 100     </booktitle>''') >>> print parse(root) z = 63 - 100 

and for:

>>> root = et.fromstring(''' <booktitle> mtn <emphasis type="italic">z</emphasis>  = 74 - 210 </booktitle>''') >>> print parse(root) mtn z = 74 - 210 

which should give basic idea do.

update: fixed whitespace...


Comments

Popular posts from this blog

c++ - Linked List error when inserting for the last time -

java - activate/deactivate sonar maven plugin by profile? -

java - What is the difference between String. and String.this. ? -