python - Remove content between <div> and <ahref> Beautiful Soup -


i have piece of code parse webpages. want remove content between, div, ahref, h1.

opener = urllib2.build_opener() opener.addheaders = [('user-agent', 'mozilla/5.0')] url = "http://en.wikipedia.org/wiki/viscosity" try:   oururl = opener.open(url).read() except exception,err:   pass soup = beautifulsoup(oururl)                 dem = soup.findall('p')       in dem:   print i.text 

i want print text without content between h1, ahref mentioned above.

edit: comment "i want return text not between <div> , </div> tags.". should strip out blocks parent has div tag:

raw = ''' <html> text <div> avoid </div> <p> nested <div> don't me either </div> </p> </html> '''  def check_for_div_parent(mark):     mark = mark.parent     if 'div' == mark.name:         return true     if 'html' == mark.name:         return false     return check_for_div_parent(mark)  soup = bs4.beautifulsoup(raw)  text in soup.findall(text=true):     if not check_for_div_parent(text):         print text.strip() 

this results in 2 tags, ignore div ones:

text nested 

original response

it's unclear trying exactly. first up, should try post full working example seem missing headers. secondly, wikipedia seems have stance against "bots" or automated downloaders

python's `urllib2`: why error 403 when `urlopen` wikipedia page?

this can avoided following lines of code

import urllib2, bs4  url = r"http://en.wikipedia.org/wiki/viscosity"  req = urllib2.request(url, headers={'user-agent' : "magic browser"})  con = urllib2.urlopen( req ) 

now have page, think want extract main text using bs4. this

soup = bs4.beautifulsoup(con.read()) start_pos = soup.find('h1').parent  p in start_pos.findall('p'):     para = ''.join([text text in p.findall(text=true)])     print para 

this gives me text looks like:

the viscosity of fluid measure of resistance gradual deformation shear stress or tensile stress. liquids, corresponds informal notion of "thickness". example, honey has higher viscosity water.[1] viscosity due friction between neighboring parcels of fluid moving @ different velocities. when fluid forced through tube, fluid moves faster near axis , near walls, therefore stress (such pressure difference between 2 ends of tube) needed overcome friction between layers , keep fluid moving. same velocity pattern, stress required proportional fluid's viscosity. liquid's viscosity depends on size , shape of particles , attractions between particles.[citation needed]


Comments

Popular posts from this blog

c++ - Linked List error when inserting for the last time -

java - activate/deactivate sonar maven plugin by profile? -

java - What is the difference between String. and String.this. ? -