ruby - Regex to get ID from link URL -
i have links this:
<div class="zg_title"> <a href="http://rads.stackoverflow.com/amzn/click/b000o3gcfu">thermos foogo leak-proof stainless st...</a> </div>
and i'm scraping them this:
product_asin = product.xpath('//div[@class="zg_title"]/a/@href').first.value
the problem takes whole url , want id:
b000o3gcfu
i think need this:
product_asin = product.xpath('//div[@class="zg_title"]/a/@href').first.value[regex_here]
what's simplest regex can use in case?
edit:
strange link url doesn't appear complete:
http://www.amazon.com/thermos-foogo-leak-proof-stainless-10-ounce/dp/b000o3gcfu/ref=zg_bs_baby-products_1
use /\w+$/
:
p doc.xpath('//div[@class="zg_title"]/a/@href').first.value[/\w+$/]
/\w+$/
matches trailing alphabets, digits, _
.
require 'nokogiri' s = <<eof <div class="zg_title"> <a href="http://rads.stackoverflow.com/amzn/click/b000o3gcfu">thermos foogo leak-proof stainless st...</a> </div> eof doc = nokogiri::html(s) p doc.xpath('//div[@class="zg_title"]/a/@href').first.value[/\w+$/] # => "b000o3gcfu"
Comments
Post a Comment