python - Excluding web links with specific extensions in web scraper -
i need exclude printing links in web scraper end in .od .jpg .pdf or .mp3
here's if
statement
if link in linklist(): print link
is there library in python that? know of"regex" i'm not greatest user of it.
assuming link path, can following:
import os if os.path.splitext(link)[1] not in ['.jpg', '.pdf', '.mp3']: print link
the function splitext
takes path , returns tuple containing path without extension, followed extension. example:
>>> os.path.splitext('http://www.example.com/path/to/filename.ext') ('http://www.example.com/path/to/filename', '.ext')
so if split link function, can check whether last element of tuple member of list/set/tuple containing blacklist of extensions.
Comments
Post a Comment