Scrapy: MySQL Pipeline -- Unexpected Errors Encountered -
i'm getting number of errors, depending upon being inserted/updated.
here code processing item:
def process_item(self, item, spider): try: if 'producer' in item: self.cursor.execute("""insert producers (title, producer) values (%s, %s)""", (item['title'], item['producer'])) elif 'actor' in item: self.cursor.execute("""insert actors (title, actor) values (%s, %s)""", (item['title'], item['actor'])) elif 'director' in item: self.cursor.execute("""insert directors (title, director) values (%s, %s)""", (item['title'], item['director'])) else: self.cursor.execute("""update example_movie set distributor=%s, rating=%s, genre=%s, budget=%s title=%s""", (item['distributor'], item['rating'], item['genre'], item['budget'], item['title'])) self.conn.commit() except mysqldb.error, e: print "error %d: %s" % (e.args[0], e.args[1]) return item
here example of items
returned scraper:
[{'budget': [u'n/a'], 'distributor': [u'lorimar'], 'genre': [u'action'], 'rating': [u'r'],'title': [u'action jackson']}, {'actor': u'craig t. nelson', 'title': [u'action jackson']}, {'actor': u'sharon stone', 'title': [u'action jackson']}, {'actor': u'carl weathers', 'title': [u'action jackson']}, {'producer': u'joel silver', 'title': [u'action jackson']}, {'director': u'craig r. baxley', 'title': [u'action jackson']}]
here errors returned:
2013-08-25 23:04:57-0500 [actorspider] error: error processing {'budget': [u'n/a'], 'distributor': [u'lorimar'], 'genre': [u'action'], 'rating': [u'r'], 'title': [u'action jackson']} traceback (most recent call last): file "/library/python/2.7/site-packages/scrapy/middleware.py", line 62, in _process_chain return process_chain(self.methods[methodname], obj, *args) file "/library/python/2.7/site-packages/scrapy/utils/defer.py", line 65, in process_chain d.callback(input) file "/system/library/frameworks/python.framework/versions/2.7/extras/lib/python/twisted/internet/defer.py", line 361, in callback self._startruncallbacks(result) file "/system/library/frameworks/python.framework/versions/2.7/extras/lib/python/twisted/internet/defer.py", line 455, in _startruncallbacks self._runcallbacks() --- <exception caught here> --- file "/system/library/frameworks/python.framework/versions/2.7/extras/lib/python/twisted/internet/defer.py", line 542, in _runcallbacks current.result = callback(current.result, *args, **kw) file "/users/fortylashes/documents/management_work/boxofficemojo/boxofficemojo/pipelines.py", line 53, in process_item self.cursor.execute("""update example_movie set distributor=%s, rating=%s, genre=%s, budget=%s title=%s""", (item['distributor'], item['rating'], item['genre'], item['budget'], item['title'])) file "/opt/local/library/frameworks/python.framework/versions/2.7/lib/python2.7/site-packages/mysqldb/cursors.py", line 159, in execute query = query % db.literal(args) exceptions.valueerror: unsupported format character 's' (0x53) @ index 38 error 1064: have error in sql syntax; check manual corresponds mysql server version right syntax use near '), 'craig t. nelson')' @ line 1 2013-08-25 23:04:57-0500 [actorspider] debug: scraped <200 http://www.boxofficemojo.com/movies/?id=actionjackson.htm> {'actor': u'craig t. nelson', 'title': [u'action jackson']} error 1064: have error in sql syntax; check manual corresponds mysql server version right syntax use near '), 'sharon stone')' @ line 1 2013-08-25 23:04:57-0500 [actorspider] debug: scraped <200 http://www.boxofficemojo.com/movies/?id=actionjackson.htm> {'actor': u'sharon stone', 'title': [u'action jackson']} error 1064: have error in sql syntax; check manual corresponds mysql server version right syntax use near '), 'carl weathers')' @ line 1 2013-08-25 23:04:57-0500 [actorspider] debug: scraped <200 http://www.boxofficemojo.com/movies/?id=actionjackson.htm> {'actor': u'carl weathers', 'title': [u'action jackson']} error 1064: have error in sql syntax; check manual corresponds mysql server version right syntax use near '), 'joel silver')' @ line 1 2013-08-25 23:04:57-0500 [actorspider] debug: scraped <200 http://www.boxofficemojo.com/movies/?id=actionjackson.htm> {'producer': u'joel silver', 'title': [u'action jackson']} error 1064: have error in sql syntax; check manual corresponds mysql server version right syntax use near '), 'craig r. baxley')' @ line 1 2013-08-25 23:04:57-0500 [actorspider] debug: scraped <200 http://www.boxofficemojo.com/movies/?id=actionjackson.htm> {'director': u'craig r. baxley', 'title': [u'action jackson']}
apparently there lot issues. thank reading! , suggestions or ideas appreciated!
::::update/more info::::
there appear 3 movies, of test set of 52 total, being inserted the actors
, producers
, directors
tables. note: update
statement isn't working @ all.
these movies are: abraham lincoln: vampire hunter
, ace ventura: pet detective
, ace ventura: when nature calls
interestingly, these of movies have :
in title
- i'm not sure means, if has idea please share it!
:::::insert solved:::::
turns out problem caused scraper
putting individual items in list. {'actor': [u'this 1 guy']
opposed top {'actor': u'this 1 guy'}
.
you have used wrong format specifier string data type @ line 53 of code. should small 's' not capital 's'.
self.cursor.execute("""update example_movie set distributor=%s, rating=%s, genre=%s, budget=%s title=%s""", (item['distributor'], item['rating'], item['genre'], item['budget'], item['title']))
it should this.
self.cursor.execute("""update example_movie set distributor=%s, rating=%s, genre=%s, budget=%s title=%s""", (item['distributor'], item['rating'], item['genre'], item['budget'], item['title']))
Comments
Post a Comment