I have a spider which crawls links for the websites passed. I want to start the same spider again when its execution is finished with different set of data. How to restart the same crawler again? The websites are passed through database. I want the crawler to run in a unlimited loop until all the websites are crawled. Currently I have to start the crawler
scrapy crawl first all the time. Is there any way to start the crawler once and it will stop when all the websites are crawled?
I searched for the same, and found a solution of handling the crawler once its closed/finished. But I don't know how to call the spider form the
closed_handler method programmatically.
The following is my code:
class MySpider(CrawlSpider): def __init__(self, *args, **kwargs): super(MySpider, self).__init__(*args, **kwargs) SignalManager(dispatcher.Any).connect( self.closed_handler, signal=signals.spider_closed) def closed_handler(self, spider): reactor.stop() settings = Settings() crawler = Crawler(settings) crawler.signals.connect(spider.spider_closing, signal=signals.spider_closed) crawler.configure() crawler.crawl(MySpider()) crawler.start() reactor.run() # code for getting the websites from the database name = "first" def parse_url(self, response): ...
I am getting the error:
Error caught on signal handler: <bound method ?.closed_handler of <MySpider 'first' at 0x40f8c70>> Traceback (most recent call last): File "c:\python27\lib\site-packages\twisted\internet\defer.py", line 150, in maybeDeferred result = f(*args, **kw) File "c:\python27\lib\site-packages\scrapy\xlib\pydispatch\robustapply.py", line 57, in robustApply return receiver(*arguments, **named) File "G:\Scrapy\web_link_crawler\web_link_crawler\spiders\first.py", line 72, in closed_handler crawler = Crawler(settings) File "c:\python27\lib\site-packages\scrapy\crawler.py", line 32, in __init__ self.spidercls.update_settings(self.settings) AttributeError: 'Settings' object has no attribute 'update_settings'
Is this the right way to get this done? Or is there any other way? Please help!