2013-09-21 6 views
6

Zbieram witrynę, aby sprawdzić stan w magazynie różnych produktów. Niestety wymaga to kliknięcia "Dodaj do koszyka" na stronie produktu i sprawdzenia komunikatu na następnej stronie w celu ustalenia, czy dostępne są zapasy (to znaczy wymaga dwóch odpowiedzi).Dlaczego mój pająk do scrapy nie śledzi wywołania zwrotnego żądania w funkcji parsowania moich produktów?

Podążałem za excellent documentation dla tego scenariusza i napisałem funkcję parsowania, aby zwrócić obiekt Request z wywołania zwrotnego do mojej funkcji analizy drugorzędnej. Jednak ta funkcja rzadko jest wywoływana. Większość produktów powoduje, że w dzienniku pojawia się tylko komunikat "Przed zwróceniem żądania", ale jest on prawidłowo wywoływany w przypadku niewielkiej części produktów.

Jakąkolwiek wskazówkę, co tu jest nie tak? Skończyły mi się pomysły.

foo/spiders/atlantic_firearms_spider.py:

from scrapy.contrib.spiders import CrawlSpider, Rule 
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor 
from scrapy.selector import HtmlXPathSelector 
from scrapy.http import FormRequest 
from foo.items import AtlanticFirearmsItem 

import datetime 
import re 

class AtlanticFirearmsSpider(CrawlSpider): 
    name = "atlantic_firearms" 
    allowed_domains = ["atlanticfirearms.com"] 
    start_urls = [ 
     "http://www.atlanticfirearms.com" 
    ] 

    rules = (
     Rule(SgmlLinkExtractor(allow=['detail.html']), callback='parse_product'), 
     Rule(SgmlLinkExtractor(allow=[], deny=['/bro', '/news', '/howtobuy', '/component/search', 'askquestion'])), 
    ) 

    def parse_product(self, response): 
     hxs = HtmlXPathSelector(response) 
     product = AtlanticFirearmsItem() 
     add_to_cart = any([hxs.select("descendant-or-self::input[@name = 'addtocart']"), 
         hxs.select("descendant-or-self::input[@value = 'Add to Cart']"), 
         hxs.select("//a[text() = 'Add to Cart']")]) 
     product['url'] = response.url 
     product['as_of_time'] = datetime.datetime.now() 

     if add_to_cart: 
      # attempt to add to cart to verify availability 
      request = FormRequest.from_response(response, formname="addtocartForm", callback=self.parse_add_to_cart) 
      request.meta['product'] = product 
      print "Before return request" 
      return request 
     else: 
      product['in_stock'] = False 
      return product 

    def parse_add_to_cart(self, response): 
     print "Inside parse_add_to_cart" 
     product = response.meta['product'] 
     hxs = HtmlXPathSelector(response) 
     product['in_stock'] = not(hxs.select("//text()[contains(.,'We regret to inform you that this product')]")) 
     return product 

foo/items.py:

from scrapy.item import Item, Field 

class AtlanticFirearmsItem(Item): 
    in_stock = Field() 
    url = Field() 
    as_of_time = Field() 

Edit: dodanie pliku dziennika na żądanie:

2013-09-21 07:25:14-0500 [scrapy] INFO: Scrapy 0.18.2 started (bot: foo) 
2013-09-21 07:25:14-0500 [scrapy] DEBUG: Optional features available: ssl, http11 
2013-09-21 07:25:14-0500 [scrapy] DEBUG: Overridden settings: {'SPIDER_MODULES': ['foo.spiders'], 'BOT_NAME': 'foo'} 
2013-09-21 07:25:14-0500 [scrapy] DEBUG: Enabled extensions: LogStats, TelnetConsole, CloseSpider, WebService, CoreStats, SpiderState 
2013-09-21 07:25:14-0500 [scrapy] DEBUG: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, MetaRef 
reshMiddleware, HttpCompressionMiddleware, RedirectMiddleware, CookiesMiddleware, ChunkedTransferMiddleware, DownloaderStats 
2013-09-21 07:25:14-0500 [scrapy] DEBUG: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware 
2013-09-21 07:25:14-0500 [scrapy] DEBUG: Enabled item pipelines: 
2013-09-21 07:25:14-0500 [atlantic_firearms] INFO: Spider opened 
2013-09-21 07:25:14-0500 [atlantic_firearms] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 
2013-09-21 07:25:14-0500 [scrapy] DEBUG: Telnet console listening on 0.0.0.0:6023 
2013-09-21 07:25:14-0500 [scrapy] DEBUG: Web service listening on 0.0.0.0:6080 
2013-09-21 07:25:16-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com> (referer: None) 
2013-09-21 07:25:16-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.cloudflare.com': <GET http://www.cloudflare.com/email-protection> 
2013-09-21 07:25:16-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.constantcontact.com': <GET http://www.constantcontact.com/jmml/email-marketing.jsp> 
2013-09-21 07:25:16-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.fdicreative.com': <GET http://www.fdicreative.com/> 
2013-09-21 07:25:16-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.redjacketfirearms.com': <GET https://www.redjacketfirearms.com/> 
2013-09-21 07:25:17-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/component/virtuemart/featured-not-published/wolf-ammunition-45acp-500-round-case-detail. 
html?Itemid=0> (referer: http://www.atlanticfirearms.com) 
Before return request 
2013-09-21 07:25:18-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/component/virtuemart/featured-not-published/vector-arms-sp89-k-style-pistol-9mm-detail.h 
tml?Itemid=0> (referer: http://www.atlanticfirearms.com) 
Before return request 
2013-09-21 07:25:18-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/> (referer: http://www.atlanticfirearms.com) 
2013-09-21 07:25:18-0500 [atlantic_firearms] DEBUG: Filtered duplicate request: <GET http://www.atlanticfirearms.com/component/virtuemart/featured-not-published/vector-arms-sp89-k-style-pisto 
l-9mm-detail.html?Itemid=0> - no more duplicates will be shown (see DUPEFILTER_CLASS) 
2013-09-21 07:25:18-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/component/virtuemart/featured-not-published/wolf-223-ar15-rifle-ammo-500-round-case-deta 
il.html?Itemid=0> (referer: http://www.atlanticfirearms.com) 
Before return request 
2013-09-21 07:25:18-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/component/virtuemart/featured-not-published/us-palm-air-save-plate-carrier-detail.html?I 
temid=0> (referer: http://www.atlanticfirearms.com) 
Before return request 
2013-09-21 07:25:19-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/component/virtuemart/featured-not-published/545-x-39-russian-ak74-ammo-1080-round-case-d 
etail.html?Itemid=0> (referer: http://www.atlanticfirearms.com) 
Before return request 
2013-09-21 07:25:19-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/component/virtuemart/featured-not-published/red-army-standard-7-62x39mm-360-round-range- 
pack-detail.html?Itemid=0> (referer: http://www.atlanticfirearms.com) 
Before return request 
2013-09-21 07:25:19-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/component/virtuemart/shipping-rifles/vector-arms-mp5-style-rifle-detail.html?Itemid=0> (
referer: http://www.atlanticfirearms.com) 
Before return request 
2013-09-21 07:25:19-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/component/virtuemart/shipping-accessories/wolf-ammunition-for-sale-ak47-detail.html?Item 
id=0> (referer: http://www.atlanticfirearms.com) 
Before return request 
2013-09-21 07:25:20-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/component/virtuemart/shipping-rifles/dsa-zm4-flat-top-ar15-carbine-dszm4cv1r-detail.html 
?Itemid=0> (referer: http://www.atlanticfirearms.com) 
Before return request 
2013-09-21 07:25:20-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/component/virtuemart/shipping-accessories/m92-ak47-yugoslavian-7-62x39mm-bolt-hold-open- 
metal-mags-pack-of-two-detail.html?Itemid=0> (referer: http://www.atlanticfirearms.com) 
Before return request 
2013-09-21 07:25:21-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/component/virtuemart/shipping-rifles/vector-arms-v94-9mm-mp5-style-pistol-full-size-deta 
il.html?Itemid=0> (referer: http://www.atlanticfirearms.com) 
Before return request 
2013-09-21 07:25:21-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/component/virtuemart/shipping-rifles/zastava-ak-47-m70b1-pap-7-62x39mm-rifles-w-2-hi-cap 
-mags-detail.html?Itemid=0> (referer: http://www.atlanticfirearms.com) 
Before return request 
2013-09-21 07:25:21-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/component/virtuemart/shipping-rifles/ptr-91-gi-rifle-939-atlanticfirearms-com-detail.htm 
l?Itemid=0> (referer: http://www.atlanticfirearms.com) 
Before return request 
2013-09-21 07:25:21-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/component/virtuemart/shipping-rifles/pap-m92-7-62x39-pistol-detail.html?Itemid=0> (refer 
er: http://www.atlanticfirearms.com) 
Before return request 
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/component/content/article/86-static-pages/159-resources.html> (referer: http://www.atlan 
ticfirearms.com) 
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.atsconsultingcorp.com': <GET http://www.atsconsultingcorp.com/>         [52/1905] 
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.bullseyemarket.com': <GET http://www.bullseyemarket.com/> 
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.corilam.com': <GET http://www.corilam.com/> 
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'chancebrownrealestate.com': <GET http://chancebrownrealestate.com/> 
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.delsolservices.com': <GET http://www.delsolservices.com/> 
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.elkhornoutfitters.com': <GET http://www.elkhornoutfitters.com/> 
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.frontierlogistics.com': <GET http://www.frontierlogistics.com/> 
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.gpstrackingkey.com': <GET http://www.gpstrackingkey.com/> 
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.hanshawkennedy.com': <GET http://www.hanshawkennedy.com/> 
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'worldenv.com': <GET http://worldenv.com/> 
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'purgexonline.com': <GET http://purgexonline.com/> 
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'bumpfirestocks.com': <GET http://bumpfirestocks.com/> 
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.texrestaurantequipment.com': <GET http://www.texrestaurantequipment.com/> 
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.houston-refinance.com': <GET http://www.houston-refinance.com/> 
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'johnson-bryan.com': <GET http://johnson-bryan.com/> 
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'kanesforms.com': <GET http://kanesforms.com/> 
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.markfoxrealestate.com': <GET http://www.markfoxrealestate.com/> 
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.mphoa.org': <GET http://www.mphoa.org/> 
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.outfitterwebsites.com': <GET http://www.outfitterwebsites.com/> 
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'outdoortrailsnetwork.com': <GET http://outdoortrailsnetwork.com/> 
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.psychologicalriskservices.com': <GET http://www.psychologicalriskservices.com/> 
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.rcshouston.com': <GET http://www.rcshouston.com/> 
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.rollingcreekcarwash.com': <GET http://www.rollingcreekcarwash.com/> 
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'slammc.com': <GET http://slammc.com/> 
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.texassaltwaterfishingguide.com': <GET http://www.texassaltwaterfishingguide.com/> 
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.waynepigment.com': <GET http://www.waynepigment.com/> 
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'bancroftfeldman.com': <GET http://bancroftfeldman.com/> 
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'elilanddesign.com': <GET http://elilanddesign.com/> 
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'dpharms.com': <GET http://dpharms.com/> 
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'contractlandstaff.com': <GET http://contractlandstaff.com/> 
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'knightsplumbing.com': <GET http://knightsplumbing.com/> 
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Redirecting (303) to <GET http://www.atlanticfirearms.com/browse-our-products.html> from <POST http://www.atlanticfirearms.com/component/vi 
rtuemart/featured-not-published/index.php> 
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Redirecting (303) to <GET http://www.atlanticfirearms.com/browse-our-products.html> from <POST http://www.atlanticfirearms.com/component/vi 
rtuemart/featured-not-published/index.php> 
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/component/virtuemart/shipping-rifles/ati-omni-5-56-poly-competition-m4-carbine-detail.ht 
ml?Itemid=0> (referer: http://www.atlanticfirearms.com) 
Before return request 
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Redirecting (303) to <GET http://www.atlanticfirearms.com/browse-our-products.html> from <POST http://www.atlanticfirearms.com/component/vi 
rtuemart/featured-not-published/index.php> 
2013-09-21 07:25:23-0500 [atlantic_firearms] DEBUG: Redirecting (303) to <GET http://www.atlanticfirearms.com/browse-our-products.html> from <POST http://www.atlanticfirearms.com/component/vi 
rtuemart/featured-not-published/index.php> 
2013-09-21 07:25:23-0500 [atlantic_firearms] DEBUG: Redirecting (303) to <GET http://www.atlanticfirearms.com/browse-our-products.html> from <POST http://www.atlanticfirearms.com/component/vi 
rtuemart/featured-not-published/index.php> 
2013-09-21 07:25:23-0500 [atlantic_firearms] DEBUG: Redirecting (303) to <GET http://www.atlanticfirearms.com/browse-our-products.html> from <POST http://www.atlanticfirearms.com/component/vi 
rtuemart/shipping-rifles/index.php> 
2013-09-21 07:25:23-0500 [atlantic_firearms] DEBUG: Redirecting (303) to <GET http://www.atlanticfirearms.com/browse-our-products.html> from <POST http://www.atlanticfirearms.com/component/vi 
rtuemart/featured-not-published/index.php> 
2013-09-21 07:25:23-0500 [atlantic_firearms] DEBUG: Redirecting (303) to <GET http://www.atlanticfirearms.com/browse-our-products.html> from <POST http://www.atlanticfirearms.com/component/vi 
rtuemart/shipping-rifles/index.php> 
2013-09-21 07:25:23-0500 [atlantic_firearms] DEBUG: Redirecting (303) to <GET http://www.atlanticfirearms.com/browse-our-products.html> from <POST http://www.atlanticfirearms.com/component/vi 
rtuemart/shipping-accessories/index.php> 
2013-09-21 07:25:23-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/dallas-gun-shop.html> (referer: http://www.atlanticfirearms.com) 
2013-09-21 07:25:24-0500 [atlantic_firearms] DEBUG: Redirecting (303) to <GET http://www.atlanticfirearms.com/browse-our-products.html> from <POST http://www.atlanticfirearms.com/component/vi 
rtuemart/shipping-accessories/index.php> 
2013-09-21 07:25:24-0500 [atlantic_firearms] DEBUG: Redirecting (303) to <GET http://www.atlanticfirearms.com/browse-our-products.html> from <POST http://www.atlanticfirearms.com/component/vi 
rtuemart/shipping-rifles/index.php> 
2013-09-21 07:25:24-0500 [atlantic_firearms] DEBUG: Redirecting (303) to <GET http://www.atlanticfirearms.com/browse-our-products.html> from <POST http://www.atlanticfirearms.com/component/vi 
rtuemart/shipping-rifles/index.php> 
2013-09-21 07:25:24-0500 [atlantic_firearms] DEBUG: Redirecting (303) to <GET http://www.atlanticfirearms.com/browse-our-products.html> from <POST http://www.atlanticfirearms.com/component/vi 
rtuemart/shipping-rifles/index.php> 
2013-09-21 07:25:24-0500 [atlantic_firearms] DEBUG: Redirecting (303) to <GET http://www.atlanticfirearms.com/browse-our-products.html> from <POST http://www.atlanticfirearms.com/component/vi 
rtuemart/shipping-rifles/index.php> 
2013-09-21 07:25:24-0500 [atlantic_firearms] DEBUG: Crawled (404) <GET http://www.atlanticfirearms.com/component/content/?Itemid=803&id=148> (referer: http://www.atlanticfirearms.com) 
2013-09-21 07:25:25-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/houston-texas-gun-shop.html> (referer: http://www.atlanticfirearms.com) 
2013-09-21 07:25:25-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/california-gun-shop.html> (referer: http://www.atlanticfirearms.com) 
2013-09-21 07:25:25-0500 [atlantic_firearms] DEBUG: Redirecting (303) to <GET http://www.atlanticfirearms.com/browse-our-products.html> from <POST http://www.atlanticfirearms.com/component/vi 
rtuemart/shipping-rifles/index.php> 
2013-09-21 07:25:25-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/browse-our-products.html> (referer: http://www.atlanticfirearms.com/component/virtuemart 
/featured-not-published/vector-arms-sp89-k-style-pistol-9mm-detail.html?Itemid=0) 
Inside parse_add_to_cart 
2013-09-21 07:25:25-0500 [atlantic_firearms] DEBUG: Scraped from <200 http://www.atlanticfirearms.com/browse-our-products.html> 
     {'as_of_time': datetime.datetime(2013, 9, 21, 7, 25, 18, 365559), 
     'in_stock': True, 
     'url': 'http://www.atlanticfirearms.com/component/virtuemart/featured-not-published/vector-arms-sp89-k-style-pistol-9mm-detail.html?Itemid=0'} 
2013-09-21 07:25:25-0500 [atlantic_firearms] DEBUG: Crawled (404) <GET http://www.atlanticfirearms.com/www.atlanticfirearms.com> (referer: http://www.atlanticfirearms.com/dallas-gun-shop.html 
) 
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/login-or-register/editaddress.html> (referer: http://www.atlanticfirearms.com) 
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/privacy-policy.html> (referer: http://www.atlanticfirearms.com) 
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/subscribe.html> (referer: http://www.atlanticfirearms.com) 
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/links.html> (referer: http://www.atlanticfirearms.com) 
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.gunbroker.com': <GET http://www.gunbroker.com/user/dealernetwork.asp> 
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.auctionarms.com': <GET http://www.auctionarms.com/> 
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.gunsamerica.com': <GET http://www.gunsamerica.com/> 
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.ar15.com': <GET http://www.ar15.com/> 
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.olyarms.com': <GET http://www.olyarms.com/> 
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.cheaperthandirt.com': <GET http://www.cheaperthandirt.com/> 
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.ammoman.com': <GET http://www.ammoman.com/> 
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.ak47.net': <GET http://www.ak47.net/> 
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.atf.treas.gov': <GET http://www.atf.treas.gov/> 
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'caag.state.ca.us': <GET http://caag.state.ca.us/firearms/> 
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.nra.org': <GET http://www.nra.org/> 
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.masterpiecearms.com': <GET http://www.masterpiecearms.com/> 
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'atlantic1.readyhosting.com': <GET http://atlantic1.readyhosting.com/programming/listview.asp?CatId=2> 
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.vulcanarmament.com': <GET http://www.vulcanarmament.com/> 
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.bushmaster.com': <GET http://www.bushmaster.com/> 
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.rockriverarms.com': <GET http://www.rockriverarms.com/> 
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'dpmsinc.com': <GET http://dpmsinc.com/> 
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.colt.com': <GET http://www.colt.com/> 
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.armalite.com': <GET http://www.armalite.com/> 
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.redstick-firearms.com': <GET http://www.redstick-firearms.com/> 
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.vectorarms.com': <GET http://www.vectorarms.com/indexframe.html> 
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.arsenalinc.com': <GET http://www.arsenalinc.com/about.htm> 
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.ak47.com': <GET http://www.ak47.com/> 
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.jldenter.com': <GET http://www.jldenter.com/store/> 
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.springfield-armory.com': <GET http://www.springfield-armory.com/index.shtml> 
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.dsarms.com': <GET http://www.dsarms.com/> 
^C2013-09-21 07:25:26-0500 [scrapy] INFO: Received SIGINT, shutting down gracefully. Send again to force 
^C2013-09-21 07:25:26-0500 [scrapy] INFO: Received SIGINT twice, forcing unclean shutdown 
+2

Może url jesteś nie jest wymagane w "dozwolonych_domach" – ben

+0

możesz opublikować swój dziennik konsoli? (z LOG_LEVEL = 'DEBUG', domyślna wartość w settings.py) –

+0

@pault. dodano dziennik zgodnie z żądaniem. Jak widać większość elementów jest w trakcie drukowania "Before return request", ale potem znika w czarnej dziurze, ponieważ wywołanie zwrotne 'parse_add_to_cart' nigdy nie jest do nich wywoływane. ["vector-arms-sp89-k-style-pistol-9mm"] (http://www.atlanticfirearms.com/component/virtuemart/featured-not-published/vector-arms-sp89-k-style-pistol- 9mm-detail.html) konsekwentnie robi to aż do "parse_add_to_cart" z jakiegoś powodu. –

Odpowiedz

23

zamieszczaniu mój wcześniejszy komentarz jako odpowiedź.

Ponieważ wszystkie żądania POST (pochodzących z FormRequest.from_response()) przekierowany do http://www.atlanticfirearms.com/browse-our-products.html należy ustawić dont_filter=True:

if add_to_cart: 
     # attempt to add to cart to verify availability 
     request = FormRequest.from_response(response, formname="addtocartForm", 
         callback=self.parse_add_to_cart, dont_filter=True) 

Zobacz Scrapy docs on requests:

dont_filter (logiczna) - wskazuje, że ten Żądanie nie powinno być filtrowane przez program planujący. Jest to używane, gdy chcesz wykonać identyczne żądanie wiele razy, aby zignorować filtr duplikatów.

Dodatkowo, można ustawić CONCURRENT_REQUESTS = 1 dodać elementy 1-by-1 w koszyku (zastanawiam się, jak serwer obsługuje równoległe koszyk uzupełnień.)

+0

Dzięki za "dont_filter = True" zaoszczędziło mi to wiele czasu. – marw

Powiązane problemy