scrapy 下载管道报错 301

2019-01-01 19:27:49 +08:00
 Ewig

7882 2019-01-01 19:21:26 [searchwww][scrapy.core.engine] INFO: Spider opened 7883 2019-01-01 19:21:26 [searchwww][scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 7884 2019-01-01 19:21:26 [searchwww][scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6031 7885 2019-01-01 19:21:38 [searchwww][scrapy.core.engine] DEBUG: Crawled (200) <GET https://searchwww.sec.gov/EDGARFSClient/jsp/EDGAR_MainAccess.jsp?search_text=F-1+ for&sort=Date&startDoc=101&numResults=100&isAdv=true&formType=FormF1&fromDate=mm/dd/yyyy&toDate=mm/dd/yyyy&stemming=true> (referer: None) 7886 2019-01-01 19:21:38 [searchwww][scrapy.core.engine] DEBUG: Crawled (301) <GET http://www.sec.gov/Archives/edgar/data/1747624/000121390018017885/ff12018_fitboxxholdings.htm> (referer: None) 7887 2019-01-01 19:21:38 [searchwww][scrapy.pipelines.files] WARNING: File (code: 301): Error downloading file from <GET http://www.sec.gov/Archives/edgar/data/1747624/000121390018017885/ ff12018_fitboxxholdings.htm> referred in <none></none>

from scrapy.pipelines.files import FilesPipeline from scrapy import Request

class download_pipeline(FilesPipeline):

def file_path(self, request, response=None, info=None):
    return request.meta.get('filename', '')

def get_media_requests(self, item, info):
    file_url = item['file_url']
    meta = {'filename': item['name']}
    yield Request(url=file_url, meta=meta)

这个在下载的管道里面总是报错 301 求指教

1971 次点击
所在节点    Python
4 条回复
wellCh4n
2019-01-02 10:29:54 +08:00
被重定向了吗?
Ewig
2019-01-02 17:39:23 +08:00
@wellCh4n 如何解决
wellCh4n
2019-01-02 18:52:12 +08:00
@Ewig #2 这个是服务端行为啊,你可以看下为什么被重定向了,在 response 里面看下被重定向到了哪个地址
Ewig
2019-01-02 19:19:06 +08:00
这个主要是框架做的,yield 回去的,我用正常的 request 就没有问题,搞不懂

这是一个专为移动设备优化的页面(即为了让你能够在 Google 搜索结果里秒开这个页面),如果你希望参与 V2EX 社区的讨论,你可以继续到 V2EX 上打开本讨论主题的完整版本。

https://www.v2ex.com/t/522922

V2EX 是创意工作者们的社区,是一个分享自己正在做的有趣事物、交流想法,可以遇见新朋友甚至新机会的地方。

V2EX is a community of developers, designers and creative people.

© 2021 V2EX