scrapy 的请求问题 - V2EX

Home Sign Up Sign In

推荐学习书目

› Learn Python the Hard Way

Python Sites

› PyPI - Python Package Index

› http://diveintopython.org/toc/index.html

› Pocoo

值得关注的项目

› PyPy

› Celery

› Jinja2

› Read the Docs

› gevent

› pyenv

› Stackless Python

› Beautiful Soup

› 结巴中文分词

› Green Unicorn

› Sentry

› Shovel

› pytest

Python 编程

› pep8 Checker

Styles

› PEP 8

› Google Python Style Guide

› Code Style from The Hitchhiker's Guide

This topic created in 2821 days ago, the information mentioned may be changed or developed.

我在写爬虫，遇到一个问题。大概是这样的，我先请求首页，然后拿到分页总数，然后按分页请求每一页的数据，但是分页的数据，有时候会请求不到，那么我想，请求不到吧我再写个方法再去请求一次，再请求不到就算了。但是分页请求即使请求不到数据，但是也不会再去请求下一次。不知道为啥。

爬虫内代码为：

   def parse(self, response):

            while pageNow < pageTotal:

                    yield scrapy.Request(url, self.parseNext)


   def parseNext(self, response):

            #如果失败了
            yield scrapy.Request(url, self,parseData)

   def parseData(self, response):

            #问题是不走到这个方法里面来

8 replies • 2018-11-11 11:32:25 +08:00

1

NLL

Nov 11, 2018 via iPhone

写个中间件

2

yangyaofei

Nov 11, 2018 via Android

同样的请求第二次被过滤掉了？加 dont-filte

3

moxiaowei

OP

Nov 11, 2018

@zhijiansha 中间件 download middle 是写了的，但是，download middle 我是用来使用代理的。

4

moxiaowei

OP

Nov 11, 2018

@yangyaofei 好的谢谢我来试试

5

moxiaowei

OP

Nov 11, 2018

@yangyaofei 谢谢确实是这样的

6

sunorg

Nov 11, 2018 via Android

用 chrome headless

7

moxiaowei

OP

Nov 11, 2018

@sunorg 什么？

8

dreasky

Nov 11, 2018

参考 scrapy 内置的 RetryMiddleware 和 DownloadTimeoutMiddleware

About · Help · Advertise · Blog · API · FAQ · Solana · 899 Online Highest 6679 ·

Select Language

创意工作者们的社区

World is powered by solitude

VERSION: 3.9.8.5 · 36ms · UTC 22:10 · PVG 06:10 · LAX 15:10 · JFK 18:10
♥ Do have faith in what you're doing.