求助，为什么 Requests 不能获取到知乎的页面？

2016-07-09 13:38:37 +08:00

nlimpid

>>> requests.get("http://github.com")
<Response [200]>
>>> requests.get("https://github.com")
<Response [200]>
>>> requests.get("https://www.baidu.com")
<Response [200]>
>>> requests.get("http://zhihu.com")
<Response [500]>
>>> requests.get("https://zhihu.com")
<Response [500]>

但是用 urlopen 可以，不知道为什么，求解。

6377 次点击

所在节点

Python

8 条回复

hebwjb

2016-07-09 13:45:17 +08:00

header = {"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.106 Safari/537.36"}

requests.get('http://zhihu.com',headers=header)
<Response [200]>

coolloves

2016-07-09 13:48:19 +08:00

加个 headers 就可以了把

nlimpid

2016-07-09 13:48:50 +08:00

@hebwjb 感谢，但是为什么呢？

zwh8800

2016-07-09 14:10:56 +08:00

@nlimpid 一般网站都会检查 ua

GreatMartial

2016-07-09 16:00:19 +08:00

@nlimpid 有的网站要检查你的请求环境，你不模拟，就认定你是机器

CosimoZi

2016-07-09 18:13:33 +08:00

知乎现在防爬虫越来越严格了……之前我爬还不需要 headers 就能上。

tobacco

2016-07-09 20:39:52 +08:00

有现成的知乎爬虫： https://github.com/egrcc/zhihu-python

nlimpid

2016-07-10 19:25:27 +08:00

@tobacco 感谢，但我不是想做爬虫。

第 1 页／共 1 页

这是一个专为移动设备优化的页面（即为了让你能够在 Google 搜索结果里秒开这个页面），如果你希望参与 V2EX 社区的讨论，你可以继续到 V2EX 上打开本讨论主题的完整版本。

https://www.v2ex.com/t/291338

V2EX 是创意工作者们的社区，是一个分享自己正在做的有趣事物、交流想法，可以遇见新朋友甚至新机会的地方。

V2EX is a community of developers, designers and creative people.