爬虫工作 4~5 个小时,就报错了,不明白什么原因导致的,帮忙看一下

2018-06-13 11:34:39 +08:00
 wsds

报错很长,但看上去大概是这个原因:socket.gaierror: [Errno -3] Temporary failure in name resolution

阿里云上运行的

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/urllib3/connection.py", line 137, in _new_conn
    (self.host, self.port), self.timeout, **extra_kw)
  File "/usr/lib/python3/dist-packages/urllib3/util/connection.py", line 67, in create_connection
    for res in socket.getaddrinfo(host, port, 0, socket.SOCK_STREAM):
  File "/usr/lib/python3.5/socket.py", line 732, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -3] Temporary failure in name resolution

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 560, in urlopen
    body=body, headers=headers)
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 354, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/usr/lib/python3.5/http/client.py", line 1106, in request
    self._send_request(method, url, body, headers)
  File "/usr/lib/python3.5/http/client.py", line 1151, in _send_request
    self.endheaders(body)
  File "/usr/lib/python3.5/http/client.py", line 1102, in endheaders
    self._send_output(message_body)
  File "/usr/lib/python3.5/http/client.py", line 934, in _send_output
    self.send(msg)
  File "/usr/lib/python3.5/http/client.py", line 877, in send
    self.connect()
  File "/usr/lib/python3/dist-packages/urllib3/connection.py", line 162, in connect
    conn = self._new_conn()
  File "/usr/lib/python3/dist-packages/urllib3/connection.py", line 146, in _new_conn
    self, "Failed to establish a new connection: %s" % e)
requests.packages.urllib3.exceptions.NewConnectionError: <requests.packages.urllib3.connection.HTTPConnection object at 0x7feaccda2668>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/requests/adapters.py", line 376, in send
    timeout=timeout
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 610, in urlopen
    _stacktrace=sys.exc_info()[2])
  File "/usr/lib/python3/dist-packages/urllib3/util/retry.py", line 273, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
requests.packages.urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='http://www.xiangshu.com/', port=80): Max retries exceeded with url: http://www.xiangshu.com/3603751.html (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7feaccda2668>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution',))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "getimg.py", line 102, in <module>
    GetImg().getdata()
  File "getimg.py", line 76, in getdata
    base_url + j['href'], headers=self.headers)
  File "/usr/lib/python3/dist-packages/requests/sessions.py", line 480, in get
    return self.request('GET', url, **kwargs)
  File "/usr/lib/python3/dist-packages/requests/sessions.py", line 468, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/lib/python3/dist-packages/requests/sessions.py", line 576, in send
    r = adapter.send(request, **kwargs)
  File "/usr/lib/python3/dist-packages/requests/adapters.py", line 437, in send
    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='http://www.xiangshu.com/', port=80): Max retries exceeded with url: http://www.xiangshu.com/3603751.html (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7feaccda2668>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution',))
7677 次点击
所在节点    Python
21 条回复
golmic
2018-06-13 11:39:12 +08:00
是所有的都报错还是偶尔有报错? 像是触发反爬
wsds
2018-06-13 11:43:14 +08:00
@golmic 基本爬几个小时就会报这个错
wsds
2018-06-13 11:43:29 +08:00
@golmic 才爬了 1 万张不到
lululau
2018-06-13 11:44:04 +08:00
像是域名解析偶发抽风
xxxy
2018-06-13 11:48:58 +08:00
dns 也有频率限制的
golmic
2018-06-13 11:49:55 +08:00
@lululau #4 解析出错会报 DNS 错误吧

大量报错就处理一下反爬,偶尔报的话重试就行
Cooky
2018-06-13 11:52:52 +08:00
换个好点的 dns ?
lerry
2018-06-13 11:54:59 +08:00
本地装个 dnsmasq 配置成系统默认 DNS, 可以改善 dns 查询
baday
2018-06-13 11:57:31 +08:00
请求头 connection 设置为 close 试试
wsds
2018-06-13 11:58:49 +08:00
@lululau 网上查了些,说是这么回事
wsds
2018-06-13 11:59:10 +08:00
@Cooky 好点的是哪种?
wsds
2018-06-13 11:59:20 +08:00
@lerry 这是阿里云上
ihancheng
2018-06-13 12:01:41 +08:00
不想吐槽套路云了,正在学 python 爬虫,我用腾讯云就没问题,阿里云抛异常死活解决不了…… 不知道是不是自己的问题,但是我在网上找了方法还是无法解决。
owenliang
2018-06-13 12:03:09 +08:00
异常是可以捕获的
wsds
2018-06-13 12:07:28 +08:00
@owenliang 这个已经是捕获后又抛出的了,你没看到 n 个 another exception occurred
Cooky
2018-06-13 12:38:02 +08:00
@wsds 阿里云不能装 dnsmasq ?
hicdn
2018-06-13 13:27:26 +08:00
DNS 解析问题。如果爬的是几个固定域名,改 hosts 文件。
dapengzhao
2018-06-13 15:36:25 +08:00
我的爬虫运行一段时间也会报这个错我的解决方法时如果 ip 不被封就捕获这个异常睡一会然后在 while true 下 break 结束此次循环重新开始。
gamecreating
2018-06-13 15:40:41 +08:00
异常 捕获一下 处理吧...

爬虫本来就不能保证全部连接成功 爬取成功
JCZ2MkKb5S8ZX9pq
2018-06-13 20:48:43 +08:00
自己写个 request,把 requests 包进去,常用的异常处理重试随机 ua 自动代理等等的都包进去,一劳永逸。

这是一个专为移动设备优化的页面(即为了让你能够在 Google 搜索结果里秒开这个页面),如果你希望参与 V2EX 社区的讨论,你可以继续到 V2EX 上打开本讨论主题的完整版本。

https://www.v2ex.com/t/462728

V2EX 是创意工作者们的社区,是一个分享自己正在做的有趣事物、交流想法,可以遇见新朋友甚至新机会的地方。

V2EX is a community of developers, designers and creative people.

© 2021 V2EX