求助 Python 大佬，在爬取 btc.com 中区块交易信息的 hash 码的过程中，由于访问频繁，会出现乱码，该怎么解决呢？

在爬取 btc.com 中区块交易信息的 hash 码的过程中，由于访问频繁，会出现乱码。进而报错使得程序停止运行，可以使用代理的方法改善这个情况，但是依然会出现程序停止运行的情况，我希望程序可以一直运行，但不知应该如何解决

changwei

2019-05-18 23:20:40 +08:00

try catch 捕获错误，保存进度，然后过段时间再继续跑

alexzsh

2019-05-18 23:53:22 +08:00

如果是由于请求过频可以使用 ip 池，但是国外高可用的 ip 池很少，所以还是洋葱路由来的舒服，python 配合 stem 一起使用，附上教程
https://github.com/Alexzsh/ICOSpider

CEBBCAT

2019-05-19 00:41:40 +08:00

呃，这不就是被识别到了爬虫么，相关的资料网上很多。你说的八成不是乱码，只是你写的程序没有考虑异常情况。

基础问题先 Google 吧。

Hconk

2019-05-19 00:45:09 +08:00

建议搞台服务器同步个 BTC 节点，直接从 RPC 接口拉数据多快。。稳定还不用考虑被防爬

chuanwu

2019-05-19 06:58:01 +08:00

btc.com 本身是一个区块链浏览器。自己实现一个好了。

kiddyu

2019-05-19 10:08:21 +08:00

访问量高的话可以直接给他们发邮件，提升配额

yuyang4271

2019-05-19 10:44:37 +08:00

@CEBBCAT 嗯嗯

yuyang4271

2019-05-19 10:45:06 +08:00

@alexzsh 谢谢

yuyang4271

2019-05-19 15:48:06 +08:00

import urllib.request
import json
import pyttsx3
import time
import requests
import re
import pyttsx3
m_last = ''
count = 1
url = 'https://btc.com/'
headers = {"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.157 Safari/537.36",
"Accept-Encoding":"gzip, deflate, br"}
proxies = {
"https": "https://117.22.42.43:8118",
"http": "http://119.139.197.247:3128",
"http": "http://121.69.37.6:9797",
}
if __name__ == '__main__':
engine=pyttsx3.init()
while True:
time.sleep(1)
count = count + 1
response = requests.get(url,headers=headers,proxies = proxies)
print(response.text)
print(type(response.text))
pattern = re.compile(r'"hash":".{64}","prev_block_ha')
m = pattern.findall(response.text)
m = m[0][8:72]
if (m_last != m):
m_last = m
voice=engine.getProperty('voice')
voices=engine.getProperty('voices')
for item in voices:
print(item.id,item.languages)
engine.setProperty('voice','zh')
engine.say(m)
engine.runAndWait()
if (count >10):
count = 0
voice=engine.getProperty('voice')
voices=engine.getProperty('voices')
for item in voices:
print(item.id,item.languages)
engine.setProperty('voice','zh')
engine.say(m)
engine.runAndWait()
print(m)

这是代码，能帮我看看哪里出错了吗？谢谢

tikazyq

2019-05-19 16:10:27 +08:00

可能对爬虫管理有用，推荐使用下 crawlab，http://github.com/tikazyq/crawlab

这是一个专为移动设备优化的页面（即为了让你能够在 Google 搜索结果里秒开这个页面），如果你希望参与 V2EX 社区的讨论，你可以继续到 V2EX 上打开本讨论主题的完整版本。

https://www.v2ex.com/t/565436

V2EX 是创意工作者们的社区，是一个分享自己正在做的有趣事物、交流想法，可以遇见新朋友甚至新机会的地方。

V2EX is a community of developers, designers and creative people.