大佬们我这协程写的有毛病吗？

大哥 readFromFolder 是阻塞的，你在 async def put 里面得用线程池去执行它。asyncio 的主线程是单线程，没法执行这种阻塞函数。

https://docs.python.org/3/library/asyncio-eventloop.html#asyncio.loop.run_in_executor

ipwx

2020-01-19 15:17:09 +08:00

顺便吐槽一句楼上，这么明显的问题不是一眼就能看出来么

Ritter

2020-01-19 15:17:36 +08:00

@cz5424 Ctrl C 中断不了

ipwx

2020-01-19 15:17:50 +08:00

除了 readFromFolder, f.read() 也是阻塞的，也得放在 executor 里面

Ritter

2020-01-19 15:20:11 +08:00

@ipwx 可是他是执行到一半的时候阻塞了 self.q.put()这句代码是可以执行到的

ipwx

2020-01-19 15:22:54 +08:00

@Ritter 楼主也没放日志（差评）。

不过 async def put 这个函数问题太大了，怎么都会出问题的。

ipwx

2020-01-19 15:25:46 +08:00

=== 我发现楼主还有个问题，在 async def run 里面。

他只创建了 consumer = asyncio.gather(...)，但是没有勒令 consumer 进入执行啊？按照道理 asyncio.gather 并不具有执行一个 coroutine 的特性啊，只有 await 才能保证让一个 coroutine 进入运行状态啊？

楼主你得用 loop.create_task 把一个 coroutine 强行进入后台运行状态才对吧？

ipwx

2020-01-19 15:26:06 +08:00

顺便 loop.create_task 就不用 await 了

Ritter

2020-01-19 15:27:47 +08:00

@ipwx 代码都是官网抄的怎么会出错我也不知道啊（狗头）

Ritter

2020-01-19 15:28:16 +08:00

@ipwx asyncio.gather 之后确实是会运行的我试过了

chenqh

2020-01-19 15:29:19 +08:00

@Ritter 日志，日志呢？

ipwx

2020-01-19 15:31:30 +08:00

@Ritter 好吧我看了一眼文档，它当真会自动把 coroutine 变成 Task 给 schedule 起来。

"If any awaitable in aws is a coroutine, it is automatically scheduled as a Task."

youngce

2020-01-19 15:33:09 +08:00

赞同 ipwx，楼主你也要明白，目前 python 协程面临最大的问题的绝大多数第三方库均是同步的，不能支持协程异步。虽然现在已经与很多库在努力的兼容协程，但是在协程处理 io 库时，一定要请楚是否支持。不支持协程的 io 都要通过线程池来处理。官网也给出了 asyncio 中线程池的用法，可以再看看

Ritter

2020-01-19 15:34:23 +08:00

@chenqh 大佬我打印了一下貌似是在 self.q.get()这卡住了

Ritter

2020-01-19 15:35:36 +08:00

@youngce 请求库是 aiohttp 异步的意思是读取文件那里卡住了吗

Ritter

2020-01-19 15:36:08 +08:00

@ipwx 大佬 asyncio 的队列能跨线程使用吗

Ritter

2020-01-19 15:36:36 +08:00

@chenqh put 这个函数是可以运行完的

chenqh

2020-01-19 15:40:09 +08:00

@Ritter 先把日志打出来呀

chenqh

2020-01-19 15:40:46 +08:00

@Ritter 我怀疑你 put 跑完了，但是 consumer 还是卡在哪里

ipwx

2020-01-19 15:41:01 +08:00

@Ritter run_in_executor 本来就是把一个阻塞函数扔到别的线程里面执行，然后把结果拿出来的。

def fn():
....something to do

await loop.run_in_executor(fn)

freshgoose

2020-01-19 15:44:17 +08:00

看来 py 的协程还是很多坑啊……这么说现阶段还是用 golang 写并发比较好？

Ritter

2020-01-19 15:44:19 +08:00

@chenqh 好像是但是我后面一句已经把 consumer 取消了

Vegetable

2020-01-19 15:45:12 +08:00

@ipwx #8 并不是无法执行阻塞，只是会阻塞 eventloop 而已。这地方不会卡死。这个方法内部的操作也并不是耗时操作。

Vegetable

2020-01-19 15:47:13 +08:00

你这个程序由于 crawl 是不会主动跳出的，所以当任务执行完毕之后，所有 await queue.get 都会阻塞，等待新的任务入队，是卡在这里吗？

Ritter

2020-01-19 15:49:39 +08:00

@Vegetable
async def run(self):
crawls = [self.crawl(i) for i in range(self.max_concurrency)]
consumer = asyncio.gather(*crawls)
...
# cancel consumer
consumer.cancel()

我这里已经把 consumer 取消了

BBrother

2020-01-19 15:53:09 +08:00

有个库叫做 aiofile，你的文件读取是阻塞的

Ritter

2020-01-19 15:55:24 +08:00

@BBrother 可是 put 函数是可以完整运行完的这里是运行到一半的时候卡住了

chenqh

2020-01-19 15:55:34 +08:00

@Ritter 我一般不是这么退出的，我一般是再 producer 那边放入特殊的字符串，比如"__end__",然后 consumer 那边接受处理，自己退出的，你试一试？

chenqh

2020-01-19 15:56:37 +08:00

@Ritter 你这个就是 consumer 的退出问题，导致的

jyyx

2020-01-19 15:58:53 +08:00

消费者那里抛异常, self.q.task_done 并没有执行
加 try finally 试下

Vegetable

2020-01-19 16:03:01 +08:00

#37 jyyx 说的对，报异常会导致任务消费出问题，join()那里会卡住。

BBrother

2020-01-19 16:06:51 +08:00

@Ritter 你的卡住是指程序运行到一半不动了，并且没有输出，也没有执行完，而且还是概率触发？

Ritter

2020-01-19 16:15:23 +08:00

@chenqh https://asyncio.readthedocs.io/en/latest/producer_consumer.html 官网第一个例子应该是你说的这种处理方式我之前试过貌似也会卡住

Ritter

2020-01-19 16:15:52 +08:00

@jyyx
@Vegetable 可是异常不会向上传播吗？

Ritter

2020-01-19 16:18:04 +08:00

@BBrother 输出是有输出的就是有概率会阻塞

ipwx

2020-01-19 16:18:19 +08:00

@Ritter 异常无论是不是向上传播，q.task_done 都不能执行了呀，然后 join() 一定会卡住啊。。。

try:
...
finally:
q.task_done()

Vegetable

2020-01-19 16:28:09 +08:00

@Ritter 因为你没有直接 await crawl，所以这个异常应该是不会传播的。程序不会因此退出

pmispig

2020-01-19 16:38:37 +08:00

请问这是什么字体，看着真舒服

Ritter

2020-01-19 16:39:06 +08:00

@jyyx
@Vegetable
@ipwx
是报异常了原因也是因为异常导致 put 的数量和 task_done 不一致
学到了感谢各位大佬的鼎力相助
谢谢谢谢~~

chenqh

2020-01-19 16:39:18 +08:00

log 呀

Ritter

2020-01-19 16:41:38 +08:00

@pmispig 这是 carbon 网站生成的图片样式用的是 VScode 的

Ritter

2020-01-19 16:42:02 +08:00

@chenqh 已经解决了大佬感谢回复

Ritter

2020-01-19 16:43:35 +08:00

@pmispig https://carbon.now.sh/

hehe12dyo

2020-01-19 17:21:01 +08:00

朋友建议你一边读一边把数据往队列里面丢。这样在读大文件读时候看起来好些。
不然一个 10m 的字典，想想就刺激。
其实这工具我写过。。

cz5424

2020-01-19 17:24:31 +08:00 via iPhone

@ipwx 楼上根本就没看代码，手动狗头

Ritter

2020-01-19 17:27:17 +08:00

@hehe12dyo 一边读一边写我也想过但是自带的 open 是阻塞的上面有位大佬说的 aiofile 有空会去研究一下

p0wd3rop

2020-01-19 17:54:52 +08:00

这种扫描小工具建议用 Go 写，快，容易理解，很香。

KaynW

2020-01-20 12:15:40 +08:00

go
go
go

大佬们 我这协程写的有毛病吗？

大佬们我这协程写的有毛病吗？