celery 为何部署到服务器上变成同步阻塞了

257 天前
 lzjunika

最近整个小应用

业务服务:gunicorn + flask

后台服务:celery

主业务主文件转换与上传,比较耗时,所以用 celery 跑后台服务

本地(Mac book): 能正常异步跑,自动切换多个 worker,无阻塞,是理想效果:

[2023-08-29 16:37:14,279: INFO/MainProcess] Task pss.api.offce.put_content_to_obs[54e1286f-74c4-48d4-98e5-99937c65714c] received
[2023-08-29 16:37:14,417: INFO/MainProcess] Task pss.api.offce.put_content_to_obs[5b4d53fc-afc2-4e50-9b9c-905d5eddddde] received
[2023-08-29 16:37:14,486: INFO/MainProcess] Task pss.api.offce.put_content_to_obs[8fb95642-6900-4aaa-b666-11f98e3a0eea] received
[2023-08-29 16:37:14,531: INFO/MainProcess] Task pss.api.offce.put_content_to_obs[4df12aa2-458c-4946-8aad-3ed25e68c5e0] received
[2023-08-29 16:37:14,583: INFO/MainProcess] Task pss.api.offce.put_content_to_obs[93192a4d-7569-44c7-a09b-7035ea331901] received
[2023-08-29 16:37:14,618: INFO/MainProcess] Task pss.api.offce.put_content_to_obs[6897f139-bc7a-4b9f-aab8-22c6d7a07a85] received
[2023-08-29 16:37:14,660: INFO/MainProcess] Task pss.api.offce.put_content_to_obs[d301e702-accd-44b2-b85e-2f7d3c3a4e4f] received
[2023-08-29 16:37:14,690: WARNING/ForkPoolWorker-8] requestId:
[2023-08-29 16:37:14,693: WARNING/ForkPoolWorker-8] 0000018A4070950C5A0294A2CECAB8DF
[2023-08-29 16:37:14,701: WARNING/ForkPoolWorker-8] [2023-08-29 16:37:14,701] WARNING in offce: obs_upload_file:OK
[2023-08-29 16:37:14,701: WARNING/ForkPoolWorker-8] obs_upload_file:OK
[2023-08-29 16:37:14,702: WARNING/ForkPoolWorker-8] test_1.png
[2023-08-29 16:37:14,736: INFO/MainProcess] Task pss.api.offce.put_content_to_obs[42c63363-9528-4f59-9e21-b2816907141f] received
[2023-08-29 16:37:14,737: INFO/ForkPoolWorker-8] Task pss.api.offce.put_content_to_obs[54e1286f-74c4-48d4-98e5-99937c65714c] succeeded in 0.4250246670001161s: True
[2023-08-29 16:37:14,755: WARNING/ForkPoolWorker-1] requestId:
[2023-08-29 16:37:14,756: WARNING/ForkPoolWorker-1] 0000018A407095555502052E5A386783
[2023-08-29 16:37:14,763: WARNING/ForkPoolWorker-1] [2023-08-29 16:37:14,761] WARNING in offce: obs_upload_file:OK
[2023-08-29 16:37:14,761: WARNING/ForkPoolWorker-1] obs_upload_file:OK
[2023-08-29 16:37:14,767: WARNING/ForkPoolWorker-1] test_2.png
[2023-08-29 16:37:14,785: INFO/ForkPoolWorker-1] Task pss.api.offce.put_content_to_obs[5b4d53fc-afc2-4e50-9b9c-905d5eddddde] succeeded in 0.3451121250000142s: True
[2023-08-29 16:37:14,788: INFO/MainProcess] Task pss.api.offce.put_content_to_obs[835c3945-2396-4490-94d0-421298d1813f] received
[2023-08-29 16:37:14,890: WARNING/ForkPoolWorker-2] requestId:
[2023-08-29 16:37:14,891: WARNING/ForkPoolWorker-2] 0000018A407095E8540AE00FC334E409
[2023-08-29 16:37:14,892: WARNING/ForkPoolWorker-2] [2023-08-29 16:37:14,892] WARNING in offce: obs_upload_file:OK
[2023-08-29 16:37:14,892: WARNING/ForkPoolWorker-2] obs_upload_file:OK
[2023-08-29 16:37:14,893: WARNING/ForkPoolWorker-2] test_3.png
[2023-08-29 16:37:14,895: INFO/ForkPoolWorker-2] Task pss.api.offce.put_content_to_obs[8fb95642-6900-4aaa-b666-11f98e3a0eea] succeeded in 0.3848593749999054s: True

服务器(Centos7.6): 只能同步跑,只有一个固定 worder 在运行,同步阻塞:

[2023-08-29 16:25:58,664: INFO/MainProcess] Task pss.api.offce.put_content_to_obs[873c3f38-98b4-47cc-98e8-6f65a58c3269] received
[2023-08-29 16:25:58,733: WARNING/ForkPoolWorker-7] requestId:
[2023-08-29 16:25:58,734: WARNING/ForkPoolWorker-7] 0000018A406644C054084BB9021C6A9B
[2023-08-29 16:25:58,734: WARNING/ForkPoolWorker-7] [2023-08-29 16:25:58,734] WARNING in offce: obs_upload_file:OK
[2023-08-29 16:25:58,734: WARNING/ForkPoolWorker-7] obs_upload_file:OK
[2023-08-29 16:25:58,734: WARNING/ForkPoolWorker-7] test_8.png
[2023-08-29 16:25:58,735: INFO/ForkPoolWorker-7] Task pss.api.offce.put_content_to_obs[873c3f38-98b4-47cc-98e8-6f65a58c3269] succeeded in 0.07009824365377426s: True
[2023-08-29 16:26:00,287: INFO/MainProcess] Task pss.api.offce.put_content_to_obs[7a4868f2-305f-4f6b-992c-6ea0791f3427] received
[2023-08-29 16:26:00,370: WARNING/ForkPoolWorker-7] requestId:
[2023-08-29 16:26:00,370: WARNING/ForkPoolWorker-7] 0000018A40664B17550A56827D6506B2
[2023-08-29 16:26:00,371: WARNING/ForkPoolWorker-7] [2023-08-29 16:26:00,371] WARNING in offce: obs_upload_file:OK
[2023-08-29 16:26:00,371: WARNING/ForkPoolWorker-7] obs_upload_file:OK
[2023-08-29 16:26:00,371: WARNING/ForkPoolWorker-7] test_9.png
[2023-08-29 16:26:00,372: INFO/ForkPoolWorker-7] Task pss.api.offce.put_content_to_obs[7a4868f2-305f-4f6b-992c-6ea0791f3427] succeeded in 0.08343333378434181s: True

如上运行日志

本地正常运行,有 3 个 worker:ForkPoolWorker-8 、ForkPoolWorker-1 和 ForkPoolWorker-2 ,并行处理,服务不阻塞

远程服务器只能同步运行,且只有一个固定的 worker: ForkPoolWorker-7,服务阻塞的

是不是远程哪里的配置不对,远程服务器启动信息摘要也跟本地一样:

 -------------- celery@xg-003 v5.2.7 (dawn-chorus)
--- ***** -----
-- ******* ---- Linux-3.10.0-1062.1.2.el7.x86_64-x86_64-with-centos-7.7.1908-Core 2023-08-29 15:48:54
- *** --- * ---
- ** ---------- [config]
- ** ---------- .> app:         pss:0x7f348eaec160
- ** ---------- .> transport:   redis://***:6379/6
- ** ---------- .> results:     redis://***:6379/6
- *** --- * --- .> concurrency: 8 (prefork)
-- ******* ---- .> task events: OFF (enable -E to monitor tasks in this worker)
--- ***** -----
 -------------- [queues]
                .> celery           exchange=celery(direct) key=celery

请各位指教,是不是哪里配置有问题

1346 次点击
所在节点    Python
19 条回复
kkk9
257 天前
redis 正常吗
lzjunika
257 天前
都能正常运行,本地由于是异步的,很快执行完毕
服务器上,一真无法异步,worker 从默认的 prefork ,调到 eventlet 和 gevent ,都不行
有没有人遇到这种情况
lzjunika
257 天前
@kkk9 redis 正常,换成本机了
purensong
257 天前
我觉得是参数的问题,
kkk9
257 天前
首先确定队列是正常的,redis 存活,可以连接,密码没错

然后是 queues ,看服务器上只有 celery ,是否 mac 不是 celery ,另外指定?
kkk9
257 天前
可以注意下 seetings 部分
lzjunika
257 天前
服务器和本地运行 celery 的命令一样:
celery -A make_celery worker --loglevel INFO --logfile=/logs/celery.log
lzjunika
257 天前
@kkk9 queues 和 seetings 这两怎么排查
Anybfans
257 天前
安装个 flower 看下
celerysoft
256 天前
@lzjunika #7 看了下你的启动命令,可能是因为没有指定 concurrency ,默认 concurrency 和机器的 CPU 核心数相同,盲猜服务器是 1 核的?那指定一下 concurrency 参数应该就好了
lzjunika
256 天前
redis 确实是有值,说明是联通的:
127.0.0.1:6379[6]> keys *
1) "celery-task-meta-a144f43b-93eb-4047-bc01-6d0fdfe9b8f6"
2) "celery-task-meta-865395d9-2226-4969-a269-a93d56ee3c4c"
3) "celery-task-meta-2c44dafc-93e4-4792-8a40-7f747bbd063b"
4) "celery-task-meta-0203b744-504b-414f-adda-41b45fe2aff9"
5) "celery-task-meta-16d37b55-b645-4e05-b58b-55b87fbf4e37"
6) "celery-task-meta-1e2fc20a-a31d-41a3-9003-5c7ffef30e42"
7) "celery-task-meta-a819a02b-7c15-475d-907a-7ab5ed5221cd"
8) "celery-task-meta-c2779805-d922-4423-b2bd-976317e5486d"
9) "celery-task-meta-7a4868f2-305f-4f6b-992c-6ea0791f3427"
10) "celery-task-meta-ff756f38-02c7-4e1f-8b20-39db4722fe83"
11) "celery-task-meta-0e38860b-dd44-47c2-9e40-4a1f4a7c4bb4"
12) "celery-task-meta-3187c555-d3a3-46b1-bf13-3bc38bc79fbd"
13) "celery-task-meta-873c3f38-98b4-47cc-98e8-6f65a58c3269"
14) "_kombu.binding.celery"
15) "_kombu.binding.celery.pidbox"
16) "celery-task-meta-bca09af8-14f4-4d00-84d1-baae7d233070"
17) "celery-task-meta-4f2c9e67-86a8-410f-bbe4-1a408981fd1a"
18) "celery-task-meta-cc93cd0f-f931-4a8c-a24e-795863531953"
19) "celery-task-meta-53d64e39-c872-46d7-a392-57e8617b8751"
20) "celery-task-meta-30efb54a-9f95-46e0-bd49-4190d5684f4c"
21) "celery-task-meta-ca6a5f83-3cab-4111-92c8-f154c2e03766"
22) "celery-task-meta-02a741d2-7426-4339-ad57-a5eea00c72e6"
23) "_kombu.binding.celeryev"
24) "celery-task-meta-94218c29-08b7-4982-ac15-2bc349767fa6"
25) "celery-task-meta-2a9fd0de-2f14-4dbe-a26e-56d6a22c8466"
26) "celery-task-meta-2c9da801-8383-4829-8db0-a0cf5fb8030b"
27) "celery-task-meta-d3d0c01d-359d-45d2-809c-9cbc5072b73d"
28) "celery-task-meta-71610058-15ea-4d5c-b282-226448300228"
29) "celery-task-meta-ee4efe45-43c3-44e6-af0e-df843cb57ea6"
30) "celery-task-meta-6ea9d50a-6b6e-4e28-a8cb-837c6517da54"
lzjunika
256 天前
@celerysoft
8 核 32g 的服务器,按默认启动 concurrency 值是 8 ,上面有运行摘要截图,这个试过了,手动指定也不行
lzjunika
256 天前
手动指定了 3 个并发
唯一改变的是 workerid:celery -A make_celery worker --concurrency=3 --loglevel INFO

但还是同步的:

[2023-08-29 18:19:00,670: INFO/MainProcess] Task pss.api.offce.put_content_to_obs[02ac662b-e0bd-4cc3-b659-6345a471505a] received
[2023-08-29 18:19:00,756: WARNING/ForkPoolWorker-1] requestId:
[2023-08-29 18:19:00,756: WARNING/ForkPoolWorker-1] 0000018A40CDC0F9540ADCD7126FE0E9
[2023-08-29 18:19:00,757: WARNING/ForkPoolWorker-1] [2023-08-29 18:19:00,757] WARNING in offce: obs_upload_file:OK
[2023-08-29 18:19:00,757: WARNING/ForkPoolWorker-1] obs_upload_file:OK
[2023-08-29 18:19:00,757: WARNING/ForkPoolWorker-1] test_8.png
[2023-08-29 18:19:00,757: INFO/ForkPoolWorker-1] Task pss.api.offce.put_content_to_obs[02ac662b-e0bd-4cc3-b659-6345a471505a] succeeded in 0.08660224080085754s: True

[2023-08-29 18:19:02,301: INFO/MainProcess] Task pss.api.offce.put_content_to_obs[19d3c1aa-20be-4dcb-a819-360191532325] received
[2023-08-29 18:19:02,400: WARNING/ForkPoolWorker-1] requestId:
[2023-08-29 18:19:02,400: WARNING/ForkPoolWorker-1] 0000018A40CDC7595A03C83BB2923AA0
[2023-08-29 18:19:02,401: WARNING/ForkPoolWorker-1] [2023-08-29 18:19:02,401] WARNING in offce: obs_upload_file:OK
[2023-08-29 18:19:02,401: WARNING/ForkPoolWorker-1] obs_upload_file:OK
[2023-08-29 18:19:02,401: WARNING/ForkPoolWorker-1] test_9.png
[2023-08-29 18:19:02,402: INFO/ForkPoolWorker-1] Task pss.api.offce.put_content_to_obs[19d3c1aa-20be-4dcb-a819-360191532325] succeeded in 0.09988882020115852s: True
deplivesb
256 天前
给我开个 ssh 我上去看看
lzjunika
255 天前
这几天,被这事搞得有点晕了,我本地,以及本地 docker 都试过,可以异步,到服务器就不行
今天试了,在服务器中多运行一个 docker 容器,来跑 celery
也就是在服务器上宿主运行一个 celery worker ,docker 中运行一个 worker, 相当于有两个消息费了
但还是同步,发起了 10 个任务,宿主执行 4 个,docker 中执行了 6 个,有分配,但还是同步,总的执行时间没变

docker 的执行命令分别试了两种,但一样的效果:
Dockerfile1(使用默认的 prefork ,8 个并发):
...
CMD ["celery", "-A", "docker_celery", "worker", "--loglevel", "INFO", "--logfile=logs/celery_docker.log"]

Dockerfile2(使用 eventlet ,5 并发):
...
CMD ["celery", "-A", "docker_celery", "worker", "--pool=eventlet", "--concurrency=5", "--loglevel", "INFO", "--logfile=logs/celery_docker.log"]

一样的结果,同步,总执行时间不变,郁闷了...
lzjunika
255 天前
一个生产者发任务,对应两个消费者(宿主服务+docker 服务)了,任务消息能正常处理,就是不能异步
生产者发任务很正常:
put_content_to_obs.delay(new_name, local_name)
在生产者端也没要求返回结果,只发送
lzjunika
255 天前
到底是哪配置的问题,非常诡异,还是换个服务器?
lzjunika
254 天前
终于看到希望了,在同一个服务器上,部署一个最小 flask+celery 服务,通过 docker-compose 跑起来,可以同步了
那就是服务器没问题,redis 没问题,就是原来程序哪里写的不对,再找找:

[2023-09-01 11:08:45,531: INFO/MainProcess] Task app.controller.index.add_together[d4885e83-f346-46b9-98c2-a9f981d7d1de] received
[2023-09-01 11:08:45,533: INFO/MainProcess] Task app.controller.index.add_together[0abc8808-5603-4c61-87de-f6bcd2747d53] received
[2023-09-01 11:08:45,535: INFO/MainProcess] Task app.controller.index.add_together[e1211bbc-8a76-4d8c-94d6-e3904cc50bdc] received
[2023-09-01 11:08:45,538: INFO/MainProcess] Task app.controller.index.add_together[3a099971-abc5-4c2c-b784-1a2aaba86a24] received
[2023-09-01 11:08:45,539: INFO/MainProcess] Task app.controller.index.add_together[f1a6604d-2757-4742-b4b5-33c4b92bbbb8] received
[2023-09-01 11:08:45,541: INFO/MainProcess] Task app.controller.index.add_together[d380858f-3e65-4569-bcea-54ea8db5e6cf] received
[2023-09-01 11:08:45,542: INFO/MainProcess] Task app.controller.index.add_together[740fbfed-7074-49f1-8680-6ddc48bfc2da] received
[2023-09-01 11:08:45,544: INFO/MainProcess] Task app.controller.index.add_together[78b6ee5f-15a0-409b-b41f-709b0fdcb818] received
[2023-09-01 11:08:45,545: INFO/MainProcess] Task app.controller.index.add_together[a482a9d2-1ffd-47df-b421-0bfcd1b386e1] received
[2023-09-01 11:08:45,546: INFO/MainProcess] Task app.controller.index.add_together[7baa35a0-d695-4010-8120-051d5eea9af7] received
[2023-09-01 11:08:46,535: INFO/ForkPoolWorker-7] Task app.controller.index.add_together[d4885e83-f346-46b9-98c2-a9f981d7d1de] succeeded in 1.0014203377068043s: 231
[2023-09-01 11:08:46,535: INFO/ForkPoolWorker-8] Task app.controller.index.add_together[0abc8808-5603-4c61-87de-f6bcd2747d53] succeeded in 1.001225769519806s: 647
[2023-09-01 11:08:46,537: INFO/ForkPoolWorker-1] Task app.controller.index.add_together[e1211bbc-8a76-4d8c-94d6-e3904cc50bdc] succeeded in 1.001103661954403s: 308
[2023-09-01 11:08:46,540: INFO/ForkPoolWorker-2] Task app.controller.index.add_together[3a099971-abc5-4c2c-b784-1a2aaba86a24] succeeded in 1.0009450502693653s: 735
[2023-09-01 11:08:46,542: INFO/ForkPoolWorker-3] Task app.controller.index.add_together[f1a6604d-2757-4742-b4b5-33c4b92bbbb8] succeeded in 1.0019154399633408s: 554
[2023-09-01 11:08:46,544: INFO/ForkPoolWorker-5] Task app.controller.index.add_together[740fbfed-7074-49f1-8680-6ddc48bfc2da] succeeded in 1.000898975878954s: 455
[2023-09-01 11:08:46,545: INFO/ForkPoolWorker-4] Task app.controller.index.add_together[d380858f-3e65-4569-bcea-54ea8db5e6cf] succeeded in 1.0016995184123516s: 771
[2023-09-01 11:08:46,546: INFO/ForkPoolWorker-6] Task app.controller.index.add_together[78b6ee5f-15a0-409b-b41f-709b0fdcb818] succeeded in 1.0007124096155167s: 281
[2023-09-01 11:08:47,537: INFO/ForkPoolWorker-8] Task app.controller.index.add_together[7baa35a0-d695-4010-8120-051d5eea9af7] succeeded in 1.00179473310709s: 788
[2023-09-01 11:08:47,538: INFO/ForkPoolWorker-7] Task app.controller.index.add_together[a482a9d2-1ffd-47df-b421-0bfcd1b386e1] succeeded in 1.0018408931791782s: 729
misoomang
251 天前
作为菜鸡提出的验证想法:可以写一个 @worker_process_init.connect() 写个方法打印日志,确定是否有多个 worker

猜想:在服务器上从上面看上传 test_8.png 任务是执行了 0.08s ,间隔接收 test_9.png 有 2s 的时间间隔,是否会是 io 的影响导致接收图片过慢,导致第一个任务再 worker1 执行,第二个任务也在 worker1 中执行

这是一个专为移动设备优化的页面(即为了让你能够在 Google 搜索结果里秒开这个页面),如果你希望参与 V2EX 社区的讨论,你可以继续到 V2EX 上打开本讨论主题的完整版本。

https://www.v2ex.com/t/969246

V2EX 是创意工作者们的社区,是一个分享自己正在做的有趣事物、交流想法,可以遇见新朋友甚至新机会的地方。

V2EX is a community of developers, designers and creative people.

© 2021 V2EX