请问大家是如何让 docker daemon 守护进程不退出的?

2020-12-02 10:50:57 +08:00
 programV2

突然收到服务监控通知, 发现 docker 上 4 个容器服务同时挂掉了, journalctl -u docker.service 进入日志查看发现了这个, 请问有 V 友碰到过吗? 该如何排错呢? 谢谢大家指点! 另外想请教大家现在都在用哪种方法让 docker daemon 守护进程不退出?

Docker version 18.09.7, build 2d0083d linux 版本: 4.15.0-123-generic #126~16.04.1-Ubuntu SMP

systemd[1]: Stopping Docker Application Container Engine...

074627-05:00" level=info msg="Processing signal 'terminated'"

049975-05:00" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"

444047-05:00" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"

699341-05:00" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"

441246-05:00" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"

204594-05:00" level=info msg="stopping event stream following graceful shutdown" error="<nil>" module=libcontainerd namespace=moby

664254-05:00" level=info msg="stopping event stream following graceful shutdown" error="context canceled" module=libcontainerd namespace=plugins.moby

systemd[1]: Stopped Docker Application Container Engine.

1687 次点击
所在节点    问与答
12 条回复
programV2
2020-12-02 10:54:51 +08:00
后面我用 重启了 docker daemon 进程, 下边是余下的日志:

systemd[1]: Starting Docker Application Container Engine...
3548512-05:00" level=info msg="parsed scheme: \"unix\"" module=grpc
4983202-05:00" level=info msg="scheme \"unix\" not registered, fallback to default scheme" module=grpc
5244875-05:00" level=info msg="parsed scheme: \"unix\"" module=grpc
5268345-05:00" level=info msg="scheme \"unix\" not registered, fallback to default scheme" module=grpc
6388785-05:00" level=info msg="ccResolverWrapper: sending new addresses to cc: [{unix:///run/containerd/containerd.sock 0 <nil>}]" module=grpc
6429898-05:00" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
6495301-05:00" level=info msg="pickfirstBalancer: HandleSubConnStateChange: 0xc4206e0ab0, CONNECTING" module=grpc
7938884-05:00" level=info msg="pickfirstBalancer: HandleSubConnStateChange: 0xc4206e0ab0, READY" module=grpc
8150455-05:00" level=info msg="ccResolverWrapper: sending new addresses to cc: [{unix:///run/containerd/containerd.sock 0 <nil>}]" module=grpc
8187954-05:00" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
8245945-05:00" level=info msg="pickfirstBalancer: HandleSubConnStateChange: 0xc4206e0da0, CONNECTING" module=grpc
8847325-05:00" level=info msg="pickfirstBalancer: HandleSubConnStateChange: 0xc4206e0da0, READY" module=grpc
5119350-05:00" level=info msg="[graphdriver] using prior storage driver: overlay2"
8117760-05:00" level=info msg="Graph migration to content-addressability took 0.00 seconds"
8433459-05:00" level=warning msg="Your kernel does not support swap memory limit"
8515283-05:00" level=warning msg="Your kernel does not support cgroup rt period"
8529936-05:00" level=warning msg="Your kernel does not support cgroup rt runtime"
9099663-05:00" level=info msg="Loading containers: start."
7191821-05:00" level=info msg="Default bridge (docker0) is assigned with an IP address 172.17.0.0/16. Daemon option --bip can be used to set a preferred IP address"
3364491-05:00" level=info msg="Loading containers: done."
3809274-05:00" level=warning msg="failed to retrieve runc version: unknown output format: runc version spec: 1.0.1-dev\n"
3797654-05:00" level=info msg="Docker daemon" commit=2d0083d graphdriver(s)=overlay2 version=18.09.7
4010995-05:00" level=info msg="Daemon has completed initialization"
9557408-05:00" level=info msg="API listen on /var/run/docker.sock"
systemd[1]: Started Docker Application Container Engine.
programV2
2020-12-02 11:22:27 +08:00
大家都没碰到过吗?
programV2
2020-12-02 11:29:38 +08:00
Docker info 的output 

Images: 5
Server Version: 18.09.7
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version:
runc version: N/A
init version: v0.18.0 (expected: fec3683b971d9c3ef73f284f176672c44b448662)
Security Options:
apparmor
seccomp
Profile: default
Kernel Version: 4.15.0-123-generic
Operating System: Ubuntu 16.04.6 LTS
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 1001MiB
Name: xxxx2.localdomain
ID: xx42:xxxx:5JEH:DQ7M:SI6Z:JE5G:ZE75:ZVYI:3FDM:NCXE:DIDP:ELIT
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false

WARNING: No swap limit support
cheng6563
2020-12-02 14:18:29 +08:00
074627-05:00" level=info msg="Processing signal 'terminated'"
这怎么像是人工停的。
给 systemd 服务配个自动重启吧。
mritd
2020-12-02 14:30:57 +08:00
哈哈哈哈 他妈的 我以为被坑的只有我 哈哈哈哈

我也是今天早上发现某些服务器 docker daemon 没了,然后发现 container 实际上还在运行(开启了 --live-restore)
看到的日志也是给了 terminated 信号,追查时间点定位在 6:40 左右

然后果断的 systemctl list-timers

```sh
06:40:13 CST 7h ago apt-daily-upgrade.timer apt-daily-upgrade.service
```

接下来

cat /var/log/unattended-upgrades/unattended-upgrades-dpkg.log

```sh
Log started: 2020-12-02 06:40:15
(Reading database ... 110792 files and directories currently installed.)
Preparing to unpack .../containerd_1.3.3-0ubuntu2.1_amd64.deb ...
Unpacking containerd (1.3.3-0ubuntu2.1) over (1.3.3-0ubuntu2) ...
Setting up containerd (1.3.3-0ubuntu2.1) ...
Processing triggers for man-db (2.9.1-1) ...
```

坑了个爹,弥补措施就是

```sh
apt-mark hold docker docker.io containerd
systemctl disable apt-daily.timer apt-daily-upgrade.timer
systemctl stop apt-daily.timer apt-daily-upgrade.timer
```
programV2
2020-12-02 15:47:02 +08:00
@mritd 昨天晚上就已经挂掉了。让我排查一整天。你这样管用吗?我是先把自动升级给关掉了。
mritd
2020-12-02 15:51:32 +08:00
@programV2 apt-mark hold 可以让某个软件包在 upgrade 时候不升级,我当时忘记 hold containerd 了;然后所幸把这个 自动升级也关了,差点坑死我这个玩意。
programV2
2020-12-02 16:04:00 +08:00
@mritd 国外论坛昨晚很多人发帖反馈了,我们论坛里面这么少人用容器吗?怎么都没人反馈呀?我还差点以为是我其他软件出问题了,怎么都想不到是这个?
mritd
2020-12-02 16:10:38 +08:00
@programV2 #8 我是连续出现了 3 台机器,一台生产一台测试还有一个国外 vps ;然后觉得事情没有这么简单排查了一下 哈哈
programV2
2020-12-02 16:45:16 +08:00
@mritd v 友你也是 Ubuntu 16 吗?有个问题想请教一下你,我用 journalctl -u docker.service 进入日志查看想要复制日志,发现日志太长,我在手机上用选中的方式没法全部都复制下来,这种情况有什么命令适合将日志全部复制出来吗?
julyclyde
2020-12-02 19:50:13 +08:00
诶不过按说
docker 和 containerd 重启
应该不影响已经启动的容器啊?
mritd
2020-12-03 10:34:28 +08:00
@programV2 #10 我们是 18/20 ;其实正常的 journalctl -u docker 接管道就可以把日志传输到下个程序,在 mac 上有 pbcopy ;为了在 ssh 中也可以直接复制到本地,我自己弄个一个工具 可以在无限远端直接复制到本地,手机上就不知道了....

这是一个专为移动设备优化的页面(即为了让你能够在 Google 搜索结果里秒开这个页面),如果你希望参与 V2EX 社区的讨论,你可以继续到 V2EX 上打开本讨论主题的完整版本。

https://www.v2ex.com/t/731263

V2EX 是创意工作者们的社区,是一个分享自己正在做的有趣事物、交流想法,可以遇见新朋友甚至新机会的地方。

V2EX is a community of developers, designers and creative people.

© 2021 V2EX