[linux][系统宕机]可能性的原因有哪些呢?

2013-10-11 09:59:49 +08:00
 geew
经常莫名其妙的发生
做软路由的服务器, 安装了几个kvm

可能的原因有哪些呢, 需要关注哪些日志来进行故障查找呢

求运维高手求各种高手
6364 次点击
所在节点    问与答
10 条回复
megaforce
2013-10-11 10:45:18 +08:00
dmesg看看唄
什么/var/log/下面的日志都看看
echo1937
2013-10-11 11:02:43 +08:00
有Kdump吗?分析一下Kdump抓下来的core。
eth2net
2013-10-11 11:21:28 +08:00
宕机是panic还是hang?如#2,有kdump最好
sdysj
2013-10-11 11:31:44 +08:00
没日志你问个毛线啊。。。
BOYPT
2013-10-11 14:20:56 +08:00
不贴日志么,那我就猜猜吧:

有可能是外星人潜入你们机房,研究你们机器CPU时候不小心改掉了SP寄存器的最高位。
humiaozuzu
2013-10-11 14:42:30 +08:00
@sdysj
@BOYPT 人家问的是需要关注哪些日志来进行故障查找呢
geew
2013-10-11 16:18:19 +08:00
好吧 贴下*.err的日志 PS: 不知道这个编辑器怎么排版啊

Oct 11 11:09:50 master pppoe[7973]: Bad TCP checksum 2a00
Oct 11 11:10:13 master pppoe[7973]: Bad TCP checksum 3200
Oct 11 11:10:21 master pppoe[7973]: Bad TCP checksum 2a00
Oct 11 11:10:23 master pppoe[7973]: Bad TCP checksum 6900
Oct 11 11:10:34 master pppoe[7973]: Bad TCP checksum 3900
Oct 11 11:12:23 master pppoe[7973]: Bad TCP checksum 2a00
Oct 11 11:12:42 master pppoe[7973]: Bad TCP checksum 6100
Oct 11 11:18:05 master pppoe[7973]: Bad TCP checksum 2a00
Oct 11 11:18:19 master pppoe[7973]: Bad TCP checksum 6000
Oct 11 11:18:19 master pppoe[7973]: Bad TCP checksum 2a00
Oct 11 11:18:23 master pppoe[7973]: Bad TCP checksum 3400
Oct 11 11:23:51 master pppoe[7973]: Bad TCP checksum 6900
Oct 11 11:23:51 master pppoe[7973]: Bad TCP checksum 6900
Oct 11 11:24:17 master pppoe[7973]: Bad TCP checksum 2a00
Oct 11 11:24:17 master pppoe[7973]: Bad TCP checksum 2a00
Oct 11 11:25:31 master pppoe[7973]: Bad TCP checksum 3d00
Oct 11 11:30:26 master pppoe[7973]: Bad TCP checksum 3200
Oct 11 11:38:02 master pppoe[7973]: Bad TCP checksum 3000
Oct 11 11:44:09 master pppoe[7973]: Bad TCP checksum 2a00
Oct 11 11:50:31 master pppoe[7973]: Bad TCP checksum 6d00
Oct 11 11:53:43 master pppoe[7973]: Bad TCP checksum 2a00
Oct 11 12:46:07 master pppoe[7973]: Bad TCP checksum 6900
Oct 11 12:55:30 master pppoe[7973]: Bad TCP checksum 2a00
Oct 11 12:57:41 master pppoe[7973]: Bad TCP checksum 2a00
Oct 11 12:57:41 master pppoe[7973]: Bad TCP checksum 2a00
Oct 11 13:05:56 master pppoe[7973]: Bad TCP checksum 2a00
Oct 11 13:06:52 master pppoe[7973]: Bad TCP checksum 2a00
Oct 11 13:28:23 master pppoe[7973]: Bad TCP checksum 900
Oct 11 13:51:32 master pppoe[7973]: Bad TCP checksum 3c00
Oct 11 14:05:12 master pppoe[7973]: Bad TCP checksum 2a00
Oct 11 14:06:08 master pppoe[7973]: Bad TCP checksum 1800
Oct 11 14:11:33 master pppoe[7973]: Bad TCP checksum 3100
Oct 11 15:01:45 master pppoe[7973]: Bad TCP checksum 3100
Oct 11 15:48:58 master kernel: BUG: soft lockup - CPU#0 stuck for 67s! [ksmd:99]
Oct 11 15:48:58 master kernel: Stack:
Oct 11 15:48:58 master kernel: Call Trace:
Oct 11 15:48:58 master kernel: Code: 01 74 05 e8 92 7a d8 ff c9 c3 55 48 89 e5 0f 1f 44 00 00 b8 00 00 01 00 f0 0f c1 07 0f b7 d0 c1 e8 10 39 c2 74 0e f3 90 0f b7 17 <eb> f5 83 3f 00 75 f4 eb df c9 c3 0f 1f 40 00 55 48 89 e5 0f 1f
Oct 11 15:52:49 master kernel: [drm:radeon_dp_i2c_aux_ch] *ERROR* aux i2c too many retries, giving up
Oct 11 15:52:49 master kernel: [drm:radeon_dp_i2c_aux_ch] *ERROR* aux i2c too many retries, giving up
Oct 11 15:52:49 master kernel: ata2.00: failed to resume link (SControl 0)
Oct 11 15:52:49 master kernel: ata2.01: failed to resume link (SControl 0)
Oct 11 15:53:00 master nslcd[1847]: [8b4567] no available LDAP server found
Oct 11 15:53:10 master nslcd[1847]: [8b4567] no available LDAP server found
Oct 11 15:53:20 master nslcd[1847]: [7b23c6] no available LDAP server found
Oct 11 15:53:20 master nslcd[1847]: [7b23c6] no available LDAP server found
Oct 11 15:53:20 master nslcd[1847]: [3c9869] no available LDAP server found
Oct 11 15:53:20 master nslcd[1847]: [3c9869] no available LDAP server found
Oct 11 15:53:20 master nslcd[1847]: [334873] no available LDAP server found
Oct 11 15:53:20 master nslcd[1847]: [334873] no available LDAP server found
Oct 11 15:53:20 master nslcd[1847]: [b0dc51] no available LDAP server found
Oct 11 15:53:20 master nslcd[1847]: [b0dc51] no available LDAP server found
Oct 11 15:53:20 master nslcd[1847]: [495cff] no available LDAP server found
Oct 11 15:53:20 master nslcd[1847]: [495cff] no available LDAP server found
Oct 11 15:53:20 master nslcd[1847]: [e8944a] no available LDAP server found
Oct 11 15:53:20 master nslcd[1847]: [e8944a] no available LDAP server found
Oct 11 15:53:20 master nslcd[1847]: [5558ec] no available LDAP server found
Oct 11 15:53:20 master nslcd[1847]: [5558ec] no available LDAP server found
Oct 11 15:53:20 master nslcd[1847]: [8e1f29] no available LDAP server found
Oct 11 15:53:20 master nslcd[1847]: [8e1f29] no available LDAP server found
Oct 11 15:53:22 master automount[2097]: lookup_read_master: lookup(nisplus): couldn't locate nis+ table auto.master
Oct 11 15:53:23 master xinetd[2133]: Server /usr/sbin/amandad is not executable [file=/etc/xinetd.d/amanda] [line=13]
Oct 11 15:53:23 master xinetd[2133]: Error parsing attribute server - DISABLING SERVICE [file=/etc/xinetd.d/amanda] [line=13]
Oct 11 15:53:23 master dhcpd: WARNING: Host declarations are global. They are not limited to the scope you declared them in.
Oct 11 15:53:24 master libvirtd: Could not find keytab file: /etc/libvirt/krb5.tab: No such file or directory
Oct 11 15:53:24 master nslcd[1847]: [e87ccd] no available LDAP server found
Oct 11 15:53:24 master nslcd[1847]: [e87ccd] no available LDAP server found
Oct 11 15:53:27 master nslcd[1847]: [1b58ba] no available LDAP server found
Oct 11 15:53:27 master nslcd[1847]: [1b58ba] no available LDAP server found
Oct 11 15:53:27 master qemu-kvm: Could not find keytab file: /etc/qemu/krb5.tab: No such file or directory
Oct 11 15:53:27 master nslcd[1847]: [7ed7ab] no available LDAP server found
Oct 11 15:53:27 master nslcd[1847]: [7ed7ab] no available LDAP server found
Oct 11 15:53:27 master qemu-kvm: Could not find keytab file: /etc/qemu/krb5.tab: No such file or directory
Oct 11 15:53:28 master nslcd[1847]: [b141f2] no available LDAP server found
Oct 11 15:53:28 master nslcd[1847]: [b141f2] no available LDAP server found
Oct 11 15:53:28 master qemu-kvm: Could not find keytab file: /etc/qemu/krb5.tab: No such file or directory
Oct 11 15:53:28 master nslcd[1847]: [b71efb] no available LDAP server found
Oct 11 15:53:28 master nslcd[1847]: [b71efb] no available LDAP server found
Oct 11 15:53:28 master qemu-kvm: Could not find keytab file: /etc/qemu/krb5.tab: No such file or directory
Oct 11 15:53:29 master nslcd[1847]: [e2a9e3] no available LDAP server found
Oct 11 15:53:29 master nslcd[1847]: [e2a9e3] no available LDAP server found
Oct 11 15:53:29 master qemu-kvm: Could not find keytab file: /etc/qemu/krb5.tab: No such file or directory
Oct 11 15:53:29 master nslcd[1847]: [45e146] no available LDAP server found
Oct 11 15:53:29 master nslcd[1847]: [45e146] no available LDAP server found
Oct 11 15:53:29 master qemu-kvm: Could not find keytab file: /etc/qemu/krb5.tab: No such file or directory
Oct 11 15:53:47 master nslcd[1847]: [5f007c] no available LDAP server found
Oct 11 15:53:47 master nslcd[1847]: [5f007c] no available LDAP server found
Oct 11 15:53:49 master nslcd[1847]: [d062c2] no available LDAP server found
Oct 11 15:53:49 master nslcd[1847]: [d062c2] no available LDAP server found
Oct 11 15:53:50 master nslcd[1847]: [200854] no available LDAP server found
Oct 11 15:53:50 master nslcd[1847]: [200854] no available LDAP server found
Oct 11 15:54:11 master nslcd[1847]: [b127f8] no available LDAP server found
Oct 11 15:54:11 master nslcd[1847]: [b127f8] no available LDAP server found
Oct 11 15:54:11 master kernel: kvm: 2403: cpu0 unimplemented perfctr wrmsr: 0xc1 data 0xabcd
Oct 11 15:54:13 master kernel: kvm: 2461: cpu0 unimplemented perfctr wrmsr: 0xc1 data 0xabcd
Oct 11 15:54:13 master kernel: kvm: 2432: cpu0 unimplemented perfctr wrmsr: 0xc1 data 0xabcd
Oct 11 15:54:14 master kernel: kvm: 2472: cpu0 unimplemented perfctr wrmsr: 0xc1 data 0xabcd
Oct 11 15:54:14 master kernel: kvm: 2532: cpu0 unimplemented perfctr wrmsr: 0xc1 data 0xabcd
Oct 11 15:55:11 master nslcd[1847]: [16231b] no available LDAP server found
Oct 11 15:55:11 master nslcd[1847]: [16231b] no available LDAP server found
geew
2013-10-11 16:20:44 +08:00
@echo1937
@eth2net
@sdysj
@BOYPT
@humiaozuzu
还需要哪些日志呢?
ceyes
2013-10-11 17:17:08 +08:00
一般来说,kernel的bug肯定宕机。

less /var/log/messages
搜索"Oops" "Call Trace" "Panic"
拿着相关的信息去bugzilla寻求帮助吧
geew
2013-10-12 09:34:26 +08:00
顶上去

这是一个专为移动设备优化的页面(即为了让你能够在 Google 搜索结果里秒开这个页面),如果你希望参与 V2EX 社区的讨论,你可以继续到 V2EX 上打开本讨论主题的完整版本。

https://www.v2ex.com/t/85188

V2EX 是创意工作者们的社区,是一个分享自己正在做的有趣事物、交流想法,可以遇见新朋友甚至新机会的地方。

V2EX is a community of developers, designers and creative people.

© 2021 V2EX