今年又有别的朋友遇到这个问题，找到我，我又看了一下，猜测问题可能出在 traffic pattern 频繁触发了 ovs dpdk bonding 的 流量 shift，shift 的抖动造成了 VM （有抖动敏感的重启机制）内部的重启，而这个 long poll 只是 VM 重启的结果。
下边是 bonding shift 流量的触发条件：高的一边比低的一边差别在几个维度（ delta BW，BW 比例，高的流量超过一个，并且 shift 能降低至少 10%）。
> Bond Packet Output¶
> When a packet is sent out a bond port, the bond member actually used is selected based on the packet’s source MAC and VLAN tag (see bond_choose_output_member()). In particular, the source MAC and VLAN tag are hashed into one of 256 values, and that value is looked up in a hash table (the “bond hash”) kept in the bond_hash member of struct port. The hash table entry identifies a bond member. If no bond member has yet been chosen for that hash table entry, vswitchd chooses one arbitrarily.
> Every 10 seconds, vswitchd rebalances the bond members (see bond_rebalance()). To rebalance, vswitchd examines the statistics for the number of bytes transmitted by each member over approximately the past minute, with data sent more recently weighted more heavily than data sent less recently. It considers each of the members in order from most-loaded to least-loaded. If highly loaded member H is significantly more heavily loaded than the least-loaded member L, and member H carries at least two hashes, then vswitchd shifts one of H’s hashes to L. However, vswitchd will only shift a hash from H to L if it will decrease the ratio of the load between H and L by at least 0.1.
> Currently, “significantly more loaded” means that H must carry at least 1 Mbps more traffic, and that traffic must be at least 3% greater than L’s.