格式有些乱，同学们可以看下面这俩链接：

https://bitnoise.s3-ap-northeast-1.amazonaws.com/index.html

filebeat 合并多行日志的问题

配置说明

最近有些日志需要进行多行合并收集，就开始研究 filebeat 的多行配置，期间遇到个问题。对 filebeat 配置熟悉的同学可以直接跳到下面的 问题 环节。

使用版本：

filebeat version 6.1.4 (amd64), libbeat 6.1.4

官方文档：

https://www.elastic.co/guide/en/beats/filebeat/6.1/multiline-examples.html https://www.elastic.co/guide/en/beats/filebeat/6.1/configuration-filebeat-options.html

下面列出几个重要的参数

include_lines

If multiline is also specified, each multiline message is combined into a single line before the lines are filtered by include_lines.

A list of regular expressions to match the lines that you want Filebeat to include. Filebeat exports only the lines that match a regular expression in the list. By default, all lines are exported.

If multiline is also specified, each multiline message is combined into a single line before the lines are filtered by include_lines.The following example configures

Filebeat to export any lines that start with "ERR" or "WARN":

filebeat.prospectors:
- paths:
    - /var/log/myapp/*.log
  include_lines: ['^ERR', '^WARN']

恶魔的低语：符合列表中任意一个正则的就会被收集。

multiline.pattern

Specifies the regular expression pattern to match. Note that the regexp patterns supported by Filebeat differ somewhat from the patterns supported by Logstash. See Regular expression support for a list of supported regexp patterns. Depending on how you configure other multiline options, lines that match the specified regular expression are considered either continuations of a previous line or the start of a new multiline event. You can set the negate option to negate the pattern.

恶魔的低语：符合此正则的行，将进行多行合并，合并规则取决于下面的两个参数

multiline.max_lines

The maximum number of lines that can be combined into one event. If the multiline message contains more than max_lines, any additional lines are discarded. The default is 500.

恶魔的低语：可合并的最大行数，超过则被忽略，默认为 500，避免合并过多日志

multiline.timeout

After the specified timeout, Filebeat sends the multiline event even if no new pattern is found to start a new event. The default is 5s.

恶魔的低语：多行匹配超时时间，超过此时间后的当前多行匹配事件将停止并发送，然后开始一个新的多行匹配事件，默认 5 秒

multiline.negate

Defines whether the pattern is negated. The default is false.

恶魔的低语：看下面结合 match 的分析

multiline.match

Specifies how Filebeat combines matching lines into an event. The settings are after or before. The behavior of these settings depends on what you specify for negate:

恶魔的低语：我第一次看上面的 multiline 相关设置的时候，真的被搞懵逼了。

下面是自己对 multiline.match 的一个总结（顺序同上图）：

第一种：

negate: false

match: after

将符合正则的行，与前一个不符合正则的行合并为一行。

不匹配 pattern 的行
匹配 pattern 的行
匹配 pattern 的行

第二种：

negate: false

match: before

将符合正则的行，与后面一个不符合正则的行合并为一行。

匹配 pattern 的行
匹配 pattern 的行
不匹配 pattern 的行

第三种：

negate: true

match: after

将不符合正则的行，与前一个符合正则的行合并为一行。

匹配 pattern 的行
不匹配 pattern 的行
不匹配 pattern 的行

第四种：

negate: true

match: before

将不符合正则的行，与后一个符合正则的行合并为一行。

不匹配 pattern 的行
不匹配 pattern 的行
匹配 pattern 的行

问题

include_lines 与 multiline.pattern 一同使用时，与预期产生的效果不同。以下列举了 3 个实例，请关注下 期望收集结果 和 实际收集结果 的差别。

include_lines 官方文档中有一条：

If multiline is also specified, each multiline message is combined into a single line before the lines are filtered by include_lines.

我的理解：

include_lines 与 multiline.pattern 一同使用时，程序先把符合 multiline.pattern 的行，按规则进行合并，再由 include_lines 过滤，得到一行。
只符合 include_lines 正则的行会以单一行进行收集。

filebeat.yml 文件配置（无其他配置）：

filebeat.prospectors:
- type: log
  enabled: true
  paths:
    - /var/log/i.log
  include_lines: ['error']
  multiline.pattern: "errorA"
  multiline.negate: true
  multiline.match: after
output.file:
  path: "/var/log"
  filename: o.log

输入样本 1

echo 'A
B
C
errorA TEST1
1
2
3
' >>/var/log/i.log

期望收集结果

errorA TEST1
1
2
3

实际收集结果

errorA TEST1
1
2
3

源数据

{"@timestamp":"2019-06-04T10:21:37.332Z","@metadata":{"beat":"filebeat","type":"doc","version":"6.1.4"},"beat":{"name":"localhost.localdomain","hostname":"localhost.localdomain","version":"6.1.4"},"source":"/var/log/i.log","offset":26,"message":"errorA TEST1\n1\n2\n3\n","prospector":{"type":"log"}}

预测结果

正确

输入样本 2

echo 'A
B
C
error TEST2
1
2
3
' >>/var/log/i.log

期望收集结果

error TEST2

实际收集结果

A
B
C
error TEST2
1
2
3

源数据

{"@timestamp":"2019-06-04T10:22:22.336Z","@metadata":{"beat":"filebeat","type":"doc","version":"6.1.4"},"message":"A\nB\nC\nerror TEST2\n1\n2\n3\n","source":"/var/log/i.log","offset":51,"prospector":{"type":"log"},"beat":{"name":"localhost.localdomain","hostname":"localhost.localdomain","version":"6.1.4"}}

预测结果

错误

输入样本 3

echo 'error TEST3
error TEST3
error TEST3
error TEST3
' >>/var/log/i.log

期望收集结果

4 条

error TEST3

error TEST3

error TEST3

error TEST3

实际收集结果

1 条

error TEST3
error TEST3
error TEST3
error TEST3

源数据

{"@timestamp":"2019-06-04T10:23:37.342Z","@metadata":{"beat":"filebeat","type":"doc","version":"6.1.4"},"prospector":{"type":"log"},"beat":{"name":"localhost.localdomain","hostname":"localhost.localdomain","version":"6.1.4"},"source":"/var/log/i.log","offset":112,"message":"error TEST3\nerror TEST3\nerror TEST3\nerror TEST3\nerror TEST3\n"}

预测结果

错误

结论

include_lines 与 multiline.pattern 一同使用时，程序先把符合 multiline.pattern 的行，按规则进行合并，再由 include_lines 过滤，得到一行。
只符合 include_lines 的行也会进行多行合并，它会与前后行都进行合并（无视 negate 和 match 规则），得到一行。
若没有设置 include_lines ，除了 multiline.pattern 会被优先合并外，其他所有行都将被合并为一行（等效于设置了: include_lines: ['.*']）。

恶魔的低语：这样造成很多不必要的多余内容收集，有什么办法吗？分成两个配置？