为什么 C++ 中 PCRE 正则匹配只能捕获 19 个 group 出来

2022-08-20 16:50:44 +08:00
 zhengken

程序问题

我的正则 pattern 是 (a)(b)(c)(d)(e)(f)(g)(h)(i)(j)(k)(l)(m)(n)(o)(p)(q)(r)(s)(t)(u)(v)(w)(x)(y)(z)

然后我需要匹配的字符串是 abcdefghijklmnopqrstuvwxyz

程序输出是:

i_0:0 i_1:26 i_2:0 i_3:1 i_4:1 i_5:2 i_6:2 i_7:3 i_8:3 i_9:4 i_10:4 i_11:5 i_12:5 i_13:6 i_14:6 i_15:7 i_16:7 i_17:8 i_18:8 i_19:9 i_20:9 i_21:10 i_22:10 i_23:11 i_24:11 i_25:12 i_26:12 i_27:13 i_28:13 i_29:14 i_30:14 i_31:15 i_32:15 i_33:16 i_34:16 i_35:17 i_36:17 i_37:18 i_38:18 i_39:19 i_40:0 i_41:0 i_42:0 i_43:0 i_44:0 i_45:0 i_46:0 i_47:0 i_48:0 i_49:0 i_50:0 i_51:0 i_52:0 i_53:0 i_54:0 i_55:0 i_56:0 i_57:0 i_58:0 i_59:0

问题:为什么只匹配了 19 组出来?

相关代码

#include <pcre.h>
#include <iostream>

pcre* _rex;
pcre_extra* _rexEx;

void CompileRexStr(const std::string& rex) {
    const char* errorinfo;
    int errpos = 0;
    _rex = NULL;
    _rexEx = NULL;

    _rex = pcre_compile(rex.c_str(), PCRE_UTF8, &errorinfo, &errpos, NULL);
    _rexEx = pcre_study(_rex, PCRE_STUDY_JIT_COMPILE, &errorinfo);
}

int main(){
    std::string rex = "(a)(b)(c)(d)(e)(f)(g)(h)(i)(j)(k)(l)(m)(n)(o)(p)(q)(r)(s)(t)(u)(v)(w)(x)(y)(z)";
    CompileRexStr(rex);

    std::string str = "abcdefghijklmnopqrstuvwxyz";
    int result[60] = {0};
    int cur = 0;
    int pos = pcre_exec(_rex, _rexEx, str.c_str(), str.length(), cur, 0, result, 60);

    for(int i=0;i < 60; i++) {
        std::cout << "i_" << i << ":" << result[i] << " ";
    }

    return 0;
}
1260 次点击
所在节点    C++
1 条回复
zhengken
2022-08-20 19:05:01 +08:00
https://stackoverflow.com/questions/73425423/why-pcre-regex-only-capture-19-groups/73425562#73425562

StackOverFlow 上面有老哥回复我了,这个问题还真没注意到,result 的长度需要 (group 个数 + 1) * 3

> The first two-thirds of the vector is used to pass back captured substrings, each substring using a pair of integers. The remaining third of the vector is used as workspace by pcre_exec() while matching capturing subpatterns, and is not available for passing back information. The number passed in ovecsize should always be a multiple of three. If it is not, it is rounded down.

这是一个专为移动设备优化的页面(即为了让你能够在 Google 搜索结果里秒开这个页面),如果你希望参与 V2EX 社区的讨论,你可以继续到 V2EX 上打开本讨论主题的完整版本。

https://www.v2ex.com/t/874212

V2EX 是创意工作者们的社区,是一个分享自己正在做的有趣事物、交流想法,可以遇见新朋友甚至新机会的地方。

V2EX is a community of developers, designers and creative people.

© 2021 V2EX