开新帖求教 pandas 大拿，关于 groupby 和 cumsum 和 rolling

原贴： https://www.v2ex.com/t/760567 求助问题：对每个 A 列里面的值，当 C 列为 False 时候，D 列为 0，当 C 列为 True 时候，D 列为上一个 True 之后的第一个 False 到当前行的 B 列总和。

更改了一下数据，更加接近原始数据

df = pd.DataFrame([['S1', 10, False], ['S1', 10, True],
    ['S2', 20, False], ['S2', 10, False], ['S2', 10, True],
    ['S3', 200, False], ['S3', 100, False], ['S3', 100, True]],
    columns=list('ABC'))
print(df)
    A    B      C
0  S1   10  False
1  S1   10   True
2  S2   20  False
3  S2   10  False
4  S2   10   True
5  S3  200  False
6  S3  100  False
7  S3  100   True

用 for 循环来切片然后再处理，能得到希望的结果：

codes = df.A.unique()
dfs = []
for code in codes:
    subdf = df[df.A == code].reset_index()
    slices = subdf[subdf.C].index
    slices = slices.insert(0, -1)
    for i in range(len(slices) - 1):
        tempdf = subdf.loc[slices[i]+1: slices[i+1]].copy()
        tempdf['D'] = np.where(tempdf.C, tempdf.groupby('A').B.sum(), 0)
        dfs.append(tempdf)
df_with_d = pd.concat(dfs).reset_index()
print(df_with_d[list('ABCD')])
    A    B      C    D
0  S1   10  False    0
1  S1   10   True   20
2  S2   20  False    0
3  S2   10  False    0
4  S2   10   True   40
5  S3  200  False    0
6  S3  100  False    0
7  S3  100   True  400

觉得效率不高，求更有效的方法！

按原贴 @necomancer 的方法

df['D'] = np.where(df.C, df.groupby(df.C.eq(False).cumsum()).B.cumsum(), 0)
print(df)
    A    B      C    D
0  S1   10  False    0
1  S1   10   True   20
2  S2   20  False    0
3  S2   10  False    0
4  S2   10   True   20
5  S3  200  False    0
6  S3  100  False    0
7  S3  100   True  200

第 4 行 D 列的结果不对，应该是 40 (20+10+10)，第 7 行 D 列应该是 400

按 @cassidyhere 的方法

class CustomIndexer(BaseIndexer):
    def get_window_bounds(self, num_values, min_periods, center, closed):
        start = np.empty(num_values, dtype=np.int64)
        end = np.empty(num_values, dtype=np.int64)
        for i in range(num_values):
            end[i] = i + 1
            j = i
            while j > 0 and self.use_expanding[j]:
                j -= 1
                start[i] = j
        return start, end
    
window_size = df.C.groupby((df.C != df.C.shift(1)).cumsum()).agg('sum').max() # 最大连续次数
indexer = CustomIndexer(window_size=window_size, use_expanding=df.C)
df['D'] = np.where(df.C, df.B.rolling(indexer, min_periods=2).sum().fillna(0), 0)
print(df)
    A    B      C      D
0  S1   10  False    0.0
1  S1   10   True   20.0
2  S2   20  False    0.0
3  S2   10  False    0.0
4  S2   10   True   20.0
5  S3  200  False    0.0
6  S3  100  False    0.0
7  S3  100   True  200.0

也是有同样的问题