请教各位 V 友一个问题,本人需要增量处理一些大型的 XML 文件,从 python-cookbook 上找到了代码,我改到了我的场景下,但是代码似乎没有正常工作,内存占用上升很快,大约处理十几万行会占用几个 g 内存,我不太理解,希望大神指点,主要逻辑代码如下
macOS BigSur
python 3.8.12
from xml.etree.ElementTree import iterparse
def parse_and_remove(filename, path):
    path_parts = path.split('/')
    doc = iterparse(filename, ('start', 'end'))
    # Skip the root element
    next(doc)
    tag_stack = []
    elem_stack = []
    for event, elem in doc:
        if event == 'start':
            tag_stack.append(elem.tag)
            elem_stack.append(elem)
        elif event == 'end':
            if tag_stack == path_parts:
                yield elem
                elem_stack[-2].remove(elem)
            try:
                tag_stack.pop()
                elem_stack.pop()
            except IndexError:
                pass
data = parse_and_remove('my.xml','path')
client, table = getMongo()
for pothole in data:
    resDict = {
        # 获取我需要的数据
        } 
    table.insert(resDict)
client.close()
|      12i2Re2PLMaDnghL      2021-11-10 09:46:26 +08:00 1. 尝试换用 lxml 2. 尝试用 xpath 而不是手动 iter 比对 path |