一个用 heapy 进行 Python 程序内存调试的问题

本程序现状如下：

(Pdb) h0 = hp.heap()
(Pdb) h0
Partition of a set of 4254865 objects. Total size = 700955416 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0 646764  15 427504800  61 427504800  61 dict (no owner)
     1 3073581  72 210210328  30 637715128  91 unicode
     2 120775   3 17186360   2 654901488  93 list
     3  71772   2 10795432   2 665696920  95 str
     4  37269   1 10594328   2 676291248  96 _sre.SRE_Pattern
     5 124398   3 10406784   1 686698032  98 tuple
     6    721   0  2114200   0 688812232  98 dict of module
     7  83485   2  2003640   0 690815872  99 int
     8  11706   0  1498368   0 692314240  99 types.CodeType
     9  11157   0  1338840   0 693653080  99 function
<653 more rows. Type e.g. '_.more' to view.>

可以看到，dict 占用了大部分内存。然而这道这点儿当然不够，我还要知道“哪些字典”大，比如，哪个模块里创建的，怎么引用到它，之类问题。

难处就在于，这个情景下，嵌套的字典非常多，给一段代码模拟这个场景：

def rand():
    return random.random()


def rand_int(i):
    return random.randint(0, i)


def rdt(max_depth, max_width):
    r = {}

    if max_depth <= 1:
        for i in range(rand_int(max_width)):
            r[rand()] = rand()
    else:
        for i in range(rand_int(max_width)):
            r[rand()] = rdt(rand_int(max_depth) - 1, max_width)

    return r


t0 = rdt(9, 9)
t1 = rdt(7, 14)
t2 = rdt(5, 19)
t3 = rdt(3, 24)

现在，请问我直接创建的四个 t，哪个大？

“大”当然不是指那一个 PyObject 占用的空间大就行了，而是所有直接间接 referents dicts 都加在一起，才有意义。

所以类似这样的解答不行：

>>> (h0[0] - h0[0].referents).byvia
Partition of a set of 5 objects. Total size = 2168 bytes.
 Index  Count   %     Size   % Cumulative  % Referred Via:
     0      1  20     1048  48      1048  48 "['t4']"
     1      1  20      280  13      1328  61 "['t0']"
     2      1  20      280  13      1608  74 "['t1']"
     3      1  20      280  13      1888  87 "['t2']"
     4      1  20      280  13      2168 100 "['t3']"

各位牛人，请教了

nthhdy

2017-08-03 15:46:26 +08:00

多谢 pympler 这个工具推荐
我搜 python memory profile and debug，为什么没搜出这个工具呢，只看到了 line profiler，heapy 啥的

这样搞是一个思路了，很笨的一个思路呵呵，很慢，跑十几分钟才跑出十分之一：

```python
from pympler.asizeof import asizeof

h0 = hp.heap()

ds = h0[0]

# 获取"根"dict set
root_ds = (ds - ds.referrers).byid

# 看一看每个“根 dict ”的 recursive size
sizes = []
for i in range(len(root_ds)):
info = i, asizeof(root_ds[i])
sizes.append(info)

# check size
```