真心求助帖，请问大佬“IndexError: index 799 is out of bounds for axis 0 with size 799”这种类型错误如何解决？

V2EX = way to explore

V2EX 是一个关于分享和探索的地方

已注册用户请登录

这是一个创建于 1469 天前的主题，其中的信息可能已经有所发展或是发生改变。

❄ 需求分析：
在半监督进行标签标注时候，遇到了“IndexError: index 799 is out of bounds for axis 0 with size 799”这种类型的错误，百思不得其解，在线虚心求教大佬！

[注] ：具体是 Billion-scale semi-supervised learning for image classification 大规模图像分类半监督学习论文中提到的这个项目），该项目的大致流程如下：
第一步是使用带标签的数据训练出一个初始的 teacher 模型 A ；
第二步是使用 teacher 模型 A 在无标签的数据上做预测，对每个类别标签的图像进行排序，挑选最好的 K 个构建新的训练数据集，即伪标签数据集 pseudo-labeled dataset ；
第三步是使用构建的数据集 pre-training 预训练出一个 student 模型 B ；
最后，将训练得到的 student 模型 B 放在最开始的有标签数据上做 fine-tune 微调，来减少潜在的误标签情况。

具体是在进行到第二步的时候，报出了上述的错误信息，全程是严格按照项目来进行的，第一步中的模型已训练完成。分析过后，报错信息定位到在第二步代码文件的 select_top_k()函数中（具体的函数代码已在下方列出，为了方便指导，已将每行代码的行号标出），这个函数的作用是将从 json 文件中提取出的键值对按照 key 元素降序排列，选出前 k 个元素，程序也正是运行到这个地方出现了报错。自己有对该函数的第 14 行代码进行修改，但发现无济于事，应该是没有 get 到真正的错误所在，由于本人知识匮乏，实在不知如何 Debug 该错误信息，在此虚心向大佬请教，先说声谢谢了！

❄ 具体报错信息如下：
Load Model Accuracy: 75.64 Load Model end epoch: 100
class name: apple
image data count: 210
class name: aquarium_fish
image data count: 203
class name: baby
image data count: 151
class name: bear
image data count: 189
...
class name: wolf
image data count: 212
class name: woman
image data count: 199
class name: worm
image data count: 173
Saving.. sampling_dict
label: 39
each label item count: 18779
label: 82
each label item count: 18779
label: 20
each label item count: 18626
label: 0
each label item count: 18340
label: 9
each label item count: 13232
label: 6
each label item count: 5547
label: 3
each label item count: 17344
label: 16
each label item count: 4228
label: 2
each label item count: 17960
label: 24
each label item count: 1880
label: 10
each label item count: 1581
label: 5
each label item count: 14524
label: 4
each label item count: 16767
label: 61
each label item count: 799
Traceback (most recent call last):
File "make_sample_data_1.py", line 155, in <module>
main(args)
File "make_sample_data_1.py", line 148, in main
select_top_k(args.k)
File "make_sample_data_1.py", line 128, in select_top_k
sampled_image_dict["all"].append([all_items[index][0], int(key)])
IndexError: index 799 is out of bounds for axis 0 with size 799

❄ 相关代码片段：
1 def select_top_k(k=1000):
2 sampled_image_dict = {}
3 sampled_image_dict["all"] = []
4 with codecs.open("./sampling_dict.json", "r", encoding="utf-8", errors="ignore") as f:
5 load_data = json.load(f)
6
7 for key in load_data.keys():
8 print("label: ", key)
9 all_items = load_data[key]
10 all_items.sort(key=lambda x: x[1], reverse=True)
11 all_items = np.array(all_items)
12 print("each label item count: ", len(all_items))
13 for index in range(0, k):
14 sampled_image_dict["all"].append([all_items[index][0], int(key)])
15
16 print("Saving.. selected image json")
17 j = json.dumps(sampled_image_dict)
18 with open("selected_image.json", "w") as f:
19 f.write(j)

目前尚无回复

Label count item each