❄ 需求分析:
在半监督进行标签标注时候,遇到了“IndexError: index 799 is out of bounds for axis 0 with size 799”这种类型的错误,百思不得其解,在线虚心求教大佬!
[注] :具体是 Billion-scale semi-supervised learning for image classification 大规模图像分类半监督学习 论文中提到的这个项目),该项目的大致流程如下:
第一步是使用带标签的数据训练出一个初始的 teacher 模型 A ;
第二步是使用 teacher 模型 A 在无标签的数据上做预测,对每个类别标签的图像进行排序,挑选最好的 K 个构建新的训练数据集,即伪标签数据集 pseudo-labeled dataset ;
第三步是使用构建的数据集 pre-training 预训练出一个 student 模型 B ;
最后,将训练得到的 student 模型 B 放在最开始的有标签数据上做 fine-tune 微调,来减少潜在的误标签情况。
具体是在进行到第二步的时候,报出了上述的错误信息,全程是严格按照项目来进行的,第一步中的模型已训练完成。分析过后,报错信息定位到在第二步代码文件的 select_top_k()函数中(具体的函数代码已在下方列出,为了方便指导,已将每行代码的行号标出),这个函数的作用是将从 json 文件中提取出的键值对按照 key 元素降序排列,选出前 k 个元素,程序也正是运行到这个地方出现了报错。自己有对该函数的第 14 行代码进行修改,但发现无济于事,应该是没有 get 到真正的错误所在,由于本人知识匮乏,实在不知如何 Debug 该错误信息,在此虚心向大佬请教,先说声谢谢了!
❄ 具体报错信息如下:
Load Model Accuracy: 75.64 Load Model end epoch: 100
class name: apple
image data count: 210
class name: aquarium_fish
image data count: 203
class name: baby
image data count: 151
class name: bear
image data count: 189
...
class name: wolf
image data count: 212
class name: woman
image data count: 199
class name: worm
image data count: 173
Saving.. sampling_dict
label: 39
each label item count: 18779
label: 82
each label item count: 18779
label: 20
each label item count: 18626
label: 0
each label item count: 18340
label: 9
each label item count: 13232
label: 6
each label item count: 5547
label: 3
each label item count: 17344
label: 16
each label item count: 4228
label: 2
each label item count: 17960
label: 24
each label item count: 1880
label: 10
each label item count: 1581
label: 5
each label item count: 14524
label: 4
each label item count: 16767
label: 61
each label item count: 799
Traceback (most recent call last):
File "
make_sample_data_1.py", line 155, in <module>
main(args)
File "
make_sample_data_1.py", line 148, in main
select_top_k(args.k)
File "
make_sample_data_1.py", line 128, in select_top_k
sampled_image_dict["all"].append([all_items[index][0], int(key)])
IndexError: index 799 is out of bounds for axis 0 with size 799
❄ 相关代码片段:
1 def select_top_k(k=1000):
2 sampled_image_dict = {}
3 sampled_image_dict["all"] = []
4 with codecs.open("./sampling_dict.json", "r", encoding="utf-8", errors="ignore") as f:
5 load_data = json.load(f)
6
7 for key in load_data.keys():
8 print("label: ", key)
9 all_items = load_data[key]
10 all_items.sort(key=lambda x: x[1], reverse=True)
11 all_items = np.array(all_items)
12 print("each label item count: ", len(all_items))
13 for index in range(0, k):
14 sampled_image_dict["all"].append([all_items[index][0], int(key)])
15
16 print("Saving.. selected image json")
17 j = json.dumps(sampled_image_dict)
18 with open("selected_image.json", "w") as f:
19 f.write(j)