LLM 模型量化问题 为什么不量化 lm_head ?

42 天前
 daweii

最近在看模型量化的课。

里面在量化下面这个模型的时候说建议不要量化最后的lm_head

CodeGenForCausalLM(
  (transformer): CodeGenModel(
    (wte): Embedding(51200, 1024)
    (drop): Dropout(p=0.0, inplace=False)
    (h): ModuleList(
      (0-19): 20 x CodeGenBlock(
        (ln_1): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (attn): CodeGenAttention(
          (attn_dropout): Dropout(p=0.0, inplace=False)
          (resid_dropout): Dropout(p=0.0, inplace=False)
          (qkv_proj): W8A16LinearLayer()
          (out_proj): W8A16LinearLayer()
        )
        (mlp): CodeGenMLP(
          (fc_in): W8A16LinearLayer()
          (fc_out): W8A16LinearLayer()
          (act): NewGELUActivation()
          (dropout): Dropout(p=0.0, inplace=False)
        )
      )
    )
    (ln_f): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
  )
  (lm_head): Linear(in_features=1024, out_features=51200, bias=True)
)

他说的原文如下:


2:14 And as I said we're not going to quantize the language model head

2:18 because since the model is an autoregressive model, it uses 

2:22 the output from the previous iteration to get the output of the next iteration.

2:27 If you quantize the language model head, a lot of errors might 

2:31 might be accumulating over the generation steps.

2:34 And you will most likely end up, having some gibberish after some tokens.

没看懂他说的理由,为什么量化 lm_head 会积累错误?有大佬能简单易懂的解释一下吗?

课程网页如下: https://learn.deeplearning.ai/courses/quantization-in-depth/lesson/12/quantize-any-open-source-pytorch-model

441 次点击
所在节点    机器学习
0 条回复

这是一个专为移动设备优化的页面(即为了让你能够在 Google 搜索结果里秒开这个页面),如果你希望参与 V2EX 社区的讨论,你可以继续到 V2EX 上打开本讨论主题的完整版本。

https://www.v2ex.com/t/1038697

V2EX 是创意工作者们的社区,是一个分享自己正在做的有趣事物、交流想法,可以遇见新朋友甚至新机会的地方。

V2EX is a community of developers, designers and creative people.

© 2021 V2EX