gpt 会不会将用户输入给他的数据用于模型训练呢？

jasonlz

2024-03-28 11:55:51 +08:00

大部分人对 LLM 训练真是一无所知。你和 GPT 的对话里，你的输入是 prompt 数据，GPT 生成的数据不会用作自己训练，prompt 数据更不会用来做 LLM 训练。最多用来做对齐工作，但是用户数据清洗困难度远比各种渠道搜集的语料库难得多。个人认为 GPT 不会用对话数据作为训练，对话数据价值低且使用难度高。

Persimmon08

2024-03-28 15:15:42 +08:00

@jasonlz

openai 在 [Data Controls FAQ]( https://help.openai.com/en/articles/7730893-data-controls-faq) 中提到用户数据用于 improve and train model 相关内容，使用 train 作为关键词在该页面进行搜索，部分内容如下

1. Data controls offer you the ability to turn off chat history and easily choose whether your conversations will be used to train our models.

2. While history is disabled, new conversations won’t be used to train and improve our models

3. ChatGPT, for instance, improves by further training on the conversations people have with it, unless you choose to disable training.

4. Once you opt out, new conversations will not be used to train our models.

jasonlz

2024-03-29 15:31:15 +08:00

@Persimmon08 我只是从理论说明，用对话数据做 LLM 训练不太可能。至于 OpenAI 使用用户数据来干什么，也许他们有其他的用途，比如做一些模型反馈、模型测试、模型对齐。但就以我经验来看也不太可能，除非 OpenAI 有非常牛逼的清洗数据能力，但这点数据对模型能力到底提高还是倒退都不一定。

这是一个专为移动设备优化的页面（即为了让你能够在 Google 搜索结果里秒开这个页面），如果你希望参与 V2EX 社区的讨论，你可以继续到 V2EX 上打开本讨论主题的完整版本。

https://www.v2ex.com/t/1027350

V2EX 是创意工作者们的社区，是一个分享自己正在做的有趣事物、交流想法，可以遇见新朋友甚至新机会的地方。

V2EX is a community of developers, designers and creative people.