如何让 AI 稳定的输出指定 json 结构

miracleyin

114 天前

取决于模型，有 json 模式的模型输出会稳定不少。
其次，使用 langfun

Latin

114 天前

https://github.com/567-labs/instructor
instructor 也不错

matrix1010

114 天前

现在的闭源模型或者 ollama/vllm 这些基本都支持 structed output 了吧，你直接动态构建一个 schema 传过去不就行了。数组数量不一致你可以转换为 object: {"1": "hello", "2": "what is it", ...}, 强制 structed output 符合这个结构

qieqie

114 天前

现在 vllm, sglang 这些推理端点的结构化输出可以直接用 cfg 状态机制导，屏蔽掉不符合语法规则的 token 输出。

mbeoliero123

114 天前

插楼问下，怎么处理乱码情况，我的 api 好像比较频繁出现乱码

liudewa

114 天前

我司搞的结构化就稳定输出 json 结构就用的提示词让返回包含 json 格式的字符串然后根据关键词截取需要的 json

mercurylanded

114 天前

用 function call ，json 格式作为 input

ruoxie

114 天前

你是一个根据以下 TypeScript 类型定义将用户请求转换为 "PageConfig" 类型的 JSON 对象的服务，并且按照字段的注释进行处理:
```
export type PageConfig = {
filters: {
component: string;
/**
* @description 翻译成英文，驼峰格式
* @type {string}
*/
key: string;
/**
* @description 保持原始内容，不要翻译
* @type {string}
*/
label: string;
/**
* @description 保持原始内容，不要翻译
* @type {string}
*/
placeholder: string;
}[];
columns: {
slot: boolean;
/**
* @description 保持原始内容，不要翻译
* @type {string}
*/
title: string;
/**
* @description 翻译成英文，驼峰格式
* @type {string}
*/
dataIndex: string;
/**
* @description 翻译成英文，驼峰格式
* @type {string}
*/
key: string;
}[];
pagination: {
show: boolean;
page: string;
size: string;
total: string;
};
includeModifyModal: boolean;
fetchName: string;
result: string;
serviceName: string;
};
```
以下是用户请求:
"""
{"filters":[{"component":"range-picker","key":"transactionTime","label":"成交时间"},{"component":"input","key":"planName","label":"提成方案名称","placeholder":"提成方案名称（个人/店组/片区）"}],"columns":[{"slot":false,"title":"成交时间","dataIndex":"成交时间","key":"成交时间"},{"slot":false,"title":"申佣时间","dataIndex":"申佣时间","key":"申佣时间"},{"slot":false,"title":"业绩来源","dataIndex":"业绩来源","key":"业绩来源"},{"slot":false,"title":"所属片区","dataIndex":"所属片区","key":"所属片区"},{"slot":false,"title":"当前组织","dataIndex":"当前组织","key":"当前组织"},{"slot":false,"title":"提成类型","dataIndex":"提成类型","key":"提成类型"},{"slot":false,"title":"员工姓名","dataIndex":"员工姓名","key":"员工姓名"},{"slot":false,"title":"成交编号","dataIndex":"成交编号","key":"成交编号"},{"slot":false,"title":"分成角色","dataIndex":"分成角色","key":"分成角色"},{"slot":false,"title":"本次申佣业绩","dataIndex":"本次申佣业绩","key":"本次申佣业绩"},{"slot":false,"title":"提成","dataIndex":"提成","key":"提成"},{"slot":false,"title":"已算提成业绩","dataIndex":"已算提成业绩","key":"已算提成业绩"},{"slot":false,"title":"当月总提成业绩","dataIndex":"当月总提成业绩","key":"当月总提成业绩"},{"slot":false,"title":"提成方案名称","dataIndex":"提成方案名称","key":"提成方案名称"},{"slot":false,"title":"方案计算类型","dataIndex":"方案计算类型","key":"方案计算类型"}],"pagination":{"show":true,"page":"page","size":"size","total":"result.total"},"includeModifyModal":false,"fetchName":"fetchTableList","result":"[\"result\"][\"records\"]","serviceName":"getTableList"}
"""
The following is the user request translated into a JSON object with 2 spaces of indentation and no properties with the value undefined:

ruoxie

114 天前

https://github.com/microsoft/TypeChat 配合这个对返回的结果进行校验，把错误内容发过去再问一次，不过我用了这么久基本很少出错

yexiaoqiu358

114 天前

https://juejin.cn/post/7491240940126158900 试了一下这个,用 zod 去验证 josn 挺好的

mumbler

114 天前

模型不听话，换更大更贵的模型

rogerer

114 天前

1. prompt ，现在 LLM 在对齐阶段一般做过格式的对齐，所以直接让输出 JSON 效果就会不错，如果还是不行把温度系数调一下；
2. 受限解码，大致思想是要求 LLM 的 output 必须符合某个语法结构，如果不行就重新采样。好处是可以保证一定不会出现格式错误，但是这样做的问题是会影响本身的性能，不推荐；

visper

114 天前

发现有时候模型还是喜欢输出 json 的时候加上```json ```这样的 markdown 格式，即使已经不再输出其他解释文字了。所以后来我直接叫它输出这样的格式，自己再去截取出来了。

darkengine

114 天前

不是提示语的问题，调用 API 的时候需要指定用 Structured Outputs ，并且定义好输出需要的字段。

ningxing

114 天前

用 gpt4 和以上，很听话的，你让他往东它绝对不敢往西

Qinnn

114 天前

可以试试在输出 json 这部分提示词换成英文的。

jingdongkehu

114 天前

后面加一句，如果翻译错了我就把你卸载掉试试

mindsucker

114 天前

限制 AI 的输出格式，会导致 AI 的推理能力下降不少，请谨慎使用

lyxxxh2

114 天前

看文档,应该有说怎么指定的。

yplam

114 天前

openai 的 api 很久之前就可以，一直在用很稳定

如何让 AI 稳定的输出指定 json 结构

需求是：提取 word 文档内容，翻译成多个语种

使用的 prompt 如下：

system prompt

user prompt