Aipex 性能优化：让 AI 更聪明地理解网页

深入探讨 Aipex 在性能优化方面的三大关键举措，揭示其如何通过精细化的技术手段，提升系统效率和用户体验。

在 AI 与网页交互的世界里，性能优化就像给赛车调校引擎一样重要。Aipex 作为连接 AI 模型与浏览器的桥梁，我们深知每一个毫秒的优化都能带来质的飞跃。今天，让我们深入探讨 Aipex 在性能优化方面的三大关键举措，看看我们是如何让 AI 更聪明、更高效地理解网页的。

Aipex 的应用场景：MCP 的研究

关键优化点

1. 使用 CDP ，模拟 Puppeteer 的 interestingOnly 辅助功能树

挑战：在网页自动化测试中，Puppeteer 提供了 interestingOnly 选项来过滤辅助功能树中的非关键节点，但直接使用 Puppeteer 会引入额外的开销和依赖。

优化前（分离的工具调用）：

// 多次分离调用 - 效率低且暴露过多数据
async function getPageDataTraditional() {
  // 调用 1：获取所有页面内容
  const pageContent = await getPageContent();
  // 返回：完整 HTML 、所有样式、所有属性、选择器

  // 调用 2：获取交互元素
  const interactiveElements = await getInteractiveElements();
  // 返回：包含选择器、样式、位置的复杂对象

  // 调用 3：获取页面链接
  const pageLinks = await getPageLinks();
  // 返回：所有链接及其属性

  return {
    content: pageContent,      // ~50KB 数据
    elements: interactiveElements, // ~30KB 数据
    links: pageLinks          // ~15KB 数据
  };
  // 总计：~95KB 数据，敏感信息暴露
}

优化后（直接使用 CDP ）：

// 来自 Aipex 实际代码的真实 CDP 实现
/**
 * 使用 Chrome DevTools Protocol 获取真实的辅助功能树
 * 这是浏览器原生的辅助功能树 - 完全等同于 Puppeteer 的 page.accessibility.snapshot()
 */
async function getRealAccessibilityTree(tabId: number): Promise<AccessibilityTree | null> {
  return new Promise(async (resolve, reject) => {
    console.log('🔍 [DEBUG] 通过 Chrome DevTools Protocol 连接到标签页:', tabId);

    // 安全地附加调试器到标签页
    const attached = await safeAttachDebugger(tabId);
    if (!attached) {
      reject(new Error('Failed to attach debugger'));
      return;
    }

    // 步骤 1：启用辅助功能域 - 为了一致的 AXNodeIds 所必需
    chrome.debugger.sendCommand({ tabId }, "Accessibility.enable", {}, () => {
      if (chrome.runtime.lastError) {
        console.error('❌ [DEBUG] 启用辅助功能域失败:', chrome.runtime.lastError.message);
        safeDetachDebugger(tabId);
        reject(new Error(`Failed to enable Accessibility domain: ${chrome.runtime.lastError.message}`));
        return;
      }

      console.log('✅ [DEBUG] 辅助功能域已启用');

      // 步骤 2：获取完整的辅助功能树
      // 这与 Puppeteer 的 page.accessibility.snapshot()相同
      chrome.debugger.sendCommand({ tabId }, "Accessibility.getFullAXTree", {
        // depth: undefined - 获取完整树（不仅仅是顶层）
        // frameId: undefined - 获取主框架
      }, (result: any) => {
        if (chrome.runtime.lastError) {
          console.error('❌ [DEBUG] 获取辅助功能树失败:', chrome.runtime.lastError.message);
          // 在分离前禁用辅助功能
          chrome.debugger.sendCommand({ tabId }, "Accessibility.disable", {}, () => {
            safeDetachDebugger(tabId);
          });
          reject(new Error(`Failed to get accessibility tree: ${chrome.runtime.lastError.message}`));
          return;
        }

        console.log('✅ [DEBUG] 获取到包含', result.nodes?.length || 0, '个节点的辅助功能树');

        // 步骤 3：禁用辅助功能并分离调试器
        chrome.debugger.sendCommand({ tabId }, "Accessibility.disable", {}, () => {
          // 添加小延迟确保辅助功能被正确禁用
          setTimeout(() => {
            safeDetachDebugger(tabId);
          }, 100);
        });

        resolve(result);
      });
    });
  });
}

性能对比：

优化前：通过调用不同的 tool 才能理解页面
优化后：~200-300 毫秒直接获取 CDP 辅助功能树，包装在一个 tool 中
内存使用：减少 70%（无 Puppeteer 进程，直接 CDP 访问）
数据大小：减少 85%（仅通过双通道过滤保留"有趣"节点）
关键创新：直接 CDP Accessibility.getFullAXTree + 自定义 interestingOnly 过滤

好处：

缩小辅助功能树：通过仅保留"有趣"的节点（标题、地标、表单控件），减少了辅助功能树的规模，提高了处理效率
降低资源消耗：避免了加载和运行 Puppeteer 的开销，节省了系统资源
提升灵活性：自定义实现使得 Aipex 能够根据特定需求调整过滤逻辑

2. 基于快照的 UI 操作：无需调试器依赖的可靠元素交互

挑战：传统 UI 自动化依赖 CSS 选择器或 XPath ，这些方法脆弱且容易在页面结构变化时失效。Aipex 的快照系统创建稳定的 UID 到元素映射，实现可靠的 UI 操作。

3. 智能快照去重：仅向 AI 发送最新快照

挑战：在 AI 对话中，可能会发生多次 take_snapshot 调用，但 AI 模型只需要最新的快照。发送所有快照会浪费 token 并让 AI 因过时的页面状态而困惑。

优化前（所有快照都发送给 AI ）：

// 低效方法 - 所有快照都发送给 AI
async function runChatWithTools(userMessages: any[], messageId?: string) {
  let messages = [systemPrompt, ...userMessages]

  // AI 在对话过程中多次调用 take_snapshot
  // 每个快照都被添加到对话历史中
  while (hasToolCalls) {
    for (const toolCall of toolCalls) {
      if (toolCall.name === 'take_snapshot') {
        const result = await executeToolCall(toolCall.name, toolCall.args)

        // 问题：每个快照都被添加到对话中
        messages.push({
          role: 'tool',
          name: 'take_snapshot',
          content: JSON.stringify(result) // 完整快照数据
        })
      }
    }
  }

  // AI 接收到所有快照 - 浪费 token 并造成困惑
  return messages; // 包含同一页面的多个快照
}

优化后（智能去重 - 仅最新快照）：

// 来自 Aipex 实际实现的优化方法
async function runChatWithTools(userMessages: any[], messageId?: string) {
  let messages = [systemPrompt, ...userMessages]

  while (hasToolCalls) {
    for (const toolCall of toolCalls) {
      if (toolCall.name === 'take_snapshot') {
        const result = await executeToolCall(toolCall.name, toolCall.args)

        // 将当前快照添加到对话中
        messages.push({
          role: 'tool',
          name: 'take_snapshot',
          content: JSON.stringify(result)
        })

        // 关键：实现智能去重
        if (toolCall.name === 'take_snapshot') {
          const currentTabUrl = result.data?.url || result.url || '';
          const currentSnapshotId = result.data?.snapshotId || result.snapshotId || '';

          // 将所有之前的 take_snapshot 结果替换为假结果
          // 这确保只有最新的快照是真实的，所有之前的都被视为重复调用
          let replacedCount = 0;

          // 反向遍历消息以找到所有之前的真实快照
          for (let i = messages.length - 1; i >= 0; i--) {
            const msg = messages[i];
            if (msg.role === 'tool' && msg.name === 'take_snapshot') {
              try {
                const content = JSON.parse(msg.content);
                const existingUrl = content.data?.url || content.url || '';
                const existingSnapshotId = content.data?.snapshotId || content.snapshotId || '';

                if (!content.skipped) {
                  // 将此真实快照替换为假结果
                  replacedCount++;
                  messages[i] = {
                    ...msg,
                    content: JSON.stringify({
                      skipped: true,
                      reason: "replaced_by_later_snapshot",
                      url: existingUrl,
                      originalSnapshotId: existingSnapshotId,
                      message: "此快照被后续快照替换（重复调用）"
                    })
                  };
                }
              } catch {
                // 解析失败则保留
              }
            }
          }

          if (replacedCount > 0) {
            console.log(`🔄 [快照去重] 将${replacedCount}个之前的快照替换为假结果`);
            console.log(`🔄 [快照去重] 保留最新快照 - URL: ${currentTabUrl}, ID: ${currentSnapshotId}`);
          }
        }
      }
    }
  }

  // AI 只接收到最新快照 - 节省 token 并防止困惑
  return messages; // 只包含最新的快照
}

// 全局快照存储 - 只有一个当前快照
let currentSnapshot: TextSnapshot | null = null;

export async function takeSnapshot(): Promise<{
  success: boolean;
  snapshotId: string;
  snapshot: string;
  title: string;
  url: string;
  message?: string;
}> {
  const [tab] = await chrome.tabs.query({ active: true, currentWindow: true })
  if (!tab || typeof tab.id !== "number") return {
    success: false,
    snapshotId: '',
    snapshot: '',
    title: '',
    url: '',
    message: '未找到活动标签页'
  }

  // 从浏览器获取辅助功能树
  const accessibilityTree = await getRealAccessibilityTree(tab.id);

  if (!accessibilityTree || !accessibilityTree.nodes) {
    return {
      success: false,
      snapshotId: '',
      snapshot: '',
      title: tab.title || '',
      url: tab.url || '',
      message: "获取辅助功能树失败"
    }
  }

  // 生成唯一快照 ID
  const snapshotId = `snapshot_${Date.now()}_${Math.random().toString(36).substr(2, 9)}`;

  // 转换为快照格式
  const root = convertAccessibilityTreeToSnapshot(accessibilityTree, snapshotId);

  if (!root) {
    return {
      success: false,
      snapshotId: '',
      snapshot: '',
      title: tab.title || '',
      url: tab.url || '',
      message: "转换辅助功能树失败"
    }
  }

  // 全局存储快照 - 替换之前的快照
  currentSnapshot = {
    root,
    idToNode,
    snapshotId
  };

  // 格式化为 AI 消费的文本
  const snapshotText = formatSnapshotAsText(root);

  return {
    success: true,
    snapshotId,
    snapshot: snapshotText,
    title: tab.title || '',
    url: tab.url || '',
    message: "快照拍摄成功"
  }
}

三个优化下来

Token 使用: 整体优化了 60-90%的 llm token ，并且任务越复杂节省的越多
防止 AI 困惑：AI 不会因同一页面的多个快照而困惑
提高响应质量：AI 专注于当前页面状态，而不是过时信息
降低成本

目前尚无回复

AIPex 性能优化 cdp

个人团队也能做 Comet, 揭秘如何快速且准确地做浏览器自动化