<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/">
    <channel>
        <title>一个普通前端</title>
        <link>https://wangguanxi.space/</link>
        <description>保持热爱，发现生活拥抱生活</description>
        <lastBuildDate>Sun, 01 Mar 2026 20:20:00 GMT</lastBuildDate>
        <docs>https://validator.w3.org/feed/docs/rss2.html</docs>
        <generator>https://github.com/jpmonette/feed</generator>
        <language>en-US</language>
        <copyright>All rights reserved 2026, guderain</copyright>
        <item>
            <title><![CDATA[tools概述]]></title>
            <link>https://wangguanxi.space/article/3152b727-a3a3-8074-9f64-d1dbdbd10f5a</link>
            <guid>https://wangguanxi.space/article/3152b727-a3a3-8074-9f64-d1dbdbd10f5a</guid>
            <pubDate>Sat, 28 Feb 2026 00:00:00 GMT</pubDate>
            <content:encoded><![CDATA[<div id="notion-article" class="mx-auto overflow-hidden "><main class="notion light-mode notion-page notion-block-3152b727a3a380749f64d1dbdbd10f5a"><div class="notion-viewport"></div><div class="notion-collection-page-properties"></div><div class="notion-text notion-block-3152b727a3a380b1aefdc17d3b94a71c">
要构建更强大的AI工程应用，只有生成文本这样的&quot;纸上谈兵&quot;能力自然是不够的。工具Tools不仅仅是&quot;肢体&quot;的延伸，更是为&quot;大脑&quot;插上了想象力的&quot;翅膀&quot;。借助工具，才能让AI应用的能力真正具备无限的</div><div class="notion-text notion-block-3152b727a3a380bba556e32c1fa077c1">可能，才能从&quot;认识世界&quot;走向&quot;改变世界&quot;。</div><div class="notion-text notion-block-3152b727a3a38042a6a4e177640a8b54">Tools 用于扩展大语言模型(LLM)的能力，使其能够与外部系统、API或自定义函数交互，从而完成仅靠文本生成无法实现的任务(如搜索、计算、数据库查询等)。</div><div class="notion-text notion-block-3152b727a3a38040acead34d6e94bd70">
特点:</div><ul class="notion-list notion-list-disc notion-block-3152b727a3a38002a3e6cc9e6e7346d6"><li>增强LLM的功能:让LLM突破纯文本生成的限制，执行实际操作(如调用搜索引擎、查询数据库、运行代码等)</li></ul><ul class="notion-list notion-list-disc notion-block-3152b727a3a3806faa56db53b65f5b73"><li>支持智能决策:在Agent 工作流中，LLM根据用户输入动态选择最合适的Tool完成任务。.模块化设计:每个Tool专注一个功能，便于复用和组合(例如:搜索工具+计算工具+天气查询工具)</li></ul><div class="notion-text notion-block-3152b727a3a380de95e5d4b6c6c62b4f">
Tools 本质上是封装了特定功能的可调用模块,是Agent、Chain或LLM可以用来与世界互动的接口。</div><div class="notion-text notion-block-3152b727a3a38044894afc4409b974e3">Tool通常包含如下几个要素:</div><ul class="notion-list notion-list-disc notion-block-3152b727a3a3802fa55fec6c1eff5772"><li>name:工具的名称</li></ul><ul class="notion-list notion-list-disc notion-block-3152b727a3a3806b9b9efd6db3849fac"><li>description:工具的功能描述</li></ul><ul class="notion-list notion-list-disc notion-block-3152b727a3a380968aaed2a999746d63"><li>该工具输入的JSON模式</li></ul><ul class="notion-list notion-list-disc notion-block-3152b727a3a380f4934de43d11f5a644"><li>要调用的函数</li></ul><ul class="notion-list notion-list-disc notion-block-3152b727a3a38024a6dfe1ed42955419"><li>return_direct:是否应将工具结果直接返回给用户(仅对Agent相关)</li></ul><div class="notion-text notion-block-3152b727a3a380288496f90f2c7cca5b">实操步骤:</div><ul class="notion-list notion-list-disc notion-block-3152b727a3a380ecabc0e6a72aecf749"><li>步骤1:将name、description和JSON模式作为上下文提供给LLM</li></ul><ul class="notion-list notion-list-disc notion-block-3152b727a3a38037b287eaf8ab803a32"><li>步骤2:LLM会根据提示词推断出需要调用哪些工具，并提供具体的调用参数信息</li></ul><ul class="notion-list notion-list-disc notion-block-3152b727a3a38038a303ebf41b6da112"><li>步骤3:用户需要根据返回的工具调用信息，自行触发相关工具的回调</li></ul></main></div>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[定义tool的方式]]></title>
            <link>https://wangguanxi.space/article/3152b727-a3a3-8078-9d68-e5047a88fca2</link>
            <guid>https://wangguanxi.space/article/3152b727-a3a3-8078-9d68-e5047a88fca2</guid>
            <pubDate>Sat, 28 Feb 2026 00:00:00 GMT</pubDate>
            <content:encoded><![CDATA[<div id="notion-article" class="mx-auto overflow-hidden "><main class="notion light-mode notion-page notion-block-3152b727a3a380789d68e5047a88fca2"><div class="notion-viewport"></div><div class="notion-collection-page-properties"></div><div class="notion-text notion-block-3152b727a3a380e59246d1c1927faa63">#%% md</div><h3 class="notion-h notion-h2 notion-h-indent-0 notion-block-3152b727a3a3800e95d1c2d1f052ca1b" data-id="3152b727a3a3800e95d1c2d1f052ca1b"><span><div id="3152b727a3a3800e95d1c2d1f052ca1b" class="notion-header-anchor"></div><a class="notion-hash-link" href="#3152b727a3a3800e95d1c2d1f052ca1b" title="创建方式1：手动创建Tool()"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title">创建方式1：手动创建Tool()</span></span></h3><h3 class="notion-h notion-h2 notion-h-indent-0 notion-block-3152b727a3a380578a66e84db36ea22d" data-id="3152b727a3a380578a66e84db36ea22d"><span><div id="3152b727a3a380578a66e84db36ea22d" class="notion-header-anchor"></div><a class="notion-hash-link" href="#3152b727a3a380578a66e84db36ea22d" title="定义tool的方式2：@tool装饰器"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title">定义tool的方式2：@tool装饰器</span></span></h3><div class="notion-text notion-block-3152b727a3a380969181deb05dc4d743">使用@tool装饰器(自定义工具的最简单方式)装饰器默认使用函数名称作为工具名称，但可以通过参数name_or_ca11able来覆盖此设置。同时，装饰器将使用函数的文档字符串作为工具的描述，因此函数必须提供文档字符串。</div><div class="notion-text notion-block-3152b727a3a38086b973fc8bd189548d">
修改args</div><h3 class="notion-h notion-h2 notion-h-indent-0 notion-block-3152b727a3a3804ba2c0da232b715e46" data-id="3152b727a3a3804ba2c0da232b715e46"><span><div id="3152b727a3a3804ba2c0da232b715e46" class="notion-header-anchor"></div><a class="notion-hash-link" href="#3152b727a3a3804ba2c0da232b715e46" title="定义tool的方式3：StructuredTool.from_function()"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title">定义tool的方式3：StructuredTool.from_function()</span></span></h3><div class="notion-text notion-block-3152b727a3a380e0ab63ceef9b3ab45a">修改args</div><div class="notion-blank notion-block-3152b727a3a38026a415df7c6b77b1ed"> </div></main></div>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[MCP]]></title>
            <link>https://wangguanxi.space/article/3152b727-a3a3-801c-9836-d751507f9077</link>
            <guid>https://wangguanxi.space/article/3152b727-a3a3-801c-9836-d751507f9077</guid>
            <pubDate>Sat, 28 Feb 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[• MCP 的核心想法：给大模型一个统一的“工具调用协议”，让模型能通过标准方式去调用各种外部工具，从而扩展能力边界（不只是思考，还能做事）。
• 要解决的问题：如果工具分别用 Java、Python、Rust 写，直接把它们做成某个特定语言的 tool 会受限。MCP 通过统一协议，把“跨语言、跨进程”工具调用标准化。
• MCP 最大特点：跨进程调用工具
    ◦ 本地跨进程：通过 stdio（标准输入输出）与子进程通信
    ◦ 远程跨进程：通过 http 连接远程服务通信
• 消息协议标准：统一使用 JSON-RPC 2.0，优点是与语言无关、结构清晰、易调试、轻量灵活。
• 传输模式演进：
    ◦ Stdio：客户端启动 MCP Server 子进程，用 stdin/stdout 交换 JSON-RPC 消息，并管理子进程生命周期。
    ◦ SSE（旧远程标准）：HTTP POST 发请求 + SSE 长连接收结果，用 sessionId 和 requestId 关联请求响应，属于“伪双工”。
    ◦ Streamable HTTP（新标准，2025-03-26 起）：用更统一、健壮的方式替代 HTTP+SSE，减少双连接维护问题，并增强 session 管理（例如返回并使用 Mcp-Session-Id，支持终止会话等），更利于扩展和部署。]]></description>
            <content:encoded><![CDATA[<div id="notion-article" class="mx-auto overflow-hidden "><main class="notion light-mode notion-page notion-block-3152b727a3a3801c9836d751507f9077"><div class="notion-viewport"></div><div class="notion-collection-page-properties"><div class="notion-collection-row"><div class="notion-collection-row-body"><div class="notion-collection-row-property"><div class="notion-collection-column-title"><svg viewBox="0 0 14 14" class="notion-collection-column-title-icon"><path d="M7 13A6 6 0 107 1a6 6 0 000 12zM3.751 5.323A.2.2 0 013.909 5h6.182a.2.2 0 01.158.323L7.158 9.297a.2.2 0 01-.316 0L3.751 5.323z"></path></svg><div class="notion-collection-column-title-body">type</div></div><div class="notion-collection-row-value"><span class="notion-property notion-property-select"><div class="notion-property-select-item notion-item-purple">Post</div></span></div></div><div class="notion-collection-row-property"><div class="notion-collection-column-title"><svg viewBox="0 0 14 14" class="notion-collection-column-title-icon"><path d="M7 13A6 6 0 107 1a6 6 0 000 12zM3.751 5.323A.2.2 0 013.909 5h6.182a.2.2 0 01.158.323L7.158 9.297a.2.2 0 01-.316 0L3.751 5.323z"></path></svg><div class="notion-collection-column-title-body">status</div></div><div class="notion-collection-row-value"><span class="notion-property notion-property-select"><div class="notion-property-select-item notion-item-red">Published</div></span></div></div><div class="notion-collection-row-property"><div class="notion-collection-column-title"><svg viewBox="0 0 14 14" class="notion-collection-column-title-icon"><path d="M10.889 5.5H3.11v1.556h7.778V5.5zm1.555-4.444h-.777V0H10.11v1.056H3.89V0H2.333v1.056h-.777c-.864 0-1.548.7-1.548 1.555L0 12.5c0 .856.692 1.5 1.556 1.5h10.888C13.3 14 14 13.356 14 12.5V2.611c0-.855-.7-1.555-1.556-1.555zm0 11.444H1.556V3.944h10.888V12.5zM8.556 8.611H3.11v1.556h5.445V8.61z"></path></svg><div class="notion-collection-column-title-body">date</div></div><div class="notion-collection-row-value"><span class="notion-property notion-property-date"></span></div></div><div class="notion-collection-row-property"><div class="notion-collection-column-title"><svg viewBox="0 0 14 14" class="notion-collection-column-title-icon"><path d="M7 4.568a.5.5 0 00-.5-.5h-6a.5.5 0 00-.5.5v1.046a.5.5 0 00.5.5h6a.5.5 0 00.5-.5V4.568zM.5 1a.5.5 0 00-.5.5v1.045a.5.5 0 00.5.5h12a.5.5 0 00.5-.5V1.5a.5.5 0 00-.5-.5H.5zM0 8.682a.5.5 0 00.5.5h11a.5.5 0 00.5-.5V7.636a.5.5 0 00-.5-.5H.5a.5.5 0 00-.5.5v1.046zm0 3.068a.5.5 0 00.5.5h9a.5.5 0 00.5-.5v-1.045a.5.5 0 00-.5-.5h-9a.5.5 0 00-.5.5v1.045z"></path></svg><div class="notion-collection-column-title-body">slug</div></div><div class="notion-collection-row-value"><span class="notion-property notion-property-text"></span></div></div><div class="notion-collection-row-property"><div class="notion-collection-column-title"><svg viewBox="0 0 14 14" class="notion-collection-column-title-icon"><path d="M7 4.568a.5.5 0 00-.5-.5h-6a.5.5 0 00-.5.5v1.046a.5.5 0 00.5.5h6a.5.5 0 00.5-.5V4.568zM.5 1a.5.5 0 00-.5.5v1.045a.5.5 0 00.5.5h12a.5.5 0 00.5-.5V1.5a.5.5 0 00-.5-.5H.5zM0 8.682a.5.5 0 00.5.5h11a.5.5 0 00.5-.5V7.636a.5.5 0 00-.5-.5H.5a.5.5 0 00-.5.5v1.046zm0 3.068a.5.5 0 00.5.5h9a.5.5 0 00.5-.5v-1.045a.5.5 0 00-.5-.5h-9a.5.5 0 00-.5.5v1.045z"></path></svg><div class="notion-collection-column-title-body">summary</div></div><div class="notion-collection-row-value"><span class="notion-property notion-property-text">• <b>MCP 的核心想法</b>：给大模型一个统一的“工具调用协议”，让模型能通过标准方式去调用各种外部工具，从而扩展能力边界（不只是思考，还能做事）。
• <b>要解决的问题</b>：如果工具分别用 Java、Python、Rust 写，直接把它们做成某个特定语言的 tool 会受限。MCP 通过统一协议，把“跨语言、跨进程”工具调用标准化。
• <b>MCP 最大特点</b>：<b>跨进程调用工具</b>
    ◦ 本地跨进程：通过 <b>stdio</b>（标准输入输出）与子进程通信
    ◦ 远程跨进程：通过 <b>http</b> 连接远程服务通信
• <b>消息协议标准</b>：统一使用 <b>JSON-RPC 2.0</b>，优点是与语言无关、结构清晰、易调试、轻量灵活。
• <b>传输模式演进</b>：
    ◦ <b>Stdio</b>：客户端启动 MCP Server 子进程，用 stdin/stdout 交换 JSON-RPC 消息，并管理子进程生命周期。
    ◦ <b>SSE（旧远程标准）</b>：HTTP POST 发请求 + SSE 长连接收结果，用 sessionId 和 requestId 关联请求响应，属于“伪双工”。
    ◦ <b>Streamable HTTP（新标准，2025-03-26 起）</b>：用更统一、健壮的方式替代 HTTP+SSE，减少双连接维护问题，并增强 session 管理（例如返回并使用 Mcp-Session-Id，支持终止会话等），更利于扩展和部署。</span></div></div><div class="notion-collection-row-property"><div class="notion-collection-column-title"><svg viewBox="0 0 14 14" class="notion-collection-column-title-icon"><path d="M4 3a1 1 0 011-1h7a1 1 0 110 2H5a1 1 0 01-1-1zm0 4a1 1 0 011-1h7a1 1 0 110 2H5a1 1 0 01-1-1zm0 4a1 1 0 011-1h7a1 1 0 110 2H5a1 1 0 01-1-1zM2 4a1 1 0 110-2 1 1 0 010 2zm0 4a1 1 0 110-2 1 1 0 010 2zm0 4a1 1 0 110-2 1 1 0 010 2z"></path></svg><div class="notion-collection-column-title-body">tags</div></div><div class="notion-collection-row-value"><span class="notion-property notion-property-multi_select"><div class="notion-property-multi_select-item notion-item-brown">人工智能</div><div class="notion-property-multi_select-item notion-item-red">推荐</div></span></div></div><div class="notion-collection-row-property"><div class="notion-collection-column-title"><svg viewBox="0 0 14 14" class="notion-collection-column-title-icon"><path d="M7 13A6 6 0 107 1a6 6 0 000 12zM3.751 5.323A.2.2 0 013.909 5h6.182a.2.2 0 01.158.323L7.158 9.297a.2.2 0 01-.316 0L3.751 5.323z"></path></svg><div class="notion-collection-column-title-body">category</div></div><div class="notion-collection-row-value"><span class="notion-property notion-property-select"><div class="notion-property-select-item notion-item-purple">技术分享</div></span></div></div><div class="notion-collection-row-property"><div class="notion-collection-column-title"><svg viewBox="0 0 14 14" class="notion-collection-column-title-icon"><path d="M7 4.568a.5.5 0 00-.5-.5h-6a.5.5 0 00-.5.5v1.046a.5.5 0 00.5.5h6a.5.5 0 00.5-.5V4.568zM.5 1a.5.5 0 00-.5.5v1.045a.5.5 0 00.5.5h12a.5.5 0 00.5-.5V1.5a.5.5 0 00-.5-.5H.5zM0 8.682a.5.5 0 00.5.5h11a.5.5 0 00.5-.5V7.636a.5.5 0 00-.5-.5H.5a.5.5 0 00-.5.5v1.046zm0 3.068a.5.5 0 00.5.5h9a.5.5 0 00.5-.5v-1.045a.5.5 0 00-.5-.5h-9a.5.5 0 00-.5.5v1.045z"></path></svg><div class="notion-collection-column-title-body">icon</div></div><div class="notion-collection-row-value"><span class="notion-property notion-property-text"></span></div></div><div class="notion-collection-row-property"><div class="notion-collection-column-title"><svg viewBox="0 0 14 14" class="notion-collection-column-title-icon"><path d="M7 4.568a.5.5 0 00-.5-.5h-6a.5.5 0 00-.5.5v1.046a.5.5 0 00.5.5h6a.5.5 0 00.5-.5V4.568zM.5 1a.5.5 0 00-.5.5v1.045a.5.5 0 00.5.5h12a.5.5 0 00.5-.5V1.5a.5.5 0 00-.5-.5H.5zM0 8.682a.5.5 0 00.5.5h11a.5.5 0 00.5-.5V7.636a.5.5 0 00-.5-.5H.5a.5.5 0 00-.5.5v1.046zm0 3.068a.5.5 0 00.5.5h9a.5.5 0 00.5-.5v-1.045a.5.5 0 00-.5-.5h-9a.5.5 0 00-.5.5v1.045z"></path></svg><div class="notion-collection-column-title-body">password</div></div><div class="notion-collection-row-value"><span class="notion-property notion-property-text"></span></div></div></div></div></div><div class="notion-text notion-block-3152b727a3a3802bbd5de8f1f595aadd">#%% md</div><h3 class="notion-h notion-h2 notion-h-indent-0 notion-block-3152b727a3a380cb8634f5b0bb97ac9e" data-id="3152b727a3a380cb8634f5b0bb97ac9e"><span><div id="3152b727a3a380cb8634f5b0bb97ac9e" class="notion-header-anchor"></div><a class="notion-hash-link" href="#3152b727a3a380cb8634f5b0bb97ac9e" title="MCP的基本原理"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title">MCP的基本原理</span></span></h3><div class="notion-text notion-block-3152b727a3a38095b913f19e022cfac6">我们实现的 tool 怎么调用、参数是什么都是大模型自己决定的。</div><div class="notion-text notion-block-3152b727a3a380eeb9e2f57b9255d2ea">tool 给大模型扩展了做事情的能力，本来它只能思考，不能做事情，但是现在可以自己调用 tool 来帮你做事情了。</div><div class="notion-text notion-block-3152b727a3a380b7b17fd56bd8e2f1e0">但你有没有发现 tool 有个问题：node 写的 ai agent 的代码，你的 tool 也得是 node 写。如果你之前有一些工具是 java、python、rust 写的呢？你想封装成 tool 怎么办呢？</div><div class="notion-text notion-block-3152b727a3a380b19913f97b70893d4e">有的人说：现在不是可以执行命令么，通过单独进程把这些其他语言写的代码跑一下就行啊。</div><div class="notion-text notion-block-3152b727a3a38063930acc298d4903c2">确实，也就是这样：这里的 stdio 就是标准输入输出流，也就是键盘输入、控制台输出。当你进程跑一个子进程，就可以用这种方式通信。</div><div class="notion-text notion-block-3152b727a3a3805ea8a1eec507517422">还有的人说：简单，用 http 啊！本地跑个服务就好了。也就是这样：现在是解决了跨语言调用工具的问题。</div><div class="notion-text notion-block-3152b727a3a38077a0c6d05cbcd7ee9d">那如果每个人都这样搞，它们提供的服务都不一样，我想接入别的 tool，是不是要了解每个服务都是怎么定义的呢？</div><div class="notion-text notion-block-3152b727a3a38088a99ac9414b883f60">能不能定义一个统一的通信协议，我们都按照这个格式来沟通，这样所有的跨进程工具调用就都可以接入了。</div><div class="notion-text notion-block-3152b727a3a38008b190c2700cf0490c">也就是这样：想跨进程调用某个工具，通过这个协议通信就行。不管是本地工具，直接跑那个进程，然后 stdio 通信。还是远程工具，通过 http 连接远程服务进程。</div><div class="notion-text notion-block-3152b727a3a38073b44dec417178338a">这个协议叫什么呢？是给 Model 扩展 Context 上下文，让它能做的更多，知道的更多的 Protocal 协议。就叫 MCP 吧。恭喜你，你发明了 MCP！</div><h3 class="notion-h notion-h2 notion-h-indent-0 notion-block-3152b727a3a3805db6bec4ad69ff4401" data-id="3152b727a3a3805db6bec4ad69ff4401"><span><div id="3152b727a3a3805db6bec4ad69ff4401" class="notion-header-anchor"></div><a class="notion-hash-link" href="#3152b727a3a3805db6bec4ad69ff4401" title="MCP 最大的特点就是可以跨进程调用工具。"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title">MCP 最大的特点就是可以跨进程调用工具。</span></span></h3><ul class="notion-list notion-list-disc notion-block-3152b727a3a3800bb55ddb4fcdbe8958"><li>跨本地的进程调用，就是用 stdio。</li></ul><ul class="notion-list notion-list-disc notion-block-3152b727a3a3806083a4d6eec2a85386"><li>跨远程的进程调用，就是用 http。</li></ul><div class="notion-blank notion-block-3152b727a3a38005bbdac5aa8f7ddc40"> </div><div class="notion-text notion-block-3152b727a3a38095ae8af2d3f8f2a24d">你的 AI Agent 就是 MCP 客户端，可以通过 MCP 协议调用各种 MCP Server，实现跨进程的工具调用。</div><div class="notion-text notion-block-3152b727a3a38010902adb5335e0531a">当然，在 langchain 里，它也是 tool ，只不过是 tool 的一种而已：你在 tool 的函数里，调用下 MCP Client，访问下远程 Mcp Server，它本质上还是 tool，但是却集成了 MCP 工具。</div><h3 class="notion-h notion-h2 notion-h-indent-0 notion-block-3152b727a3a380f99239d7b98b0febec" data-id="3152b727a3a380f99239d7b98b0febec"><span><div id="3152b727a3a380f99239d7b98b0febec" class="notion-header-anchor"></div><a class="notion-hash-link" href="#3152b727a3a380f99239d7b98b0febec" title="MCP传输模式与核心架构深度剖析"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title">MCP传输模式与核心架构深度剖析</span></span></h3><div class="notion-text notion-block-3152b727a3a3808d80b7d3806c752647">1.消息协议:JSON-RPC2.0</div><div class="notion-text notion-block-3152b727a3a3808895dad63cfa2cfa32">在MCP中规定了唯一的标准消息格式，就是JSON-RPC2.0JSON-RPC2.0是一种轻量级的、用于远程过程调用(RPC)的消息交换协议，使用JSON作为数据格式注意:它不是一个底层通信协议，只是一个应用层的消息格式标准。这种消息协议的好处，与语言无关(还有语言不支持JSON吗)、简单易用(结构简单，天然可读，易于调试)、轻量灵活(可以适配各种传输方</div><div class="notion-text notion-block-3152b727a3a38016ac5adbc4b76f1637">2.三种传输模式</div><div class="notion-text notion-block-3152b727a3a380c38ba2fe5e219fdb1c">MCP提供了三种不同的传输实现。</div><div class="notion-text notion-block-3152b727a3a380808a2ac6c33af5f348">默认传输方式:基于Stdio、SSE、Streamable HTTP</div><div class="notion-blank notion-block-3152b727a3a3806c9a96f844683755a1"> </div><h4 class="notion-h notion-h3 notion-h-indent-1 notion-block-3152b727a3a3801ca113f91d2b706a43" data-id="3152b727a3a3801ca113f91d2b706a43"><span><div id="3152b727a3a3801ca113f91d2b706a43" class="notion-header-anchor"></div><a class="notion-hash-link" href="#3152b727a3a3801ca113f91d2b706a43" title="STDlO(Standard Input/Output)"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title">STDlO(Standard Input/Output)</span></span></h4><div class="notion-text notion-block-3152b727a3a380d08acad8af1fcfc5cd">是一种基于标准输入(stdin)和标准输出(stdout)的本地通信方式</div><div class="notion-text notion-block-3152b727a3a38080acf5cdc45e8c6012">MCPClient启动一个子进程(MCR Server)并通过stdin和stdout交换JSON-RPC消息来实现通信</div><div class="notion-text notion-block-3152b727a3a380a0a5a3e03bc47cae04">详细描述如下:</div><div class="notion-text notion-block-3152b727a3a380e19209c0146cc5dc3e">1.启动子进程(MCPServer)</div><ul class="notion-list notion-list-disc notion-block-3152b727a3a380d3b282edd007437d77"><li>MCPClient以子进程形式启动MCPServer,通过命令行指定Server的可执行文件及其参数</li></ul><div class="notion-text notion-block-3152b727a3a3807ea14aef03221356e5">2.消息交换</div><ul class="notion-list notion-list-disc notion-block-3152b727a3a3803dbc17f7431e9bbb2e"><li>MCPClient通过stdin向MCPServer写入JSON-RPC消息</li></ul><ul class="notion-list notion-list-disc notion-block-3152b727a3a380a6a4bfcb8e9cde64a3"><li>MCPServer处理请求后,通过stdout返回JSON-RPC消息，也可通过stderr输出日志</li></ul><div class="notion-text notion-block-3152b727a3a3804eaf1acb22d3d93f77">3.生命周期管理</div><ul class="notion-list notion-list-disc notion-block-3152b727a3a38009a396cbfebb79af5d"><li>MCPClient控制子进程(MCPServer)的启动和关闭。通信结束后,MCPClient关闭stdin,终止MCP Server</li></ul><div class="notion-blank notion-block-3152b727a3a38067815ecada9ed423de"> </div><h4 class="notion-h notion-h3 notion-h-indent-1 notion-block-3152b727a3a38021af1ef3a9705efeb3" data-id="3152b727a3a38021af1ef3a9705efeb3"><span><div id="3152b727a3a38021af1ef3a9705efeb3" class="notion-header-anchor"></div><a class="notion-hash-link" href="#3152b727a3a38021af1ef3a9705efeb3" title="SSE模式"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title">SSE模式</span></span></h4><div class="notion-text notion-block-3152b727a3a3801c9a46dffa8d4b2e98">基于SSE的Remote模式(MCP标准(2025-03-26版之前))SSE(服务器发送事件)是一种基于HTTP协议的单向通信技术，允许Server主动实时向Client推送消息，Client只需建立一次连接即可持续接收消息。它的特点是:</div><ul class="notion-list notion-list-disc notion-block-3152b727a3a38055b8b0e44da6ff85e6"><li>单向(仅ServerClient)</li></ul><ul class="notion-list notion-list-disc notion-block-3152b727a3a38029ae2fdc5e417e70cc"><li>基于HTTP协议，一般借助一次HTTPGet请求建立连接</li></ul><ul class="notion-list notion-list-disc notion-block-3152b727a3a380c8882dee0f8636a3da"><li>适合实时消息推送场景(如进度更新、实时数据流等)由于SSE是一种单向通信的模式，所以它需要配合HTTPPost来实现Client与Server的双向通信严格的说,这是一种HTTP Post(Client-&gt;Server)+HTTPSSE(Server-&gt;Client)的伪双工通信模式</li></ul><div class="notion-text notion-block-3152b727a3a380439b83c942e789b97a">这种传输模式下:</div><ul class="notion-list notion-list-disc notion-block-3152b727a3a380a7b225ef91305e3515"><li>一个HTTPPost通道,用于Client发送请求。比如调用MCPServer中的Tools并传递参数。注意,此时Server会立即返回</li></ul><ul class="notion-list notion-list-disc notion-block-3152b727a3a380f8a51bdde7831be6f5"><li>一个HTTPSSE通道，用干Server推送数据，比如返回调用结果或更新进度</li></ul><ul class="notion-list notion-list-disc notion-block-3152b727a3a38035ad0bd1aff102919f"><li>两个通道通过sessionid来关联，而请求与响应则通过消息中的id来对应</li></ul><div class="notion-text notion-block-3152b727a3a3809391cdd64e5a578acf">详细描述如下:
1.连接建立:Client首先请求建立SSE连接，Server&quot;同意&quot;，然后生成并推送唯一的SessionID</div><div class="notion-text notion-block-3152b727a3a380cd9d6bd3bf674e34b6">2.请求发送:Client通过HTTPPOST发送JSON-RPC2.0请求(请求中会带有SessionID和Request ID信息)</div><div class="notion-text notion-block-3152b727a3a380fb8f97fe027bedd939">3.请求接收确认:Server接收请求后立即返回202(Accepted)状态码，表示已接受请求</div><div class="notion-text notion-block-3152b727a3a3808e9eb6c9460715ebcc">4.异步处理:Server应用框架会自动处理请求，根据请求中的参数，决定调用某个工具或资源</div><div class="notion-text notion-block-3152b727a3a38099b63bfb2270304669">5.结果推送:处理完成后，Server通过SSE通道推送JSON-RPC2.0响应，其中带有对应的RequestID</div><div class="notion-text notion-block-3152b727a3a380249eb3c105084f40a0">6.结果匹配:Client的SSE连接侦听接收到数据流后，会根据Request ID将接收到的响应与之前的请求匹配</div><div class="notion-text notion-block-3152b727a3a3808e8ce4f668ebad3bb2">7.重复处理:循环2-6这个过程。这里面包含一个MCP的初始化过程</div><div class="notion-text notion-block-3152b727a3a380359352c965e847d9e0">8.连接断开:在Client完成所有请求后，可以选择断开SSE连接，会话结束</div><div class="notion-text notion-block-3152b727a3a380498c92d5e7cd73a3ab">简单总结:通过HTTPPost发送请求，但通过SSE的长连接异步获得Server的响应结果</div><div class="notion-blank notion-block-3152b727a3a380bc888bec18098b437d"> </div><h4 class="notion-h notion-h3 notion-h-indent-1 notion-block-3152b727a3a380da9b4ecbd2210f1d5b" data-id="3152b727a3a380da9b4ecbd2210f1d5b"><span><div id="3152b727a3a380da9b4ecbd2210f1d5b" class="notion-header-anchor"></div><a class="notion-hash-link" href="#3152b727a3a380da9b4ecbd2210f1d5b" title="Streamable 模式"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title">Streamable 模式</span></span></h4><div class="notion-text notion-block-3152b727a3a3805fb403e69a63f70417">Streamable HTTP模式(MCP标准(2025-03-26版))在MCP新标准(2025-03-26版)中，MCP引入了新的Streamable HTTP远程传输机制来代替之前的HTTP+SSE的远程传输模式</div><div class="notion-text notion-block-3152b727a3a38012ae05ed944a145f83">HTTP+SSE 这种方式存在问题有:</div><ul class="notion-list notion-list-disc notion-block-3152b727a3a380bebbf5d5e84140037a"><li>需要维护两个独立的连接端点</li></ul><ul class="notion-list notion-list-disc notion-block-3152b727a3a380c2813df63b84d8e367"><li>有较高的连接可靠性要求。一旦SSE连接断开，Client无法自动恢复，需要重新建立新连接，导致上下文丢失</li></ul><ul class="notion-list notion-list-disc notion-block-3152b727a3a3800bad6fe7237d1d7c53"><li>Server必须为每个Client维持一个高可用长连接，对可用性和伸缩性提出挑战</li></ul><ul class="notion-list notion-list-disc notion-block-3152b727a3a380d2bd33d15b137daf33"><li>强制所有Server向Client的消息都经由SSE单向推送，缺乏灵活性</li></ul><div class="notion-text notion-block-3152b727a3a380edb33add6fec4b36f0">这里的主要变化包括:</div><ul class="notion-list notion-list-disc notion-block-3152b727a3a3804ab0abeacc3a1c323f"><li>Server只需一个统一的HTTP端点(/messages)用于通信</li></ul><ul class="notion-list notion-list-disc notion-block-3152b727a3a3807d9f74c3f4ecf8bbb7"><li>Client可以完全无状态的方式与Server进行交互,即RestfulHTTPPost方式.必要时Client也可以在单次请求中获得SSE方式响应，如:一个需要进度通知的长时间运行的任务可以借助SSE不断推送进度</li></ul><ul class="notion-list notion-list-disc notion-block-3152b727a3a380dcb503d5773566f151"><li>Client也可以通过HTTPGet请求来打开一个长连接的SSE流，这种方式与当前的HTTP+SSE模式类似</li></ul><ul class="notion-list notion-list-disc notion-block-3152b727a3a38079bedccad70eb6617d"><li>增强的Session管理。Server会在初始化时返回Mcp-Session-ld,后续Client在每次请求中需要携带该MCP-Session-ld。这个Mcp-Session-ld作用是用来关联一次会话的多次交互; Server可以用Session-ld来终止会话,要求Client开启新会话;Client也可以用HTTPDelete请求来终止会话</li></ul><div class="notion-text notion-block-3152b727a3a38008a58efa8831063fad">Streamable HTTP在旧方案的基础上，提升了传输层的灵活性与健壮性:</div><ul class="notion-list notion-list-disc notion-block-3152b727a3a380ceba88df5dce1cf5bc"><li>允许无状态的Server存在，不依赖长连接。有更好的部署灵活性与扩展能力</li></ul><ul class="notion-list notion-list-disc notion-block-3152b727a3a3805799d8f9c01b813e77"><li>对Server中间件的兼容性更好，只需要支持HTTP即可，无需做SSE处理</li></ul><ul class="notion-list notion-list-disc notion-block-3152b727a3a380d7bb8dc9b30f5f98c8"><li>允许根据自身需要开启SSE响应或长连接，保留了现有规范SSE模式的优势</li></ul><div class="notion-blank notion-block-3152b727a3a3807cade1e95b5e0c185e"> </div><h3 class="notion-h notion-h2 notion-h-indent-0 notion-block-3152b727a3a38049bd48f41eaeb1424a" data-id="3152b727a3a38049bd48f41eaeb1424a"><span><div id="3152b727a3a38049bd48f41eaeb1424a" class="notion-header-anchor"></div><a class="notion-hash-link" href="#3152b727a3a38049bd48f41eaeb1424a" title="总结"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title">总结</span></span></h3><ul class="notion-list notion-list-disc notion-block-3152b727a3a38098b711c99e86a1d9f6"><li>MCP 的核心想法：给大模型一个统一的“工具调用协议”，让模型能通过标准方式去调用各种外部工具，从而扩展能力边界（不只是思考，还能做事）。
• 要解决的问题：如果工具分别用 Java、Python、Rust 写，直接把它们做成某个特定语言的 tool 会受限。MCP 通过统一协议，把“跨语言、跨进程”工具调用标准化。
• MCP 最大特点：跨进程调用工具
◦ 本地跨进程：通过 stdio（标准输入输出）与子进程通信
◦ 远程跨进程：通过 http 连接远程服务通信
• 消息协议标准：统一使用 JSON-RPC 2.0，优点是与语言无关、结构清晰、易调试、轻量灵活。
• 传输模式演进：
◦ Stdio：客户端启动 MCP Server 子进程，用 stdin/stdout 交换 JSON-RPC 消息，并管理子进程生命周期。
◦ SSE（旧远程标准）：HTTP POST 发请求 + SSE 长连接收结果，用 sessionId 和 requestId 关联请求响应，属于“伪双工”。
◦ Streamable HTTP（新标准，2025-03-26 起）：用更统一、健壮的方式替代 HTTP+SSE，减少双连接维护问题，并增强 session 管理（例如返回并使用 Mcp-Session-Id，支持终止会话等），更利于扩展和部署。</li></ul></main></div>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[文档拆分器]]></title>
            <link>https://wangguanxi.space/article/3152b727-a3a3-8026-8210-c3adfb1eb87b</link>
            <guid>https://wangguanxi.space/article/3152b727-a3a3-8026-8210-c3adfb1eb87b</guid>
            <pubDate>Sat, 28 Feb 2026 00:00:00 GMT</pubDate>
            <content:encoded><![CDATA[<div id="notion-article" class="mx-auto overflow-hidden "><main class="notion light-mode notion-page notion-block-3152b727a3a380268210c3adfb1eb87b"><div class="notion-viewport"></div><div class="notion-collection-page-properties"></div><div class="notion-blank notion-block-3152b727a3a380699c14d8bbe259efe1"> </div><h4 class="notion-h notion-h3 notion-h-indent-0 notion-block-3152b727a3a380398f2be3878f0ccee3" data-id="3152b727a3a380398f2be3878f0ccee3"><span><div id="3152b727a3a380398f2be3878f0ccee3" class="notion-header-anchor"></div><a class="notion-hash-link" href="#3152b727a3a380398f2be3878f0ccee3" title="1. 为什么要拆分/分块/切分"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title">1. 为什么要拆分/分块/切分</span></span></h4><div class="notion-text notion-block-3152b727a3a3806c8b76fa8e1531ac3d">当拿到统一的一个Document对象后，接下来需要切分成chunks。如果不切分，而是考虑作为一个整体的Document对象，会存在两点问题:</div><ul class="notion-list notion-list-disc notion-block-3152b727a3a3800eb6bbfb367e43dd5f"><li>1.假设提问的Query的答案出现在某一个Document对象中，那么将检索到的整个Document对象直接放入Prompt中并不是最优的选择，因为其中一定会包含非常多无关的信息，而无效信息越多，对大模型后续的推理影响越大。</li></ul><ul class="notion-list notion-list-disc notion-block-3152b727a3a380119427cff2229ebe40"><li>2.任何一个大模型都存在最大输入的Token限制，如果一个Document非常大，比如一个几百兆的PDF，那么大模型肯定无法容纳如此多的信息。基于此，一个有效的解决方案就是将完整的Document对象进行分块处理(Chunking)。无论是在存储还是检索过程中，都将以这些块(chunks)为基本单位，这样有效地避免内容不相关性问题和超出最大输入限制的问题。</li></ul><div class="notion-blank notion-block-3152b727a3a38010ba36d69115785889"> </div><div class="notion-blank notion-block-3152b727a3a380a798dec7c304140aba"> </div><h3 class="notion-h notion-h2 notion-h-indent-0 notion-block-3152b727a3a380e98093d8ff31653233" data-id="3152b727a3a380e98093d8ff31653233"><span><div id="3152b727a3a380e98093d8ff31653233" class="notion-header-anchor"></div><a class="notion-hash-link" href="#3152b727a3a380e98093d8ff31653233" title="2、TextSplitter的使用"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title">2、TextSplitter的使用</span></span></h3><h4 class="notion-h notion-h3 notion-h-indent-1 notion-block-3152b727a3a38034962efdc7dd86fcd1" data-id="3152b727a3a38034962efdc7dd86fcd1"><span><div id="3152b727a3a38034962efdc7dd86fcd1" class="notion-header-anchor"></div><a class="notion-hash-link" href="#3152b727a3a38034962efdc7dd86fcd1" title="TextSplitter作为各种具体的文档拆分器的父类"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title">TextSplitter作为各种具体的文档拆分器的父类</span></span></h4><div class="notion-text notion-block-3152b727a3a380f49188eea5f921616d">内部定义了一些常用的属性:</div><div class="notion-text notion-block-3152b727a3a380978745ca89d09e6bc1">chunk_size:返回块的最大尺寸，单位是字符数。默认值为4000(由长度函数测量)</div><div class="notion-text notion-block-3152b727a3a380f79180fb27f8966c05">chunk_overLap:相邻两个块之间的字符重叠数,避免信息在边界处被切断而丢失。默认值为200,通常会设置为chunk_size的10% -20%。</div><div class="notion-text notion-block-3152b727a3a38047b4b7e8babcafb3a1">Length_function:用于测量给定块字符数的函数。默认赋值为len函数。Len函数在Python中按Unicode字符计数，所以一个汉字、一个英文字母、一个符号都算一个字符。</div><div class="notion-text notion-block-3152b727a3a380078d57ffc4393290b7">keep_separator:是否在块中保留分隔符，默认值为False</div><div class="notion-text notion-block-3152b727a3a380d6a3aed9098ec49125">add_start_index:如果为&#x27;True&#x27;，则在元数据中包含块的起始索引。默认值为False</div><div class="notion-text notion-block-3152b727a3a380f0a753c91681c630bc">strip_whitespace:如果为&#x27;True&#x27;，则从每个文档的开始和结束处去除空白字符。默认值为True</div><div class="notion-text notion-block-3152b727a3a38006982ce8790b32f274">2.内部定义的常用的方法:</div><div class="notion-text notion-block-3152b727a3a380f3b0a9ca4d10ceaf59">情况1:按照字符串进行拆分:</div><div class="notion-text notion-block-3152b727a3a38053a562ca4e07f214c0">split_text(xxx):传入的参数类型:字符串;返回值的类型:List[str]</div><div class="notion-text notion-block-3152b727a3a38031a55fc6eea43ebecc">create_documents(xxx):传入的参数类型:List[str];返回值的类型:List[Document]。底层调用了split_text(xxx)</div><div class="notion-text notion-block-3152b727a3a380649906ed186ecec8e1">情况2:按照Document对象进行拆分:</div><div class="notion-text notion-block-3152b727a3a3800db8ddd47275d71769">split_documents(xxx):传入的参数类型:List[Document];返回值的类型:List[Document]。底层调用了create_documents(xxx)</div><div class="notion-blank notion-block-3152b727a3a38025b7a6cc02ce14e358"> </div><div class="notion-blank notion-block-3152b727a3a3802a9157cca1bc05eb63"> </div></main></div>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[常见的拆分器]]></title>
            <link>https://wangguanxi.space/article/3152b727-a3a3-8012-82e7-e0ad1149a5c6</link>
            <guid>https://wangguanxi.space/article/3152b727-a3a3-8012-82e7-e0ad1149a5c6</guid>
            <pubDate>Sat, 28 Feb 2026 00:00:00 GMT</pubDate>
            <content:encoded><![CDATA[<div id="notion-article" class="mx-auto overflow-hidden "><main class="notion light-mode notion-page notion-block-3152b727a3a3801282e7e0ad1149a5c6"><div class="notion-viewport"></div><div class="notion-collection-page-properties"></div><div class="notion-text notion-block-3152b727a3a380578a25c6a537032148">#%% md</div><h3 class="notion-h notion-h2 notion-h-indent-0 notion-block-3152b727a3a380cfbba7e3cc21f8307f" data-id="3152b727a3a380cfbba7e3cc21f8307f"><span><div id="3152b727a3a380cfbba7e3cc21f8307f" class="notion-header-anchor"></div><a class="notion-hash-link" href="#3152b727a3a380cfbba7e3cc21f8307f" title="1.CharacterTextSplitter"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title">1.CharacterTextSplitter</span></span></h3><div class="notion-text notion-block-3152b727a3a38009bb9afaddf25c7fe0">参数情况说明:</div><ul class="notion-list notion-list-disc notion-block-3152b727a3a38033ac20c3c1c9704435"><li>chunk_size:每个切块的最大token数量,默认值为4000。</li></ul><ul class="notion-list notion-list-disc notion-block-3152b727a3a3804cb284d0c4a52458ba"><li>chunk_overlap:相邻两个切块之间的最大重叠token数量，默认值为200。</li></ul><ul class="notion-list notion-list-disc notion-block-3152b727a3a380cbac12e0cf7722cccc"><li>separator:分割使用的分隔符,默认值为&quot;n\n&quot;。</li></ul><ul class="notion-list notion-list-disc notion-block-3152b727a3a380b0a38befb234d16eab"><li>length_function:用于计算切块长度的方法。默认赋值为父类Textsplitter的len函数。</li></ul><div class="notion-blank notion-block-3152b727a3a380c58236eb1a36028119"> </div><div class="notion-text notion-block-3152b727a3a380e485a4f6c364a46cb0">#%% md</div><h4 class="notion-h notion-h3 notion-h-indent-1 notion-block-3152b727a3a38073a0bfdf8376547a85" data-id="3152b727a3a38073a0bfdf8376547a85"><span><div id="3152b727a3a38073a0bfdf8376547a85" class="notion-header-anchor"></div><a class="notion-hash-link" href="#3152b727a3a38073a0bfdf8376547a85" title="举例1：体会 chunk_size 和 chunk_overlap"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title">举例1：体会 chunk_size 和 chunk_overlap</span></span></h4><div class="notion-text notion-block-3152b727a3a380498b2ad650d4600ae9">separator优先原则:当设置了 separator(如&quot;。&quot;)，分割器会首先尝试在分隔符处分割，然后再考虑chunk_size。</div><div class="notion-text notion-block-3152b727a3a380859e8fda645ecff33e">这是为了避免在句子中间硬性切断。这种设计是为了:</div><ul class="notion-list notion-list-disc notion-block-3152b727a3a38043b8fce3bfaa37490b"><li>1.优先保持语义完整性(不切断句子)</li></ul><ul class="notion-list notion-list-disc notion-block-3152b727a3a3800d9d9cd4db40e7b6cc"><li>2.避免产生无意义的碎片(如半个单词/不完整句子)</li></ul><ul class="notion-list notion-list-disc notion-block-3152b727a3a380d68cb6f56b1a2a01f4"><li>3.如果 chunk_size比片段小，无法拆分片段,导致overlap失效。</li></ul><ul class="notion-list notion-list-disc notion-block-3152b727a3a380149cfbe07bf6e8b6b6"><li>4.chunk_overlap仅在合并后的片段之间生效(如果chunk_size足够大)。如果没有合并的片段，则overlap失效。</li></ul><div class="notion-text notion-block-3152b727a3a3809c8656f44c4208eb3a">#%% md</div><h3 class="notion-h notion-h2 notion-h-indent-0 notion-block-3152b727a3a3805d8079f4b2af96b693" data-id="3152b727a3a3805d8079f4b2af96b693"><span><div id="3152b727a3a3805d8079f4b2af96b693" class="notion-header-anchor"></div><a class="notion-hash-link" href="#3152b727a3a3805d8079f4b2af96b693" title="2.RecursiveCharacterTextSplitter（最常用）"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title">2.RecursiveCharacterTextSplitter（最常用）</span></span></h3><div class="notion-text notion-block-3152b727a3a380f68dceee34b34d1590">文档切分器中较常用的是RecursivecharacterTextSplitter(递归字符文本切分器)，遇到特定字符时进行分割。</div><div class="notion-text notion-block-3152b727a3a3803dae00ddeca57b6f4f">默认情况下，它尝试进行切割的字符包括[&quot;\n\n&quot;,&quot;\n&quot;,&quot; &quot;，&quot;&quot;]。</div><div class="notion-text notion-block-3152b727a3a380749967e76987856c75">具体为:根据第一个字符进行切块，但如果任何切块太大，则会继续移动到下一个字符继续切块，以此类推。</div><div class="notion-text notion-block-3152b727a3a3805b9f57f3d6165c6655">此外，还可以考虑添加，。等分割字符。</div><div class="notion-text notion-block-3152b727a3a3804b8ca2ff3db96a146f">特点:</div><ul class="notion-list notion-list-disc notion-block-3152b727a3a38043a100cf37374fad71"><li>保留上下文:优先在自然语言边界(如段落、句子结尾)处分割，减少信息碎片化。</li></ul><ul class="notion-list notion-list-disc notion-block-3152b727a3a3803db3a7e4f93d11f75b"><li>智能分段:通过递归尝试多种分隔符，将文本分割为大小接近chunk_size的片段。</li></ul><ul class="notion-list notion-list-disc notion-block-3152b727a3a380e2bfedd389ec059a90"><li>灵活适配:适用于多种文本类型(代码、Markdown、普通文本等)，是LangChain中最通用的文本拆分器。</li></ul><div class="notion-blank notion-block-3152b727a3a38063b66cce4ce4c02750"> </div><div class="notion-text notion-block-3152b727a3a380518bbdcb55c69cdf03">#%% md</div><h3 class="notion-h notion-h2 notion-h-indent-0 notion-block-3152b727a3a380788f00d9e0338ca5ba" data-id="3152b727a3a380788f00d9e0338ca5ba"><span><div id="3152b727a3a380788f00d9e0338ca5ba" class="notion-header-anchor"></div><a class="notion-hash-link" href="#3152b727a3a380788f00d9e0338ca5ba" title="TokenTextSplitter"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title">TokenTextSplitter</span></span></h3><div class="notion-text notion-block-3152b727a3a38078871ecac6aed2c655">为什么按Token分割?</div><div class="notion-text notion-block-3152b727a3a38099870be2d7382dff51">语言模型对输入长度的限制是基于Token数(如GPT-4的8k/32k Token上限)，直接按字符或单词分割可能导致实际Token数超限。</div><div class="notion-text notion-block-3152b727a3a38017836cc5ebc73bb227">大语言模型(LLM)通常是以token的数量作为其计量(或收费)的依据，所以采用token分割也有助于我们在使用时更方便的控制成本。</div><div class="notion-text notion-block-3152b727a3a3803a956af4f9cbbcb1ae">TokenTextSplitter 使用说明:</div><ul class="notion-list notion-list-disc notion-block-3152b727a3a38029b988ddb734cdb260"><li>核心依据:Token数量+自然边界。(TokenTextsplitter 严格按照 token数量进行分割，但同时会优先在自然边界(如句尾)处切断，以尽量保证语义的完整性。)</li></ul><ul class="notion-list notion-list-disc notion-block-3152b727a3a3807bb869e7b6feb63386"><li>优点:与LLM的Token计数逻辑一致，能尽量保持语义完整</li></ul><ul class="notion-list notion-list-disc notion-block-3152b727a3a380d1ac64ce6bf1a7ea08"><li>缺点:对非英语或特定领域文本，Token化效果可能不佳典型场景:需要精确控制Token数输入LLM的场景</li></ul><div class="notion-text notion-block-3152b727a3a380e18343ce7975d3e8dd">#%% md</div><h3 class="notion-h notion-h2 notion-h-indent-0 notion-block-3152b727a3a380f194a8cadf25a05077" data-id="3152b727a3a380f194a8cadf25a05077"><span><div id="3152b727a3a380f194a8cadf25a05077" class="notion-header-anchor"></div><a class="notion-hash-link" href="#3152b727a3a380f194a8cadf25a05077" title="SemanticChunker:语义分块 (langchain1.0 中用不了，代替方案semchunk)"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title">SemanticChunker:语义分块 (langchain1.0 中用不了，代替方案semchunk)</span></span></h3><div class="notion-text notion-block-3152b727a3a3804cb836fc56350caf21">Semanticchunking(语义分块)是LangChain中一种更高级的文本分割方法,它超越了传统的基于字符或固定大小的分块方式，而是根据文本的语义结构进行智能分块，使每个分块保持语义完整性，从而提高检索增强生成(RAG)等应用的效果。</div><h4 class="notion-h notion-h3 notion-h-indent-1 notion-block-3152b727a3a3806da7d9f551c6d19f3c" data-id="3152b727a3a3806da7d9f551c6d19f3c"><span><div id="3152b727a3a3806da7d9f551c6d19f3c" class="notion-header-anchor"></div><a class="notion-hash-link" href="#3152b727a3a3806da7d9f551c6d19f3c" title="1. Late Chunking (延迟分块) —— 2025 年最火技术"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title">1. Late Chunking (延迟分块) —— 2025 年最火技术</span></span></h4><div class="notion-text notion-block-3152b727a3a380098d95ef40f324dc9e">这是目前最推崇的替代方案，由 Jina AI 等公司率先普及。</div><div class="notion-text notion-block-3152b727a3a38038a951ded1049e69fb">需要先用一个“逻辑切割器”来定义边界，然后让 Embedding 模型去处理这些边界。</div><div class="notion-text notion-block-3152b727a3a380f797c9ca01c9953510">传统做法：先切块，再对每个块单独生成向量。这会导致切块时丢失整篇文章的语境（Context Loss）。</div><div class="notion-text notion-block-3152b727a3a380f780ece28fb87e40ad">Late Chunking 做法：</div><ul class="notion-list notion-list-disc notion-block-3152b727a3a38037861cfd8fd880bd90"><li>用 RecursiveCharacterTextSplitter 把文本物理上切开。</li></ul><ul class="notion-list notion-list-disc notion-block-3152b727a3a380128b8efa3b7f498f0e"><li>将整篇长文档输入支持长上下文的 Embedding 模型（text-embedding-v3 或 Qwen3-Embedding，国外是JinaEmbeddings）。</li></ul><ul class="notion-list notion-list-disc notion-block-3152b727a3a380b89106c01564ac5122"><li>模型会为每个 Token 生成带有全局信息的向量。</li></ul><ul class="notion-list notion-list-disc notion-block-3152b727a3a380e1bb51c05a3472d235"><li>最后再进行切块。这样切出来的每一个分块向量，都**“吸收”了前后文的信息**。</li></ul><div class="notion-text notion-block-3152b727a3a3806e863fdf06da1841ef">优点：即便是一个很短的句子，它的向量也包含了整篇文章的背景，极大提升了检索精度。</div><h4 class="notion-h notion-h3 notion-h-indent-1 notion-block-3152b727a3a38040b6f9c09375825a70" data-id="3152b727a3a38040b6f9c09375825a70"><span><div id="3152b727a3a38040b6f9c09375825a70" class="notion-header-anchor"></div><a class="notion-hash-link" href="#3152b727a3a38040b6f9c09375825a70" title="2. Contextual Retrieval (上下文检索/分块)"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title">2. Contextual Retrieval (上下文检索/分块)</span></span></h4><div class="notion-text notion-block-3152b727a3a380b898cbdff1fc212ff0">由 Anthropic（Claude 厂商）在 2024 年末提出，并在 2025 年大规模流行。</div><div class="notion-text notion-block-3152b727a3a380e19d72e98a2b197ec4">原理：在分块之前，利用 LLM 为每个小分块写一段简短的“背景说明”（Contextual Header）。</div><div class="notion-text notion-block-3152b727a3a38066bce7d0e0ce38d3ec">示例：</div><ul class="notion-list notion-list-disc notion-block-3152b727a3a380bb8197ee2f9986368a"><li>原始分块：“该公司的营收增长了 20%。”（单独看这个块，搜索者不知道是哪家公司，哪一年）。</li></ul><ul class="notion-list notion-list-disc notion-block-3152b727a3a38086bc36effe7d0be7de"><li>上下文分块：“[背景：这是关于苹果公司 2024 年第三季度财报的描述] 该公司的营收增长了 20%。”</li></ul><div class="notion-text notion-block-3152b727a3a380b887c6fad890750440">优点：显著解决了分块后语义支离破碎的问题，检索成功率（Top-1 Recall）通常能提升 40% 以上。</div><h4 class="notion-h notion-h3 notion-h-indent-1 notion-block-3152b727a3a38088a7c0e6cf5c9d6028" data-id="3152b727a3a38088a7c0e6cf5c9d6028"><span><div id="3152b727a3a38088a7c0e6cf5c9d6028" class="notion-header-anchor"></div><a class="notion-hash-link" href="#3152b727a3a38088a7c0e6cf5c9d6028" title="3. Agentic Chunking (智能体分块)"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title">3. Agentic Chunking (智能体分块)</span></span></h4><div class="notion-text notion-block-3152b727a3a38023a8b7d367704260db">利用 AI Agent（如 GPT-4o 或 Claude 3.5）来像人一样阅读并决定在哪里下刀。</div><div class="notion-text notion-block-3152b727a3a380eea97cc236f86a48b4">逻辑：</div><div class="notion-text notion-block-3152b727a3a38098bab8dcbfe0ea0eaf">Agent 扫描全文。</div><div class="notion-text notion-block-3152b727a3a38047bc0acd63551a2c61">它判断：“这里主题变了，从‘产品功能’转到了‘售后政策’，我应该在这里切断。”</div><div class="notion-text notion-block-3152b727a3a3803fb782dbcac1faa8c5">它能识别复杂的边界（如表格中间、代码块中间不准切）。</div><div class="notion-text notion-block-3152b727a3a3805ba4e6d93d36382563">优点：这是目前最精准的切分方式，完全遵循逻辑而非字符数。</div><div class="notion-text notion-block-3152b727a3a380cda7ace99b91dece20">缺点：非常昂贵，通常只用于构建极高质量的旗舰级知识库。</div></main></div>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[文档嵌入模型]]></title>
            <link>https://wangguanxi.space/article/3152b727-a3a3-80ee-aa3d-d4d0be17f6cc</link>
            <guid>https://wangguanxi.space/article/3152b727-a3a3-80ee-aa3d-d4d0be17f6cc</guid>
            <pubDate>Sat, 28 Feb 2026 00:00:00 GMT</pubDate>
            <content:encoded><![CDATA[<div id="notion-article" class="mx-auto overflow-hidden "><main class="notion light-mode notion-page notion-block-3152b727a3a380eeaa3dd4d0be17f6cc"><div class="notion-viewport"></div><div class="notion-collection-page-properties"></div><div class="notion-file notion-block-3152b727a3a380a78b8acb6972c1807d"><a target="_blank" rel="noopener noreferrer" class="notion-file-link" href="https://file.notion.so/f/f/ef449f46-5721-421a-999e-c9b32b5a798d/c5ca65f8-6424-4fdf-94f1-f83925c1b247/04-%E6%96%87%E6%A1%A3%E5%B5%8C%E5%85%A5%E6%A8%A1%E5%9E%8B.ipynb?table=block&amp;id=3152b727-a3a3-80a7-8b8a-cb6972c1807d&amp;spaceId=ef449f46-5721-421a-999e-c9b32b5a798d&amp;expirationTimestamp=1772424000000&amp;signature=4mEqv0_bsIA6hNtRZG0c4t-qRTrPNVLXUUMsPS6KSGs"><svg class="notion-file-icon" viewBox="0 0 30 30"><path d="M22,8v12c0,3.866-3.134,7-7,7s-7-3.134-7-7V8c0-2.762,2.238-5,5-5s5,2.238,5,5v12c0,1.657-1.343,3-3,3s-3-1.343-3-3V8h-2v12c0,2.762,2.238,5,5,5s5-2.238,5-5V8c0-3.866-3.134-7-7-7S6,4.134,6,8v12c0,4.971,4.029,9,9,9s9-4.029,9-9V8H22z"></path></svg><div class="notion-file-info"><div class="notion-file-title">04-文档嵌入模型.ipynb</div><div class="notion-file-size">221.6 KiB</div></div></a></div></main></div>]]></content:encoded>
        </item>
    </channel>
</rss>