Edit: Claude Code 是怎么跟 Linux shell 打交道的

编辑文章

标题 *

URL 别名 *

内容 * (支持 Markdown 格式)

# Claude Code 是怎么跟 Linux shell 打交道的

可以从“**架构视角**”和“**交互细节**”两层来理解 Claude Code 是怎么跟 Linux shell 打交道的。下面说的是典型实现思路（Claude Code 这类产品本质都差不多），不是某个闭源系统的源码。

---

## 1. 总体架构：LLM 并不直接“敲命令”

Claude（LLM 本身）不会真的去打开一个 bash 进程敲命令。实际结构一般是三层：

1. **LLM（Claude）**  
   - 通过 ReAct 风格的思考，决定要执行什么命令，例如：
     - `npm test`
     - `pytest tests/`
     - `ls -la`
     - `grep -R "apiKey" .`
   - 但它不能直接 `execve`，而是输出一个**“工具调用意图”**。

2. **工具层（Tool / Function Calling）**  
   - 产品端暴露给 LLM 一个抽象工具，例如：
     - `run_shell_command(command: string, timeout_sec: int, workdir: string) -> { stdout, stderr, exit_code }`
   - LLM 在对话里“调用”这个工具（通过结构化 JSON / function_call 之类的协议）。
   - 调用参数中只是一串命令字符串 + 限制条件。

3. **沙箱执行层（真实 Linux 环境）**  
   - 由服务端维护一个**受控 Linux 环境**，比如：
     - 容器（Docker / firecracker / gVisor）
     - chroot / namespace 隔离
   - 真正执行命令的是这层：  
     - 启动一个 shell（通常是 `/bin/bash -lc '...'` 或 `/bin/sh -c '...'`）
     - 绑定一个特定工作目录（项目根目录）
     - 捕获 stdout/stderr/exit code
   - 再把结果返回给工具层，再喂给 LLM，成为下一轮的“Observation”。

你可以把这个过程理解为：

> LLM 只负责“写命令”和“读命令结果文本”，  
> 真正和 Linux 内核打 syscall 的，是服务端那层受控 shell。

---

## 2. 交互流程：一条命令从“想法”到执行

假设你在 Claude Code 里说：

> 帮我跑一下单元测试，看看有什么错误。

典型流程是这样的：

1. **感知当前工程状态**  
   - LLM 可能先调用“list files”“read file”等工具，看工程结构：
     - 发现有 `package.json`，里面有 `"test": "vitest"` 这种 script
   - 推理得到：  
     > 应该运行 `npm test`。

2. **生成 shell 调用意图**  
   - 它不会直接在输出里写：`npm test` 然后真的被执行。  
   - 而是生成一个结构化工具调用，例如（伪示意）：

```json
     {
       "tool": "run_shell_command",
       "arguments": {
         "command": "npm test",
         "timeout_sec": 300,
         "workdir": "/mnt/workspace/project"
       }
     }
     ```

3. **宿主系统真正执行命令**  
   - 产品后端收到这个“函数调用”，在沙箱环境里做类似事情：

```bash
     cd /mnt/workspace/project
     /bin/bash -lc "npm test"
     ```

- 设置：
     - 最大执行时间（超时 kill）
     - 资源限制（CPU、内存）
     - 禁止网络 / 限制出网
   - 捕获：
     - `stdout` 全量文本
     - `stderr` 全量文本
     - `exit code`（0/非 0）

4. **结果回传给 LLM**  
   - 执行结束后，工具返回结果，例如：

```json
     {
       "stdout": "Test suite failed...\n  Error: Cannot find module './config'\n ...",
       "stderr": "",
       "exit_code": 1
     }
     ```

- 这段结果被拼进下一轮 prompt 里，通常长这样（人看不到的系统部分）：

> 工具 `run_shell_command` 执行完成：  
     > command: `npm test`  
     > exit_code: `1`  
     > stdout:  
     > ```  
     > Test suite failed...
     >   Error: Cannot find module './config'
     >   at ...
     > ```  
     > stderr: *(空)*

5. **根据输出继续 ReAct**  
   - LLM 读到这些文本后再推理：
     - 发现是某个文件缺失/路径错误
     - 再去读相关文件，修改，保存
     - 再次调用 `run_shell_command("npm test")` 验证
   - 这就形成了一个完整的“观察 → 思考 → 行动 → 再观察”的闭环。

---

## 3. shell 交互的典型约束与安全策略

和“你自己用终端”相比，Claude Code 跑 shell 时一般会加非常多限制。

### 3.1 命令可执行范围

- **工作目录固定**  
  - 通常只能在预设的 project root 下执行命令。
  - 对 `/` 或系统级目录没有权限，或者是只读。

- **环境变量受控**  
  - 只暴露和项目相关的变量（如 `NODE_ENV=test`）。
  - 不会把系统真实凭证（API key、云账号）传进去。

- **无 / 受限网络访问**  
  - 出于安全和可重复性，大多禁掉对公网的访问。
  - 即使模型想执行 `curl http://example.com`，也会失败或被拦截。

### 3.2 命令内容的过滤

在工具层或者执行层可以加入简单策略：

- 黑名单：
  - `rm -rf /`
  - `shutdown`
  - 直接操作 `/etc`, `/home`, `/var` 等
- 检测明显危险模式：
  - `:(){ :|:& };:` 这类 fork bomb
  - 无限后台进程
- 限制 `sudo` / `su` 等命令（通常根本不提供 root 权限）。

对于明显危险命令，有几种可能处理方式：

- 直接拒绝执行，返回错误说明；
- 或者在 prompt 级别让 LLM 被提醒“不可使用此类命令”。

### 3.3 资源 & 时间限制

- **时间**：  
  - 每次命令有硬性超时时间（比如 30s / 120s），超时即 kill。
- **CPU / 内存**：  
  - 使用 cgroups / 容器配额限制资源，避免无限占用。
- **输出长度**：  
  - 长日志只截取前后部分，避免把几百 MB 日志塞回 prompt。

---

## 4. 与 “交互式 shell 会话” 的差异

很多人会以为 Claude Code 内部是开了一个交互式 bash 会话、像人那样一条条敲命令。实际通常是：

- 每次是**独立的命令调用**：
  - 后端可能每次都 `bash -lc "<command>"`，执行完就结束。
  - 不维持复杂的交互式状态（如 shell 历史、别名）。
- 状态靠**文件系统 + 进程结果**来承载：
  - 比如编译出来的二进制、生成的日志文件，都是持久化在 workspace 中，下次再读文件即可。

当然，如果产品想实现“长期 shell 会话”，也是可以的（维护一个长期打开的 pty/session），但需要：

- 处理交互式程序（`top`、`vim`、`python` REPL 等）；
- 处理“程序在等输入”这种状态；
- 更复杂的超时与中断机制。

实际工程上，为了简单和稳健，多数会倾向于**无状态命令调用**（stateless command execution）+ 持久文件系统。

---

## 5. 从 LLM 视角：shell 是一个“黑盒函数”

从 Claude 的“认知模型”看，Linux shell 只是一个有签名的工具：

```text
run_shell_command(command) -> { stdout, stderr, exit_code }
```

- 它只知道：  
  - 给它一个字符串，它会返回几块字符串（和一个 exit code）。
- 它不知道：
  - 什么是 Linux 内核、syscall、TTY、进程组之类的细节。
- 它只是通过大量数据学会了：
  - 哪些命令可能工作（`npm test`, `pytest`, `ls`, `grep`）
  - 如何解释 stdout/stderr 中的报错信息
  - 出错时该如何改代码 / 改命令再试。

这也是为什么你可以把 Claude Code 概括为：

> “一个通过工具调用与 Linux shell 进行 RPC 式交互的 ReAct 代理。”

---

## 6. 如果你自己要实现一个简化版

如果你感兴趣自己做个“LLM + shell”的玩具，可以按这个极简架构来：

1. 后端（Python 伪代码）：

```python
   import subprocess, shlex, textwrap

def run_shell_command(command: str, cwd: str) -> dict:
       # 注意：生产环境要做安全过滤和资源限制
       completed = subprocess.run(
           ["/bin/bash", "-lc", command],
           cwd=cwd,
           capture_output=True,
           text=True,
           timeout=60  # 秒
       )
       return {
           "stdout": completed.stdout[-8000:],  # 截断输出
           "stderr": completed.stderr[-8000:],
           "exit_code": completed.returncode,
       }
   ```

2. 在 LLM 的工具列表中注册一个 `run_shell_command`；
3. 每当 LLM 触发这个工具调用，就用上面的函数执行，再把结果拼回对话。

下面给你一套「能跑起来」的最小 demo 设计，思路是：

- 用一个支持工具调用 / function calling 的 LLM（你可以替换为任意家）
- 暴露一个 `run_shell` 工具给模型
- 在后端做：
  - 命令过滤（黑名单 & 简单规则）
  - 资源限制（超时、输出截断）
  - 一个简易 ReAct 循环

我会按这几个部分讲：

1. 整体架构
2. 工具调用协议（LLM 端）
3. Shell 执行与安全策略（后端）
4. 一个最小 Python Demo（可直接改成你自己的 key/LLM）
5. 如何让它更「Claude Code 化」

---

## 1. 整体架构概览

目标：实现这样一种对话：

> 用户：  
> 帮我在当前目录下列出所有文件，并创建一个 `hello.txt`。

LLM 内部会类似这样思考和行动：

1. 调用工具：`run_shell("ls")`
2. 读到输出 → 决定命令：`echo 'hello' > hello.txt`
3. 再次调用工具：`run_shell("echo 'hello' > hello.txt")`
4. 返回给用户一个自然语言总结：

> 我已经在当前目录创建了 hello.txt，内容为 "hello"。

你需要三块：

- 前端：只管显示对话（可以是命令行）
- 中间层：把用户输入 + 工具结果，不断交给 LLM 推理
- 工具层：实现 `run_shell`（真正与 Linux 交互）

---

## 2. LLM 工具调用协议设计

抽象成一个工具签名：

```ts
run_shell(command: string) -> {
  stdout: string
  stderr: string
  exit_code: number
}
```

在“支持 function calling 的 LLM”里，一般定义为 JSON Schema，例如（伪协议，接近 OpenAI/Anthropic 方案）：

```jsonc
{
  "name": "run_shell",
  "description": "Execute a bash command in a restricted project workspace",
  "parameters": {
    "type": "object",
    "properties": {
      "command": {
        "type": "string",
        "description": "The shell command to run. Do not include surrounding quotes. Assume bash -lc is used."
      }
    },
    "required": ["command"]
  }
}
```

调用时，LLM 会输出类似：

```json
{
  "tool": "run_shell",
  "arguments": {
    "command": "ls -la"
  }
}
```

你的后端要做三件事：

1. 解析这段 JSON
2. 运行命令，拿到 `stdout/stderr/exit_code`
3. 再作为“工具结果消息”反馈给 LLM，比如：

```json
{
  "role": "tool",
  "tool_name": "run_shell",
  "tool_call_id": "xxx",
  "content": {
    "stdout": "total 8\n-rw-r--r-- main.py\n",
    "stderr": "",
    "exit_code": 0
  }
}
```

然后再让 LLM继续对话。

---

## 3. Shell 执行与简单安全策略

### 3.1 执行逻辑（Linux shell）

使用 `bash -lc` 运行命令，捕获输出：

```python
import subprocess

def run_shell_raw(command: str, cwd: str, timeout_sec: int = 30):
    completed = subprocess.run(
        ["/bin/bash", "-lc", command],
        cwd=cwd,
        capture_output=True,
        text=True,
        timeout=timeout_sec
    )
    return completed.stdout, completed.stderr, completed.returncode
```

### 3.2 命令过滤策略（最小可用）

你至少要做：

- 黑名单：禁止极危险关键词/模式
- 路径限制：不让动 `/`、`/etc`、`/var` 等
- 控制 `rm` 之类 destructive 命令

一个极简版本（不保证绝对安全，但比裸跑强很多）：

```python
import shlex

FORBIDDEN_SUBSTRINGS = [
    "rm -rf /",
    ":(){ :|:& };:",        # fork bomb
    "mkfs", "fdisk", "mount", "umount",
    "shutdown", "reboot", "halt",
    "sudo ", "su ",
]

FORBIDDEN_PATH_PREFIXES = [
    "/etc", "/bin", "/sbin", "/usr", "/var", "/lib", "/root", "/home"
]

DANGEROUS_COMMANDS = ["rm", "mv", "chmod", "chown", "dd", "truncate"]

def is_command_safe(command: str) -> tuple[bool, str | None]:
    cmd_lower = command.lower()

# 1) 简单子串黑名单
    for bad in FORBIDDEN_SUBSTRINGS:
        if bad in cmd_lower:
            return False, f"Command contains forbidden pattern: {bad}"

# 2) 解析第一段 token，检查是否是危险命令
    try:
        tokens = shlex.split(command)
    except ValueError:
        # 解析失败，直接拒绝
        return False, "Failed to parse command."

if not tokens:
        return False, "Empty command."

base_cmd = tokens[0]

# 3) 如果是危险命令，检查参数里是否有绝对路径等
    if base_cmd in DANGEROUS_COMMANDS:
        for t in tokens[1:]:
            # 不允许直接作用于 / 或 /etc 等
            for prefix in FORBIDDEN_PATH_PREFIXES:
                if t.startswith(prefix):
                    return False, f"Forbidden path in command: {t}"
            if t == "/" or t == "/*":
                return False, "Refusing to operate on root directory."

# 4) 其他简单规则，比如禁止后台无限进程
    if "&" in tokens:
        return False, "Background processes are not allowed."

return True, None
```

### 3.3 输出截断与时间限制

- 时间：`timeout=30` 或 `timeout=60` 秒
- 输出：截断到比如 8000 字符以内，避免 prompt 爆炸

```python
MAX_OUTPUT_CHARS = 8000

def truncate_output(s: str) -> str:
    if len(s) <= MAX_OUTPUT_CHARS:
        return s
    head = s[:4000]
    tail = s[-4000:]
    return head + "\n...[TRUNCATED]...\n" + tail
```

---

## 4. 一个最小 Python Demo（命令行版）

下面这段是一个**自包含**的 demo 结构：

- 使用一个假想的 `call_llm` 函数（你把它替换成你自己的 OpenAI/Anthropic SDK 调用即可）
- 支持：
  - 普通对话
  - 模型调用工具 `run_shell`
  - 后端执行 + 结果回注入

### 4.1 对话 & ReAct 循环框架

```python
import json
import os
import subprocess
import textwrap
import shlex
from typing import List, Dict, Any, Tuple

# ========== 配置区域 ==========
WORKSPACE_DIR = os.path.abspath("./workspace")  # 模型看到的“项目根”
os.makedirs(WORKSPACE_DIR, exist_ok=True)

MAX_OUTPUT_CHARS = 8000
SHELL_TIMEOUT_SEC = 30

# 这里你要改成自己真实调用 LLM 的函数
def call_llm(messages: List[Dict[str, Any]], tools: List[Dict[str, Any]]) -> Dict[str, Any]:
    """
    你需要在这里：
      - 调用真实的 LLM API（OpenAI / Anthropic / 本地）
      - 开启工具调用 / function calling 能力
      - 返回一个 dict，其中可能包含：
        - 正常回复（role=assistant, content=...）
        - 或一个工具调用意图（比如 {"tool": "run_shell", "arguments": {...}}）
    为了 demo，我们会写个伪返回结构说明用法。
    """
    raise NotImplementedError("请用你自己的 LLM SDK 实现 call_llm")

# ========== 安全策略：命令过滤 ==========

FORBIDDEN_SUBSTRINGS = [
    "rm -rf /",
    ":(){ :|:& };:",
    "mkfs", "fdisk", "mount ", "umount ",
    "shutdown", "reboot", "halt",
    "sudo ", "su ",
]

FORBIDDEN_PATH_PREFIXES = [
    "/etc", "/bin", "/sbin", "/usr", "/var", "/lib", "/root", "/home"
]

DANGEROUS_COMMANDS = ["rm", "mv", "chmod", "chown", "dd", "truncate"]

def is_command_safe(command: str) -> Tuple[bool, str | None]:
    cmd_lower = command.lower()

for bad in FORBIDDEN_SUBSTRINGS:
        if bad in cmd_lower:
            return False, f"Command contains forbidden pattern: {bad}"

try:
        tokens = shlex.split(command)
    except ValueError:
        return False, "Failed to parse command."

if not tokens:
        return False, "Empty command."

base_cmd = tokens[0]

if base_cmd in DANGEROUS_COMMANDS:
        for t in tokens[1:]:
            for prefix in FORBIDDEN_PATH_PREFIXES:
                if t.startswith(prefix):
                    return False, f"Forbidden path in command: {t}"
            if t in ("/", "/*"):
                return False, "Refusing to operate on root directory."

if "&" in tokens:
        return False, "Background processes are not allowed."

return True, None

def truncate_output(s: str) -> str:
    if len(s) <= MAX_OUTPUT_CHARS:
        return s
    head = s[:4000]
    tail = s[-4000:]
    return head + "\n...[TRUNCATED]...\n" + tail

# ========== 工具实现：run_shell ==========

def run_shell_command(command: str) -> Dict[str, Any]:
    safe, reason = is_command_safe(command)
    if not safe:
        return {
            "stdout": "",
            "stderr": f"[blocked by policy] {reason}",
            "exit_code": -1,
        }

try:
        completed = subprocess.run(
            ["/bin/bash", "-lc", command],
            cwd=WORKSPACE_DIR,
            capture_output=True,
            text=True,
            timeout=SHELL_TIMEOUT_SEC,
        )
        stdout = truncate_output(completed.stdout)
        stderr = truncate_output(completed.stderr)
        return {
            "stdout": stdout,
            "stderr": stderr,
            "exit_code": completed.returncode,
        }
    except subprocess.TimeoutExpired:
        return {
            "stdout": "",
            "stderr": f"[timeout] Command exceeded {SHELL_TIMEOUT_SEC}s limit.",
            "exit_code": -2,
        }

# ========== 工具描述：提供给 LLM 的 Schema ==========

TOOLS = [
    {
        "name": "run_shell",
        "description": "Execute a bash command in the project workspace. Use it to run tests, list files, compile, etc.",
        "parameters": {
            "type": "object",
            "properties": {
                "command": {
                    "type": "string",
                    "description": "Shell command to run. Assume bash -lc is used. Use relative paths within the workspace.",
                }
            },
            "required": ["command"],
        },
    }
]

# ========== 主循环：处理用户输入 & 工具调用 ==========

def main():
    print(f"Workspace directory: {WORKSPACE_DIR}")
    print("Type 'exit' to quit.\n")

messages: List[Dict[str, Any]] = [
        {
            "role": "system",
            "content": textwrap.dedent(
                """
                You are a coding assistant with access to a restricted shell tool `run_shell`.
                - The project workspace root is at a fixed directory. Only use relative paths.
                - Before running build/tests, inspect the project (list files, read config, etc.).
                - Prefer safe read-only commands. Use destructive commands (rm/mv) only if absolutely needed.
                - After using tools, explain to the user what you did and the results.
                """
            ).strip(),
        }
    ]

while True:
        user_input = input("User: ").strip()
        if user_input.lower() in {"exit", "quit"}:
            break

messages.append({"role": "user", "content": user_input})

# 每轮可能需要多次“工具调用 -> 结果 -> 再调用”，简单实现就：一轮只允许最多 N 次工具调用
        tool_steps_remaining = 5

while True:
            # 1) 调用 LLM
            response = call_llm(messages, TOOLS)

# 你需要设计 call_llm 的返回结构。这里假设有三种情况：
            # - {"type": "assistant", "content": "..."} 正常回答
            # - {"type": "tool_call", "tool_name": "...", "tool_args": {...}}
            # - {"type": "assistant_final", "content": "..."} 本轮结束的最终回答

kind = response.get("type")

if kind == "assistant":
                # 中间解释 / 提前说点话（可选）
                content = response["content"]
                print(f"Assistant: {content}")
                messages.append({"role": "assistant", "content": content})
                break  # 结束本轮

elif kind == "assistant_final":
                content = response["content"]
                print(f"Assistant: {content}")
                messages.append({"role": "assistant", "content": content})
                break  # 本轮结束

elif kind == "tool_call":
                if tool_steps_remaining <= 0:
                    # 超出工具调用上限，强制结束
                    print("[system] Tool call limit reached for this turn.")
                    messages.append({
                        "role": "assistant",
                        "content": "I have reached the maximum number of tool calls for this turn."
                    })
                    break

tool_steps_remaining -= 1
                tool_name = response["tool_name"]
                tool_args = response.get("tool_args", {})

if tool_name == "run_shell":
                    command = tool_args.get("command", "")
                    print(f"[tool] run_shell: {command}")
                    result = run_shell_command(command)

# 打印一点输出，方便你在终端调试
                    print("  [stdout]:")
                    print(textwrap.indent(result["stdout"], "    "))
                    print("  [stderr]:")
                    print(textwrap.indent(result["stderr"], "    "))
                    print(f"  [exit_code]: {result['exit_code']}")

# 把工具执行结果注入 messages
                    messages.append({
                        "role": "tool",
                        "tool_name": "run_shell",
                        "content": json.dumps(result, ensure_ascii=False),
                    })
                    # 然后继续 while True，让 LLM 再思考，用工具结果作为 observation

else:
                    print(f"[system] Unknown tool: {tool_name}")
                    messages.append({
                        "role": "assistant",
                        "content": f"I tried to use an unknown tool: {tool_name}."
                    })
                    break

else:
                print(f"[system] Unknown response type: {kind}")
                break

if __name__ == "__main__":
    main()
```

### 4.2 `call_llm` 的实现思路（示意）

用 OpenAI 的风格举个伪例子（你需要自行替换为真实 SDK）：

```python
from openai import OpenAI

client = OpenAI(api_key="YOUR_KEY")

def call_llm(messages, tools):
    # 把内部 messages 转成 OpenAI 所需格式
    # 这里简单示意，不考虑 tool 消息的全规范
    openai_messages = []
    for m in messages:
        role = m["role"]
        if role == "tool":
            # 视作 assistant 的 tool result
            openai_messages.append({
                "role": "tool",
                "content": m["content"],
                "name": m.get("tool_name", "unknown_tool")
            })
        else:
            openai_messages.append({"role": role, "content": m["content"]})

# 把 tools 传给 model，启用 tool_choice="auto"
    resp = client.chat.completions.create(
        model="gpt-4.1",
        messages=openai_messages,
        tools=[
            {
                "type": "function",
                "function": t,
            }
            for t in tools
        ],
        tool_choice="auto",
    )

choice = resp.choices[0]
    msg = choice.message

# 如果模型要调用工具
    if msg.tool_calls:
        # 这里只处理一个 tool_call，真实实现应支持多个
        tool_call = msg.tool_calls[0]
        fn = tool_call.function
        tool_name = fn.name
        args = json.loads(fn.arguments)
        return {
            "type": "tool_call",
            "tool_name": tool_name,
            "tool_args": args,
        }

# 否则是正常回复
    content = msg.content
    # 你可以明显区分“中间回复”和“最终回复”，这里简单都当最终
    return {
        "type": "assistant_final",
        "content": content,
    }
```

如果你用的是 Anthropic/其他，只要能：

- 注册工具（function）
- 接收模型发来的工具调用意图
- 把工具执行结果再作为 message 塞回去

就能跑同样的框架。

---

## 5. 如何往「Claude Code」方向进化

在这个最小 demo 的基础上，你可以逐步叠：

1. **再加几个工具**  
   - `read_file(path)` / `write_file(path, content)`  
   - `list_dir(path)`  
   这样模型就能不光跑 shell，还能读写工程文件，形成完整 Code Agent。

2. **更完善的安全策略**  
   - 真正用容器 / namespace 隔离 workspace
   - 严格禁止访问系统目录，只挂载一个临时目录
   - 命令审计日志

3. **更丰富的 ReAct 控制**  
   - 限制「每轮最多 N 次工具调用 / 每会话最多 M 次」
   - 在系统提示里明确写清「先读工程，再操作，再验证」。

4. **IDE 集成**  
   - 把 `workspace` 替换成用户项目目录（本地 agent）
   - 或者通过 API 同步文件内容（远程）。

配图 (可多选)

选择新图片文件或拖拽到此处

标签