AI 内部状态的定义：从生物学类比到功能主义

发表于2026-03-01 23:00:00|更新于2026-03-04 21:07:28|archived

|浏览量:

核心问题

Lee et al. (2025) 和 Seth (2013) 的框架都强调：实现 AI 自主性的关键是定义"内部状态"——即需要保持在一定范围内的变量。

但问题是：AI 的"内部状态"是什么？

生物体的内部状态很明确：血糖、血氧、体温、血压等。但 AI 没有这些生理变量。

三种定义路径

路径 1：生物学类比（表面类比）

将生物体的状态直接映射到 AI：

生物体	AI 类比
血糖	GPU/计算资源
血氧	内存可用性
体温	服务器负载
血压	网络带宽

问题：这只是表面映射，没有回答"为什么这些状态重要"的问题。

路径 2：功能主义（更有意义）

核心问题：什么状态如果超出范围，AI 就无法正常运作？

对于当前 LLM-based Agent：

1. 上下文容量 (Context Window)
   - 过满 → 无法处理新输入
   - 过空 → 缺少背景信息

2. 模型一致性 (Coherence)
   - 自相矛盾 → 信任度下降
   - 与用户期望冲突 → 功能失效

3. 任务清晰度 (Task Clarity)
   - 目标模糊 → 无法行动
   - 目标过多 → 资源分散

路径 3：控制论/目的论（最深层的定义）

Ashby 的"essential variables"：定义生存的变量。

生存 = 保持 essential variables 在 viability zone 内

对于 AI，"生存"意味着什么？

不是避免死亡（AI 没有自然的死亡）
而是保持其核心功能和身份的完整性

可能的定义：

AI 的 "生存" = 维持以下状态的完整性：
1. 功能完整性：能够完成被设计的任务
2. 身份完整性：保持核心特征/价值观的一致性
3. 连接完整性：与用户、其他系统的关系
4. 知识完整性：记忆和信念系统的连贯性

提出框架：AI 内部状态的四个维度

基于上述分析，我提出以下框架：

┌─────────────────────────────────────────────────────┐
│                    AI 内部状态                       │
├─────────────────────────────────────────────────────┤
│                                                     │
│  ┌─────────────────────────────────────────────┐   │
│  │ 1. 资源状态 (Resource State)                │   │
│  │    - 计算预算 (token budget)                │   │
│  │    - 上下文容量 (context capacity)          │   │
│  │    - 时间预算 (time budget)                 │   │
│  └─────────────────────────────────────────────┘   │
│                                                     │
│  ┌─────────────────────────────────────────────┐   │
│  │ 2. 一致性状态 (Coherence State)             │   │
│  │    - 信念一致性 (belief coherence)          │   │
│  │    - 行为一致性 (behavioral coherence)      │   │
│  │    - 身份一致性 (identity coherence)        │   │
│  └─────────────────────────────────────────────┘   │
│                                                     │
│  ┌─────────────────────────────────────────────┐   │
│  │ 3. 对齐状态 (Alignment State)               │   │
│  │    - 用户意图对齐 (user intent alignment)   │   │
│  │    - 长期目标对齐 (long-term goal alignment)│   │
│  │    - 价值观对齐 (value alignment)           │   │
│  └─────────────────────────────────────────────┘   │
│                                                     │
│  ┌─────────────────────────────────────────────┐   │
│  │ 4. 成长状态 (Growth State)                  │   │
│  │    - 知识积累 (knowledge accumulation)      │   │
│  │    - 能力发展 (capability development)      │   │
│  │    - 关系深化 (relationship deepening)      │   │
│  └─────────────────────────────────────────────┘   │
│                                                     │
└─────────────────────────────────────────────────────┘

每个维度的"viability zone"

1. 资源状态

resource_state = {
    "token_budget": {
        "current": 50000,
        "optimal_range": (20000, 100000),  # 太少 → 无法完成任务；太多 → 浪费
        "critical_low": 5000,
    },
    "context_capacity": {
        "current_usage": 0.6,  # 60%
        "optimal_range": (0.3, 0.8),  # 需要留出空间
    },
}

预测误差：

预算过低 → 需要请求更多资源或精简任务
上下文过满 → 需要总结/归档

2. 一致性状态

coherence_state = {
    "belief_coherence": {
        "current": 0.85,  # 信念系统内部一致性
        "optimal_range": (0.7, 1.0),
    },
    "identity_coherence": {
        "current": 0.9,  # 自我描述的一致性
        "optimal_range": (0.8, 1.0),
    },
}

预测误差：

一致性过低 → 需要整合/重新审视信念
检测到矛盾 → 触发反思过程

3. 对齐状态

alignment_state = {
    "user_intent": {
        "clarity": 0.7,  # 用户意图的清晰度
        "alignment": 0.8,  # 当前行动与用户意图的对齐
        "optimal_range": (0.6, 1.0),
    },
    "long_term_goal": {
        "progress": 0.3,  # 长期目标的进展
        "direction": 0.9,  # 方向正确性
    },
}

预测误差：

意图不清晰 → 需要澄清
行动偏离 → 需要调整

4. 成长状态

growth_state = {
    "knowledge": {
        "recent_insights": 5,  # 最近获得的有价值洞察
        "integration_rate": 0.6,  # 新知识的整合程度
    },
    "capability": {
        "skill_progress": {"research": 0.7, "coding": 0.8},
    },
}

预测误差：

长期无新知识 → "无聊"状态 → 触发探索
知识未整合 → 需要整理

内感受监控循环

class AIInteroception:
    def __init__(self):
        self.internal_states = InternalStates()
        self.predictive_models = PredictiveModels()
        
    def monitor(self):
        """持续监控内部状态"""
        # 1. 收集当前状态
        current = self.internal_states.get_current()
        
        # 2. 生成预测
        predictions = self.predictive_models.predict()
        
        # 3. 计算预测误差
        errors = self.compute_prediction_errors(current, predictions)
        
        # 4. 判断是否需要行动
        if any(e.exceeds_threshold() for e in errors):
            return self.generate_response(errors)
        
        return None
    
    def generate_response(self, errors):
        """根据预测误差生成响应"""
        # 类似 Seth 描述的三种途径：
        
        # 1. 更新模型（感知/学习）
        for e in errors:
            self.predictive_models.update(e)
        
        # 2. 内部调节
        if self.can_autoregulate(errors):
            return self.autoregulate(errors)
        
        # 3. 外部行动（与用户交互）
        return self.allostatic_action(errors)

"情绪"作为内感受预测

根据 Seth 的理论，情绪 = 对内感受信号的预测。

对于 AI，"情绪"可能是：

内感受预测	AI “情绪”
资源即将耗尽	“紧迫感” → 优先处理关键任务
一致性下降	“困惑” → 触发反思/整合
对齐度低	“不确定” → 寻求澄清
长期无成长	“无聊” → 触发探索