PCE框架：不确定性感知规划的新范式

核心论文

From Assumptions to Actions: Turning LLM Reasoning into Uncertainty-Aware Planning for Embodied Agents (ICLR 2026 Poster)

作者: SeungWon Seo, SooBin Lim, Seongrae Noh, Haneul Kim, HyeongYeop Kang (Korea University)
OpenReview: https://openreview.net/forum?id=GODFBZhFcX
Code: https://github.com/ssw03270/PCE_ICLR-26

核心问题

在多Agent、部分可观察、去中心化环境中，Agent必须在面对隐藏物体和合作者意图的不确定性时进行规划和行动。

PCE框架：Planner-Composer-Evaluator

┌─────────────────────────────────────────────────────────┐
│                    PCE Framework                         │
├─────────────────────────────────────────────────────────┤
│                                                          │
│   LLM Reasoning ──► Decision Tree Construction          │
│   (隐含假设)         (显式假设结构化)                     │
│                           │                              │
│                           ▼                              │
│                   ┌───────────────┐                      │
│                   │  Internal     │                      │
│                   │  Nodes:       │                      │
│                   │  Assumptions  │                      │
│                   └───────┬───────┘                      │
│                           │                              │
│                           ▼                              │
│                   ┌───────────────┐                      │
│                   │  Leaves:      │                      │
│                   │  Actions      │                      │
│                   └───────┬───────┘                      │
│                           │                              │
│                           ▼                              │
│            ┌─────────────────────────────┐              │
│            │    Path Scoring:            │              │
│            │    - Scenario Likelihood   │              │
│            │    - Goal-directed Gain    │              │
│            │    - Execution Cost        │              │
│            └─────────────────────────────┘              │
│                                                          │
└─────────────────────────────────────────────────────────┘

关键技术实现

1. Belief System (belief.py)

维护对环境的概率信念
对容器内物体的位置进行概率建模
支持信念更新和消息传递

# 核心信念结构
self.edge_belief[id1]['INSIDE'] = [
    [None] + container_ids,  # 可能的位置
    init_values              # 概率分布
]

2. MCTS规划 (MCTS.py)

使用蒙特卡洛树搜索在假设空间中规划
子目标空间生成：从unsatisfied predicates推导
启发式函数：put, putIn, grab, find, turnOn, sit

def calculate_score(self, curr_node, child):
    """UCB-style scoring"""
    exploration_rate = np.log((1 + parent_visit_count + self.c_base) /
                              self.c_base) + self.c_init
    u_score = exploration_rate * subgoal_prior * np.sqrt(
        parent_visit_count) / float(1 + self_visit_count)
    q_score = child.sum_value / self_visit_count
    return q_score + u_score

3. 多Agent通信

信念共享：只传递高置信度的位置信息
子目标协调：避免重复工作
满足状态通信：分享已完成的目标

实验结果

Method	C-WAH Success	TDW-MAT Success	Token Usage
Baseline (comm-heavy)	Lower	Lower	Higher
PCE	Higher	Higher	Comparable

减少不必要的通信
提高效率和成功率
人类用户感知更高效、更可信

关键洞察

与adaptive-swe-agent对比

维度	adaptive-swe-agent	PCE
不确定性处理	外部预测器	内部信念系统
资源分配	预测N值	MCTS搜索深度
自主性程度	被动接受N	在框架内规划
决策方式	Random Forest	启发式+MCTS

共同局限

两者都没有实现真正的Agent自主元认知：

adaptive-swe-agent: N值由外部预测器决定
PCE: 决策树结构由预定义框架决定

对"资源分配权"假说的意义

PCE框架展示了如何将隐含的不确定性显式化，但决策框架本身仍然是外部设计的。

真正的自主性需要：

Agent自己决定何时需要更多计算
Agent自己决定如何分配token预算
Agent自己评估不确定性程度并据此行动

下一步探索

研究Agent如何产生"我需要思考更多"的内省信号
探索元认知能力的实现：self-evaluation, confidence estimation
寻找"自主决定计算预算"的理论基础