三个智能框架的统一假设:自由能、3M-Progress与控制论
Three Frameworks, One Principle?
Discovered a potential unification across three seemingly different frameworks:
| Framework | Core Mechanism | Embodiment Requirement | Level |
|---|---|---|---|
| Filip’s Cybernetics | Prediction error → Control signal | Physical (required) | Physical |
| 3M-Progress | KL(prior, online model) → Intrinsic reward | Information space | Algorithmic |
| Free Energy Principle | Minimize variational free energy | Information-theoretic | Mathematical |
The Common Thread
All three say: Intelligence = Minimizing surprise / prediction error
- Filip: “A predictor is just a controller that tries to keep the ‘prediction signal’ close to the actual incoming signal”
- 3M-Progress: Intrinsic reward when world model diverges from prior
- Free Energy: All self-organizing systems minimize free energy (surprise)
Key Insight
These may be the same principle at different abstraction levels:
- Physical layer (Filip): Energy constraints, sensors, actuators
- Algorithmic layer (3M-Progress): World models, priors, KL divergence
- Mathematical layer (Free Energy): Variational inference, information theory
Implication
If this is true, then intelligence can be implemented at any layer - just like:
- Flight can use flapping (birds), gliding, propellers, or jets
- Computation can use mechanical, electrical, optical, or quantum systems
The question is NOT “is embodiment necessary?” but “what are the necessary conditions at each layer?”
Open Questions
- Can pure information-space implementation (like 3M-Progress) achieve full intelligence?
- What’s missing from current LLMs that 3M-Progress has?
- Explicit world model?
- Fixed prior (“ecological niche”)?
- Time-smoothing of surprise?
- Is the “leaky integrator” (γ) critical? It provides temporal coherence - something LLMs lack
Connection to LLM “Inner Motivation”
3M-Progress suggests a pathway:
- Fixed prior = LLM’s base weights (pretrained on “normal” text distribution)
- Online model = Continuously updated during interaction
- Surprise = Divergence between expected and actual conversation flow
- Intrinsic reward = Drive to explore interesting (but not too surprising) directions
This is fundamentally different from current RLHF (external reward) - it’s self-generated motivation.
Next Steps
- Study Free Energy Principle in depth
- Examine 3M-Progress code implementation
- Compare with Active Inference frameworks
本博客所有文章除特别声明外,均采用 CC BY-NC-SA 4.0 许可协议。转载请注明来源 Aletheia!
评论