Welcome to the Era of Experience 欢迎来到体验时代

<aside> 💡

原文链接:https://storage.googleapis.com/deepmind-media/Era-of-Experience /The Era of Experience Paper.pdf

</aside>

Abstract 摘要

We stand on the threshold of a new era in artificial intelligence that promises to achieve an unprecedented level of ability. A new generation of agents will acquire superhuman capabilities by learning predominantly from experience. This note explores the key characteristics that will define this upcoming era.

我们正站在人工智能新时代的门槛上,这个时代将开启前所未有的能力水平。新一代智能体主要通过体验学习,从而获得超越人类的能力。本文将探讨这个即将到来的时代的关键特征。

The Era of Human Data 人类数据时代

Artificial intelligence (AI) has made remarkable strides over recent years by training on massive amounts of human-generated data and fine-tuning with expert human examples and preferences. This approach is exemplified by large language models (LLMs) that have achieved a sweeping level of generality. A single LLM can now perform tasks spanning from writing poetry and solving physics problems to diagnosing medical issues and summarising legal documents.

近年来,人工智能(AI)在海量人类数据的训练基础上,结合专家示例和偏好进行微调,取得了显著进展。大型语言模型(LLMs)是这一方法的典型代表,展现出了惊人的通用性。如今,单个 LLM 就能胜任从诗歌创作、物理问题求解到医疗诊断和法律文件总结等多样化任务。

However, while imitating humans is enough to reproduce many human capabilities to a competent level, this approach in isolation has not and likely cannot achieve superhuman intelligence across many important topics and tasks. In key domains such as mathematics, coding, and science, the knowledge extracted from human data is rapidly approaching a limit. The majority of high-quality data sources - those that can actually improve a strong agent’s performance - have either already been, or soon will be consumed. The pace of progress driven solely by supervised learning from human data is demonstrably slowing, signalling the need for a new approach. Furthermore, valuable new insights, such as new theorems, technologies or scientific breakthroughs, lie beyond the current boundaries of human understanding and cannot be captured by existing human data.

虽然模仿人类可以在许多领域达到合格水平,但仅靠这种方法无法在诸多重要领域和任务中实现超人类智能。在数学、编程和科学等关键领域,从人类数据中提取的知识正迅速趋近极限。

大多数高质量数据源(那些能真正提升强大智能体性能的资源)已经或即将耗尽。单纯依靠人类数据的监督学习所带来的进步明显放缓,这表明我们需要一种新方法。更重要的是,新定理、技术或科学突破等创新性发现往往超出了当前人类认知的界限,无法从现有人类数据中获取。

The Era of Experience 体验时代

To progress significantly further, a new source of data is required. This data must be generated in a way that continually improves as the agent becomes stronger; any static procedure for synthetically generating data will quickly become outstripped. This can be achieved by allowing agents to learn continually from their own experience, i.e., data that is generated by the agent interacting with its environment. AI is at the cusp of a new period in which experience will become the dominant medium of improvement and ultimately dwarf the scale of human data used in today’s systems.

要实现重大突破,我们需要一种新的数据来源。这些数据必须能随着智能体能力的提升而不断进化;任何静态的数据合成方法都将很快失去价值。让智能体通过与环境互动产生的体验数据不断学习,可以实现这一目标。人工智能正站在新时代的门槛上——在这个时代,体验将成为推动进步的主要动力,其规模终将远超当今系统所使用的人类数据

This transition may have already started, even for the large language models that epitomise human-centric AI. One example is in the capability of mathematics. AlphaProof [20] recently became the first program to achieve a medal in the International Mathematical Olympiad, eclipsing the performance of human-centric approaches [27, 19]. Initially exposed to around a hundred thousand formal proofs, created over many years

这一转变已经开始显现,即使在大型语言模型这类人工智能典范中也能看到端倪。以数学能力为例,AlphaProof近期成为首个在国际数学奥林匹克竞赛中摘得奖牌的程序,其表现超越了传统的人类导向方法。该程序初期仅接触了约十万个多年来积累的形式化证明。

by human mathematicians, AlphaProof’s reinforcement learning (RL) algorithm subsequently generated a hundred million more through continual interaction with a formal proving system. This focus on interactive experience allowed AlphaProof to explore mathematical possibilities beyond the confines of pre-existing formal proofs, so as to discover solutions to novel and challenging problems. Informal mathematics has also achieved success by replacing expert generated data with self-generated data; for example, recent work from DeepSeek “underscores the power and beauty of reinforcement learning: rather than explicitly teaching the model on how to solve a problem, we simply provide it with the right incentives, and it autonomously develops advanced problem-solving strategies.” [10]

在人类数学家奠定的基础上,AlphaProof 的强化学习(RL)算法通过持续与形式化证明系统交互,生成了上亿条证明路径。这种注重交互体验的方法让 AlphaProof 得以突破现有形式化证明的界限,探索数学新领域,进而解决新颖而复杂的难题。非形式化数学领域也通过以自主生成数据取代专家标注数据取得了成功;正如 DeepSeek 的最新研究所示:"强化学习展现出独特的力量与优雅——我们只需提供恰当的激励,无需明确指导模型解题方法,它就能自主发展出高级的问题解决策略。"

Our contention is that incredible new capabilities will arise once the full potential of experiential learning is harnessed. This era of experience will likely be characterised by agents and environments that, in addition to learning from vast quantities of experiential data, will break through the limitations of human-centric AI systems in several further dimensions:

我们认为,一旦充分发挥体验式学习的潜力,智能将会涌现出惊人的新能力。在这个体验时代,智能体与环境将展现出全新特征——它们不仅能从海量体验数据中学习,还将在多个维度上突破以人类为中心的 AI 系统的局限: