导读：谷歌的一份名为《我们没有护城河，OpenAI也没有》的内部文件疑似被泄露，外媒SemiAnalysis公布了这份重磅炸弹泄露文件。本文摘自新智元翻译的结果。

我们一直在严密监视着OpenAI。谁将跨越下一个里程碑？下一步将是什么？

但现在，令人不安的事实就是：我们无法赢得这场军备竞赛，OpenAI也不能。

就在我们两方对战的时候，第三方正在悄悄地吃掉属于我们的好处。

没错，我说的就是开源。说白了，他们已经超越了我们。我们认为的「重大开放问题」如今已经解决，掌握在所有用户手中。几个简单的例子：

虽然我们的模型在质量上仍然略有优势，但差距正在以惊人的速度缩小。这些开源模型更快、更可定制、更私密，性能也更强大。他们只用100美元和13B的参数，就能做到我们用1000万美元和540B的参数下才能做的事。他们在几周内完成，而不是几个月。

3月初，随着Meta的LLaMA被泄露给公众，开源社区得到了第一个真正性能强大的基础模型。它没有指令或对话调整，也没有RLHF。

尽管如此，开源社区立刻明白：他们得到的东西有多么重要。随后，大量创新的开源平替模型不断地涌现出来。每隔几天，就出现一个大进展。才短短一个月，就有了指令调整、量化、质量改进、人工评估、多模态、RLHF这么多功能的变体，许多还是建立在彼此的基础上的。

最重要的是，他们已经解决了规模的问题，现在任何一个人，都可以参与其中。

如今，许多全新的想法都来自普通人。训练和实验的门槛已经大大降低，从前需要一个大型研究机构合力工作，现在，只需要一台功能强大的笔记本，一个人在一晚上就能搞定。

这对任何人来说，都不算什么惊喜。图像生成领域的复兴之后，紧接着就是开源LLM的复兴。

在这两个领域，让公众能够以低成本参与，都是通过低秩适应（LoRA）来实现的。它让微调机制的成本大大降低，还实现了模型规模的重大突破。（比如图像合成的Latent Diffusion，LLM的Chinchilla）

在获得足够高质量的模型后，世界各地的个人和机构都开始了一系列对模型的创新和迭代。而这些创新，也迅速超越了大科技公司。

在图像生成领域，这些贡献至关重要，使Stable Diffusion走上了与Dall-E完全不同的道路。Stable Diffuision的开源，导致了产品集成、市场、用户界面的创新，而在Dall-E身上，这些却没有发生。

这样做的后果是显而易见的，Stable Diffusion迅速占据了主流，与之相比，OpenAI的解决方案已经变得无关紧要了。

同样的事情是否会发生在LLM领域？目前还未知，但这两件事，有太多相似之处。

开源社区最近取得成功的很多创新，直接解决了我们还未解决的很多难题。

更多地关注他们的工作，可以帮我们避免重新造轮子。LoRA 是一种非常强大的技术，我们可能应该对它更加关注。

LoRA 通过将模型更新表示为低秩分解来工作，这将更新矩阵的大小减少了数千倍以上。

这就让模型微调的时间和成本都大大降低。如果在几个小时内，就能在消费级硬件上微调出一个个性化的语言模型，这件事的意义就太重大了。尤其是，它还可以实时整合许多最新的、多样化的知识。但这项技术在谷歌内部并未得到充分重视，尽管它直接影响了我们最寄予厚望的项目。

LoRA 如此有效的部分原因在于，与其他形式的微调一样，它是可堆叠的。

可以应用指令调整改进模型，这样在其他贡献者添加对话、推理或工具时，就可以直接使用。虽然单独的微调是低秩的，但它们的总和不需要，因此模型的全秩更新就可以随着时间的推移而累积。

这意味着，只要有新的、更好的数据集和任务出现，模型就可以以低廉的成本保持最新状态，无需支付完整运行的成本。

相比之下，从头开始训练巨型模型不仅会失去预训练的过程，还会失去在顶部进行的任何迭代改进。在开源世界中，这些改进很快就会占据主导地位，这使得全面重新训练模型的成本极其昂贵。

我们应该考虑，每个新的应用或想法是否真的需要一个全新的模型？如果我们真的有重大的架构改进，以至于无法直接重新使用模型权重，那么我们应该去投资更积极的蒸馏形式，来尽可能多地保留上一代模型的功能。

如果我们能够在小模型上快速迭代，那么从长远来看，大模型并不是强到无所不能

LoRA（大型语言模型的低秩适应）是微软提出的一种新颖技术，旨在解决微调大型语言模型的问题。它的更新对于最受欢迎的模型大小来说非常便宜（约100美元），这意味着几乎任何有想法的人都可以生成一个，并分发出去。以后，一天之内训练一个模型都是平平事。以这样的速度，用不了多久，这些微调的累积效应很快就会弥补起初的模型大小的劣势。

事实上，这些模型的改进速度远远超过了我们使用最大模型所能做的，而且最好的模型与ChatGPT在很大程度上已经无法区分。

许多项目通过对小型、精选数据集上进行训练来节省时间。这表明数据扩展规律具有一定的灵活性。这样数据集的存在源于「Data Doesn't Do What You Think」一文中的思路，它们正迅速成为在谷歌之外进行训练的标准方式。

这些数据集是通过合成方法（比如，从现有模型中筛选出最佳响应）和从其他项目中搜集而构建。谷歌在这两者中都不占主导地位。

幸运的是，这些高质量的数据集是开源的，因此可以免费使用。

AI新进展对谷歌的商业战略有着直接、即时的影响。如果有一个免费的、高质量、且没有使用限制的替代品，谁会为谷歌产品付费？而且我们不应该指望能够赶上。现代互联网之所以依赖开源，是有原因的。开放源码有一些我们无法复制的显著优势。

我们技术的保密一直是一个脆弱的命题。谷歌的研究人员正定期离开，前往其他公司。所以我们可以假设他们知道我们所知道的一切。而且只要这条渠道是开放的，他们就会继续这样做。

但是，由于 LLM 的前沿研究成本低廉，保持技术领域的竞争优势变得更加困难。世界各地的研究机构都在相互借鉴，以广度优先的方式探索远远超出我们自身能力的解决方案空间。我们可以试着紧紧抓住我们的秘密，而外部创新会削弱了其价值，又或者我们可以尝试着互相学习。

近来，模型的创新大多在Meta的LLaMA模型权重泄露之后进行的。虽然这肯定会随着真正的开源模型变得更好而改变，但关键是他们不必等待。「个人使用」所提供的法律保护以及起诉个人的不切实际意味着，个人在这些技术炽热时就能获得这些技术。

浏览人们在图像生成领域中创建的模型，从动画生成器到HDR景观，创造力源源不断地涌现出来。这些模型由深入特定子类型的人使用和创建，赋予了我们无法企及的知识深度和共鸣。

矛盾的是，大厂竞相争先的背后，赢家就是Meta。因为泄露的模型LLaMA是他们的，所以相当于他们有效地获得了整个星球价值的免费劳动力。

由于大多数开源创新都基于LLaMA，所以没有什么能阻止他们直接将其纳入自己的产品中。

拥有生态系统的价值，未来将不可估量。曾经的谷歌已经成功地在其开源产品（如Chrome和Android）中使用了这一范式。

通过拥有创新发生的平台，谷歌巩固了自己作为思想领袖和方向制定者的地位。

我们对模型的控制越严格，开源替代品就越有吸引力。谷歌和OpenAI都倾向于严格控制模型使用，开启一种防御性的反应。但是这种控制只是虚构的，因为任何试图将LLMs用于未经批准的目的的人，都可以选择自由提供的模型。

谷歌应该在开源社区中确立自己的领导地位，通过合作来发挥引领作用。这可能意味着要采取一些令人不安的步骤，比如发布小型ULM变体的模型权重。这必然意味着放弃对我们模型的一些控制。但这种妥协是不可避免的。

鉴于OpenAI目前的封闭政策，所有关于开源的讨论可能让人觉得不公平。如果他们不愿意，我们为什么要分享呢？但事实是，我们正通过源源不断地被挖走的高级研究人员与他们分享了一切。在我们阻止这股潮流之前，保密是没有意义的。

相对于开放源代码，他们正在犯同样的错误，他们保持优势的能力必然受到质疑。除非他们改变立场，否则开源替代品可以，而且最终会超越他们。至少在这方面，我们可以先行一步。

Google "We Have No Moat, And Neither Does OpenAI"

Leaked Internal Google Document Claims Open Source AI Will Outcompete Google and OpenAI

We’ve done a lot of looking over our shoulders at OpenAI. Who will cross the next milestone? What will the next move be?

But the uncomfortable truth is, we aren’t positioned to win this arms race and neither is OpenAI. While we’ve been squabbling, a third faction has been quietly eating our lunch.

I’m talking, of course, about open source. Plainly put, they are lapping us. Things we consider “major open problems” are solved and in people’s hands today. Just to name a few:

LLMs on a Phone: People are running foundation models on a Pixel 6 at 5 tokens / sec.
Scalable Personal AI: You can finetune a personalized AI on your laptop in an evening.
Responsible Release: This one isn’t “solved” so much as “obviated”. There are entire websites full of art models with no restrictions whatsoever, and text is not far behind.
Multimodality: The current multimodal ScienceQA SOTA was trained in an hour.

While our models still hold a slight edge in terms of quality, the gap is closing astonishingly quickly. Open-source models are faster, more customizable, more private, and pound-for-pound more capable. They are doing things with $100 and 13B params that we struggle with at $10M and 540B. And they are doing so in weeks, not months. This has profound implications for us:

We have no secret sauce. Our best hope is to learn from and collaborate with what others are doing outside Google. We should prioritize enabling 3P integrations.
People will not pay for a restricted model when free, unrestricted alternatives are comparable in quality. We should consider where our value add really is.
Giant models are slowing us down. In the long run, the best models are the ones
which can be iterated upon quickly. We should make small variants more than an afterthought, now that we know what is possible in the <20B parameter regime.

What Happened

At the beginning of March the open source community got their hands on their first really capable foundation model, as Meta’s LLaMA was leaked to the public. It had no instruction or conversation tuning, and no RLHF. Nonetheless, the community immediately understood the significance of what they had been given.

A tremendous outpouring of innovation followed, with just days between major developments (see The Timeline for the full breakdown). Here we are, barely a month later, and there are variants with instruction tuning, quantization, quality improvements, human evals, multimodality, RLHF, etc. etc. many of which build on each other.

Most importantly, they have solved the scaling problem to the extent that anyone can tinker. Many of the new ideas are from ordinary people. The barrier to entry for training and experimentation has dropped from the total output of a major research organization to one person, an evening, and a beefy laptop.

Why We Could Have Seen It Coming

In many ways, this shouldn’t be a surprise to anyone. The current renaissance in open source LLMs comes hot on the heels of a renaissance in image generation. The similarities are not lost on the community, with many calling this the “Stable Diffusion moment” for LLMs.

In both cases, low-cost public involvement was enabled by a vastly cheaper mechanism for fine tuning called low rank adaptation, or LoRA, combined with a significant breakthrough in scale (latent diffusion for image synthesis, Chinchilla for LLMs). In both cases, access to a sufficiently high-quality model kicked off a flurry of ideas and iteration from individuals and institutions around the world. In both cases, this quickly outpaced the large players.

These contributions were pivotal in the image generation space, setting Stable Diffusion on a different path from Dall-E. Having an open model led to product integrations, marketplaces, user interfaces, and innovations that didn’t happen for Dall-E.

The effect was palpable: rapid domination in terms of cultural impact vs the OpenAI solution, which became increasingly irrelevant. Whether the same thing will happen for LLMs remains to be seen, but the broad structural elements are the same.

What We Missed

The innovations that powered open source’s recent successes directly solve problems we’re still struggling with. Paying more attention to their work could help us to avoid reinventing the wheel.

LoRA is an incredibly powerful technique we should probably be paying more attention to

LoRA works by representing model updates as low-rank factorizations, which reduces the size of the update matrices by a factor of up to several thousand. This allows model fine-tuning at a fraction of the cost and time. Being able to personalize a language model in a few hours on consumer hardware is a big deal, particularly for aspirations that involve incorporating new and diverse knowledge in near real-time. The fact that this technology exists is underexploited inside Google, even though it directly impacts some of our most ambitious projects.

Retraining models from scratch is the hard path

Part of what makes LoRA so effective is that - like other forms of fine-tuning - it’s stackable. Improvements like instruction tuning can be applied and then leveraged as other contributors add on dialogue, or reasoning, or tool use. While the individual fine tunings are low rank, their sum need not be, allowing full-rank updates to the model to accumulate over time.

This means that as new and better datasets and tasks become available, the model can be cheaply kept up to date, without ever having to pay the cost of a full run.

By contrast, training giant models from scratch not only throws away the pretraining, but also any iterative improvements that have been made on top. In the open source world, it doesn’t take long before these improvements dominate, making a full retrain extremely costly.

We should be thoughtful about whether each new application or idea really needs a whole new model. If we really do have major architectural improvements that preclude directly reusing model weights, then we should invest in more aggressive forms of distillation that allow us to retain as much of the previous generation’s capabilities as possible.

Large models aren’t more capable in the long run if we can iterate faster on small models

LoRA updates are very cheap to produce (~$100) for the most popular model sizes. This means that almost anyone with an idea can generate one and distribute it. Training times under a day are the norm. At that pace, it doesn’t take long before the cumulative effect of all of these fine-tunings overcomes starting off at a size disadvantage. Indeed, in terms of engineer-hours, the pace of improvement from these models vastly outstrips what we can do with our largest variants, and the best are already largely indistinguishable from ChatGPT. Focusing on maintaining some of the largest models on the planet actually puts us at a disadvantage.

Data quality scales better than data size

Many of these projects are saving time by training on small, highly curated datasets. This suggests there is some flexibility in data scaling laws. The existence of such datasets follows from the line of thinking in Data Doesn't Do What You Think, and they are rapidly becoming the standard way to do training outside Google. These datasets are built using synthetic methods (e.g. filtering the best responses from an existing model) and scavenging from other projects, neither of which is dominant at Google. Fortunately, these high quality datasets are open source, so they are free to use.

Directly Competing With Open Source Is a Losing Proposition

This recent progress has direct, immediate implications for our business strategy. Who would pay for a Google product with usage restrictions if there is a free, high quality alternative without them?

And we should not expect to be able to catch up. The modern internet runs on open source for a reason. Open source has some significant advantages that we cannot replicate.

We need them more than they need us

Keeping our technology secret was always a tenuous proposition. Google researchers are leaving for other companies on a regular cadence, so we can assume they know everything we know, and will continue to for as long as that pipeline is open.

But holding on to a competitive advantage in technology becomes even harder now that cutting edge research in LLMs is affordable. Research institutions all over the world are building on each other’s work, exploring the solution space in a breadth-first way that far outstrips our own capacity. We can try to hold tightly to our secrets while outside innovation dilutes their value, or we can try to learn from each other.

Individuals are not constrained by licenses to the same degree as corporations

Much of this innovation is happening on top of the leaked model weights from Meta. While this will inevitably change as truly open models get better, the point is that they don’t have to wait. The legal cover afforded by “personal use” and the impracticality of prosecuting individuals means that individuals are getting access to these technologies while they are hot.

Being your own customer means you understand the use case

Browsing through the models that people are creating in the image generation space, there is a vast outpouring of creativity, from anime generators to HDR landscapes. These models are used and created by people who are deeply immersed in their particular subgenre, lending a depth of knowledge and empathy we cannot hope to match.

Owning the Ecosystem: Letting Open Source Work for Us

Paradoxically, the one clear winner in all of this is Meta. Because the leaked model was theirs, they have effectively garnered an entire planet's worth of free labor. Since most open source innovation is happening on top of their architecture, there is nothing stopping them from directly incorporating it into their products.

The value of owning the ecosystem cannot be overstated. Google itself has successfully used this paradigm in its open source offerings, like Chrome and Android. By owning the platform where innovation happens, Google cements itself as a thought leader and direction-setter, earning the ability to shape the narrative on ideas that are larger than itself.

The more tightly we control our models, the more attractive we make open alternatives. Google and OpenAI have both gravitated defensively toward release patterns that allow them to retain tight control over how their models are used. But this control is a fiction. Anyone seeking to use LLMs for unsanctioned purposes can simply take their pick of the freely available models.

Google should establish itself a leader in the open source community, taking the lead by cooperating with, rather than ignoring, the broader conversation. This probably means taking some uncomfortable steps, like publishing the model weights for small ULM variants. This necessarily means relinquishing some control over our models. But this compromise is inevitable. We cannot hope to both drive innovation and control it.

Epilogue: What about OpenAI?

All this talk of open source can feel unfair given OpenAI’s current closed policy. Why do we have to share, if they won’t? But the fact of the matter is, we are already sharing everything with them in the form of the steady flow of poached senior researchers. Until we stem that tide, secrecy is a moot point.

And in the end, OpenAI doesn’t matter. They are making the same mistakes we are in their posture relative to open source, and their ability to maintain an edge is necessarily in question. Open source alternatives can and will eventually eclipse them unless they change their stance. In this respect, at least, we can make the first move.

The Timeline

Feb 24, 2023 - LLaMA is Launched

Meta launches LLaMA, open sourcing the code, but not the weights. At this point, LLaMA is not instruction or conversation tuned. Like many current models, it is a relatively small model (available at 7B, 13B, 33B, and 65B parameters) that has been trained for a relatively large amount of time, and is therefore quite capable relative to its size.

March 3, 2023 - The Inevitable Happens

Within a week, LLaMA is leaked to the public. The impact on the community cannot be overstated. Existing licenses prevent it from being used for commercial purposes, but suddenly anyone is able to experiment. From this point forward, innovations come hard and fast.

March 12, 2023 - Language models on a Toaster

A little over a week later, Artem Andreenko gets the model working on a Raspberry Pi. At this point the model runs too slowly to be practical because the weights must be paged in and out of memory. Nonetheless, this sets the stage for an onslaught of minification efforts.

March 13, 2023 - Fine Tuning on a Laptop

The next day, Stanford releases Alpaca, which adds instruction tuning to LLaMA. More important than the actual weights, however, was Eric Wang’s alpaca-lora repo, which used low rank fine-tuning to do this training “within hours on a single RTX 4090”.

Suddenly, anyone could fine-tune the model to do anything, kicking off a race to the bottom on low-budget fine-tuning projects. Papers proudly describe their total spend of a few hundred dollars. What’s more, the low rank updates can be distributed easily and separately from the original weights, making them independent of the original license from Meta. Anyone can share and apply them.

March 18, 2023 - Now It’s Fast

Georgi Gerganov uses 4 bit quantization to run LLaMA on a MacBook CPU. It is the first “no GPU” solution that is fast enough to be practical.

March 19, 2023 - A 13B model achieves “parity” with Bard

The next day, a cross-university collaboration releases Vicuna, and uses GPT-4-powered eval to provide qualitative comparisons of model outputs. While the evaluation method is suspect, the model is materially better than earlier variants. Training Cost: $300.

Notably, they were able to use data from ChatGPT while circumventing restrictions on its API - They simply sampled examples of “impressive” ChatGPT dialogue posted on sites like ShareGPT.

March 25, 2023 - Choose Your Own Model

Nomic creates GPT4All, which is both a model and, more importantly, an ecosystem. For the first time, we see models (including Vicuna) being gathered together in one place. Training Cost: $100.

March 28, 2023 - Open Source GPT-3

Cerebras (not to be confused with our own Cerebra) trains the GPT-3 architecture using the optimal compute schedule implied by Chinchilla, and the optimal scaling implied by μ-parameterization. This outperforms existing GPT-3 clones by a wide margin, and represents the first confirmed use of μ-parameterization “in the wild”. These models are trained from scratch, meaning the community is no longer dependent on LLaMA.

March 28, 2023 - Multimodal Training in One Hour

Using a novel Parameter Efficient Fine Tuning (PEFT) technique, LLaMA-Adapter introduces instruction tuning and multimodality in one hour of training. Impressively, they do so with just 1.2M learnable parameters. The model achieves a new SOTA on multimodal ScienceQA.

April 3, 2023 - Real Humans Can’t Tell the Difference Between a 13B Open Model and ChatGPT

Berkeley launches Koala, a dialogue model trained entirely using freely available data.

They take the crucial step of measuring real human preferences between their model and ChatGPT. While ChatGPT still holds a slight edge, more than 50% of the time users either prefer Koala or have no preference. Training Cost: $100.

April 15, 2023 - Open Source RLHF at ChatGPT Levels

Open Assistant launches a model and, more importantly, a dataset for Alignment via RLHF. Their model is close (48.3% vs. 51.7%) to ChatGPT in terms of human preference. In addition to LLaMA, they show that this dataset can be applied to Pythia-12B, giving people the option to use a fully open stack to run the model. Moreover, because the dataset is publicly available, it takes RLHF from unachievable to cheap and easy for small experimenters.

谷歌内部认为在AI军备竞赛中没有护城河