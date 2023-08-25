A few months ago, Meta seemingly made a gift to the world: It released its artificial intelligence system, LLaMa-2, as open-source software, allowing researchers and others to build off the technology entirely for free.

LLaMA-2 remains the highest-profile open-source AI to come from the increasingly heated Silicon Valley arms race over AI during the past year, but it has some company. In addition to Meta's offering, Hugging Face, a Brooklyn-based startup, has made its AI open source—allowing its code to be publicly studied, altered and distributed—as has BigScience’s BLOOM model.

But while open-source technology has spurred numerous innovations in the past, a new paper in the Social Science Research Network from a team of leading AI researchers argues that some open-source AI programs don’t deserve the same applause. This includes LLaMA-2—even after its most recent update, a code-writing tool released earlier this week.

Here’s a conversation with Sarah Myers West, one of the paper's authors and managing director of the AI Now Institute, to explain why these programs don't come close to truly being open.

Within the AI boom, open source is certainly a buzzy phrase. What’s it mean exactly—and what’re its origins?

If you look back to the 1990s and the emergence of open-source software, it contains a set of core principles: that the software would be transparent, that it would be documented, that you would have access to the source code and be able to tinker with it and that the licensing provisions enable widespread use without charging for it.

There’s a long history of big businesses offering such software. A good example is when IBM invested a billion dollars back in the 90s into the open-source operating system, Linux, and it was explicitly because they were worried about Microsoft's dominance in operating systems. Likewise, Google open-sourced Android in hopes that it would help it compete with Apple’s iPhone.

When you take that sort of original definition of open-source software you have a lot of difficulty applying it to AI. For example, the Open Source Initiative, which helps develop standards for open-source licensing, has started a process of trying to come up with a new open-source definition for AI because it doesn't really map very cleanly onto the kinds of systems that are in wide use today.

And why is that?

The systems we’re calling AI look nothing like traditional open-source software. They require significant infrastructure produced from distinctly different practices than the ones familiar to traditional software developers. This is compounded by the uncomfortable fact that the term AI itself is not clearly defined.

For over 70 years now, AI has been applied to a wide range of technologies, and it currently serves as much as a marketing label as it does technical descriptor. This makes a lot of room for confusion—and for self-interested and specious applications of the term ‘open’ to systems that in no way resemble what we’ve generally understood to be open-source software.



In this context, we see companies claiming openness even as key features of their AI systems are not being disclosed to the public. If these were truly open-source programs, we’d be able to see what they’re built on: the data sets that the companies used to train these AIs. But, over the past few months, we’ve seen a wave of AI systems described as ‘open’ in an attempt at branding, even though the authors and stewards of these systems provide little meaningful access or transparency about the systems' foundations.

Meta’s LLaMA-2 falls into this category of faux open source?

The community license for LLaMA-2 isn’t recognized by the Open Source Initiative, and it arguably fails to meet key criteria for the group’s definition of open source, which is a standard in the tech industry. It includes provisions around commercial use—true open source software wouldn’t contain such provisions—and Meta fails to provide meaningful transparency, particularly regarding the data used to train the system.

Open-source software is meant to spur innovation. But you’re doubtful that such software could meaningfully transform the AI industry.

Due to the vast difference between large AI systems and traditional software, even the most maximally ‘open’ AI efforts won’t ensure a level playing field. AI requires so much more software, hardware and human labor to develop than other software. Businesses have those types of resources—ordinary researchers often don't. Which leaves the power with the big companies.

As we make clear in our paper, the resources required to create large AI systems and deploy them at scale are concentrated within the hands of a few tech companies. ‘Openness’ in any form does not ensure competition or democratization. The dominance of a few companies is so ingrained in artificial intelligence right now that the mere fact of the existence of open source is not going to provide escape velocity from that dynamic of deep concentration.

So Washington will likely still need to create additional rules governing AI?

Regulators will need to bear all of this in mind.

This interview has been edited and condensed for clarity from several conversations.