Navigating the Maze of Open Source Licensing in AI: A Quest for True OSI Compliance

Navigating the Maze of Open Source Licensing in AI: A Quest for True OSI Compliance

Posted on 19.09.2023 by maddes8cht

In the rapidly evolving landscape of artificial intelligence, where innovation knows no bounds, one crucial factor often remains hidden in the shadows—the world of open source licensing. As we embark on a quest to champion true OSI-compliant licenses, let’s navigate the intricate web of AI models, licenses, and ethics that define this journey.

The Llama Phenomenon

Our quest begins with Llama, a groundbreaking language model unleashed by Meta in February 2023. Originally intended for researchers, Llama swiftly found its way into the hands of the curious and the passionate through leaked torrent downloads and URLs. However, the legality of these leaked variants raised eyebrows within the AI community. Llama Llama’s allure lies in its ability to adapt to specific domains, a unique feature setting it apart from ChatGPT. Moreover, its capability to run on personal hardware makes it suitable for handling sensitive data. Building upon Llama’s foundation, models like Vicuna emerged as Instruction-Following Models. As not everyone possesses an A100 graphics card with 80GB of RAM, open-source solutions were developed to reduce memory usage and boost computational speed. The result? An ecosystem teeming with Llama descendants, each finely tuned in various directions, yet shrouded in legal ambiguity.

The Emergence of True Open Source Models

As the AI community grappled with the implications of Llama and its progeny, a parallel evolution was underway—an ecosystem of models that could replace Llama. Among these models, some have emerged as champions of true OSI compliance:

RWKV-LM: This model, available under the Apache 2 license, offers several smaller sized options, up to 7B and 14B size.
OpenLlama: An ambitious endeavor to train a Llama-compatible model from the ground up. While currently available in smaller sizes, it holds the promise of expanding its reach.
MPT: An independent model, offered in 7B and 30B sizes, under creative commons licenses (©CC-By-SA-3.0 and CC-By-NC-SA-4.0).
Falcon: Originating from the United Arab Emirates, Falcon was initially released under its own license but soon transitioned to the Apache 2 license. It is available in sizes of 7B and 40B. In the realm of “true” free models, Falcon shines as a powerful contender.

Falcon

Llama 2: The Open Source Conundrum

In July 2023, Meta unveiled Llama 2, the next generation of their language model. This release marked a pivotal moment in the AI landscape. Llama 2, an open-source model, offered its vast potential to both researchers and commercial users, free of charge. Ranging from 7B to a staggering 70B parameters, these models were trained on 2 trillion tokens, and doubled the context length of their predecessor.

However, this release stirred debates about whether Llama 2 truly qualifies as open source. Critics argue that Meta’s licensing terms contain certain restrictions incongruent with the open-source definition. The Open Source Initiative (OSI) underscored that Llama 2 should not be classified as open source, emphasizing that Meta had misconstrued the term “open source” by adding conditions and limitations.

The license agreement for Llama 2, published on July 18, 2023, grants users a limited license to use, reproduce, distribute, and modify Llama materials. Yet, it imposes commercial conditions that come into play when monthly active users exceed 700 million. Moreover, restrictions limit the use of Llama materials for improving other large language models.

GGML and Llama.cpp: Bridging the Gap

Parallel to the Llama saga, a project named ‘Llama.cpp’ emerged on GitHub. This project focused on quantization and inference for Llama models and later expanded its scope to encompass other “free” models suitable for consumer hardware. Hugging Face became a hub for continuously releasing new fine-tuned models based on this technology. The interest in these developments grew exponentially.

The symbiotic relationship between Llama.cpp and Hugging Face has created a substantial pool of fine-tuned models accessible to consumer PCs. An increasing number of open-source software projects now harness this software and these models in novel and diverse contexts.

open source

Embarking on the Quest

As we delve deeper into the intricacies of open source licensing in the realm of AI, our quest to champion true OSI compliance becomes more significant than ever. It’s a journey that calls upon us to explore the ethics of AI, uphold the values of transparency, accessibility, and freedom, and engage with a community dedicated to pushing the boundaries of knowledge.

Join me on this quest. Whether you’re a developer, a researcher, an AI enthusiast, or simply an advocate for open source principles, there’s a role for you in this adventure. Together, we can redefine the ethics of AI, shape a future where knowledge remains a shared resource, and champion the cause of true OSI compliance.

Stay connected with the quest by following me on Hugging Face, GitHub, and Stack Overflow. Let’s embark on