theBloke's Impact: Quantized Models and Accessibility
For months, theBloke has been diligently quantizing models and making them available on HuggingFace. In the process, a thriving ecosystem has emerged from which the community benefits.
As Llama models gained traction, theBloke took it upon himself to provide quantized versions for Llama.cpp, all of which are freely available.
Amidst the changing software landscape, theBloke became the unofficial go-to resource for downloading quantized models. It excelled at systematically curating and documenting this growing collection. Its work stood out not only in quantity, but also in the reliability and speed with which new finetunings were made available, as well as in the quality of documentation and the completeness of implementation - nowhere else can you find a complete set of all quantization variants for every available model. Thus, theBloke set a benchmark for accessibility.
Shifting Focus to Llama and Llama 2
In Unveiling of Llama 2 i wrote about my concern that with the release of powerful lama 2 community efforts would even more concentrate on a not really free model and it#s descendants - and that’s exactly what happened. However, the increase in new fine-tuned models, especially for Llama 2, has led theBloke to focus mainly on quantizing Llama and Llama 2 models. This has resulted in Falcon and other free models such as MPT rarely being found among the quantized offerings.
theBloke is a friendly and helpful person who responded promptly to inquiries about specific Falcon models. His willingness to help is obvious, and he does a great job, but due to the growing demand and the increasing amount of quantized models, the waiting time for specific requests has increased. Although he encouraged users to express their model preferences, i found myslef at the point to start quantizing my falcons myself, and to qunatize them on a more regulr basis - which i found to be great, but also tedious, as it is a long proces.
A New Alignment
As for me, I believe that “Falcon 40b” is still the most powerful truly free AI model available, and it should receive more support.
So in this situation my decision matured: To follow in the tracks of theBloke, but with a different focus. The concept is to specialize exclusively in the quantization of free models and to create a separate area for these open source gems. It’s an initiative born out of admiration for theBloke’s work and the belief that more emphasis needs to be placed on quantizing free models.
I’m quite limited in my equipment at the moment, but with what I have available I think I can provide at least a solid bunch of models each week. That’s far less than what theBloke is doing at the moment, but it’s still a significant improvement on the situation with the free model spin-offs.