Libre ChatGPT for libre user?

Hello everyone.

Please tell me which of the existing models of artificial intelligence (I don’t know why such a silly name and where the intelligence comes from) is free and uses AGPL or GPL license and is best suited for everyday use?

3 Likes

The concept of Libre ChatGPT is likely a flawed concept in my opinion. It is most probable that anyone who offers you this concept is a liar.

To create a ChatGPT, the method is to use a very large number of graphics cards with a very large amount of processing power to distill a very large amount of text data until it becomes artificial digital neurons. If you don’t know what is, imagine that I have three baseball bats. One of them is wooden, one of them is metal, and one is plastic. Suppose that I hit you repeatedly with plastic one, but never hit you with the other two baseball bats. Your brain will form neurons in this situation that anticipate you being hit when you see me holding a plastic baseball bat. The learning happens when the neurons form.

If you then copy those neurons and publish them onto the internet, along with some small code for what a neuron is to tell people how to use what you learned, you could then declare that the code is GPL licensed and that anyone can use it. You can, apparently, do that even without really giving anyone the guide on how much it hurt to be hit with a plastic baseball bat or how you processed that information to form the idea/neurons designating that you should give plastic more attention than metal or wood when you see a bat.

Because you can have “libre” code reading and executing “non libre” neurons (“weights”), the original moral principles for why it was a good idea to use “libre” technology might still end up being violated. Because, in my opinion, one of the goals is that the user should be free to change how the technology works. But if you are given a giant dump of Mark Zuckerberg’s brain, and a small piece of “free” code that you can use to run Mark Zuckerberg’s brain or not, but you cannot meaningfully change Mark Zuckerberg’s brain because you don’t have the resources, compute power, data, and other information used to grow that brain, then the idea of referring to this as “libre” is essentially just a feel-good thing of lying to yourself while you become an extension of Mark Zuckerberg’s sovereign will upon the Earth (in my opinion).

Because of this technology limitation and because it is currently so extremely expensive to “grow your own ChatGPT,” it is very likely that as a result simply by nature of pragmatism users are going to fall into one of two camps:

  1. Those who download “models” with no knowledge of how to create them, and run them anyway, and tell themselves that this is “libre” to feel good about it because the moral principles are not their concern (they want pragmatic accomplishment)
  2. Those who do not download “models” with no knowledge of how to create them, and as a result in a self-serving way to avoid dwelling on the problem are likely to speak in euphemisms and tell you that ChatGPT is evil, or its a plagiarist fraud, and it’s not worth dealing with

Obviously there is a third group of people who have access to entire rooms filled with GPUs, and lots and lots of money, and those people make LLMs out of nothing but standard computers, graphics card hardware, and maybe the contents of Wikipedia or something. I’m sure those people exist. But they’re so busy getting rich that I doubt they would be concerned with “libre” technology or anything like that, and so you’re probably not going to find them here.

3 Likes

None of the large language models (ChatGPT, Gemini, Deepseek, Copilot etc.) are in that category of opennes. Original question also is a bit too simple and should be more granular: do you care if the model use is under that lisence or also that the data used to train it is open - and what about the system it’s offered in, or do you want to know how it’s weighted etc.

Actually, what is a good term to describe a totally open AI is a bit in the air. See: Artificial Intelligence and Data in Open Source There was also a good FOSDEM talk that had a simple slide, see: How-to: Installing AI to L5 and running it locally offline with ollama.ai - #6 by JR-Fi

If you mean, which of the medium to small models - that you can use on your own server or device - you can use (and control), that fit to your description, then you actually do have some ok-ish options (some are better at being totally open than others - “open ai” is used a lot as a marketing term, which covers only parts of the total “libre” concept). The best source for them is probably Linux Foundations github: Open Model Initiative · GitHub (related to https://lfaidata.foundation/)

2 Likes

See also:
https://www.fsf.org/news/fsf-is-working-on-freedom-in-machine-learning-applications

3 Likes

GPL:

AGPL:

3 Likes

venice.ai :wink:

1 Like

I like to use the “Dolphin” models at Ollama’s website. The Dolphin models are uncensored.

You can download these models locally and run them on your own computer, thus preserving privacy. The 8b models run fine on my Librem Mini.

If you want to run on a Librem 5 though, you probably have to use a “tiny” model like “Tiny Dolphin”. And even that would probably run pretty slow.

1 Like

Libre GPT depends on a few libre components:

  • Training source code.
  • Inference source code.
  • Data set.
  • The final model.

Many models claim to be “open source”, such as Meta’s (Facebook’s) Llama. They don’t use an established open source license, nor follow the practices of OSI (or FSF) approved licenses. There is a proliferation of models such as these, many of which are non-commercial (NC) or have other restrictions, such as not critisizing a particular government.

Then there’s another bunch of models that claim to be “open source” and use a truly open source license, but still really aren’t libre. An example of this would be Alibaba’s Qwen models. These binaries are released under the Apache 2.0 license. Another example is Deepseek, which uses the MIT license.

These models aren’t libre though, in that the data sets that were used to train the models aren’t under a libre license or even made available at all. They are almost certainly made from non-libre sources.

In many cases, the source code to train these models is made available under libre licenses. It really is the data sets that are problematic.

There is lots of inference code available under libre licenses that can run pretty much any model. One of the most popular that works well is Ollama:

There are a few rare models that actually release their data sets. For example, OpenCoder, which states:

“We provide not just the final models, but also the reproducible training data, the complete data processing pipeline, rigorous experimental ablation results, and detailed training protocols for open scientific research.”

It is largely built from Huggingface’s FineWeb, which is built from CommonCrawl. Though CommonCrawl isn’t perhaps 100% libre either:

IBM has built a fine-tuned version of FineWeb data set called GneissWeb, under the Apache 2.0 license, available here:

There are some models from Allen AI, such as OLMo, which claim to be fully libre:

“OLMo 2 is a family of fully-open language models, developed start-to-finish with open and accessible training data, open-source training code, reproducible training recipes, transparent evaluations, intermediate checkpoints, and more.”

I don’t know of any models that are under the AGPL or GPL license though, the closest ones will use MIT or Apache. There is likely some inference code that is under (A)GPL, but not the models themselves that are built from (A)GPL data sets.

In sum, the closest you’ll probably get to a libre GPT, is running Ollama with OpenCoder.

If there are more libre models with libre data sets, I would love to hear about them. :slight_smile:

Happy hacking,

-Jeff

1 Like

I personally use DuckDuckGo’s https://duck.ai and select between one of the available AI models they offer.

They recently added the ability to save/revisit chats, or deleting old chat sessions. There is a daily message limit (maybe 50? I can’t remember), but I find it to be a reasonable number for my needs. In general, DuckDuckGo does collect metrics on query data, but this data is anonymized from user accounts. (See their privacy policy.) I am unaware of additional pros/cons of using this service as compared to other LLM sites.

2 Likes

The superficial “available to use” is not the definition for “free”, “open” or “libre”, but that is what the services are teaching us - convenience is a powerful thing. The four or five dimensions that should be covered (model, data, system source, weights and overall service/client) should be covered. It’s another question what’s the risk or damage, cost and level of own control when using it. It would be nice, if some [unnamed] linux centric entity would offer AI services. (hint, hint).

1 Like

Here at least the client is libre:

1 Like

related http://elevenfreedoms.org/

2 Likes

Maybe a more reasonable approach to address AI privacy and security challenges would be FHE (Full Homomorphic Encryption) - have a look at this post on HackRead:

2 Likes

Hmmm. A bit over my head but, heh, I learnt a little about Homomorphic Encryption.

It seems to me also that while that may address the practical impacts on privacy, it may not address the moral impacts. In other words, for example, if I flatly declined to give my consent for data about me to be used in order to train an AI, FHE may address the practical aspects of that (it keeps data about me secret from the cloud server that trains and hosts the AI model) but still does not respect my wishes. It could even be said that it encourages the disrespecting of my wishes by making it more practical to do so.
:thinking:

2 Likes

I think there are some decent free to download, free to use, free to tune models. Even on CPU you can run some and if you have GPU with enough ram you can run fast enough more of them. Try them with Ollama and Open webui.

So far I’m not aware of large model with fully published training data. This may change with https://openeurollm.eu/

1 Like

Alpaca is a GPLv3 licensed client for ollama. It is not a ChatGPT client.

ollama is Free (MIT license). And it is easy to interact with ollama via the terminal. Huggingface has a lot of Free and non-Free models that one can run on ollama.

Honza, there is other work being done on this. See Releasing the largest multilingual open pretraining dataset .

1 Like

I think the best way is to install and use AI locally on our personal PC (like I did it!) or on a rented cloud powerful server! My AI is simple, 7B, but enough for daily simple tasks! It’s working totally local and protected by firewall and VPN! I think that’s the best compromise between power and privacy and security! Otherwise, for more deep/complex/multimedia tasks, I use venice.ai (although I’m installing image and music AI generators too on my PC)!

1 Like