Libre ChatGPT for libre user?

jam · April 18, 2025, 4:40pm

Hello everyone.

Please tell me which of the existing models of artificial intelligence (I don’t know why such a silly name and where the intelligence comes from) is free and uses AGPL or GPL license and is best suited for everyday use?

Dlonk · April 18, 2025, 4:55pm

The concept of Libre ChatGPT is likely a flawed concept in my opinion. It is most probable that anyone who offers you this concept is a liar.

To create a ChatGPT, the method is to use a very large number of graphics cards with a very large amount of processing power to distill a very large amount of text data until it becomes artificial digital neurons. If you don’t know what is, imagine that I have three baseball bats. One of them is wooden, one of them is metal, and one is plastic. Suppose that I hit you repeatedly with plastic one, but never hit you with the other two baseball bats. Your brain will form neurons in this situation that anticipate you being hit when you see me holding a plastic baseball bat. The learning happens when the neurons form.

If you then copy those neurons and publish them onto the internet, along with some small code for what a neuron is to tell people how to use what you learned, you could then declare that the code is GPL licensed and that anyone can use it. You can, apparently, do that even without really giving anyone the guide on how much it hurt to be hit with a plastic baseball bat or how you processed that information to form the idea/neurons designating that you should give plastic more attention than metal or wood when you see a bat.

Because you can have “libre” code reading and executing “non libre” neurons (“weights”), the original moral principles for why it was a good idea to use “libre” technology might still end up being violated. Because, in my opinion, one of the goals is that the user should be free to change how the technology works. But if you are given a giant dump of Mark Zuckerberg’s brain, and a small piece of “free” code that you can use to run Mark Zuckerberg’s brain or not, but you cannot meaningfully change Mark Zuckerberg’s brain because you don’t have the resources, compute power, data, and other information used to grow that brain, then the idea of referring to this as “libre” is essentially just a feel-good thing of lying to yourself while you become an extension of Mark Zuckerberg’s sovereign will upon the Earth (in my opinion).

Because of this technology limitation and because it is currently so extremely expensive to “grow your own ChatGPT,” it is very likely that as a result simply by nature of pragmatism users are going to fall into one of two camps:

Those who download “models” with no knowledge of how to create them, and run them anyway, and tell themselves that this is “libre” to feel good about it because the moral principles are not their concern (they want pragmatic accomplishment)
Those who do not download “models” with no knowledge of how to create them, and as a result in a self-serving way to avoid dwelling on the problem are likely to speak in euphemisms and tell you that ChatGPT is evil, or its a plagiarist fraud, and it’s not worth dealing with

Obviously there is a third group of people who have access to entire rooms filled with GPUs, and lots and lots of money, and those people make LLMs out of nothing but standard computers, graphics card hardware, and maybe the contents of Wikipedia or something. I’m sure those people exist. But they’re so busy getting rich that I doubt they would be concerned with “libre” technology or anything like that, and so you’re probably not going to find them here.

JR-Fi · April 18, 2025, 5:01pm

None of the large language models (ChatGPT, Gemini, Deepseek, Copilot etc.) are in that category of opennes. Original question also is a bit too simple and should be more granular: do you care if the model use is under that lisence or also that the data used to train it is open - and what about the system it’s offered in, or do you want to know how it’s weighted etc.

Actually, what is a good term to describe a totally open AI is a bit in the air. See: Artificial Intelligence and Data in Open Source There was also a good FOSDEM talk that had a simple slide, see: How-to: Installing AI to L5 and running it locally offline with ollama.ai - #6 by JR-Fi

If you mean, which of the medium to small models - that you can use on your own server or device - you can use (and control), that fit to your description, then you actually do have some ok-ish options (some are better at being totally open than others - “open ai” is used a lot as a marketing term, which covers only parts of the total “libre” concept). The best source for them is probably Linux Foundations github: Open Model Initiative · GitHub (related to https://lfaidata.foundation/)

fsflover · April 18, 2025, 5:12pm

See also:
https://www.fsf.org/news/fsf-is-working-on-freedom-in-machine-learning-applications

FranklyFlawless · April 18, 2025, 11:58pm

GPL:

AGPL:

veleno · April 19, 2025, 9:18pm

venice.ai

leetaur · April 20, 2025, 2:35pm

I like to use the “Dolphin” models at Ollama’s website. The Dolphin models are uncensored.

You can download these models locally and run them on your own computer, thus preserving privacy. The 8b models run fine on my Librem Mini.

If you want to run on a Librem 5 though, you probably have to use a “tiny” model like “Tiny Dolphin”. And even that would probably run pretty slow.

jebba · April 20, 2025, 2:45pm

Libre GPT depends on a few libre components:

Training source code.
Inference source code.
Data set.
The final model.

Many models claim to be “open source”, such as Meta’s (Facebook’s) Llama. They don’t use an established open source license, nor follow the practices of OSI (or FSF) approved licenses. There is a proliferation of models such as these, many of which are non-commercial (NC) or have other restrictions, such as not critisizing a particular government.

Then there’s another bunch of models that claim to be “open source” and use a truly open source license, but still really aren’t libre. An example of this would be Alibaba’s Qwen models. These binaries are released under the Apache 2.0 license. Another example is Deepseek, which uses the MIT license.

These models aren’t libre though, in that the data sets that were used to train the models aren’t under a libre license or even made available at all. They are almost certainly made from non-libre sources.

In many cases, the source code to train these models is made available under libre licenses. It really is the data sets that are problematic.

There is lots of inference code available under libre licenses that can run pretty much any model. One of the most popular that works well is Ollama:

https://ollama.com/

There are a few rare models that actually release their data sets. For example, OpenCoder, which states:

“We provide not just the final models, but also the reproducible training data, the complete data processing pipeline, rigorous experimental ablation results, and detailed training protocols for open scientific research.”

https://opencoder-llm.github.io/

It is largely built from Huggingface’s FineWeb, which is built from CommonCrawl. Though CommonCrawl isn’t perhaps 100% libre either:

Common Crawl - Terms of Use

IBM has built a fine-tuned version of FineWeb data set called GneissWeb, under the Apache 2.0 license, available here:

ibm-granite/GneissWeb · Datasets at Hugging Face

There are some models from Allen AI, such as OLMo, which claim to be fully libre:

“OLMo 2 is a family of fully-open language models, developed start-to-finish with open and accessible training data, open-source training code, reproducible training recipes, transparent evaluations, intermediate checkpoints, and more.”

OLMo from Ai2

I don’t know of any models that are under the AGPL or GPL license though, the closest ones will use MIT or Apache. There is likely some inference code that is under (A)GPL, but not the models themselves that are built from (A)GPL data sets.

In sum, the closest you’ll probably get to a libre GPT, is running Ollama with OpenCoder.

If there are more libre models with libre data sets, I would love to hear about them.

Happy hacking,

-Jeff

JCS · April 22, 2025, 2:53pm

I personally use DuckDuckGo’s https://duck.ai and select between one of the available AI models they offer.

They recently added the ability to save/revisit chats, or deleting old chat sessions. There is a daily message limit (maybe 50? I can’t remember), but I find it to be a reasonable number for my needs. In general, DuckDuckGo does collect metrics on query data, but this data is anonymized from user accounts. (See their privacy policy.) I am unaware of additional pros/cons of using this service as compared to other LLM sites.

JR-Fi · April 22, 2025, 3:09pm

The superficial “available to use” is not the definition for “free”, “open” or “libre”, but that is what the services are teaching us - convenience is a powerful thing. The four or five dimensions that should be covered (model, data, system source, weights and overall service/client) should be covered. It’s another question what’s the risk or damage, cost and level of own control when using it. It would be nice, if some [unnamed] linux centric entity would offer AI services. (hint, hint).

magician · April 23, 2025, 2:38am

Here at least the client is libre:

librem5user1o1 · April 23, 2025, 7:33pm

related http://elevenfreedoms.org/

TiX0 · April 24, 2025, 2:13am

Maybe a more reasonable approach to address AI privacy and security challenges would be FHE (Full Homomorphic Encryption) - have a look at this post on HackRead:

irvinewade · April 24, 2025, 8:43am

Hmmm. A bit over my head but, heh, I learnt a little about Homomorphic Encryption.

It seems to me also that while that may address the practical impacts on privacy, it may not address the moral impacts. In other words, for example, if I flatly declined to give my consent for data about me to be used in order to train an AI, FHE may address the practical aspects of that (it keeps data about me secret from the cloud server that trains and hosts the AI model) but still does not respect my wishes. It could even be said that it encourages the disrespecting of my wishes by making it more practical to do so.

Honza · April 24, 2025, 2:47pm

I think there are some decent free to download, free to use, free to tune models. Even on CPU you can run some and if you have GPU with enough ram you can run fast enough more of them. Try them with Ollama and Open webui.

So far I’m not aware of large model with fully published training data. This may change with https://openeurollm.eu/

Privacy2 · April 24, 2025, 3:38pm

Alpaca is a GPLv3 licensed client for ollama. It is not a ChatGPT client.

ollama is Free (MIT license). And it is easy to interact with ollama via the terminal. Huggingface has a lot of Free and non-Free models that one can run on ollama.

Honza, there is other work being done on this. See Releasing the largest multilingual open pretraining dataset .

veleno · April 24, 2025, 8:50pm

I think the best way is to install and use AI locally on our personal PC (like I did it!) or on a rented cloud powerful server! My AI is simple, 7B, but enough for daily simple tasks! It’s working totally local and protected by firewall and VPN! I think that’s the best compromise between power and privacy and security! Otherwise, for more deep/complex/multimedia tasks, I use venice.ai (although I’m installing image and music AI generators too on my PC)!

leetaur · April 28, 2025, 4:16pm

If you are going to use Ollama to interface with open-source models (again, always locally), I also recommend using Anything LLM.

Using Anything LLM, you can create a Workspace that uses Ollama as your LLM Provider, and use your downloaded model.

AnythingLLM gives you the ability to attach documents (PDFs, text docs, etc) or attach web pages that extends the model’s knowledge. An example is that the model I was interacting with (Dolphin Mistral) knew nothing about the Gemini protocol. But once I attached the technical documentation (webpages, linked below for the curious), the model was able to answer questions on it.

https://geminiprotocol.net/docs/

Lliure · April 28, 2025, 6:53pm

Everything I can find on Lattica is just press releases over press releases.

Not a single independent piece so far (or any other kind of sign that Lattica might be on to something). That’s a pretty telling ratio.
Without any outside review, their product could be anything: an outright scam, a fundamental AI privacy breakthrough, or a nothingburger.

irvinewade · April 28, 2025, 10:56pm

I thought the same. The media article just looked like a reheat of a Lattica Press Release (such is the media landscape these days) and the information available from Lattica itself is quite vague.

A workable FHE system would be quite significant. However it wasn’t even clear to me precisely what role Lattica’s products play in the overall system.

I guess it’s one to file under “interesting but let’s see whether it amounts to anything”.

For example, for it to amount to anything, collectors of information have to care enough in the first place to want to keep the information secret. I don’t see a lot of sign, in my country at least, that either companies or governments even care. If you don’t have to or want to keep information secret then you can just stuff it into any computation without worrying about encryption.