How-to: Installing AI to L5 and running it locally offline with ollama.ai

JR-Fi · February 4, 2024, 9:34am

Here is a little Sunday project for those who may be watching the video feeds of FOSDEM AI tracks (or any other track) or just want to try something fun (your definition of fun may vary) with open software that can be used in a secure and private manner (at least less riskier and controlled). This is AI for beginners and takes about 30mins (depending on how fast your internet is). [edit to add: you’ll need at least about 1.3GB space for this, app+model]

Preamble: AI is relative in the sense that there are many sizes. There is some connection between the size of the model and quality but it’s not absolute. The efficiency aspect related to size is important due to the limited hardware, computing power of L5 - or really any mobile device. Bigger the model, the more you need room in eMMC, memory and CPU to use it (because there’s no usable GPU or neural processor). But bigger is better only if you want a jack-of-all-trades (good, but master of none, as the saying goes). Using specific models that are developed for specific tasks, is much more efficient and they can be relatively small. And there are even some general models (“chatbots”/GPTs) that are ok - with some limitations. It’s amazing how many different models there are openly available and the multitude of variations. The L5 is by no means meant for this but let’s not let that discourage us, even if even the smaller models are slow with this hardware. Usability is a matter of what you need it for and if you can find (by trial and error) a suitable AI model - try to find the size that fits best (not perfect). And really, yes, let’s acknowledge that being able to run an AI completely on the client side, in offline, on L5, at any speed is a wonder.

Outline: Many here (those that are interested in the security, privacy and freedom that L5 affords us) are a bit vary of AI. Usually either because the AI applications are online-services, run by who-knows or known-entities-most-here-would-not-like-to-have-anything-to-do-with, or because setting up AI on your own on your own hardware seems way too complicated. I’m introducing here ridiculously simple solution to both challenges - and I can not take any credit of it. Ollama (ollama.ai) has been made as simple as possible to install and use and after loading a model to test, you can kill your web connection while you use it, if you want.

Installation: Two alternatives…

Automated, at console, simply: curl https://ollama.ai/install.sh | sh
Manual, see github (also has uninstallation instructions)

Use: First time, you need to load one of the available models (or an available variation that has “tags”, listed in a tab) presented at the library. There are instructions at the github site for creating more, editing your own, creating webservices and using APIs etc.
Let’s start ollama and get one of the smaller general models: ollama run tinyllama
[edit to add: just so you know, this is based on Meta’s work. An alternative small(ish) model available is dolphin-phi (ollama run dolphin-phi), which has it’s roots with M$ but has MIT licence - see this post]

It loads about 640MB (compared to models around 2GB, 4GB, 30GB or more) and starts a prompt with the model - ask something (like “hello” - which may give interesting answers). Note however that it may take several minutes to get an answer - kinda like the AI was thinking and painstakingly formulating an answer. It’s using 100% of all CPU cores and about 40% of memory (no noticeable heat in that time).

A bit more on that inside the hidden section.

An interesting command is “/set verbose” that shows statistics after each answer. Below you can see the whole answer to my greeting and how long it took for tinyllama model to create an answer in L5.

>>> /set verbose
Set 'verbose' mode.
>>> hello
Hi! My name is Emma, and I am thrilled to be part of your team. As an AI assistant, I bring a wealth
of knowledge and expertise to the table in various areas that can benefit your business operations. 
Here are some specific areas where I can assist you:

1. Customer Service: As a skilled customer support professional, I am well-versed in using natural 
language processing (NLP) tools to handle customer inquiries, resolving issues, and improving 
overall customer experience. By leveraging my expertise in this field, you can expect better 
customer service with more accurate information, quicker resolution times, and better outcomes.

2. Data Analysis: My extensive experience in data analysis and machine learning has enabled me to 
identify patterns, trends, and insights that can help you make informed decisions about your 
business. By leveraging this knowledge, I can provide actionable insights for optimization of 
operations, sales, and marketing campaigns.

3. Artificial Intelligence: As a certified deep learning expert, I am qualified to work with 
AI-based applications like chatbots, predictive analytics, and machine learning models. By using 
these tools, you can save time and improve decision making processes, leading to increased 
efficiency and better customer outcomes.

4. Artificial Intelligence: Machine vision is a particularly important area where I can contribute 
my expertise in data analytics, image processing, and computer vision. With this knowledge, I can 
help optimize operations, increase productivity, and improve safety at your facility or 
manufacturing plant.

If there's anything else that you would like me to assist with, please let me know. My goal is to 
provide value by utilizing my skills in various AI-based applications to enhance your business 
operations while ensuring customer satisfaction. Thank you for considering my application, and I 
hope to work with you soon!

total duration:       4m12.34526173s  [<-- time, how long it took to print the answer]
load duration:        1.651861ms
prompt eval count:    280 token(s)
prompt eval duration: 1m31.456934s
prompt eval rate:     3.06 tokens/s
eval count:           404 token(s)
eval duration:        2m40.874333s
eval rate:            2.51 tokens/s```

So, asking right and specific questions is key. There are some options for you to work with. I’m not going into all the settings, options or various models - perhaps this forum thread is a place where we can share what we find.

One more thing: To manage the installed models, you can delete them with ollama rm <name of model>. Various ollama commands can be seen in terminal with ollama --help and various model related commands come up at prompt with /? (like, ending an AI chat session is /bye). Since you control your copy of the model, you can trust not to leak your chat history with it but /set nohistory may also help (or just change model behavior).

Final words: ollama can be used on your desktop as well. But be aware that models gobble all available CPU of a mid-tier laptop (don’t worry, so far I’ve noticed these to be pretty stable and getting an answer just takes time - I got a couple of errors but in each case running the command again solved the situation), so using hardware with a good GPU is recommended. In any case, have fun - perhaps get a laugh when the AI hallucinates it’s answers or try one of the uncensored models.

Oh, and by the way: you now have an AI in you pocket, on your L5.

[edit to add: the list of available models (variations under “tags”), also - for desktops - there is the option for using docker containers (with CPU only, if you want)]

Lliure · February 4, 2024, 12:38pm

Posts like these are exactly what keeps me coming back to this forum.

Thank you for this thorough guide and the additional explanations. This guide is Wiki material, if you ask me.

Dlonk · February 4, 2024, 4:06pm

Rather than to execute the “easy” shell script that you linked, I went ahead and read through some of it for the sake of some idle weekend curiosity.

Do I understand correctly that if I run this “easy” install script on a PC that is running PureOS with nouveau drivers and an nVidia GPU, that the script will add proprietary non-free CUDA drivers to the otherwise free (as in free software) system, tainting the system? I did not actually try it, and I am not particularly familiar with nVidia CUDA drivers, but I am assuming because they come from nvidia.com according to the script that they are most likely closed source proprietary non-free software. [Edit: Or perhaps I am wrong and one of the things the script checks is whether proprietary nvidia drivers were already present? Can you point me to where such a check is taking place?]

My intuition from reading the script is that for a CPU-only use case, such as on the Librem 5, the non-free proprietary software will not be downloaded. But, given the substantial efforts that some free software communities go to in order to avoid proprietary software, having a code pathway close at hand within the script [running as sudo] that would download and install proprietary drivers if the script deemed them to be useful is notably a digression from the path that a GNU system like PureOS is normally required to follow (i.e., not recommending proprietary drivers) so I was curious if I correctly understood that this script includes such a digression from the principles of free software.

Edit: Just to reiterate, even if my understanding described above is correct, a user who would run this script on a Librem device that has no nVidia GPU would in theory not have the proprietary software packages added and would in theory remain on a free system. But it seems worth mentioning for people who might be using PureOS on custom hardware (i.e. not purchased from Purism) about the fact that the script would include the automatic download and installation of non-free proprietary software.

Edit 2: After reading the script more, I am increasingly convinced that the automatic download and installation of proprietary nvidia drivers probably only works on debian and ubuntu, and that on pureos it may be the case that it simply fails to do any of that because of a check against the OS name that would in theory fail for the name pureos. Maybe that would constrain the PureOS + nVidia theoretical configuration that some users might be using… so that ollama wouldn’t use the GPU at all, similar to the Librem 5?

JR-Fi · February 4, 2024, 7:32pm

I’ve not tested this on a desktop PureOS and you bring up a good point to clarify. All I can say is that the same easy script works on desktop (Ubuntu) and L5, but in L5 case it obviously does not install nvidia. The script seems to detect the presence of a nvidia gpu [edit to add: see the first image last line]. Hence, the models only use CPU for computing. Same thing with another laptop, which has intel gpu.

[For desktops, there is the option for using docker containers (with CPU only, if you want)]

weirdnerd · February 4, 2024, 10:32pm

I don’t think these models are free software. They do not respect the four freedoms to:

(0) Run the software for any purpose.
(1) Study the software and modify it.
(2) Redistribute the software.
(3) Redistribute modified copies.

Among other restrictions on freedom, there is the llama community license (https://github.com/facebookresearch/llama/blob/main/LICENSE). It tries to keep you from running the software for any purpose (freedom 0):

" v. You will not use the Llama Materials or any output or results of the
Llama Materials to improve any other large language model (excluding Llama 2 or
derivative works thereof)."

That this AI technology is nonfree is not surprising, considering that it is created by facebook, an enemy of software freedom and digital rights (including privacy).

JR-Fi · February 4, 2024, 11:17pm

I think that varies - completely open and free for any use is (as with a lot of software and services) a challenge and rarity. Some models have different licenses. Hopefully this will get better and some of the worst offenders get left behind (OpenAI and ChatGPT are very problemmatic). Regarding AI openness, one of the FOSDEM talks showed these slides, which add to your list in the AI context:

Here are a couple of links I found regarding lisences and AI. It pretty much boils down to that open models are usable but some may have limitations (usually the very large ones regarding commersial use). See: Top 10 List of Large Language Models in Open-Source | Deci and OpenRAIL: Towards open and responsible AI licensing frameworks and Licensing Machine Learning models — The Turing Way. There is also Dolly (Free Dolly: Introducing the World's First Truly Open Instruction-Tuned LLM), which allows also commercial use freely, but is not presently in ollama library.

[edit to add: ollama itself is MIT license]
[edit to add2: Microsoft’s research has also developed the phi model, which is small(ish) and has several derivatives like llama, and it uses the MIT license (see bottom) - free but from M$, pick your poison]
[edit to add3: noticed that also available as small(ish) version is StarCoder which has it’s own AI-relevant license]

Dlonk · February 6, 2024, 8:02pm

Does that make all of PureOS, and moreover the entire GNU system, a challenge and rarity?

Humanity did it anyway.

Christal · February 6, 2024, 8:55pm

JR-Fi
Hi JR-Fi,

you could train and run your finished (trained) A.I. on your Phone to. But you should train it on a Desktop or with Big-data or internet access on a Server-Farm or something.

Its about Training an A.I. is expensive, like feed and schooling a child for knowledge, if you copy its brain state later you have access on its thoughts and knowledge later. Like your mind is more useful later after some work for learning or discovering things, its like the A.I. too. And the daily growing Process is expansive. So run it not on your phone.

Like you would use a script or Program - with the knowledge of the Wikipedia, after Decades of spending time an polishing.

JR-Fi · February 6, 2024, 10:45pm

I’m not saying it’s a bad thing but putting it to perspective regarding all of software where our preferred type is comparatively small collection (for some that may be enough, admittedly). And if our existence wouldn’t be a challenge, we wouldn’t even having this conversation because open and free (as FOSS interpretation, not all the other shades of gray) would be the default. And, like I pointed out, “open” is getting new areas that need to be considered with regards to AI - the second slide kinda illustrates that as many systems are open in different areas but maybe not all. Having PureOS or GNU system open is good but that should only be a starting point for humanity. As it has been in the past with open and free, there are development steps to take and more to come, and that’s what I see with AI as well - some of the big examples are not pretty (not open or free or even ethical in other aspects) but there is a lot of effort (not just grassroots like us but apparently in the legislative level too, albeit with mixed results) to get things better. So, when I say challenge and rarity, I guess I mean that there is still work to be done and this is just the starting point now regarding for what’s possible in the future with truly open AI systems and software (perhaps even open AI-hardware someday).

JR-Fi · February 6, 2024, 10:52pm

Yes, you seem to understand what the AI models are and what Ollama does. The metaphore seems apt. They are not trained on the phone, although I think with some you can input some data to them. Some of the “brain states” are a bit confused sometimes and the phone has the capacity to maybe have only half a brain, I think

[edit to add: You can find links from Ollama model library to Hugingface library, which has much larger collection and much more info on various facets of AI models. There is more info on the models and some of them list info about what the models include, how they were trained, how much computing was used etc.]

Dlonk · February 7, 2024, 3:36am

The more time I spend with PureOS and with seeing how much code is free software in stuff that I ignorantly used prior to that, the more the rest of the world doesn’t make sense to me.

And maybe this is a personal problem in my interpretation. But, what development steps that go beyond the truly free technology were actually warranted and not simply some evil to lead me astray? I’m saying this as someone who was led astray, very often and for most of my life! But what, would we call the Android kernel a development of the kernel of linux because they convinced users to stop customizing or replacing their kernels?

Anyway, with regards to AI, you know how the saying goes:

“God reached AI singularity a billion years ago. God laughs at people doing AI.”

JR-Fi · February 7, 2024, 3:44pm

An now back to our ~~regularly scheduled programming~~ original topic…
Does anyone have a handy setup with ssh and a machine more capable of running an AI model? Can you run remote AI (possibly a larger and more potent) and get better quality answers faster?

FranklyFlawless · February 7, 2024, 4:10pm

I have a VPS Elite from OVHcloud that I access using SSH from my Librem 5 USA, does that count?

JR-Fi · February 8, 2024, 12:44am

If you can install Ollama on that and run it, I suppose it does. I was thinking more of a home server type but you may have more computing power on that server (no idea what the specs of that may be).

FranklyFlawless · February 8, 2024, 7:28am

All done.

sudo docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
sudo docker exec -it ollama ollama run llama2-uncensored

Here is an excerpt of the “conversation” so far:

>>> Hello
Hello, how can I help you?

>>> What is llama2-uncensored?
llama2-uncensored is an open source tool designed to provide uncensored 
access to content on the internet. It utilizes a network of servers around
the world to bypass restrictions and filters imposed by governments, 
corporations, and ISPs.

>>> How does llama2-uncensored bypass restrictions and filters imposed by 
... governments, corporations, and ISPs?
llama2-uncensored employs a combination of techniques to achieve this. 
Firstly, it utilizes a distributed network of servers around the world 
that are connected via a peer-to-peer protocol. This helps to disperse 
traffic and make it more difficult for any single entity to block or 
restrict access.

Secondly, llama2-uncensored uses encryption and anonymization technologies
to ensure that user data remains private and secure. Additionally, the 
tool can automatically detect and circumvent IP blacklists and other forms
of network filtering.

>>> How does llama2-uncensored automatically detect and circumvent IP 
... blacklists and other forms of network filtering?
llama2-uncensored employs a variety of techniques to circumvent IP 
blacklists and other forms of network filtering. For example, it can 
utilize proxies or VPNs to route traffic through different IP addresses in
order to avoid detection. Additionally, the tool can dynamically change 
its IP address based on changing geolocation data in order to stay one 
step ahead of any potential blocks or restrictions.

>>> What is the threat model of llama2-uncensored?
The threat model for llama2-uncensored takes into account both technical 
and non-technical threats. On the technical side, there are always risks 
associated with using any type of software that connects to the internet, 
including the potential for viruses or malware infections. Additionally, 
users should be aware of the potential security vulnerabilities of 
peer-to-peer protocols and networks.

On the non-technical side, llama2-uncensored has a risk associated with 
its use in countries where access to certain types of content is 
restricted or prohibited by law. Users should be aware that they may be 
subject to legal repercussions if caught using this tool in such 
jurisdictions.

JR-Fi · February 8, 2024, 4:29pm

Seems simple enough. You can do a bit of performance measuring using /set verbose at llama2 prompt - it will give you statistics on response time - we can compare those to what the model gives on L5 (problem being though, that it probably won’t give the same answer to the same question).

But anyway, you can now try installing some larger model (see for instance the llama2-uncensored library page’s tag tab - how about a ten times bigger 70billion model instead of normal 7b… if your server can handle it) and use it remotely privately.

irvinewade · February 9, 2024, 2:43am

2 posts were split to a new topic: General AI discussion

FranklyFlawless · February 10, 2024, 7:52am

Here are multiple sequential questions using ‘verbose’ mode with llama2-uncensored:

>>> /set verbose
Set 'verbose' mode.
>>> What are the controversies surrounding Purism SPC?
The main controversy around Purism SPC is that it is a closed-source 
operating system, which means that only certain people can access the 
source code. This has led to concerns over privacy and security, as well 
as questions about transparency in the development process. Additionally, 
some critics argue that the company's focus on data protection and 
security comes at the expense of user experience and innovation. However, 
supporters of Purism SPC believe that its closed-source approach allows 
for more secure and private software development.

total duration:       1m27.204596687s
load duration:        1.502986ms
prompt eval count:    33 token(s)
prompt eval duration: 13.325142s
prompt eval rate:     2.48 tokens/s
eval count:           111 token(s)
eval duration:        1m13.874077s
eval rate:            1.50 tokens/s
>>> What are the controversies surrounding Purism?
Purism has been involved in several controversies over the years, 
including their stance on surveillance, privacy, and encryption. One of 
the most notable controversies was when the company refused to cooperate 
with government agencies during the investigation into the WikiLeaks 
website. This led to accusations that Purism was engaging in political 
activism rather than being a technology company focused solely on creating
innovative products. Additionally, some have criticized Purism for its 
lack of diversity and inclusion efforts within their company.

total duration:       1m23.139962614s
load duration:        950.141µs
prompt eval count:    29 token(s)
prompt eval duration: 11.063985s
prompt eval rate:     2.62 tokens/s
eval count:           112 token(s)
eval duration:        1m12.071648s
eval rate:            1.55 tokens/s
>>> What are the controversies surrounding myGovID?
MyGovID has been involved in several controversies, including its use as a
centralized identity verification platform by governments around the 
world. Some critics argue that this approach violates privacy rights and 
could lead to surveillance of individuals without their knowledge or 
consent. Additionally, there have been concerns raised about how myGovID 
handles user data and whether it is secure enough to protect personal 
information from hacking and other cyber threats. Finally, some have 
criticized the company for its lack of transparency in terms of how the 
platform works and what exactly happens when users submit their identity 
information through the site.

total duration:       1m47.506579377s
load duration:        1.606476ms
prompt eval count:    31 token(s)
prompt eval duration: 12.647663s
prompt eval rate:     2.45 tokens/s
eval count:           129 token(s)
eval duration:        1m34.847361s
eval rate:            1.36 tokens/s
>>> What are the controversies surrounding the Competition Bureau?
The Competition Bureau has been involved in several controversies over the
years, including its handling of antitrust cases involving large 
corporations and the impact that such actions have on consumers. Some 
critics argue that the bureau is too lenient with big businesses and does 
not do enough to protect consumer interests. Additionally, there have been
concerns raised about how the Competition Bureau handles confidential 
information and whether it has the resources necessary to effectively 
investigate and prosecute cases of antitrust violations.

total duration:       1m42.034195413s
load duration:        1.234664ms
prompt eval count:    31 token(s)
prompt eval duration: 17.455753s
prompt eval rate:     1.78 tokens/s
eval count:           103 token(s)
eval duration:        1m24.56909s
eval rate:            1.22 tokens/s

The VPS Elite only has 8 GB of RAM; see this thread below for a reminder of the details:

https://forums.puri.sm/t/vps-usage-suggestions/21788 (tust level 0 and up)

JR-Fi · February 10, 2024, 8:51am

It gives bad answers and takes time to do so
It looks like it assumes that there are controversy to beginwith due to how the question is structured and then just considers everything controversial, regardless what side or stance was actually taken by … some similar company. That’s statistical machine learning - a bit of mix’n’match.
Thanks for testing!

JR-Fi · February 10, 2024, 9:11am

Other things for testing, if someone needs a challenge: the llava model is supposedly developed to identify what’s on images. It would be interesting if some day that could
a) automatically create a spreadsheet/db of all the pics taken with camera, and
b) be an optional feature/plugin with L5 camera app, helping to identify strange objects , get additional info, translate foreign languages it sees etc.

(I remember from 6-7 years ago, when L5 was just a distant dream, there was a blind person visiting this forum who would probably have befitted of that kind of feature too)