Guy I was chatting with at a meeting recently told me about this app. https://handy.computer/
I’ve been searching for a good offline voice input app that I can use on my Librem5 in lieu of the touchscreen keyboard, and this ticks all the boxes.
From the web site:
About Handy
I built Handy because I broke a finger, was put into a cast and as a result my hand was out of commision. I tried some of the existing speech-to-text apps but none were open source and extensible. So Handy was made to fill this gap.
It’s probably the most simple speech to text app, it’s only function is to put whatever you say into a text box. Press and hold a keyboard shortcut, speak, and release. Your words appear wherever you were typing. It runs completely offline using Whisper, works across platforms, and doesn’t require subscriptions or cloud services.
Project Goals
Handy isn’t trying to be the best speech-to-text app. It’s trying to be the most forkable one.
Handy is a starting point for when you need accessibility features, want to experiment with voice computing, or just prefer tools you can actually own and modify. The project is designed to be tiny and extensible. It is opinionated and you might have feature requests that may not make it into the main project, but you are more than welcome to implement them yourself. And with the help of Claude Code and agentic programming tools, you might just be able to without any programming experience.
Take it, fork it, modify it, break it, fix it. Build something amazing and share it to everyone.
Under the Hood
Press a shortcut to start/stop recording (or use push-to-talk mode)
Silence is removed via VAD filter (silero)
Transcribes your speech using your choice of Whisper or Parakeet models.
Pastes the text directly into whatever app you’re using
Runs entirely on your machine with GPU acceleration when available
It is a bit old now but since mobile Linux has been reinventing the wheel every five years or so for the last 25 years there is also this: Making sure you're not a bot!
in addition to using it on a librem 5, it would be useful to get it running on byzantium on a librem 14. has anyone tried this?
and what are the legal/moral issues related to using these models? can a package that uses such models pass the strict fsf-inspired requirements of pureos? or if i want to use this software do i have to always build my own packages?
I’m not sure about which whisper package the OP used, but whisper.cpp which derives from OpenAI’s whisper model is licensed with the MIT License for the code and for the model weights. And, as you probably know, the MIT License if a Free license.
I don’t know about what “moral” issues you’re referring to.
[Edit: It also references the option to use the Parakeet model. The original Parakeet model was released by NVIDIA and (according to the LLM I asked) it required an “attribution to NVIDIA” … so it might suffer from the same non-Free requirements that affected the original BSD license. I don’t know for certain because my summary came from … an LLM and they sometimes hallucinate.
my understanding is that the training data and complete details of the training procedure are not available. i would be delighted to be told i am wrong on this point.
What is clear is that most LLMs offer the structure and the weights with a license. One could argue that this might not be enough because, for the most part, if you wanted to improve the model, you would need training data to find the appropriate weights for that new model. However, it certainly doesn’t restrict someone from gathering their own training data … and the license would absolutely cover your ability to subsequently use your own data. For example the various llama models took Meta/Facebook’s model which licensed only the model and the weights, but not the training data … and found their own training data (“a mix of publicly available online data”) which they used to create slightly different, but competitive models. And, it should be noted, that having the old model+weights greatly aids training a new/different model.
But, yes, the availability of training data and the license for that data is a crux issue in regard to being able to improve a model … which is certainly related to the 4 Freedoms. That said, it might be analogous to when I create code and license it: I only give you the right to use the code and derive new code from that code itself; I don’t give you the rights to all of the knowledge and material I read and worked to understand to put myself in the position to have written the code in the first place.
CC0 = no restrictions
CC-BY = name the authors in the way they want and do not make changes in their names (for example the source sais “cars are red”, you can change the source into “cars are green”, but you are not allowed to say “the authors say the cars are green”)
CC-BY-SA = CC-BY plus share with same license.
All other CC licenses are no free knowledge and so also not aligned with FSF missions.
I haven’t tried it yet, but it sounds like IBM VoiceType Dictation, which came with OS/2 Warp in the mid-90’s – and ran on a 486 with 4MB of RAM (not a typo). It wasn’t super accurate, but it didn’t big the system, either.
Makes me wonder… Warp also had VoiceType Navigation, where you could assign a recorded sound clip to CORBA objects and if spoke into the mic and your speech matched an object, it would activate (open a folder, execute a program or script, etc). You could also use it to navigate controls on a form or hot-keys.
Navigation worked much better than dictation.
I would really like to be able to assign sounds to .desktop files on the L5 so I could open apps by voice while driving.
Or … I wondered about whether it would work with a shell. Typing those long shell commands can be painful with an on-screen keyboard, particularly switching layouts to pick up special characters. So voice dictation of shell commands would be useful. However I wonder whether that would break the model i.e. if it has been trained on normal spoken sentences, not geekspeak. That is to say, it might need a specialist model for shell commands (and, if you want to extend it further, a specialist model for programming language XYZ, for each XYZ).
Undoubtedly, these limited domains of discourse will work better than natural language, per MB of model.
The use case of Handy seems to be targeted at something like a web forum, but maybe not this one, as people do need to post geekspeak here.
This is the error message I’m getting when attempting the launch handy from the terminal
handy: /lib/aarch64-linux-gnu/libc.so.6: version `GLIBC_2.39' not found (required by handy)
handy: /lib/aarch64-linux-gnu/libc.so.6: version `GLIBC_2.38' not found (required by handy)
Is anyone else able to reproduce this error on their Librem 5 with the handy arm64 deb pkg?
This kind of error is absolutely normal with Linux where you grab random executables from the web and attempt to run them on the version of the operating system that you have installed. This is just one area where a distro adds value i.e. in providing a baseline set of packages that are all based on compatible versions and hopefully actually work together.
The fact that you are seemingly running a mixed version may complicate matters.
Various of the container formats are designed to work around this problem e.g. ask the guy whether he is willing produce a flatpak?? (It is probably not reasonable to ask him to produce a native package for the version that byzantium is based on i.e. bullseye.)
Perhaps start by finding out what version of the relevant package (i.e. glibc) comes as standard with crimson? as standard with dawn? … maybe by looking at the corresponding Debian versions (bookworm / trixie)
If your phone is not a crash test dummy then this whole exercise could be hard work. So I would probably explore what native versions work and what native versions don’t work by using a spare desktop / laptop. Then you will at least know what you are shooting for on the phone.
My understanding is that building from source may be an option but that could fail to produce a result if the software actually relies on a newer API.
I’ve built Handy for Crimson backports, so, you can try this package or just run apt update and apt install handy if you already on the backports repo.
another reason why having the training data is relevant: you have a better chance of scanning it to see if it has been deliberately deviously designed to do what the originator wants on a certain input.
the main reason why i asked this question is that pureos tries to follow the fsf’s conditions and so if the fsf thinks the model weights are the equivalent of obfuscated source code, then this package might never be added to pureos.