Handy - offline speech-to-text app

Guy I was chatting with at a meeting recently told me about this app. https://handy.computer/

I’ve been searching for a good offline voice input app that I can use on my Librem5 in lieu of the touchscreen keyboard, and this ticks all the boxes.

From the web site:

About Handy

I built Handy because I broke a finger, was put into a cast and as a result my hand was out of commision. I tried some of the existing speech-to-text apps but none were open source and extensible. So Handy was made to fill this gap.

It’s probably the most simple speech to text app, it’s only function is to put whatever you say into a text box. Press and hold a keyboard shortcut, speak, and release. Your words appear wherever you were typing. It runs completely offline using Whisper, works across platforms, and doesn’t require subscriptions or cloud services.

Project Goals

Handy isn’t trying to be the best speech-to-text app. It’s trying to be the most forkable one.

Handy is a starting point for when you need accessibility features, want to experiment with voice computing, or just prefer tools you can actually own and modify. The project is designed to be tiny and extensible. It is opinionated and you might have feature requests that may not make it into the main project, but you are more than welcome to implement them yourself. And with the help of Claude Code and agentic programming tools, you might just be able to without any programming experience.

Take it, fork it, modify it, break it, fix it. Build something amazing and share it to everyone.

Under the Hood

  • Press a shortcut to start/stop recording (or use push-to-talk mode)
  • Silence is removed via VAD filter (silero)
  • Transcribes your speech using your choice of Whisper or Parakeet models.
  • Pastes the text directly into whatever app you’re using
  • Runs entirely on your machine with GPU acceleration when available
  • Works on Windows, macOS, and Linux
5 Likes

Does it work out-of-the-box on the Librem 5?

If not, or otherwise, have you built it from source for the Librem 5?

2 Likes

It is a bit old now but since mobile Linux has been reinventing the wheel every five years or so for the last 25 years there is also this: Making sure you're not a bot!

I just tried installing the .deb pkg for arm64 on my Librem5 and it choked with an error about GLIBC 2.29 iirc.

I’m running Crimson with @galilley backports.

If anyone is able to get it running on stock Byzantium or Crimson please post.

The AppImage on my Librem mini w/Crimson choked too, but the .deb pkg installation worked.

in addition to using it on a librem 5, it would be useful to get it running on byzantium on a librem 14. has anyone tried this?

and what are the legal/moral issues related to using these models? can a package that uses such models pass the strict fsf-inspired requirements of pureos? or if i want to use this software do i have to always build my own packages?

This is the other option for offline use: https://flathub.org/fi/apps/net.mkiol.SpeechNote

3 Likes

I’m not sure about which whisper package the OP used, but whisper.cpp which derives from OpenAI’s whisper model is licensed with the MIT License for the code and for the model weights. And, as you probably know, the MIT License if a Free license.

I don’t know about what “moral” issues you’re referring to.

[Edit: It also references the option to use the Parakeet model. The original Parakeet model was released by NVIDIA and (according to the LLM I asked) it required an “attribution to NVIDIA” … so it might suffer from the same non-Free requirements that affected the original BSD license. I don’t know for certain because my summary came from … an LLM and they sometimes hallucinate.

Edit2: The model is CC-by-4.0 from NVIDIA. Sadly, I do not know a lot about CC licenses. It’s approved by the FSF for works that don’t include software and documentation: CC BY 4.0 and CC BY-SA 4.0 added to our list of free licenses — Free Software Foundation — Working together for free software

]

my understanding is that the training data and complete details of the training procedure are not available. i would be delighted to be told i am wrong on this point.

i don’t know what the fsf’s position on this is.

That varies from model to model.

What is clear is that most LLMs offer the structure and the weights with a license. One could argue that this might not be enough because, for the most part, if you wanted to improve the model, you would need training data to find the appropriate weights for that new model. However, it certainly doesn’t restrict someone from gathering their own training data … and the license would absolutely cover your ability to subsequently use your own data. For example the various llama models took Meta/Facebook’s model which licensed only the model and the weights, but not the training data … and found their own training data (“a mix of publicly available online data”) which they used to create slightly different, but competitive models. And, it should be noted, that having the old model+weights greatly aids training a new/different model.

But, yes, the availability of training data and the license for that data is a crux issue in regard to being able to improve a model … which is certainly related to the 4 Freedoms. That said, it might be analogous to when I create code and license it: I only give you the right to use the code and derive new code from that code itself; I don’t give you the rights to all of the knowledge and material I read and worked to understand to put myself in the position to have written the code in the first place.

CC0 = no restrictions
CC-BY = name the authors in the way they want and do not make changes in their names (for example the source sais “cars are red”, you can change the source into “cars are green”, but you are not allowed to say “the authors say the cars are green”)
CC-BY-SA = CC-BY plus share with same license.

All other CC licenses are no free knowledge and so also not aligned with FSF missions.

1 Like

I use with success vosk

I use with success vosk

Did you use it on L5 Byzantium and was it the Deb package for the “Handy” application?

1 Like

I haven’t tried it yet, but it sounds like IBM VoiceType Dictation, which came with OS/2 Warp in the mid-90’s – and ran on a 486 with 4MB of RAM (not a typo). It wasn’t super accurate, but it didn’t big the system, either.

Makes me wonder… Warp also had VoiceType Navigation, where you could assign a recorded sound clip to CORBA objects and if spoke into the mic and your speech matched an object, it would activate (open a folder, execute a program or script, etc). You could also use it to navigate controls on a form or hot-keys.

Navigation worked much better than dictation.

I would really like to be able to assign sounds to .desktop files on the L5 so I could open apps by voice while driving.

3 Likes

Or … I wondered about whether it would work with a shell. Typing those long shell commands can be painful with an on-screen keyboard, particularly switching layouts to pick up special characters. So voice dictation of shell commands would be useful. However I wonder whether that would break the model i.e. if it has been trained on normal spoken sentences, not geekspeak. That is to say, it might need a specialist model for shell commands (and, if you want to extend it further, a specialist model for programming language XYZ, for each XYZ).

Undoubtedly, these limited domains of discourse will work better than natural language, per MB of model.

The use case of Handy seems to be targeted at something like a web forum, but maybe not this one, as people do need to post geekspeak here.

2 Likes

This is the error message I’m getting when attempting the launch handy from the terminal

handy: /lib/aarch64-linux-gnu/libc.so.6: version `GLIBC_2.39' not found (required by handy)
handy: /lib/aarch64-linux-gnu/libc.so.6: version `GLIBC_2.38' not found (required by handy)

Is anyone else able to reproduce this error on their Librem 5 with the handy arm64 deb pkg?

This kind of error is absolutely normal with Linux where you grab random executables from the web and attempt to run them on the version of the operating system that you have installed. This is just one area where a distro adds value i.e. in providing a baseline set of packages that are all based on compatible versions and hopefully actually work together.

The fact that you are seemingly running a mixed version may complicate matters.

Various of the container formats are designed to work around this problem e.g. ask the guy whether he is willing produce a flatpak?? (It is probably not reasonable to ask him to produce a native package for the version that byzantium is based on i.e. bullseye.)

Perhaps start by finding out what version of the relevant package (i.e. glibc) comes as standard with crimson? as standard with dawn? … maybe by looking at the corresponding Debian versions (bookworm / trixie)

If your phone is not a crash test dummy then this whole exercise could be hard work. So I would probably explore what native versions work and what native versions don’t work by using a spare desktop / laptop. Then you will at least know what you are shooting for on the phone.

My understanding is that building from source may be an option but that could fail to produce a result if the software actually relies on a newer API.

1 Like

I’ve built Handy for Crimson backports, so, you can try this package or just run apt update and apt install handy if you already on the backports repo.

4 Likes

Yes, L5 Byz and yes, deb pkg

another reason why having the training data is relevant: you have a better chance of scanning it to see if it has been deliberately deviously designed to do what the originator wants on a certain input.

the main reason why i asked this question is that pureos tries to follow the fsf’s conditions and so if the fsf thinks the model weights are the equivalent of obfuscated source code, then this package might never be added to pureos.

2 Likes