Is anyone here successfully doing speech-to-text and/or voice control of their computer with free software and without losing privacy (i.e., not sending your voice to Google/Apple servers to be turned into text)?
If so:
What speech recognition software are you using?
What OS/distribution are you using? PureOS? Debian? Qubes? Something else?
What computer are you using? Librem 13/15? Librem 5? Something else?
What microphone are you using? Built-in? Something else?
[This topic properly belongs in several categories but it appears that Discourse requires it to live in only one category. Sorry for the somewhat arbitrary choice of the “Librem” category.]
no since only M$, Apple and a few others have successfully implemented a close to reliable hw/sw solution
speech-to-text requires a pretty good microphone and speech analysis program to be able to interpret correctly what i’m saying and currently aliases and scripts in GNU/Linux distros can do that way more easily and efficiently …
Hey @Joe, I’ve been playing around with snips.ai and it was working quite well with an additional Respeaker Hat on the Raspberry Pi3. And they even claimed to have their Speech-To-Text engine optimized for the imx8m used in the Librem 5. Unfortunately they’ve been acquired by Sonos and stopped their maker offering.
I’d like to add and ask a related question, if there is anyone here able to answer: isn’t there two types of methods to do voice commands - one that is more private and one that isn’t? Type A: in the device and using specific identifiable commands linked to actions, where selection and variation of identified commands and users is limited, but also may be resource intensive. And B: outside of device, because larger range of commands and more natural language is aimed to be used, hence more resources and data are needed. The type B obviously being problematic from security standpoint. But isn’t the limited type A still possible and done - weren’t there this type of apps already in the 486 era? Could in device be done now?
“In device” was done on 486/Pentium in the late 90’s by IBM on OS/2 Warp.
Voice Command let you record a sound snippet (didn’t have to be your voice) that you would associate with a Shadow (far superior, but somewhat similar to a Shortcut or Symbolic Link). If VC was on, and it heard the snippet, it would execute the program/script called by the Shadow. It worked very well out of the box on plain vanilla Warp.
Voice Type Dictation would use speech recognition to let you dictate into whatever textbox had focus. This worked OK if you took the time to train it to your voice.
I bring up this ancient history because Big Blue recently bought Red Hat, and in a world where even M$ Open Source, it doesn’t seem unreasonable to think that IBM might release the code. I would love to have these features in PureOS, with the power of the FLOSS community behind it.
yes probably but i haven’t yet seen anything that is of universally acceptable quality to what the classic keyboard can offer … it’s like : "Sry could you speak up ? what was that ? you want to go to pawn-hub or the porn-hub ? etc. the simple nuances are the tricky ones … and some are quite embarrassing to get wrong …
if you want an open source speech to text engine there is DeepSpeech from Mozilla.
There is even a version build for AMR64 Debian.
But I think it’s still a long way from the converted text to a complete voice control solution.
Do you mean speech to text? Which is definitely a challenge. I was thinking more like, identifying certain sounds/words that correspond to pre-set commands or scripts. So you can… minimize your browsers and open a spreadsheet hands free… with just one short word.
even for short words/commands there are still voice fluctuations/inflections/tone/etc that can result in quite different commands being input … what if instead of an intended command your computer interprets that you wanted to blow up your entire disk space (knock on wood) ?
Which would only happen, if I have that script pre-set for that. And have a codeword for it that is close to my other codewords. And I’d be delighted that my secure data-nuke worked because I have backups…?
But, again, that’s why you train your program to identify the parameters within which it identifies your speech/words. Or so I very vaguely remember from the 90’s system logic that @Photon described.
Edit to add: James McClain’s Speech Command / Palaver for Ubuntu is in Github but needs a recognition API…
Edit to add2: And also Simon, which seems to fit my description.
True, but Voice Command did work very well. As I remember my experimentation, it worked on the overall waveform/length of a snippet, not by voiceprint/speech-to-text/execute text. My little kids would surreptitiously enter my office while I was working and yell, “Play DOOM!”, then run out howling with laughter. It was funny only because DOOM would start up in full-screen and they would then yell, “Mom! Dad’s playing Doom again!”