If one uses a lot of photo settings… Linux might actually be able to be a little better than Android, or at least better than my current OnePlus One running LineageOS with OpenCamera. I notice that in Linux with guvcview, I can get my cheap ELP webcam from 2014 to pick up the black-on-dark-grey letters of my HHKB better than on my 2014 phone, simply by turning the exposure way up. On my phone, I need a flash to get anything.
I guess the real test of that will come once I get my Librem 5 though, because I believe it actually has the exact same camera sensor as my OnePlus One! That could allow for a very close comparison. Hopefully some good mobile-optimised camera software exists by then with at least as much functionality as guvcview.
Why try to match Google Assistent? IMO, Google Assistent kind of sucks…
My dream for a voice assistant is more like the Linux command line, and combined with AR so… well, I think this excerpt this comment I made on Reddit should explain decently:
GUI design, particularly in its early days, made heavy use of metaphor: desktops, files, folders, recycle bins. It made a lot of sense, because computers were still trying to bootstrap themselves into legitimacy. The ease of implementation of metaphors was one of the huge advantages GUIs wielded over CLIs. Ironically, though, the CLI has embodied an accidental metaphor all along: it’s a conversation.
This makes me think: what makes voice interfaces so much worse than GUIs? Well, a main reason is that it simply can’t understand me with many things, with many things I might want to say even being seemingly impossible for the voice recognition to understand, no matter which accent I use or even if I use TTS, such as song names not in English. I guess the CLI has a similar problem, though, of names of files being in different character sets. What makes the CLI work is that I can just copy and paste any file I can’t write, usually with middle click. But… you can’t do that with a voice interface.
But what if you could? What about some hybrid interface, where there was voice and visual feedback, where I could say “blender /mnt MEGA Blends Mia… yeah, that one” and point to the file I want, or for characters I can’t type and don’t know how to pronounce all of them say “mpv Music Music2 Videos Hejme japanese characters… that one!”? What if I could say part of the file I want, then reach out and grab it to an application I want to send it to (one thing I can’t do right now in the CLI, if I want to send an image to Discord not easily able to be seen with the built-in file browser)? What if I could say “Waterfox… DuckDuck… yeah” (autocompletes to https://duckduckgo.com) then paste an error I found with my keyboard? Couldn’t a hybrid of CLI, GUI, and voice be so much more efficient, being able to say vaguely part of what I want and just get it like with the CLI, but instead quickly saying it with my voice?
Just out of curiosity, I tried to open Waterfox with Google now (a simpler task than dealing with files), just to see if we have anything close, and… no, we do not, not at all. I said “Waterfox” and it calculated a while then searched an unrelated term. I said “Open Waterfox” and it calculated a bit then said “Opening Waterfox” then opened it, not very snappily. It would be faster just to tap the app. I shouldn’t have to say complicated phrases like “open” instead of just the software I want to use, shouldn’t have to say “Use VLC to open Dragon by Two Steps From Hell” (and even that doesn’t work; it just doesn’t understand at first, then searches on the web) instead of “vlc Music Two Steps… yeah” <taps “Two Steps From Hell”, then taps on “Dragon”>. The current way it’s set up, it’s faster for my to play it via mpv on the CLI… with my on-screen keyboard. Current interfaces kind of suck, honestly. I think it could be cool to see a better hybrid that actually works.
Actually, even without dealing with voice nonsense, I would love an easy way for me to select a file in the CLI then drag it to a GUI application. I wonder if such a software exists yet? Just, a command then the file name, which would pop out a window I could drag from, causing the window to disappear but the dragged thing to stay. It could even quickly pop up a directory I could quickly drag a file from, or just drag the entire directory from the “.” directory. Perhaps I should work on that, and later, a more advanced voice/VR interface where it is easy to say things like I type them in the command line and seamlessly switch to selecting and dragging them whenever. That would be really nice and efficient, I think, and maybe then there could finally be a more efficient interface than the CLI? Assuming it could actually understand everything, of course…
It could even implement CLI commands with voice, since the first word is the software name. I could say “cat Elŝutoj slash Domain” click on some things then say “pipe tee ex tee two bin”, to write the equivilent of “cat ~/Elŝutoj/Domainlist/gen.xyz/2020-10-14/xyz_letters_3.txt|txt2bin”, which gives the output of that file but in binary (using https://github.com/happysmash27/bin2txt), or any other routine shell command I might want to use. If it used two many weird symbols, then it could be easy to fall back to a keyboard. I guess there could be some ambiguity problems (what if there is a file called “pipe”), but it is a nice dream I think.
What if we could have something like that instead? In my experience at least, current voice assistants are barely useful at all. Why try to merely meet that quality, instead of going beyond it? It wouldn’t really be able to be VR/AR on the Librem 5, but even on a flat interface… surely it would be nice to be able to say out a file path, then pick out which file one wants when it isn’t as convenient to say it, all within the same interface, instead of having to rely on something like Google Assistent which can’t find something at all most of the time? I don’t think we should aspire to match an interface that barely works for many tasks; rather we should aspire to surpass it, however that may be.
Edit: Just realised you said text-to-speach, not speach-to-text. …Is TTS even a problem on Linux? Espeak works fine.