FYI there already is an anonymous privacy-respecting European search engine:
And it’s a pretty good one at that too! Integrates nicely with FF
FYI there already is an anonymous privacy-respecting European search engine:
And it’s a pretty good one at that too! Integrates nicely with FF
Startpage uses Google as its search engine under the hood.
Are you sure? I thought they used Bing (because in fact Google didn’t want to make a deal with them…)
Absolutely.
Ok, That would be good news - Google is still better than any other (unfortunately)
You can use the list of search engines I linked from my Wikiless instance above if you want to compare the current available search engines, or if you want more backstory surrounding my usage of Startpage, you can read about it on the Purism community forums:
Currently I use my own Whoogle instance:
Nice!
Your Whoogle instance would give the same results as if it were a Google search?
I would not know, as I have stopped directly using Google services a long time ago. What I do know is that my Whoogle instance fetches Google results and has not been ratelimited since I have deployed it.
… and I would hope that when the project as per this topic is complete, Startpage changes over to using the fruits of this project.
Here is another quote from the latter article:
Even if an independent European privacy-first search engine is realized and Startpage migrates over to this “solution”, Big Tech can and will use technical and legal measures to prevent gratis indexing of their proprietary platforms, effectively locking projects out of usable search results unless they pay for the “privilege”.
Up to a point that is reasonable. If the information is only accessible by logging in and even then only accessible for personal use then it may be reasonable to prevent crawling or scraping. In any case, a polite web crawler should respect the “robot directives” - and a company in this position should be issuing same.
Where this would get very dodgy is if, say, a Big Tech company offers search and offers web site hosting - and they allow their crawler to crawl those web sites but they prevent any other crawlers from crawling those web sites, or by default they prevent that.
I have no problem with a social media site outright preventing all web crawling.
Then there are the free, public web sites that want to prevent scraping in order to monetise their content - but are not averse to being indexed. (That can be handled by the site owner indexing its own content and then handing the index to the crawler.)
You are right though that there are many legal stoushes and many regulatory stoushes between where we are now and any kind of comprehensive European “sovereign” search / indexing capability.
Startpage used to have a “co-company” called Ixquick which was a metasearch engine (pulling results from 14 different other engines), which I greatly preferred, but sadly it was shut down in 2016 in favor of the Google-only Startpage.
(The pull of dark forces is strong, I guess. But, actually, it may have been about reducing expenditures.)
Before Whoogle, I deployed SearXNG and LibreY, but they were highly susceptible to ratelimiting, so fetching search results were unreliable. By default, SearXNG is configured with aggregating Brave, DuckDuckGo, Google, and Qwant, while LibreY is configured to switch between scraping DuckDuckGo and Google. The only one I have not deployed yet is 4get, which by default is configured with DuckDuckGo.
Literally “scrapping,” or “scraping,” rather?
Edit: I wasn’t sure which action you were referring to, as either of those would be possible in this case.
My point is that privacy-focused metasearch engines have various issues of their own other than financial cost. After deploying LibreY and SearXNG, rate limiting ultimately forced me to explore other options, but I also had to consider what default search engines to configure, accessible Tor Browser user experience with the Security Level slider set to Safest, in-built integration with other privacy front-ends (URL rewrites), how maintained the software repository was, etc. Reducing administration complexity allowed me to focus on allocating resources towards other projects.
I wonder how this compares to Mojeek, which already exists:
Maybe Ecosia and Qwant are bigger actors, who can create something bigger? I think Mojeek’s index is still quite small compared to the big ones.
Mojeek’s index is over six billion pages as of October 2022, but it is also closed-source with highly variable search results, contrary to their claim and based on experience:
Most of their other claims are accurate to various degrees.
May be this is related https://news.infomaniak.com/en/ethical-artificial-intelligence/ though not for search, they are targeting for personal productivity, but focusing on privacy.
Infomaniak’s AI implementation is based on sorting information from their own email and cloud service:
@FranklyFlawless I have looked a bit more into your Woogle Search instance and I have 2 comments:
comparing it with Startpage search results, I found some differences - especially in the ranking positions (but less in terms of the results themselves.) I wonder why, since you mentioned that Startpage was itself using Google under the hood. Both engines should then be similar in ranking as well, shoudn’t they? Maybe this could be explained if Startpage is also mixing results from other sources, like a metasearch engine (which its predecessor ixquick was)
looking at the FF network console to see what it loads, I found out only 2 js scripts:
Anyway thanks for this useful URL