Duckduckgo censorship

irvinewade · March 15, 2022, 1:25am

Well, yes, but what do the hardware requirements look like for an index of the entire public web? I expect that the only way that it could be done “open” is if it is a distributed system.

Ick · March 15, 2022, 1:43am

Open Search Foundation said:

The first steps have already been taken. Together with experts from European computer centres and research institutions, we are promoting decentralised indexing experiments and the development of advanced concepts.

Sounds like still a long way, but with cooperations of many companies and institutes there’s a way of decentralization with combined powerful hardware. I guess it’s the only realistic way for such hardware requirements.

Nick3 · March 15, 2022, 5:42am

I’m using swisscows, and not having problems so far.

amarok · March 15, 2022, 12:36pm

Speaking of “censorship,” Google is the king, as anyone who uses Google search directly only sees what Google decides to show.

Bass20 · March 15, 2022, 1:28pm

I’m getting the feeling there isn’t a viable alternative search engine that doesn’t censor results? Eventually all we see will be completely controlled by some entity (or small handful of entities) using methods we don’t really understand to push an opinion or agenda that goes against true freedom of information?

The end user should be the only one deciding what is, or is not, misinformation, or what is, or is not, relevant information. I don’t think anyone here wants someone else deciding that for them, but I could be wrong.

amarok · March 15, 2022, 1:32pm

Maybe a metasearch engine like searx? At least it pulls results from multiple sources.

EDIT: A nice option that searx makes available: Neocities / Random SearX Redirector

Another option: Make your own searX instance.

weirdnerd · March 15, 2022, 3:56pm

I am very anti-censorship, but it is difficult to imagine any search engine without it, decentralized and open or otherwise. Curating information is inherent to the function of a search engine. The “censorship” might be ideologically driven to a greater or lesser degree, but I’m not sure a search engine without it is desirable. If someone is maliciously creating thousands of impostor web pages to “hide” a legitimate page, shouldn’t those pages be “censored”?

Gavaudan · March 15, 2022, 4:43pm

If “censorship” can be based on fact, then yes. It seems to me it is almost universally based on opinion.

lwriemen · March 15, 2022, 7:50pm

Metager let’s you define user filters. I don’t let it return hits from Microsoft sites, so I can counter Bing a little.

muon · March 15, 2022, 9:37pm

The most insulting thing I read about the DuckDuckGo disappointment was something one of their leadership said… they claimed it wasn’t really censorship, it’s “just search rankings”. Pretending that this isn’t effectively the same thing when they rank a site artificially low so that it never gets seen.

How stupid do they think we are?

I tend to use Presearch now. Can be a bit slow at times but I like the tech.

irvinewade · March 15, 2022, 10:46pm

It could be argued that they should be “censored” on the client side so that it is 100% within your control as to what you see and what you don’t see. That is, in the index no pages are ever censored. In terms of the pages (URLs) that the server provides to the client no pages are ever censored.

That of course implies that there is a client. However if we are imagining a new distributed open search implementation then we can also imagine an open source client for it.

OpojOJirYAlG · March 15, 2022, 11:19pm

I mean, it’s pretty easy to imagine, or even remember if you’re old enough… just go back to the time when search engines indexed every site page of every site and when you searched for something the first hundred or so results were the different pages of the same site and results were ranked in order of what came back first from the index not what was most relevant.

I’d be curious how many people are wanting to go back to the good old days of searching through results by the tens of pages hoping you didn’t skip what you were looking for and how many just don’t understand what they’re asking for when they say no censorship of any kind because they didn’t live through it.

Ick · March 16, 2022, 12:46am

But what is if you want to find such fake pages … maybe for research purpose or what ever? Of course a default search engine shouldn’t do that. How ever, a possibility to look for unusual stuff is sometimes a legit requirement. That’s why I’m advertising so much for an open search index. When we have such thing, we can use many search engines for different purpose like daily usage (don’t display fake pages), results for only little pages (excludes big known ones), fake page machine and many other things.

The difference between censorship and manipulated ranking order (for example to filter malicious fake pages) is if there are alternatives to choose what ever you want. But nowadays there’s no real alternative since everything is Google, Bing, Yandex etc and machines that are build on top. That comes at least close to censorship, even if it’s usually no problem for us.

OpojOJirYAlG · March 16, 2022, 4:16am

I mean in the case of duckduckgo’s change, those results would just be further down the list, not gone.

Oh, so in your view what ddg is doing is different than censorship.

irvinewade · March 16, 2022, 4:26am

Looks like the topic has circled back. Duckduckgo censorship

Rather than argue about what “censorship” means … as implied above, how about we stick to the terms “removal” and “down-ranking”? They are unarguably different and they are mutually exclusive and they are fairly clear.

Ick · March 16, 2022, 11:32am

Today every engine is doing some kind of censorship since all are build on top of closed indexes. We have no alternative. That’s the important part - having alternatives.

We don’t need to have 1 search engine for every usage. I also use Wikipedia search engine installed on browser to find pages on Wikipedia (in a specific language) without searching the whole web. And in future with open index it could be interesting to have more than just one or two search engine for the whole web, because they could prefilter things. But in future we also would have alternatives with contrary filters.

And specific DDG nowadays: it is one of the few privacy orientated search machines. It shouldn’t filter more than its index already does. What if the other privacy orientated ones also decide to filter “disinformation”? How should I do my research on disinformation on web if everything is already filtered? With a huge marked of search engines based on open index such filtering would be okay, but we have not such marked right now.

And another thing I haven’t said before. Changing such things on runtime is always a bad idea and can be called censorship - even if a huge marked of other engines would exist. User trust that the concept would stay the same as it is and don’t want to check every month if something will change. What would you do if DDG is stopping the privacy part of it? You may read it on a blog or article or you may never see such an article and use it in trust that your privacy is safe. They could make a easy accessible user-setting, but not changing its concept as whole. There are always user that don’t know about the current concept change.

Again in short, 2 things are needed for non-censored:

alternative engine marked based on open index
creating an own filter concept (have to be public) and never change it.

fsflover · March 16, 2022, 1:56pm

Sure there is: hosting and ranking yourself: https://yacy.net.

irvinewade · March 16, 2022, 10:13pm

… provided that you have enough storage and enough internet bandwidth.

I completely agree that operating your own web search service will be perfect from the point of view of how the search operates. As is true of own cloud or own server for anything else on the internet. You can always trust yourself. You can never be sure that you can trust anyone else.

Anyone going to hazard a guess for the total storage requirement? I’m pretty sure that my VPSs, even combined, would fall well short.

That’s why I was looking at distributed index but with client-side ranking (P2P mode in your link?).

OpojOJirYAlG · March 17, 2022, 2:01am

Probably not that much, a handful of TB is probably excessive (granted a TB of text is a lot of text), unless you’re trying to have a database of images for reverse image lookup etc.

The limiting factors I would be worried about are RAM and CPU for search performance. Retrieving all of that from disk for each search would be slow.

irvinewade · March 17, 2022, 3:08am

Yes, I think Google keeps the index in memory i.e. the TB are distributed across enough of their servers that the same number of TB of RAM is available. So (hypothetically) 4TB of disk storage would mean 64 servers with 64 GB of RAM each, or equivalent. (While 64 GB is fairly light on for an industrial strength server, it is more realistic for a bunch of randoms trying to create a distributed index.) All in rough terms of course.

That doesn’t take into account a gaggle of front-end servers for handling the presentation and client communication.