The Power of (Getting Sabotaged By) PureOS Defaults

Dlonk · December 10, 2024, 1:07pm

So I use Librem 5. On my Librem 5 the default browser is Gnome Web, so I use it sometimes. On Gnome Web the default search is DuckDuckGo. If I run a whois on the IP of DuckDuckGo, it’s Microsoft.

Today when I was in a hurry and had 10 minutes to catch a train, I opened the search and it defaulted to DuckDuckGo. I searched for the train page, but when I clicked on the link, it wouldn’t click. This is because DuckDuckGo hyperlinks are all links back to DuckDuckGo itself then to the destination, so that they can do creepy bean counting of URL clicks like Google. But unlike Google, their server has been down for several weeks for me, where these “redirect” links all don’t work. So even though the search function returns the results I want, they are inaccessible, like a lie from DuckDuckGo as a result of it being too complicated.

So then I have to remember to type a different search engine into the url bar and go search some other way. And ultimately I should change the default so that a scammy Microsoft based company like DuckDuckGo can’t do this to me, and use a search provider that doesn’t need to stalk the user so that it can’t fail in this way.

But if Purism used to always like to talk about, “the power of defaults,” then why is the default this non-working trash? It literally doesn’t work. Any search result only hyperlinks to a DuckDuckGo server failure, and there isn’t an option to copy the original destination URL - they literally only provide the DuckDuckGo tracking & redirect URL, which then goes to a DuckDuckGo server failure page. It’s been this way at least a month and I just hadn’t bothered to change my default search. But, the problem is a stupid problem to have.

jonathon.hall · December 10, 2024, 3:13pm

There is a documented purpose for this redirect. It is to suppress the default HTTP referer header, which tells the destination web site where you came from.

It is not for bean counting. According to their docs (above), they do this on newer browsers using referrer policy. However, Epiphany lacks support for that AFAIK, causing DDG to fall back to the redirect method (at least in Byzantium and Crimson, I have not checked latest upstream).

We’ve backported a fix for this to crimson as we are working to release crimson for the Librem 5, but unfortunately it just disables the redirect.

I don’t know why DDG has allowed the redirect method to break. Maybe they don’t realize there are browsers in use that use this path.

weirdnerd · December 10, 2024, 4:12pm

Shouldn’t this be the responsibility of the client? I don’t want DuckDuckGo to always be able to track which links I have clicked on. The problem with the referer header can be solved by the client simply not sending that header.

Wouldn’t the redirect solution be imperfect anyway? It would still allow sites to tell that I am coming from DuckDuckGo via the referer header. However, using the client browser to disable the referer header completely would not leak that information.

From the Wikipedia article that you linked, it sounds like browsers generally only send the domain name in the referer header anyway, (that is, not the search terms in query parameters that the DuckDuckGo documentation claims that they are protecting). So, if I understand correctly, in almost all cases, the DuckDuckGo redirect hides… absolutely nothing. It almost makes it look like there are… other motives for those redirects (e.g., tracking which links are clicked on).

@jonathan.hall, I appreciate your comment, as it inspired me to turn off network.http.sendRefererHeader on the about:config page in my browser. Thanks for the tip!

jonathon.hall · December 10, 2024, 6:30pm

Sure! Feel free to check out what the latest upstream Epiphany does and send improvements Of course we wish to do more on this front ourselves as well, but until we get Crimson released, we need to focus on getting Crimson into our customers’ hands for a more recent base.

Sure! But if you’re a service operator (DDG), and your goal is to improve privacy for your users, it is beneficial to do what you can for existing browsers while also seeking improvements in those browsers. There’s always a long tail of browsers in the field (or any software) that doesn’t have the latest improvements (as we’re seeing here with Epiphany).

Yeah. Mozilla makes it sound like it could basically be anything. I have not read through the RFCs and surveyed browser behavior, etc., but it seems plausible that browsers sort of did whatever they wanted and behavior has probably changed over time. This header is from the 1990s if not earlier.

So maybe there were still browsers sending the full query string at the time, or maybe DDG just wasn’t sure about all browsers out there.

I can’t say for sure what thoughts exist in the heads of DDG’s team, but I would think:

If this was true, they’d probably use it everywhere. I don’t think they do this on Firefox / Chrom* which support referrer policies. Collecting tracking data from the relatively small proportion of users out there using other browsers would be a strange choice.
If this was true, it probably would not be so completely broken and apparently forgotten.

I don’t know what DDG’s true motivations are, but I don’t think it makes sense to assume malevolence when evidence points elsewhere.

Glad to hear that, you’re welcome!

weirdnerd · December 10, 2024, 7:21pm

A well-written comment, as usual. Thanks!

The reason I assume malevolence is that I often see the tracking links when javascript is disabled, which is the only time that kind of tracking would be needed.

irvinewade · December 10, 2024, 9:57pm

This won’t work for sites that disallow deep-linking. So if there’s a reference from a URL on domain d to another URL on domain d, it is prudent to send the Referer header. You’re not really leaking something that the site didn’t already know anyway and if the site disallows deep-linking then you won’t have a problem.

Hopefully that config parameter that you changed encompasses that logic.

I can confirm that current Firefox (on Linux or iOS) sends the full query string in the Referer header. However that was me looking at logs for a local web server and I don’t know whether Firefox might alter its behaviour, for privacy reasons, depending on whether the server is local. (I think it does this even when the server is not local.) I also don’t know which of a thousand Firefox settings might influence the behaviour.

POST is preferred for queries anyway, particularly those containing sensitive information.

It is supposed to be the accurate and full URL of the referring resource (per the RFC). For the avoidance of doubt, the full URL does not include the fragment (#whatever).

The problem is that the RFC was written pre-Surveillance Capitalism.

weirdnerd · December 11, 2024, 2:45am

Sorry, this is lost on me. Would you mind explaining deep linking?

irvinewade · December 11, 2024, 3:01am

A prototypical example from the early days of the web when internet bandwidth was much more limited … let’s say that someone has an online photo library. The photos are expected to be referenced from web pages on that web site. Now let’s say that I want to use one of those images on my own web site. I can

a) access the online photo library web page, find the image, download it, upload it to my own web site - this may well be a copyright violation

OR

b) just reference the image on the online photo library web site from my own web site - this is deep linking - and may or may not be copyright violation. This of course puts the load on their web server every time someone visits my web site, hence it is a bit impolite.

Countless court cases resulted. As all covered by:
Deep linking - Wikipedia
and explained in more detail.

Those web sites that disallow deep linking will require that a reference to the ancillary content of an HTML page (typically scripts, stylesheets, images and other media - mostly media) be made only from an HTML page on the ‘same’ web site … which requires that the Referer header be sent.

FranklyFlawless · December 11, 2024, 4:19am

I usually know about the term as hotlinking instead.

irvinewade · December 11, 2024, 5:15am

Seems to be basically the same thing: Inline linking - Wikipedia

Either way, if you don’t send a Referer header, you run the risk of having your request dealt with unkindly by the web server if the web server is attempting to prevent deep/hot/inline linking.

Sharon · December 11, 2024, 10:15am

FPI. cPanel (on a LAMP) refers to it as “Hotlink Protection” but I prefer @irvinewade’s “deep-linking” protection.

I remember the time-frame of “deep linking” and with advent of ‘frames’ whole pages of one site linked the page into a frame on the borrowers page.

Not much has changed; except for being branded and treated like cattle.

Header wise - Why not an extension that provides a useless response? Similar to providing fake fingerprints?

~s

weirdnerd · December 11, 2024, 5:09pm

I like this. Always indicate wikipedia.org as the referrer?

I have never experienced this. You’re saying that if I really like a website and write down its URL, then leave my computer and turn it on a different time, that a website might block my access if I try to type the URL into the browser’s address bar? In that case, my browser would have nothing to put in the referer header, right?

irvinewade · December 11, 2024, 9:41pm

Yes, that too.

The thing that has changed is that the internet has been largely turned into a mechanism for Surveillance Capitalism.

You can’t provide a random or even an empty Referer header to a site that is demanding to see its own domain name in a URL in that header. However it may be OK always to provide a Referer header that is the home page (http(s)://domain/) of the resource that you are requesting.

Any such extension would need exhaustive testing with a range of web sites, those that care a little, those that care a lot and those that don’t care.

I think my main goal if implementing this for myself, as a compromise between reliability and privacy, would be

when navigating within a domain (or indeed within a set of domains under the same administrative control) send the legitimate Referer header
when navigating from one domain to another do not send a Referer header (or if the destination domain demands a Referer header then send its home page)

irvinewade · December 11, 2024, 10:03pm

Yes, exactly.

The RFC requires that if you type a URL into the address bar (or if you use a bookmark), the browser not send a Referer header. But a web server is entitled to reject such a request.

In other words: ‘bookmarks’ won’t work with such a web site (other than to bookmark the home page, presumably).

Note however that this behaviour is more common with ancillary resources of a web page (e.g. media on the web page such as images or videos) rather than the web page itself.

However a web site that wants to force you always to visit the home page and then conduct a search for the actual page that you want to view might apply the same behaviour to the web pages of that web site.

Because of the way search engines work, and in particular how they present their search results, any such web site’s pages cannot successfully be reached by customers who use a search engine. So a web site that literally forced you to visit the home page and search using the home page makes a big trade-off.

As the two Wikipedia articles explain, one of the motivations in avoiding deep linking is to force you to

a) spend more time on the web site, and
b) see more ads on the web site

i.e. share of eyeballs.

(If they allow deep linking then they get the burden of serving out the content but you get to see the ads on someone else’s web site, who has deep linked to their content, and hence they miss out on the ad serving / clicking revenue.)

A more ‘legitimate’ reason could be that, on a given web site, URLs resulting from searches are completely dynamic - so there is no point in bookmarking the URL (and that is also discussed in Wikipedia).

Dlonk · December 13, 2024, 12:03am

Hmm. So, when I end up making my own browser and realize I shouldn’t trust what is out there, my browser should always tell every page that the referrer to that page is itself, right?

Then, if someone makes a site which does a bunch of “deep linking” to other sites in a way that is illegal, and only works on my browser, if I get in trouble I will say, “Do not visit an illegal site,” to the user.

Would anything go wrong in that case?

Then I could go back to blaming DuckDuckGo for being nonfunctional abandonware if all their search results redirect to a server error or whatever.

irvinewade · December 13, 2024, 12:24am

No one strategy will work for all sites. So if you were really making your own browser, you would allow configuration of what strategy to apply to what site, obviously with a default strategy for any site for which a strategy has not been specified.

That would be interesting. That is actually different from anything I suggested above.

See post 13 above for what I said I would (hypothetically) implement if doing it for myself.

Any strategy will typically involve trade-offs. For example, if you decide that privacy is paramount, you might just suppress the Referer header all the time. Some web sites won’t work, and you decide that the desirability of those web sites is not sufficient to compromise and refine your strategy.

Note also that the Wikipedia page suggests that for some web sites that refuse hotlinking, you will not get an error. Instead, you will just get a ‘rude’ substituted image or other media. That would make it more difficult to automate handling.

irvinewade · December 13, 2024, 12:36am

PS One situation in which you might shoot yourself in the foot is a web feedback mechanism. The intent of the mechanism is to give feedback that something is wrong on the page. The feedback mechanism could use the Referer to tell the web site which page you are giving feedback on. If your browser mangles the Referer header then your feedback won’t apply to the correct web page. So at a minimum, the user would need to include the URL on which feedback is being given within the text of the feedback (where that is an option).

Of course, a web site is entitled to implement the feedback mechanism differently. There are many other ways in which it could be implemented that don’t rely on the Referer header.