Defending against Traffic Correlation?

weirdnerd · February 26, 2021, 6:36pm

I take privacy and anonymity seriously, although I do sometimes sacrifice them for convenience.

One great option for high quality anonymity is using Tor and disabling Javascript completely. This should defeat many attacks on anonymity, but I still haven’t found a good defense against one such attack: traffic correlation.

At the very least, if my traffic travels via the same internet service provider (ISP) both before and after leaving the Tor network, then the ISP could potentially correlate my traffic on both sides, thus matching my home IP address with the sites I visit and de-anonymizing me.

Does anyone know of good tools or practices for defending against traffic correlation attacks?

Kyle_Rankin · February 26, 2021, 6:47pm

The three-node-minimum Tor routing onion is designed specifically to prevent those kinds of correlation attacks (and attacks where a particular Tor node is compromised). Your ISP only has visibility into the first Tor node you connect to, so it could not correlate that traffic with traffic exiting that Tor node nor any others.

Even if an ISP (or govt) did have the ability to monitor the traffic leaving every Tor exit node on earth, it would still not necessarily be able to correlate outbound traffic on an exit node with your incoming traffic, unless you were the only person in the world using Tor. That’s not just because of the volume of traffic through Tor globally, or the three-node onion approach itself, but also because to my understanding Tor batches traffic between nodes specifically to make that kind of traffic analysis correlation difficult, even though it might add some latency.

The middle Tor node (let’s call it Node B) serves the purpose of anonymizing any attempt to correlate traffic entering the Tor network through Node A, and leaving via Node C in case either are compromised. Your ISP may know that you are using Tor, and are connecting to Node A, but it has no way of correlating your traffic to Node A, with traffic leaving Node C, because it can’t know that traffic between Node A and B is associated with you specifically (Node A is communicating with Nodes D, E and F at the same time), much less could it know that traffic between Node B and C is associated with you (because Node B is also communicating with D, E, F and other nodes on the network).

weirdnerd · February 26, 2021, 8:30pm

I agree that Tor most likely provides adequate protection against traffic correlation, but still, I’m curious about solutions for those of us lacking in such healthy control over our paranoia (or, jokes aside, for people working against oppressive governments or in other situations requiring extreme care about anonymity).

After a bit more reading, I see that Tor does offer some minimal protection against traffic correlation, described here:

[tor-dev] Proposal: Padding for netflow record resolution reduction

However, I would not consider this adequate protection against de-anonymization in today’s world, where de-anonymization is becoming increasingly valuable, both for governments and for the profit of companies that sell people’s data. The document at the link above clarifies that this feature isn’t meant to protect against attacks by the ISP:

This defense does not assume fully adversarial behavior on the part of the upstream network administrator, as that administrator typically has no specific interest in trying to deanonymize Tor . . .

I think that as people’s data becomes more valuable, there is more and more reason to question this optimism.

Tor describes in their FAQ why they cannot use padding to provide comprehensive protection against traffic correlation:

Even if you could send full end-to-end padding between all users and all destinations all the time, you’re still vulnerable to active attacks that block the padding for a short time at one end and look for patterns later in the path.

Tor Project: FAQ

The distinction between these active attacks and passive attacks is described on the Tor Wikipedia page:

There are two methods of traffic-analysis attack, passive and active. In the passive traffic-analysis method, the attacker extracts features from the traffic of a specific flow on one side of the network and looks for those features on the other side of the network. In the active traffic-analysis method, the attacker alters the timings of the packets of a flow according to a specific pattern and looks for that pattern on the other side of the network; therefore, the attacker can link the flows in one side to the other side of the network and break the anonymity of it.

Tor (network) - Wikipedia

I think protecting against active attacks would not be trivial at all, but such attacks becoming widespread does not seem so far-fetched to me. Companies already do all sorts of creepy things to defeat anonymity, including: all kinds of fingerprinting in the browser, completely blocking access from Tor, and intentionally breaking functionality or access when any sort of “adblock” is detected. I do not feel that a traffic correlation de-anonymization deal between ISPs and governments or data companies is so far-fetched.

One potential solution I am imagining is for a VPN to batch all traffic to and fro, between the VPN client and the VPN server, but I have never seen such a service advertised. It would certainly decimate performance, but for situations requiring extreme care for anonymity, this might be an acceptable trade-off?

fsflover · February 26, 2021, 9:05pm

Consider using Invisible Internet Project, which has stronger defense against traffic correlation than Tor.

prolog · February 26, 2021, 9:48pm

A couple of years ago there on the chaos communication Congress there was a talk held by Mr. Applebaum. I think the title was "state of the onion. Its probably on media.ccc.de .

In that talk or during Q&A IIRC he said traffic correlation could work in principle but their monitoring of the Tor network did not show any indication that traffic correlation was going on. Better you watch it yourself. I don’t want to cite him wrong.

I don’t know if situation has changed since then.

reC · March 1, 2021, 1:12am

instead of people ASSUMING that the ISP-admin(s) is/are conspiring against us we should ASSUME that the equipment THEY use themselves HAS been unwittingly (in N$A’s Clapper’s words) compromised at the lower levels (how many run libre-kernels do you know ? are there ANY assurances that this is the case ?)

last but not least … how many ISP admins do you know ? are there a majority that know or care about the GNU / FSF ? have they read dr. R. M. Stallman’s books ? what do they think ? is it even relevant ?

Dwaff · March 1, 2021, 3:46pm

Why not both?

prolog · March 3, 2021, 4:08pm

NSA and GHCQ seem to capture almost all internet traffic. That might be enough to do passive correlation, isn’t it? That doesn’t mean that ISPs don’t play a role in this game.

reC · March 3, 2021, 8:05pm

they capture AND store EVERYTHING they can yes, but only TARGETS are subject to scrutiny and further analysis (with or without a warrant) but the main concern is that the system and data CAN be abused at some point or by different people/organizations at a latter date prior to the collection of the data …

prolog · March 3, 2021, 8:21pm

They decide who is a target by there own metrics. IIRC they give you negative points for something like being registered at Linux Journal forum.

reC · March 3, 2021, 8:27pm

i agree that being on a ‘list’ is not something that anyone should want, but there is a ‘spectrum’ of lists (or at least we hope so …)

that article is from 2014. we need more up to date articles written AFTER some of the post Snowden revelations ‘ammendments’ …

kieran · March 4, 2021, 3:46am

Oops. I’m in trouble then.

It would be funny except it’s not.