Researchers Find 'Anonymized' Data Is Even Less Anonymous Than We Thought

reC · February 13, 2020, 10:27pm

all this “assuring” the public with no way to prove it ? yeah right …

tracy · February 14, 2020, 12:16am

It would stink if it where “anosmiyzed” data, but you wouldn’t know.

kieran · February 14, 2020, 12:59am

Groan.

reC · February 14, 2020, 1:05am

what do you mean by that ? … late hour … must sleep zzzz

tracy · February 14, 2020, 1:38am

Anosmia, lack of smell.

tracy · February 14, 2020, 1:40am

Which also is a double-entendre to that other guy you’re replying to: Itsthesmell.

ruff · February 14, 2020, 7:42am

Pff, the article is rather FUDy, no details, pure emotions (we’re all gonna die!!111)
Anonymisation is one thing, leak is another, mapping is third. We can bring up old example of google passive de-anonymisation when they identify you as another person.
If anonymisation is done properly (eg tokenisation where the whole one-side-identity is hidden behind unique token for each unique session) then there’s scarcely a way you can map it anywhere. You can pretend, build trends, show on whiteboard how some points are approximating and collapsing - but that is not private data, just your speculation.
And if token db is leaked - it’s privacy breach.
And if anonymisation is not done properly - that’s fake anonymisation (eg use identity stable hash instead of random unique token).

Siddy2408 · February 14, 2020, 9:03am

I am so shocked!!! /sarcasm

lipu · February 14, 2020, 9:33am

You’re right, of course, maybe in the “you can’t reverse engineer the comments back into machine code for the same reasons you can’t reverse engineer a hamburger into a cow” sense (for those who remember MS vs. Stac).

It has been shown a number of times that a relatively small number of a person’s “anonymized” data points are enough to identify them with decent probability.

Why ‘Anonymous’ Data Sometimes Isn’t (Wired)

“Anonymized” data really isn’t—and here’s why not (Ars Technica)

‘Anonymised’ data can never be totally anonymous, says study (The Guardian)

So while not every person in an anonymized data set may be identified with 100% certainty, enough of them will be - and that’s an issue even if the db mapping opaque tokens to real identities is not leaked.

This is why I dislike Google’s presence on most every web site (analytics, or otherwise). Even if those sites only share “anonymous” data, the pattern from all sites combined will generally be enough to identify people without their consent.

ruff · February 14, 2020, 11:10am

When you know the person, and know person’s data is in a limited set of data sample - you can of course map the data to the person.
Eg. if you know people in your company which has 5 employees, you could probably with a high certainty map proxy logs to your employees without looking for identity details. However you will hardly be able to tell who accessed that specific malicious URL without identity data - as people normally do not visit malicious URLs intentionally.
But this is forensic, it has different goals and means than privacy/surveillance.