The Anonymisation Problem - Computerphile

Поділитися
Вставка
  • Опубліковано 4 лют 2025

КОМЕНТАРІ • 146

  • @trejkaz
    @trejkaz 7 років тому +13

    They tried to do an "anonymous" survey at work, but:
    1. They had a compulsory question where they asked for your team and manager.
    2. The email sent around had a unique ID on the link to the survey.

  • @PavelJanata
    @PavelJanata 7 років тому +170

    I remember hearing about a anonymous form in some computer science class.
    First question: Are you a male or female?
    There was just one female in that class

    • @anonymouse7074
      @anonymouse7074 7 років тому +4

      Lmao

    • @tacticallala8788
      @tacticallala8788 7 років тому +8

      WRONG, it was a XIR !!

    • @Lorkin32
      @Lorkin32 7 років тому +10

      this is equally sad and informing! Impressive!

    • @KuraIthys
      @KuraIthys 7 років тому +1

      Ouch. XD - I know that feeling... >__<
      But yeah... 'Anonymous' Sure. XD

    • @richardtickler8555
      @richardtickler8555 7 років тому +2

      We had to fill out an anonymous survey at our first day at uni. There were about 100 ppl in my course and they asked for birthday and home town

  • @fetchstixRHD
    @fetchstixRHD 7 років тому

    It’s beautiful watching these when they are topics that appear in my lectures(!)

  • @rentzepopoulos
    @rentzepopoulos 7 років тому +3

    For a few videos I feel that being allowed to like just once is not enough; this was one of them.

  • @puellanivis
    @puellanivis 7 років тому +9

    Even with “perfectly” anonymized aggregated data, later releases of that same data can cause individuals to be identifiable from the data if the populations have only slightly altered. Basically, it’s the same kind of math used in interferometers.
    So, let’s say you have “perfectly” anonymized aggregated data about the viewing habits from your whole street over a period of time. Knowing when people leave and enter the group, you can use interference patterns to extrapolate out individuals.
    But here’s one of the cool things that some people have figured out: you can add statistical noise to the data. If you have each individual data point randomly be a lie, at a certain level of aggregation, the statistical noise can be filtered out of the aggregate data, but the more and more narrow you attempt to get, the more this statistical noise begins making the data untrustable.
    So, let’s say that we have the video rental/watches, but each person has a random selection of ~25% of their watches replaced with a completely random selection from the available data. Even if you can deanonymize an individual into a single row, the data you have on their watching habits is fundamentally untrustworthy, because 25% of them are false. You can’t point to a movie and say, “look, they watched this movie!” Because that could have been a lie. But, aggregate the data together, to say 10,000 people, and now the data on most-watched movies is still clear, because the statistical noise can be filtered out.

  • @igniculus_
    @igniculus_ 7 років тому

    I never miss a computerphile video

  • @Slarti
    @Slarti 7 років тому +1

    When I worked as a data analyst with medical data there weer certain rules including you could not anonymise any data for a patient who had a condition that was below a certain prevalence within their general geographical area - this was so that it was not possible to trace individuals through their conditions.

  • @scottbeard9603
    @scottbeard9603 7 років тому +2

    A video on the EU’s new General Data Protection Act would be incredibly interesting/useful!

  • @willemkossen
    @willemkossen 7 років тому +1

    Very good video. I'ld love to chat a bit more about these topics with this man.

  • @sinefield5425
    @sinefield5425 7 років тому +34

    Yeah I see the problem with anonymisation. There's no use pixelating the guy's face if it's just going to show it right next to it in the thumbnail

  • @2Cerealbox
    @2Cerealbox 7 років тому +4

    I've been railing against this for years! Thank god someone more visible than me has something to say about it.

  • @TorgieMadison
    @TorgieMadison 7 років тому

    You should do a video on the courier / generation / uses of OTPs. There's a whole world of intrigue in how OTPs are both so simple, but so impenetrable to hacking.

  • @AnimeReference
    @AnimeReference 7 років тому +2

    How many post codes do you have? In Australia it is roughly one per suburb (sometimes one per two suburbs) so most of the students have the same post code as the school, otherwise will be from surrounding suburbs at most two away (with a decent population in each unless the school is tiny).

  • @asireprimad
    @asireprimad 9 місяців тому

    How about a follow up video on differential privacy and statistical disclosure control?

  • @rikwisselink-bijker
    @rikwisselink-bijker 7 років тому +4

    He must have been so proud of his son :)

  • @fkhg1
    @fkhg1 7 років тому

    anyone knows if the person in background at 9:45 got his bus or did the person missd it?

  • @ButzPunk
    @ButzPunk 7 років тому +6

    Never realised that UK postcodes say the street number. In Australia, the postcode just tells you the (group of) suburb(s) (or larger region, if you're in the country) that you live in. I can see why UK postcodes are so long now.

    • @BritishBeachcomber
      @BritishBeachcomber 7 років тому +1

      Bluelightzero or just one in my case. I have my own personal postcode!

    • @iAmTheSquidThing
      @iAmTheSquidThing 7 років тому

      I believe the first three characters are the region, and the last three characters are the street.

    • @BritishBeachcomber
      @BritishBeachcomber 7 років тому

      Each postcode consists of between two and four characters, followed by a space, followed by another three characters.
      The first set of characters are the outcode(sometimes known as the outward code) whilst the second set are the incode(sometimes known as the inward code).
      These are used to direct the mail first to a regional sorting office, then to the local destination.

    • @syphon47
      @syphon47 7 років тому +2

      The fist portion which is a letter (or 2) is called the Post Area (B = Birmingham, LE = Leicester). Including the numbers before the space is the Post District, which is more granular. If you then include the first number after the space you have the Post Sector which is a small region of a few hundred streets (Post sectors vary in size)
      Oh and postcodes also have an extra 2 characters at the end officially, called the delivery point suffix DPS which is basically identifying the letter box. Used for multiple residences within one house number I think
      It's all very fascinating... :-|

  • @BigDBrian
    @BigDBrian 7 років тому +5

    If I may humbly suggest you alter the title so it doesn't appear to suggest that anonymity is the problem, that would be great.
    After all, the video is about the opposite. Suggestions: The (re)identification problem; The deanonymisation problem.
    Just to be clear - it's a suggestion and not a demand.

    • @oktw6969
      @oktw6969 7 років тому

      So you suggest changing a title based purely on the form of political correctness? It is called that way because retaining anonymity on complex data structures becomes a problem.

    • @leftaroundabout
      @leftaroundabout 7 років тому +2

      In CS, the word “problem” does not have any negative connotation. E.g. the Travelling Salesman problem doesn't discuss how to get rid of the salesman, it discusses a goal the salesman is pursuing and _the problem she's experiencing_ in trying to get there.
      Likewise, the anonymisation problem is the research subject where we try to achieve anonymity. “The re-identification problem”, conversely, would be an internal video an intelligence agency might produce while trying to break that anonymity...

  • @ivarwind
    @ivarwind 7 років тому +1

    The problem with a bogus post code of course, is that given all the students fill out the form, the one with the bogus post code comes from the student whose post code is missing in the data.

  • @Super_Cool_Guy
    @Super_Cool_Guy 7 років тому +1

    That make great sense!

  • @victornpb
    @victornpb 7 років тому +6

    If he's the only one that filled a bogus zip code, u can still identify him...

    • @grn1
      @grn1 3 роки тому

      Not if he chose one of the codes with a lot of people in them.

  • @KaktitsMartins
    @KaktitsMartins 7 років тому +3

    "people tend to live somewhere"

  • @robertdanielpickard
    @robertdanielpickard 7 років тому

    Great topic!

  • @Flankymanga
    @Flankymanga 7 років тому

    This is exactly why i was thinking not twice but quadruple time what to fill on form when there was a national citizen recount in my country that was reported to be anonymous....

  • @linawhatevs8389
    @linawhatevs8389 7 років тому

    There IS completely bulletproof cryptography: the One Time Pad.
    Something as simple as limiting the output to something like 128 bits should be enough to remove any hope of deanonymizing a gigabyte-sized database.

  • @cpt_nordbart
    @cpt_nordbart 7 років тому

    What about decensoring. I've heard about cases where blacked out names on some documents where reconstructable.

    • @SuviTuuliAllan
      @SuviTuuliAllan 7 років тому

      Start using white ink. Problem solved!

  • @SuviTuuliAllan
    @SuviTuuliAllan 7 років тому +1

    How about a video on CJDNS, Hyperboria, and all that other mesh nonsense? Or did you make a video like that already? Well, in any case, get to the details then.

  • @MrSonny6155
    @MrSonny6155 7 років тому

    When you realise that you can now watch computer nerds on Computerphile in 4K. Too bad my internet speed is a meme.

  • @elliot9507
    @elliot9507 7 років тому

    S'il vous plait, activez la traduction la vidéo à l'aire très intéressante mais malheureusement j'arrive à comprendre qu' 1/3 de ce qu'il dit

  • @BlenderDumbass
    @BlenderDumbass 7 років тому +1

    The point is you have to remove everything unicly indentifying and use a lot of false data to confuse any algorythm

  • @KipIngram
    @KipIngram 10 місяців тому

    De-anonymizing someone in a situation where anonymity was clearly promised (like in the speaker's son's post code situation) should be a criminal offense with substantial jail time associated with it.

  • @DerkvanL
    @DerkvanL 7 років тому +1

    Your extra bits link is not available.

    • @Computerphile
      @Computerphile  7 років тому +1

      +DerkvanL thanks for the spot, should be there now >Sean

    • @DerkvanL
      @DerkvanL 7 років тому

      Computerphile thx, a very interesting topic! Watched it ;)

  • @jwenting
    @jwenting 7 років тому

    I've had more than a few "anonymous surveys" that were sent using personalised links... I tend to not answer such surveys, they're clearly not anonymous.

  • @tedchirvasiu
    @tedchirvasiu 7 років тому +8

    staticksticks

  • @justin_5631
    @justin_5631 7 років тому +2

    Just noticed this guy works outside a giant lego pyramid.

    • @oclipa
      @oclipa 7 років тому +2

      Justin _ Actually, all Computerphile videos are created in minecraft, but it is not usually this obvious.

    • @justin_5631
      @justin_5631 7 років тому +1

      I could correlate this anonymous video with the number of giant minecraft pyramids in the world to discover where the videos are being made.

  • @SuviTuuliAllan
    @SuviTuuliAllan 7 років тому

    So what was his son's name and shoe size?

  • @pnedkov
    @pnedkov 7 років тому +1

    If his son is the only person fillied a bogus post code he is busted. They can rule out the people who filled their actual post code and can be identified. And let's not forget his father is a professor in that field. How do you protect yourself against that?

    • @oktw6969
      @oktw6969 7 років тому

      By not having your state intelligence agency ran purely through negative selection.

  • @Ludvigvanamadeus
    @Ludvigvanamadeus 7 років тому

    while It is true that any cypher can be broken given enough time, at a certain level It is not 'a-supercomputer-would-need-afew-years-level' difficult, It becomes 'the-sun-will-burn-out-even-if-you-had-a-planet-sized-quantum-computer-level' difficult

  • @sciverzero8197
    @sciverzero8197 7 років тому

    I really wish google would let me have unlinked 'slightly less nonymous' accounts for things ... >.>

  • @froozynoobfan
    @froozynoobfan 7 років тому

    What if you scramble the column indexes of each column randomly and ofc minimize/remove any sensitive personaldata with cryptografie (strong enough key)

    • @iAmTheSquidThing
      @iAmTheSquidThing 7 років тому

      Then the data wouldn't be much use, because you wouldn't be able to find correlations between two different variables.

  • @PsychoticusRex
    @PsychoticusRex 7 років тому

    3 Cheers for someone up-talking OAS! XD

  • @IdgaradLyracant
    @IdgaradLyracant 7 років тому +6

    I did this stuff for nearly a decade, we called it behavioral heuristics in identifying people on the Internet. For example with VPNs, they are pointless, we want behaviors, not IP addresses. Staying anonymous on the Internet is nigh impossible now. Tor and VPNs aren't going to help at this point.

    • @tacticallala8788
      @tacticallala8788 7 років тому +4

      Are you saying you get the careful people too? The ones who as an example wouldn't add all the same UA-cam channels.

  • @Interpause
    @Interpause 7 років тому

    YET I NEED ALL THE DATA I CAN GET

  • @Cambesa
    @Cambesa 7 років тому

    Would using AES-256-CBC and double encryption help anonymizing users? I'm thinking of ways to anonymise users in a database

    • @tacticallala8788
      @tacticallala8788 7 років тому +1

      I learned to salt and encrypt at least 1000 times for secret data like passwords but perhaps it should be done for everything, except searching the db would be a pain.

    • @johnfrancisdoe1563
      @johnfrancisdoe1563 7 років тому +2

      Cambesa You're not getting the point. This is about getting rid of the personal identity *permanently*. As in deleting it or not getting it in the first place. It's not about protecting the data you do keep.

  • @Baigle1
    @Baigle1 7 років тому +20

    and Microsoft says that all their keylogging is anonymized lol

    • @tacticallala8788
      @tacticallala8788 7 років тому +4

      Windows 10 is NS/\ sbywear.

    • @Baigle1
      @Baigle1 7 років тому +3

      More like Redmond spyware. All that telemetry crap is now hidden away in the kernel. No escaping it, just use a different OS.

    • @Hudgi34
      @Hudgi34 7 років тому

      yeah just make your own OS

    • @tacticallala8788
      @tacticallala8788 7 років тому +1

      667Atlas Everything is logged, not only your keystrokes. When one day the NS/\ want to see if you're a danger they'll want to see all your traffic and all your diik pics.

    • @Baigle1
      @Baigle1 7 років тому +1

      Yes, keylogging. Otherwise "typing and handwriting data" by their terms. By default it is on, and it is one of nearly a hundred or more sources of telemetry data from Windows 10 machines.
      There are court cases going on that considers all data acquired by 3rd parties (Historical Phone Location Records in that case) to be witnesses to a crime, but the impact of your lack of privacy doesn't stop in criminal cases. Its very profitable to know as much about you as possible, and no database software is invulnerable.

  • @exponentmantissa5598
    @exponentmantissa5598 7 років тому

    Run TAILS all the time and use pseudonyms and aliases.

  • @judgesmicheal2096
    @judgesmicheal2096 7 років тому

    "Anonymization" is spelled with a "z" not a "s".

    • @Computerphile
      @Computerphile  7 років тому

      Depending where you come from.... >Sean

  • @motyakskellington7723
    @motyakskellington7723 7 років тому

    Post-quantum cryptography

  • @hattrickster33
    @hattrickster33 7 років тому

    Travel back in time and tell Turing you can crack Enigma in seconds =p

    • @voidvector
      @voidvector 7 років тому

      Just bring back a bag full of laptop w/power adapters. Given the amount of basic spreadsheet calculations you can do on it (e.g. ballistics, crypto, linear/non-linear optimization, monte carlo sims), it would probably straight up win the war for whichever side that gets it.

  • @KipIngram
    @KipIngram 10 місяців тому

    Actually Enigma isn't THAT easy to break. Not "your average notebook could do it in seconds" easy. It's certainly doable with modern tech, but not really tech that every Joe on the street has under his arm.

  • @grrr1351
    @grrr1351 7 років тому

    This is how FBI tracks people using bitcoin.

  • @marcgrec7814
    @marcgrec7814 7 років тому

    XD

  • @redhat7025
    @redhat7025 7 років тому +1

    NO software, IS UNBREAKABLE
    prison,
    government,
    or human

  • @barefeg
    @barefeg 7 років тому +19

    Meanwhile millennials throw their name, pictures, videos, locations, preferences, friend networks etc on the internet! Lol

    • @SuviTuuliAllan
      @SuviTuuliAllan 7 років тому

      My location is the one that aliens are trying to avoid. My preferences include mustard and flavoured ice cream. Would you like to download my DNA as well? It's available on opensnp.org. No, really. See if I have the gene to care. (obviously I do, otherwise I wouldn't be wearing this silly hat right now! my neighbours seem to like it tho since they keep staring at it when I come out of the sauna...)

  • @ckay11002
    @ckay11002 7 років тому +7

    Do androids dream of electric sheep?

  • @geoffhalsey2184
    @geoffhalsey2184 7 років тому

    Doesn't a VPN help?

  • @hihtitmamnan
    @hihtitmamnan 7 років тому

    this guy talks SO LOUD and then quite and then LOUD again... it's so annoying!

  • @ObsaSiyo
    @ObsaSiyo 7 років тому

    Do you guys think the goverement will ever regulate processing power of computers? like guns?

    • @tacticallala8788
      @tacticallala8788 7 років тому

      As long as they can still take your money for it they'll have lots of excuses ready and the C|/\ will be ready to anonymously mock you for disagrreeing, you fukken tinphoil hat wearing rossian psicho trying to krrack your way into govemment computers.

    • @ObsaSiyo
      @ObsaSiyo 7 років тому

      Makes sense. so you are saying that as long as the government as access to everyone's computer they will not need to regulate them. However, what about AI in the future? if it lowers the learning curve for doing damage to the government will they regulate what is on the computer instead of what the computer can do..

    • @iAmTheSquidThing
      @iAmTheSquidThing 7 років тому

      I'm sure they'll try. Every organisation always pushes for more power.

    • @tacticallala8788
      @tacticallala8788 7 років тому

      They have already gone too far and there will always be someone crazy or greedy enough to take it to the next level.

  • @shubhamshinde3593
    @shubhamshinde3593 7 років тому +21

    he looks like bert from the big bang theory

    • @maxf130
      @maxf130 7 років тому +1

      Rock Show

  • @aveaoz
    @aveaoz 7 років тому +2

    FIRST xdddddddddddd

    • @mdkmen
      @mdkmen 7 років тому +6

      please stop

    • @fetchstixRHD
      @fetchstixRHD 7 років тому

      I think I’m starting to warm to “First” comments now 😂

    • @namewarvergeben
      @namewarvergeben 7 років тому

      If people "warm up to it", maybe that'll finally make it stop.

  • @Ludvigvanamadeus
    @Ludvigvanamadeus 7 років тому +8

    while It is true that any cypher can be broken given enough time, at a certain level It is not 'a-supercomputer-would-need-afew-years-level' difficult, It becomes 'the-sun-will-burn-out-even-if-you-had-a-planet-sized-quantum-computer-level' difficult

    • @lordcirth
      @lordcirth 7 років тому +5

      But only if you don't count side-channel attacks. That's how crypto really gets broken.

    • @RnBandCrunk
      @RnBandCrunk 7 років тому +1

      Igor Bednarski a planet sized quantum computer could easily solve all the cryptography known now in milliseconds

    • @masansr
      @masansr 7 років тому +2

      I could generate a million character key in a moment, a supercomputer would need (possible characters)^10^6 actions to crack that. Let's say I only use English lowercase letters (although there is no reason to limit yourself like that). That's 26^1000000 actions. Or roughly 2.23x10^1414973. It's estimated that there are 10^50 atoms on Earth. Let's say every atom could perform a calculation every 5x10^-44 seconds (Planck time). Earth would be a computer with frequency of 2x10^94 calculations per second. That's roughly 10^1414879 seconds to crack the code, which is 10^1414779 times longer than the heat death of the Universe.
      Of course, you could get lucky and solve it in, let's say, first 0,05% of guesses, but it would still be long past heat death.
      (Every calculation done by Wolfram Alpha)

    • @lordcirth
      @lordcirth 7 років тому +7

      masansr Yup. And the NSA would just exploit a bug in your browser, root your machine, and steal the key.

    • @masansr
      @masansr 7 років тому +1

      Well, if you had Windows 10, they could just ask Microsoft for access to the computer, no need to publicise another bug. That's the problem with such keys - since there is no algorithm, you have to have a copy of it somewhere. But they cannot be cracked.

  • @Ludvigvanamadeus
    @Ludvigvanamadeus 7 років тому +2

    while It is true that any cypher can be broken given enough time, at a certain level It is not 'a-supercomputer-would-need-afew-years-level' difficult, It becomes 'the-sun-will-burn-out-even-if-you-had-a-planet-sized-quantum-computer-level' difficult

    • @Yotanido
      @Yotanido 7 років тому

      Any cipher can be broken in an instant. You just need to guess right on your first guess.
      Insanely unlikely, yes, but still possible. There will never be an unbreakable cipher.
      The most secure cipher we have right now is the one time pad. The key length is equal to the message length, so there's no point in guessing the key - you can just guess the message. It's the best we'll ever have from a strictly information theory standpoint.
      Perhaps quantum cryptography will save us, but I don't know the first thing about it.