Programmer here: Linus mentioned that the data goes through a CRYPTOGRAPHIC hash. While they certainly can be used to verify file integrity, more commonly used for such purpose are NON-cryptographic hash algorithms since they are generally faster. While the cryptographic ones are generally reserved for well, cryptography. (Difference: Protection against active malicious manipulation vs just plain transmission damage) What makes a hash function cryptographic(or not)? Basically it's how hard it is for a bad actor to crack (or produce a collision). MD5 is once considered a cryptographic hash and are popular among website as a mean as storing password, but has since been deprecated and regarded as a non-cryptographic hash just for file integrity verification after people start finding fast ways to crack MD5 hashes
@@I3erow It depends on how secure something should be. I keep using md5 for securing image downloads. It is fast and simple, and no one is going to waste time cracking an md5 hash to get access (remove watermark) to a $1 picture, it will cost more to do so than just buying the picture itself.
Actually Linus, as a CS Major it’s actually a miracle that information gets from point A to point B, fucking magic man, that’s why those low level devs always have a long beard, they’re magicians.
Forget Point A to Point B--the potential that a stray cosmic ray strikes your RAM and flips a bit at just the wrong time is why I'm super psyched that DDR5 has ECC built in.
@@GSBarlev And God said, let this bit be switched: and the bit was switched. And God saw the bit, and it was fucking magic to them; and God divined low level devs into existence
Fun fact, every credit/debit card has a checksum built into the number so computers can quickly determine if users accidentally typoed their number in wrong when paying for things online. Many other numbers which humans are expected to enter manually usually are designed with these sorts of checks in mind like insurance numbers, IMEI numbers and even those lil survey codes you find on receipts
The keyword being "many". Many also _don't_ have them. In fact, some of the survey-codes on receipts are practically human-readable and you can edit them as you please to fill out multiple surveys. (Not that there's any point; all you're doing is wasting your time giving them market-research for free. I'm not convinced they ever give ANYONE the cash prizes they claim to. 😒)
the difference is the "checksum" on credit card numbers is quite primitive, google Luhn's Algorithm, it's just the numbers added, multiplied and moduloed, even still, it's good enough to catch common mistakes like wrong and swapped digits
Checksums are redundant pieces of information added to data that allow the receiver to verify if the data was received correctly. A simple checksum is to use only the first seven of the 8 bits in a byte for data. The eighth bit is a sum of the first 7 bits (modulo 2) that acts as a check for the first 7 -therein the name 'checksum'. In a nutshell :)
But wait, there's more! They make 16 bit checksums too! Utilized in SNES - JRR Tolkien's Lord of the Rings Edit: I should know, I mostly decoded them. Punch in "3P5" multiple times in a row, like 8 times in a row, tell me that don't unlock all the characters...
Actually 🤓, the simplest (and fastest) checksum for binary computers is `xorsum`. It's "infinitely"- parallelizable, but doesn't have "avalanche-effect"
7-zip can generate the checksums of files too ! It's the CRC option in the 7-zip menu in the explorer. It avoids downloading something else to check it or typing a command.
Data transmitted with UDP does not get resent. The UDP protocol is used for things like video streaming where an occasional dropped packet won't be missed.
There is certainly no standard mechanism whereby UDP always gets resent if a packet is corrupted or lost. However, software that uses UDP might be able to tell when packets need to be resent and then do so. Consider, for example, QUIC. QUIC uses UDP, and packets that are lost most definitely do get resent. In fact, consider simple DNS using UDP. Most DNS clients will resend the outgoing packet if no response is received, it might only be resent once before giving up, or get resent but to a secondary DNS server instead, but it also might get resent to the same DNS server if a second one isn't specified.
Brian Gregory is correct. It's not the case that using UDP means there are no validation checks. It's just not included in the UDP protocol, and is instead left to the application layer to handle as appropriate to the situation. Almost every application that uses UDP does in fact do some sort of data validation. For example, Wireguard uses UDP. Traffic over Wireguard is encrypted and needs to be 100% accurate.
UDP also doesn't strictly require checksumming. With hardware generally always able to do it (see TOE, the TCP offload Engine protocol for more), it's almost always there, but it's not on *every* packet as Linus claimed, likely because the tangent would just eat up way more time than it's worth.
@@MrBleach163 Yes. For something like a Skype call it depends, but you'll probably be lucky if there isn't some kind of visible glitch, but the point is, you don't want to wait for retransmission and have the time delay keep increasing to the point where you're waiting ages for the person you're calling to respond to what you say.
I've heard that many of the free cloud storage services use checksums to save space. They run the hash and compare to what is already on their servers. If two matching files are uploaded, only one copy gets stored. It does not matter if the files were uploaded to separate accounts, only one copy is actually stored.
Pretty sure it happens with Plex / Jellyfin metadata fetchers as well, which is why occasionally you'll get results that aren't just a little bit off, but, like, wildly off.
This is called deduplication, and is a staple feature in large storage systems. However they should not stop at running a simple hash to decide if the files are the same. At next step they check the file length and then if it still matches they check the actual binary data. This has to be done as the checksum of two files can be the same, even with different content and even file size. Simple checksums like CRC32 is very easy to manipulate. A anime fan sub group used to make sure their releases all had a CRC32 checksum that showed the episode number. So episode 1 had the checksum 01010101, episode 2 got hashed as 02020202 and so on. This is (marginally) harder with MD5 and a lot harder with SHA256 or better. But even without malicious intent there are only so many hash values possible in say 256 bits that eventually two files will have the same hash. This means that a hash value can't guarantee that the file is what you think it is. It can only guarantee that the file hash is the same. So use it to check for transmission errors, and file integrity. Not as prof of content not being manipulated by a third party.
@@lPlanetarizado "hash collision" and can be very tricky. Id probably do it on multiple levels, data chunk wise, file wise and check metadata (file size, dates, entropy)
I mean, even CRC32 is highly susceptible to collisions (files that are different from each other, but having the same hash value), and SHA-1 had that issue take hold around 2013. Most entities have moved on to hash algorithms like SHA-256 up to SHA-2048, depending on the importance of the data and urgency vs compute cost per file.
if you have 7-zip installed (which you really should if you don't, it's amazing), it actually adds all sorts of checksum-generation options to the right-click menu in windows, its really handy
And you're not limited on using 7-zip on just ZIP/RAR/other compressed archive format files to pull their checksums... you can use that built-in functionality for pretty much any file.
@@Dinkleberg96 no it won't. By the time a quantum computer powerful enough to work on decrypting real internet packages exists all important things will be using quantum secure algorithms. It'll be Y2K all over again.
@@matthewparker9276 Not necessarily. Important long-term data which was encrypted with "good at the time" cryptographic ciphers are being saved for future quantum computers to decrypt. Even though we can't break it now, saving RSA-4096 encrypted "Who_Shot_JFK.docx" and "Herbs_and_Spices_v11.KFC" for computers 30 years from now could cause real national security issues.
@@robspiess i cant wait for "Herbs_and_Spices_v11.KFC" to get cracked and cause the USA to fall into utter chaos due to it revealing the real herbs and spices.
@@robspiess I think he though the first reply was saying that Quantum Resistant Algos were bad for security, and not Quantum Computing will be bad for security. That post was kinda ambiguous.
I'm kind of surprised there was no mention of block-level CRC for storage media, the checksum that makes it possible for RAID-scrubbing to find faults, and for disks in general to be reasonably certain they're reading back the same values that were written in the first place, something almost everyone takes for granted.
@@HarpaxA He, and the writers, are, but a large part of the runtime is spent on offline and local-network file-validation and things like passwords. The fact that this is a design consideration that allows data-integrity issues to be found in RAID when the multi-disk abstraction might otherwise hide problems until way too late is just one application and a way to get people's attention with a topic that seems to draw some number of views (yay, algorithm). Block-device-level checksums seem relevant to this topic specifically because there's an emphasis on "how does data reliably get from point A to point B?" and it needs to be stored and retrieved from somewhere. There's nothing about a magnetic head or voltage-assessment that provides assurance that read-mistakes won't happen without a checksum of their own.
Noteworthy that a checksum on the download page being the same as the downloaded file doesn't mean that it hasn't been tampered with. If you're man in the middle-d or the site is compromised enough, hackers could also just replace the hash shown on the site to match the modified file.
Yeah, comparing checksums for downloaded files when the checksum and file are on the same server feels like security theater, just giving a (false) sense of security without actually adding any security. Doesn't the practice come from (and make more sense in) the scenario where a 3rd party file hosting service (or a mirror) is used to store the actual files, while only a link and a checksum is on the website itself, so you can verify that the file you get from the 3rd party is the one intended by the owner of the website?
Very important post! A hash is never proof of what the file contains, just that the file you ran the hash algorithm on has the same hash result as what you were told to expect. So use it to verify that the file wasn't corrupted in transmission or changed in some form. But don't rely on the content being what you expect just because the hash matches what's on the site you got it from.
4:23 not every service uses TCP, for example, most online games and realtime apps use UDP to reduce latency because packets don't have to arrive correctly.
1:33 If those bad actors could replace the download with a malicious one on some website, it would be of no hassle for them to replace the checksum as well. MITM attacks are guarded against with protocols like SSL. Checksums are not used to validate the security of a file but rather to confirm it was downloaded correctly from the origin (even though TCP handles it too) so that, in the worst case scenario, your PC doesn't break down from an incorrect OS download.
Yes BUT... often downloads are hosted on a different domain than the checksums. Ideally you download the file from the least suspicious mirror and get copies of the checksum from multiple other sources.
literally right after he said "make sure they don't get corrupted" at 3:40, my blender simulation used up the last of my RAM and made the video start stuttering and I swear to god I just assumed that it was just a gag for the video
Checksums are only useful if you're expecting errors not malicious intervention. Anybody could just change the source, and then re-hash the source and send that as the checksum. Encryption will be necessary regardless.
Yeah no, encryption alone does not mean attacker can't change plaintext. E.g. stream ciphers are vulnerable to known plaintext attacks. What you want is an unforgeable checksum, and in the field of cryptography you have two ways for that, digital signatures (software/drivers/official email etc), and message authentication codes (generally instant messaging). It's very common data that is assigned a MAC or digital signature is also encrypted, but unless we're talking about authenticated encryption, integrity and authenticity is provided by algorithms other than the encryption.
Couple of points: - the cryptographic hash function outputs aren't guaranteed to be unique, but are generally designed to avoid collisions. - passwords aren't just hashed (or at least they shouldn't be 😅), if they were, then if the Database was leaked, the attacker would be able to tell the simple passwords. Two people with the same password would then have the same hash output. This could also mean the attacker can generate hashes from a list of common passwords and compare against the database to find people with common passwords and hack their accounts. To get around this, a "salt" is added. The salt is randomly generated and when combined with the password and then hashed, it will create a new output, even if two users have the same password. This is why you should use unique/random passwords, because if the server doesn't salt the passwords, common passwords can be found easily and then anywhere you use that same password is then potentially compromised - even if the other places do salt them.
Except good opsec is to never store your checksums (or your salt) on the same server as your sensitive data. Checksums in my circles also tend to be cryptographically signed via PGP.
If you want to verify a file that's copied locally (i.e., both the source and destination file are on locally-accessible filesystems), doing a file compare (e.g., the Unix/Linux "cmp" command) should be much faster than doing a checksum, and will tell you exactly where the first different byte appears. I'm not a Windows guy, so I don't know how easy this is to do in Windows, but if you're going to get a third-party product to perform your checksums for you, you could probably get a third-party "cmp" program.
On the "Windows Explorer doesn't compute hashes" note; It wouldn't take a lot of resources to do that at all. They would only have to add a checksum middleman to the file transfer stream.
Awww I really wanted Linus to mention salting in the password segment. I know it's too much of a tangent for such a short video but it's a neat solution to an unfortunately real security problem.
It's worth pointing out that the reason TCP's checksums aren't for security is because anyone who could replace the file being downloaded with malware could also just change the checksum to match the malware they inserted. That's why TLS/HTTPS uses an enhanced version of checksums called digital signatures that uses special encryption tricks to prove that the checksum was calculated by the server you're downloading the file from and not an attacker.
Did discover something interesting with MS Teams. Apparently, it is possible to get corrupted files sent out over Teams between users. Colleague of mine had a known good file direct from the manufacturer. They then sent that file via teams to several other users that needed access to the file but couldn't access the direct download. 2 of those it was sent to could not use the firmware file because the device they were updating kept throwing an error saying the file could not be validated. I had them send me their copy through a program that I know does checksums and when I compared the file size just on its face it was smaller than the verified original. So while it seems teams attempts to deliver files, I can say first hand that it's not guaranteed to arrive in one piece.
A quick solution is to archive the file using 7zip and add the checksum to the file name. When the receiver run the file through 7zip to unarchive it will check the checksum and even if it matches it still will throw a fit when trying to unarchive the file if the archive has been changed in any way. This should be enough to catch any unintentional tampering, such as lost or corrupted packages.
You should mention that there is a difference between UDP and TCP in this instance. Because if we're doing something over UDP it's not gonna bother with resending it, lost is lost at that point.
This is still used if you have a slow internet and constant disconnection when doing downloads. Checksum is a way to check if your downloaded file is not corrupted.
A TQ vid on cryptography, specifically password storage and Rainbow tables would be pretty cool as a sequel to this. Would love to see more security related content
Checksums vs Hashes vs Keyed Hash (MAC) and signatures could have been more clearly separated / explained. Fitting it into a technique format/speed is a challenge but would add a lot of value / clarity.
TCP/IP doesn't actually do a checksum in that way. now, it has been a while since I read up on it, but if memory serves, then TCP checks on a per packet basis instead. it also uses a kind of "session" number in order to keep track of a session of communication. sending info from A to B would look something like this: A: Sending packets 1-14 B: received packet 14 A: sending packets 15-34 B: received packet 31 A: sending packets 32-42 B: received packet 36 A: sending packets 36-40 So in addition to having a session token in all this information, A tags all sent packets with a number per packet as well. B will read every packet it gets until it has either read all packets or the packet it receives is not the one numerically after the last. so if it gets 1, 2, 3, 5, then it stops and sends back that it got packet 3. Notice how little data B actually uses by just sending a response of the latest packet it received in a series. This makes sure that TCP is not gonna use tons of data to communicate back and forth. But it still does communicate back and forth in order to keep signal integrity. UDP/IP on the other hand is not like that. UDP is like pouring a bucket of water down the drain. Most of it should arrive sequentially, but some might not arrive in order. Or at all. But it doesn't matter, since the receiver isn't checking it. Video streaming is done like this in order to keep up with the massive amounts of data being sent, where TCP might lag behind. But it comes at the cost of sometimes being out of order and have a little lag spike here and there.
Yup. Since they mentioned Steam, it's worth noting that before SteamOS added the ability to directly change the boot animations, you could still swap in your own custom -Shrek supercut- video on the Steam Deck as long as it was precisely (down to the byte) the same length as the OG ani.
@@GSBarlev old-school game called combat arms, you can use hash collisions to modify the game files to create exploits like wallhacks. Ash collisions don't mean the same byte size. Generally when you perform hash collisions, the file gets bigger
I use TeraCopy in windows to handle all file copy and moves because I can turn on its verify option as a default and never have to worry about it again.
@@mahdi9064 In short: Google registered .zip (and .mov) tlds for its domain service. This is bad because many programs will automatically convert zip file names into links now, even if sent by a trusted person. So bad actors could now register domains of common file names to host malware.
@@TheDakes then it's probably good google snatched them before any actual malicious actors could. sure i dont trust google, and niether should anyone, but they wont use this to send you to malicious sites
@@TylerTMG No capital letters, no punctuation ;-) Also I can see exactly how you typed it so I don't even need to remember it if I want to hack you, it's already written down on every QWERTY keyboard.
One thing you didn't mention is that corrupt file downloads can be deliberately induced by your isp because of "traffic shaping", The worst part is that the download would have been faster and use less data/bandwidth if they had simply allowed the download to go unimpeded instead of forcing you try over and over.
Since checksums are a much smaller set of data than the data itself, it is possible for certain permutations of data to produce the same checksum, but improbable. That fact and others means that computers are not necessarily totally reliable but may be 1 in 1 x 10 e 10 reliable per bit or so.
Improbable, but inevitable due to the sheer amount of files and limitations of the system, which is why it's a terrible way for companies to check data on people's phones to send to law enforcement agencies
To address confusion. Checksums//CRC’s/Hashes etc. are often used interchangeably but basically all have the same basic goal. Can you with reasonable confidence know the file/data you have is the one that you actually wanted. Simple checksums use very lightweight insecure algorithms but their only purpose is detecting simple corruption. There’s no security component., meaning it’s relatively trivial to modify the file and tweak it such that it still has a valid, if not identical checksum if you were a malicious actor. When you bring in cryptography the intent is to prevent that attack vector. In that whilst you can modify a file, doing it such a way that leaves the files hash unchanged is non trivial. Any/all methods of hashing will suffer collisions by the very nature of containing less data than the thing it’s describing. I.e. you can’t uniquely describe a 1gb file using only 256 bytes of data, if that were true we’d all just download the file hash and magically reconstruct the original file from that. The essence of the more secure methods is to make it that the collisions will be a function of chance, not intent.
Video Suggestion: How to clean and maintain a Linux OS. Example: in windows you can delete temp files and stuff. How do we do that stuff on Linux. when i use the command prompt to install apps and frameworks; how do i know how to remove the bloat and leftover files after install? what are the common practices for keeping it clean?
1:00 that's wrong, passwords don't become stored as hash, they become encrypted. Hash and encryption are not the same. Hash is 1 way directional so it csnt be reversed (thus hashing passwords in DB won't let users login anymore as it can't verify rhe oassword's correctness) while with encryption like AES or MD5 user authorization will work.
Most DBs will store the password’s hash. The encryption part goes from frontend to backend, backend then transforms the password into it’s hashed form and stores/compares against the hash saved in DB. Nothing wrong with that.
@@CarlosCabrera-kn1jb Yup. Technically it's possible that two passwords will share the same hash, but the likelihood (assuming good encryption) is far less than the odds that the key to your dad's 1998 Ford Taurus could also have started someone else's car (look it up)
PGP / cryptographic signatures are even better still. Every computer and device should come with PGP/GPG... so useful! Even works for signing email. Anyone can generate a hash, and using HMAC requires sharing the password/key (which makes it easy to fake authenticity). Public key crypto is the only real solution.
I remember tom scott going on about websites that put their checksums on there And he said if they are able to change the file sent it wouldnt be too dificult to change the hash on the website to the hackers file
Hashtab is another great checksum utility for windows, adds a hash tab to the properties of any file, showing it's hashes in many common hashing functions, you can paste in your hash and it will verify if its correct.
You should do a video on the Border Gateway Protocol (BGP). One of the most fundamental and cool pieces of internet infrastructure that even most software engineers have no idea about!
Absolutely not. Those fancy hashing functions are lossy, so you loose details. SHA1 is 160 bits / 20 bytes, sha256 is 256 bits / 32 bytes. If I give you the hash of my 3 MB file, well, you cannot restore it. That's also why those hashes are used for storing password, as they cannot be reversed.
The cryptographic hash of a small file takes more than a trivial amount of time, and that time is constant for files smaller than the hash block size. Large files can take advantage of pipelining and amortize any required context switches. Hashing a gigabyte of data in one file will take much less time compared to hashing the same data divided into 2²⁸ four-byte long files.
Worth noting that TCP and UDP use a small non-cryptographic checksum. It's only 16-bits, not nearly as long as the one the animation showed. That means random collisions are far more likely (but still pretty rare), where random bitflips could pass the check, and since the checksum is part of the packet itself, it doesn't provide meaningful security from intentional changes by a "man in the middle". HTTPS provides end-to-end security that prevents that, but basic TCP and UDP don't.
Going by the thumbnail, here I thought this video was an incredibly speedy response to .zip top-level domains now being a thing, making phishing and tricking people into downloading malicious data stupidly easy.
Getting a hash for a large file while you are copying it does only take a trivial amount of time. You already have the file in memory, it's just a few instructions extra per byte. What takes time is getting a hash for a file you aren't reading anyway, as file operations is where time is spent.
Checksums are also how automatically de-duplicating filesystems for incremental backups work. Each file is stored not by its name, but by its content hash. The file metadata then just records the hash of the content, and any other file with the same content will point to the same physical extent. Tahoe-LAFS leverages this for distributed files between friends, and Freenet uses a similar process to shard and distribute files pseudo-anonymously across the entire Freenet network.
Best to use a long cryptographic hash for de-duplicating. A long CRC could work too. A simple sum is not really suitable for this, too much chance of collisions.
@@flameshana9 In my experience it mainly gets used to avoid re-backing up a file (or block of data) that hasn't changed since last time it was backed up. Not to avoid backing up a file that is a copy of another file also on the source media.
I had to download a program that would verify a copy/paste. As Windows doesn't do it for some reason. As sometimes a copy/paste gets corrupted. And you don't know until you try and open the file later. And then, that could be real bad if you don't have a backup.
Don't wanna be that guy but here are a few exceptions to the answer to the question in the title: hash collisions (weak algorithms,) web cache poisoning, and request smuggling.
Why have I never thought to use checksums when copying files on my own computer and network? I usually just resorted to verifying that the total byte count was identical.
Checksums are important. People (or really just Apple sycophants) like to say ALAC is just as good as FLAC and it’s totally not a problem at all that Apple doesn’t let you use FLAC, because you can just convert between the two and lossless is lossless! Except that’s not the case at all because FLAC natively stores a checksum for every track and ALAC is entirely a-lacking in this regard.
One point you missed. Hashes aren't unique as you stated. That's why we can have collisions. After all, no finite value uniquely mapped to an infinite value. Intentional collisions are difficult to figure out currently due to provably hard math, but accidental collisions aren't what these are trying to protect against. Adding in details like original file size can help reduce collisions even more, but don't completely eliminate the possibility (speaking mathematically).
There was a forum I used to use that got confused and applied a scantily clad woman with the text SEND NUDES as my profile picture. I named my PFP the same as that one so the server was like, yeah these are the same.
But TCP already ensures a packet drop/corruption will raise an error, right? That means manually checking the full file isn't neccesary? And if hackers want to tamper with the file, they can also easily change the checksum to the one calculated from the malicious file.
yes my son got sent adult videos after he tried to download homework questions, the internet is scary
Happens to me all the time. So much wasted data on my mobile plan...
Sounds more like someone got caught lol
Umm dude I think we need to talk about what really going on with your son.
What was it like? I mean.. uh... I'd like to know so I can protect myself from it.
@@qazhr depends on how old the son is 😂
Programmer here: Linus mentioned that the data goes through a CRYPTOGRAPHIC hash. While they certainly can be used to verify file integrity, more commonly used for such purpose are NON-cryptographic hash algorithms since they are generally faster. While the cryptographic ones are generally reserved for well, cryptography. (Difference: Protection against active malicious manipulation vs just plain transmission damage)
What makes a hash function cryptographic(or not)? Basically it's how hard it is for a bad actor to crack (or produce a collision). MD5 is once considered a cryptographic hash and are popular among website as a mean as storing password, but has since been deprecated and regarded as a non-cryptographic hash just for file integrity verification after people start finding fast ways to crack MD5 hashes
NERD alert!
@@techaddictdude you are literally watching ltt, we all are nerds here lol
dude MD5 is outdated und SHOULD NEVER be used for security related features anymore....
@@I3erow md5 is still fine to check if the file is not currupted on transfer tho
@@I3erow It depends on how secure something should be.
I keep using md5 for securing image downloads.
It is fast and simple, and no one is going to waste time cracking an md5 hash to get access (remove watermark) to a $1 picture, it will cost more to do so than just buying the picture itself.
Actually Linus, as a CS Major it’s actually a miracle that information gets from point A to point B, fucking magic man, that’s why those low level devs always have a long beard, they’re magicians.
Forget Point A to Point B--the potential that a stray cosmic ray strikes your RAM and flips a bit at just the wrong time is why I'm super psyched that DDR5 has ECC built in.
Just… for your own information, don’t write a message to professional technologists and say “ as a CS Major..”
@@GSBarlev And God said, let this bit be switched: and the bit was switched. And God saw the bit, and it was fucking magic to them; and God divined low level devs into existence
@Gilad Barlev it does? That's awesome. Does it have any practical applications for the average user though?
@@JazGalaxy chill, it was just a joke my friendo.
Fun fact, every credit/debit card has a checksum built into the number so computers can quickly determine if users accidentally typoed their number in wrong when paying for things online. Many other numbers which humans are expected to enter manually usually are designed with these sorts of checks in mind like insurance numbers, IMEI numbers and even those lil survey codes you find on receipts
The keyword being "many". Many also _don't_ have them. In fact, some of the survey-codes on receipts are practically human-readable and you can edit them as you please to fill out multiple surveys. (Not that there's any point; all you're doing is wasting your time giving them market-research for free. I'm not convinced they ever give ANYONE the cash prizes they claim to. 😒)
the difference is the "checksum" on credit card numbers is quite primitive, google Luhn's Algorithm, it's just the numbers added, multiplied and moduloed, even still, it's good enough to catch common mistakes like wrong and swapped digits
I love that the word typoed is a typo.
Interestingly, Social Security Numbers specifically do not do this
Checksums are redundant pieces of information added to data that allow the receiver to verify if the data was received correctly. A simple checksum is to use only the first seven of the 8 bits in a byte for data. The eighth bit is a sum of the first 7 bits (modulo 2) that acts as a check for the first 7 -therein the name 'checksum'. In a nutshell :)
But wait, there's more!
They make 16 bit checksums too!
Utilized in SNES - JRR Tolkien's Lord of the Rings
Edit: I should know, I mostly decoded them. Punch in "3P5" multiple times in a row, like 8 times in a row, tell me that don't unlock all the characters...
LT really needs to do more videos like these. For himself and his viewers... Many are clueless
Checksums as fast as possible .. YOU win! Instead of Linus.
Actually 🤓, the simplest (and fastest) checksum for binary computers is `xorsum`. It's "infinitely"- parallelizable, but doesn't have "avalanche-effect"
7-zip can generate the checksums of files too ! It's the CRC option in the 7-zip menu in the explorer. It avoids downloading something else to check it or typing a command.
WHY YOU NEED LINUS ?????
ads you need ?
Data transmitted with UDP does not get resent. The UDP protocol is used for things like video streaming where an occasional dropped packet won't be missed.
There is certainly no standard mechanism whereby UDP always gets resent if a packet is corrupted or lost. However, software that uses UDP might be able to tell when packets need to be resent and then do so. Consider, for example, QUIC. QUIC uses UDP, and packets that are lost most definitely do get resent. In fact, consider simple DNS using UDP. Most DNS clients will resend the outgoing packet if no response is received, it might only be resent once before giving up, or get resent but to a secondary DNS server instead, but it also might get resent to the same DNS server if a second one isn't specified.
I'm not sure that in video streaming one packet won't be missed given the complexity of compression algorithms...
Brian Gregory is correct. It's not the case that using UDP means there are no validation checks. It's just not included in the UDP protocol, and is instead left to the application layer to handle as appropriate to the situation. Almost every application that uses UDP does in fact do some sort of data validation. For example, Wireguard uses UDP. Traffic over Wireguard is encrypted and needs to be 100% accurate.
UDP also doesn't strictly require checksumming. With hardware generally always able to do it (see TOE, the TCP offload Engine protocol for more), it's almost always there, but it's not on *every* packet as Linus claimed, likely because the tangent would just eat up way more time than it's worth.
@@MrBleach163 Yes. For something like a Skype call it depends, but you'll probably be lucky if there isn't some kind of visible glitch, but the point is, you don't want to wait for retransmission and have the time delay keep increasing to the point where you're waiting ages for the person you're calling to respond to what you say.
I've heard that many of the free cloud storage services use checksums to save space. They run the hash and compare to what is already on their servers. If two matching files are uploaded, only one copy gets stored. It does not matter if the files were uploaded to separate accounts, only one copy is actually stored.
You know, I've always wondered about that.
Pretty sure it happens with Plex / Jellyfin metadata fetchers as well, which is why occasionally you'll get results that aren't just a little bit off, but, like, wildly off.
as a bug bounty guy, that opens options to find bugs, thanks
This is called deduplication, and is a staple feature in large storage systems. However they should not stop at running a simple hash to decide if the files are the same. At next step they check the file length and then if it still matches they check the actual binary data. This has to be done as the checksum of two files can be the same, even with different content and even file size.
Simple checksums like CRC32 is very easy to manipulate. A anime fan sub group used to make sure their releases all had a CRC32 checksum that showed the episode number. So episode 1 had the checksum 01010101, episode 2 got hashed as 02020202 and so on. This is (marginally) harder with MD5 and a lot harder with SHA256 or better. But even without malicious intent there are only so many hash values possible in say 256 bits that eventually two files will have the same hash. This means that a hash value can't guarantee that the file is what you think it is. It can only guarantee that the file hash is the same. So use it to check for transmission errors, and file integrity. Not as prof of content not being manipulated by a third party.
@@lPlanetarizado "hash collision" and can be very tricky. Id probably do it on multiple levels, data chunk wise, file wise and check metadata (file size, dates, entropy)
Not all CRCs are cryptographic, actually I'm pretty sure most fast checksums are not crypto hardened. Still, great video!
I mean yeah, the simplest checksum is just a parity bit.
I mean, even CRC32 is highly susceptible to collisions (files that are different from each other, but having the same hash value), and SHA-1 had that issue take hold around 2013. Most entities have moved on to hash algorithms like SHA-256 up to SHA-2048, depending on the importance of the data and urgency vs compute cost per file.
Why would you use anything other than sha256?
@@ShadowSlayer1441 you would use SHA3-256 which is more robust to attacks. Most programs still use SHA2-256.
@@ShadowSlayer1441 Speed? Simplicity?
if you have 7-zip installed (which you really should if you don't, it's amazing), it actually adds all sorts of checksum-generation options to the right-click menu in windows, its really handy
And you're not limited on using 7-zip on just ZIP/RAR/other compressed archive format files to pull their checksums... you can use that built-in functionality for pretty much any file.
Winrar gang here
i use breezip from microsoft store
Peazip good as well...
WINrar 👑
And we're slowly moving to quantum-resistant hash functions, to avoid the issue of quantum computing in the future.
That will be a problem for security
@@Dinkleberg96 no it won't. By the time a quantum computer powerful enough to work on decrypting real internet packages exists all important things will be using quantum secure algorithms. It'll be Y2K all over again.
@@matthewparker9276 Not necessarily. Important long-term data which was encrypted with "good at the time" cryptographic ciphers are being saved for future quantum computers to decrypt. Even though we can't break it now, saving RSA-4096 encrypted "Who_Shot_JFK.docx" and "Herbs_and_Spices_v11.KFC" for computers 30 years from now could cause real national security issues.
@@robspiess i cant wait for "Herbs_and_Spices_v11.KFC" to get cracked and cause the USA to fall into utter chaos due to it revealing the real herbs and spices.
@@robspiess I think he though the first reply was saying that Quantum Resistant Algos were bad for security, and not Quantum Computing will be bad for security. That post was kinda ambiguous.
I'm kind of surprised there was no mention of block-level CRC for storage media, the checksum that makes it possible for RAID-scrubbing to find faults, and for disks in general to be reasonably certain they're reading back the same values that were written in the first place, something almost everyone takes for granted.
He's talking abt Internet Checksum, not RAID
@@HarpaxA He, and the writers, are, but a large part of the runtime is spent on offline and local-network file-validation and things like passwords. The fact that this is a design consideration that allows data-integrity issues to be found in RAID when the multi-disk abstraction might otherwise hide problems until way too late is just one application and a way to get people's attention with a topic that seems to draw some number of views (yay, algorithm).
Block-device-level checksums seem relevant to this topic specifically because there's an emphasis on "how does data reliably get from point A to point B?" and it needs to be stored and retrieved from somewhere. There's nothing about a magnetic head or voltage-assessment that provides assurance that read-mistakes won't happen without a checksum of their own.
personally i prefer the method of looking at the files and going "Yeah that seems about right"
wait THIZ|S ISNT MY 8K TOY STORY 1 VIDEO
Hashtab is one of the best checksum tools for Windows. It adds a tab in the properties dialog of a file to let you compare checksums.
Noteworthy that a checksum on the download page being the same as the downloaded file doesn't mean that it hasn't been tampered with. If you're man in the middle-d or the site is compromised enough, hackers could also just replace the hash shown on the site to match the modified file.
Yeah, comparing checksums for downloaded files when the checksum and file are on the same server feels like security theater, just giving a (false) sense of security without actually adding any security.
Doesn't the practice come from (and make more sense in) the scenario where a 3rd party file hosting service (or a mirror) is used to store the actual files, while only a link and a checksum is on the website itself, so you can verify that the file you get from the 3rd party is the one intended by the owner of the website?
Very important post! A hash is never proof of what the file contains, just that the file you ran the hash algorithm on has the same hash result as what you were told to expect. So use it to verify that the file wasn't corrupted in transmission or changed in some form. But don't rely on the content being what you expect just because the hash matches what's on the site you got it from.
Then sign your files…
@@o0Donuts0o File-signing certificates are expensive af. There's no LetsEncrypt for that. 😕
The first thing that came to my mind when I first heart about checksums. But then to other attacks it can be helpful. So it is obviously not useless.
Whenever he says "bad actors", I can't help but think actors who just suck at their job doing shady things
If they're acting at doing their job and they still suck, then I'd argue that still makes them a bad actor
4:23 not every service uses TCP, for example, most online games and realtime apps use UDP to reduce latency because packets don't have to arrive correctly.
There is a big difference between *Cryptographic* checksum and the one used for verification that a file arrived correctly such as TCP/IP protocol
1:33 If those bad actors could replace the download with a malicious one on some website, it would be of no hassle for them to replace the checksum as well. MITM attacks are guarded against with protocols like SSL. Checksums are not used to validate the security of a file but rather to confirm it was downloaded correctly from the origin (even though TCP handles it too) so that, in the worst case scenario, your PC doesn't break down from an incorrect OS download.
Yes BUT... often downloads are hosted on a different domain than the checksums. Ideally you download the file from the least suspicious mirror and get copies of the checksum from multiple other sources.
Note that 7zip has a checksum viewer as well, so if you have that, you can view the checksum of a file easily
literally right after he said "make sure they don't get corrupted" at 3:40, my blender simulation used up the last of my RAM and made the video start stuttering and I swear to god I just assumed that it was just a gag for the video
lmao
subbing btw please release vid
Checksums are only useful if you're expecting errors not malicious intervention. Anybody could just change the source, and then re-hash the source and send that as the checksum. Encryption will be necessary regardless.
not exactly, most websites write the checksum out, so that you run the checksum yourself the file doesn't check itself
wrong.... this is why we use strong crypto as checksum, it will take you forever to reshash the contents to match the original checksum.
Yeah no, encryption alone does not mean attacker can't change plaintext. E.g. stream ciphers are vulnerable to known plaintext attacks. What you want is an unforgeable checksum, and in the field of cryptography you have two ways for that, digital signatures (software/drivers/official email etc), and message authentication codes (generally instant messaging). It's very common data that is assigned a MAC or digital signature is also encrypted, but unless we're talking about authenticated encryption, integrity and authenticity is provided by algorithms other than the encryption.
@@evertchin OP meant that if you can change the file on someones server, you probably can also change the displayed checksum on the web page.
Couple of points:
- the cryptographic hash function outputs aren't guaranteed to be unique, but are generally designed to avoid collisions.
- passwords aren't just hashed (or at least they shouldn't be 😅), if they were, then if the Database was leaked, the attacker would be able to tell the simple passwords. Two people with the same password would then have the same hash output. This could also mean the attacker can generate hashes from a list of common passwords and compare against the database to find people with common passwords and hack their accounts.
To get around this, a "salt" is added. The salt is randomly generated and when combined with the password and then hashed, it will create a new output, even if two users have the same password.
This is why you should use unique/random passwords, because if the server doesn't salt the passwords, common passwords can be found easily and then anywhere you use that same password is then potentially compromised - even if the other places do salt them.
A good hacker that changes a file for something malicious one on some server would also change the checksum file at the same time
Except good opsec is to never store your checksums (or your salt) on the same server as your sensitive data.
Checksums in my circles also tend to be cryptographically signed via PGP.
If you want to verify a file that's copied locally (i.e., both the source and destination file are on locally-accessible filesystems), doing a file compare (e.g., the Unix/Linux "cmp" command) should be much faster than doing a checksum, and will tell you exactly where the first different byte appears.
I'm not a Windows guy, so I don't know how easy this is to do in Windows, but if you're going to get a third-party product to perform your checksums for you, you could probably get a third-party "cmp" program.
On the "Windows Explorer doesn't compute hashes" note;
It wouldn't take a lot of resources to do that at all. They would only have to add a checksum middleman to the file transfer stream.
with the number of cores medium & high-end systems have these days it's not like performance is a concern either (just make it optional)
Awww I really wanted Linus to mention salting in the password segment. I know it's too much of a tangent for such a short video but it's a neat solution to an unfortunately real security problem.
I think this comment section has all the salt covered over CRC vs checksum.
It's worth pointing out that the reason TCP's checksums aren't for security is because anyone who could replace the file being downloaded with malware could also just change the checksum to match the malware they inserted. That's why TLS/HTTPS uses an enhanced version of checksums called digital signatures that uses special encryption tricks to prove that the checksum was calculated by the server you're downloading the file from and not an attacker.
Did discover something interesting with MS Teams. Apparently, it is possible to get corrupted files sent out over Teams between users. Colleague of mine had a known good file direct from the manufacturer. They then sent that file via teams to several other users that needed access to the file but couldn't access the direct download. 2 of those it was sent to could not use the firmware file because the device they were updating kept throwing an error saying the file could not be validated. I had them send me their copy through a program that I know does checksums and when I compared the file size just on its face it was smaller than the verified original. So while it seems teams attempts to deliver files, I can say first hand that it's not guaranteed to arrive in one piece.
I mean, it's MS Teams, I wouldn't expect it to work properly for anything
A quick solution is to archive the file using 7zip and add the checksum to the file name. When the receiver run the file through 7zip to unarchive it will check the checksum and even if it matches it still will throw a fit when trying to unarchive the file if the archive has been changed in any way. This should be enough to catch any unintentional tampering, such as lost or corrupted packages.
Sooo it couldn’t be corrupted from pc to device requiring firmware? It’s just MS Teams? Lord your diagnosis skills are terrible.
I’m surprised that this video didn’t mention google registering the .zip domain.
I thought it was going to bring that up considering the thumbnail.
Why would it?
Given the title and the thumbnail that is exactly what I thought as well. Disappointed lol.
You should mention that there is a difference between UDP and TCP in this instance. Because if we're doing something over UDP it's not gonna bother with resending it, lost is lost at that point.
This is still used if you have a slow internet and constant disconnection when doing downloads. Checksum is a way to check if your downloaded file is not corrupted.
A TQ vid on cryptography, specifically password storage and Rainbow tables would be pretty cool as a sequel to this. Would love to see more security related content
CRC32 will do for a quick check, but for security it is best to use SHA256 to ensure nothing was tampered with.
Checksums vs Hashes vs Keyed Hash (MAC) and signatures could have been more clearly separated / explained. Fitting it into a technique format/speed is a challenge but would add a lot of value / clarity.
Pretty abysmal that Windows doesn't use checksums. I've had a few known corrupted files before and who knows how many unknown corrupted files.
TCP/IP doesn't actually do a checksum in that way.
now, it has been a while since I read up on it, but if memory serves, then TCP checks on a per packet basis instead.
it also uses a kind of "session" number in order to keep track of a session of communication.
sending info from A to B would look something like this:
A: Sending packets 1-14
B: received packet 14
A: sending packets 15-34
B: received packet 31
A: sending packets 32-42
B: received packet 36
A: sending packets 36-40
So in addition to having a session token in all this information, A tags all sent packets with a number per packet as well. B will read every packet it gets until it has either read all packets or the packet it receives is not the one numerically after the last. so if it gets 1, 2, 3, 5, then it stops and sends back that it got packet 3.
Notice how little data B actually uses by just sending a response of the latest packet it received in a series. This makes sure that TCP is not gonna use tons of data to communicate back and forth.
But it still does communicate back and forth in order to keep signal integrity.
UDP/IP on the other hand is not like that.
UDP is like pouring a bucket of water down the drain. Most of it should arrive sequentially, but some might not arrive in order. Or at all. But it doesn't matter, since the receiver isn't checking it. Video streaming is done like this in order to keep up with the massive amounts of data being sent, where TCP might lag behind. But it comes at the cost of sometimes being out of order and have a little lag spike here and there.
And yet hash collisions exist. We use this to crack files on some games to mod them
Yup. Since they mentioned Steam, it's worth noting that before SteamOS added the ability to directly change the boot animations, you could still swap in your own custom -Shrek supercut- video on the Steam Deck as long as it was precisely (down to the byte) the same length as the OG ani.
@@GSBarlev old-school game called combat arms, you can use hash collisions to modify the game files to create exploits like wallhacks.
Ash collisions don't mean the same byte size. Generally when you perform hash collisions, the file gets bigger
@@TechX1320 True. I'm conflating checksums with hashes. But we're on the subject of file verification anyway, so I think the point is fair.
I use TeraCopy in windows to handle all file copy and moves because I can turn on its verify option as a default and never have to worry about it again.
Looking at the thumbnail, I thought this was about the new .zip TLD...
_sigh… unzips_
can you give more context ?
@@mahdi9064 In short: Google registered .zip (and .mov) tlds for its domain service. This is bad because many programs will automatically convert zip file names into links now, even if sent by a trusted person. So bad actors could now register domains of common file names to host malware.
@@TheDakes then it's probably good google snatched them before any actual malicious actors could. sure i dont trust google, and niether should anyone, but they wont use this to send you to malicious sites
Love it when u make 5 minute videos with 1 minute ad
TY for bringning that up.
1:15 that doesn't mean that a strong hash value can cover for a weak password!! ALWAYS choose strong passwords guys
so 1qaz2wsx3edc4rfv5tgb6yhn7ujm8ik9ol0p is weak?
@@TylerTMG No capital letters, no punctuation ;-)
Also I can see exactly how you typed it so I don't even need to remember it if I want to hack you, it's already written down on every QWERTY keyboard.
@@BrianG61UK also how do i enable 2 factor?
TCP/IP -along with a lot of other things - uses CRC-32, which categorically isn't a cryptographic hash (even if it's used as one sometimes).
TeraCopy as a windows file transfer replacement has been my go to for years for this.
One thing you didn't mention is that corrupt file downloads can be deliberately induced by your isp because of "traffic shaping",
The worst part is that the download would have been faster and use less data/bandwidth if they had simply allowed the download to go unimpeded instead of forcing you try over and over.
The output of a hash functions are not necessarily unique since the input may be infinite but the output is finite. Its just higly unlikely to happen.
Since checksums are a much smaller set of data than the data itself, it is possible for certain permutations of data to produce the same checksum, but improbable. That fact and others means that computers are not necessarily totally reliable but may be 1 in 1 x 10 e 10 reliable per bit or so.
Improbable, but inevitable due to the sheer amount of files and limitations of the system, which is why it's a terrible way for companies to check data on people's phones to send to law enforcement agencies
To address confusion. Checksums//CRC’s/Hashes etc. are often used interchangeably but basically all have the same basic goal. Can you with reasonable confidence know the file/data you have is the one that you actually wanted.
Simple checksums use very lightweight insecure algorithms but their only purpose is detecting simple corruption. There’s no security component., meaning it’s relatively trivial to modify the file and tweak it such that it still has a valid, if not identical checksum if you were a malicious actor.
When you bring in cryptography the intent is to prevent that attack vector. In that whilst you can modify a file, doing it such a way that leaves the files hash unchanged is non trivial.
Any/all methods of hashing will suffer collisions by the very nature of containing less data than the thing it’s describing. I.e. you can’t uniquely describe a 1gb file using only 256 bytes of data, if that were true we’d all just download the file hash and magically reconstruct the original file from that.
The essence of the more secure methods is to make it that the collisions will be a function of chance, not intent.
Video Suggestion: How to clean and maintain a Linux OS. Example: in windows you can delete temp files and stuff. How do we do that stuff on Linux. when i use the command prompt to install apps and frameworks; how do i know how to remove the bloat and leftover files after install? what are the common practices for keeping it clean?
1:00 that's wrong, passwords don't become stored as hash, they become encrypted. Hash and encryption are not the same. Hash is 1 way directional so it csnt be reversed (thus hashing passwords in DB won't let users login anymore as it can't verify rhe oassword's correctness) while with encryption like AES or MD5 user authorization will work.
Most DBs will store the password’s hash. The encryption part goes from frontend to backend, backend then transforms the password into it’s hashed form and stores/compares against the hash saved in DB. Nothing wrong with that.
@@CarlosCabrera-kn1jb Yup. Technically it's possible that two passwords will share the same hash, but the likelihood (assuming good encryption) is far less than the odds that the key to your dad's 1998 Ford Taurus could also have started someone else's car (look it up)
1. AES is __encryption_ and MD5 is _hashing._
2. The same password will produce the same hash so the hashes can just be compared.
And yet we still get corrupted downloads sometimes and have to manually download the file again.
Sometimes at low level shit goes wrong, but TCP also contains an ACK signal: if something does not arrive it will resend it.
You better hope your password is not stored as a hash, as rainbow tables solve that problem. Salted hashes... well thats different.
Yeah, I've been getting these weird "ads" or "sponsored segments" for every video I watch.
Gosh darn it, Colton..
Hashing is one way encryption and used to digitally sign files. Like for root servers handing out certificates for intermediates.
PGP / cryptographic signatures are even better still. Every computer and device should come with PGP/GPG... so useful! Even works for signing email. Anyone can generate a hash, and using HMAC requires sharing the password/key (which makes it easy to fake authenticity). Public key crypto is the only real solution.
3:13 skip ad
I remember tom scott going on about websites that put their checksums on there
And he said if they are able to change the file sent it wouldnt be too dificult to change the hash on the website to the hackers file
Thought this was going to be a video on the osi model. This is just as good.
This is the MOST informative Video I´ve watched so far. Thanks!!
Linus has been cooking in that sun
Hashtab is another great checksum utility for windows, adds a hash tab to the properties of any file, showing it's hashes in many common hashing functions, you can paste in your hash and it will verify if its correct.
You should do a video on the Border Gateway Protocol (BGP). One of the most fundamental and cool pieces of internet infrastructure that even most software engineers have no idea about!
i expected you to talk about error correction codes and how they are used in transmit, would love to see a video of it from you!
interesting question : could you "reverse" the SHA to get the file back from it ?
Absolutely not. Those fancy hashing functions are lossy, so you loose details. SHA1 is 160 bits / 20 bytes, sha256 is 256 bits / 32 bytes. If I give you the hash of my 3 MB file, well, you cannot restore it. That's also why those hashes are used for storing password, as they cannot be reversed.
The cryptographic hash of a small file takes more than a trivial amount of time, and that time is constant for files smaller than the hash block size. Large files can take advantage of pipelining and amortize any required context switches. Hashing a gigabyte of data in one file will take much less time compared to hashing the same data divided into 2²⁸ four-byte long files.
Worth noting that TCP and UDP use a small non-cryptographic checksum. It's only 16-bits, not nearly as long as the one the animation showed. That means random collisions are far more likely (but still pretty rare), where random bitflips could pass the check, and since the checksum is part of the packet itself, it doesn't provide meaningful security from intentional changes by a "man in the middle". HTTPS provides end-to-end security that prevents that, but basic TCP and UDP don't.
Hashtab inserts a extra tab "hash" into the files-options where you also can compare
Going by the thumbnail, here I thought this video was an incredibly speedy response to .zip top-level domains now being a thing, making phishing and tricking people into downloading malicious data stupidly easy.
Getting a hash for a large file while you are copying it does only take a trivial amount of time. You already have the file in memory, it's just a few instructions extra per byte. What takes time is getting a hash for a file you aren't reading anyway, as file operations is where time is spent.
Checksums are also how automatically de-duplicating filesystems for incremental backups work. Each file is stored not by its name, but by its content hash. The file metadata then just records the hash of the content, and any other file with the same content will point to the same physical extent. Tahoe-LAFS leverages this for distributed files between friends, and Freenet uses a similar process to shard and distribute files pseudo-anonymously across the entire Freenet network.
Why do backup programs still make duplicates then? I've tried so many and they all do a painfully bad job at it.
Best to use a long cryptographic hash for de-duplicating. A long CRC could work too. A simple sum is not really suitable for this, too much chance of collisions.
@@flameshana9 In my experience it mainly gets used to avoid re-backing up a file (or block of data) that hasn't changed since last time it was backed up. Not to avoid backing up a file that is a copy of another file also on the source media.
I had to download a program that would verify a copy/paste. As Windows doesn't do it for some reason. As sometimes a copy/paste gets corrupted. And you don't know until you try and open the file later. And then, that could be real bad if you don't have a backup.
My computer literally crashed at 4:47 and rebooted on its own, a very creepy coincidence
It is not only at TCP/IP layer but eve on the link layer the Etherner frames have CRC of that frame...
The movie in the intro is called 1917, it's one of my favorite movies of all time
More of this!!!! plz and thank you
This is also how ZFS verifies file integrity on ZFS RAID arrays.
Don't wanna be that guy but here are a few exceptions to the answer to the question in the title: hash collisions (weak algorithms,) web cache poisoning, and request smuggling.
Always nice to learn something useful from time to time.
if you have 7-zip installed, you can right click a file and get "all" hash-sums
I remember when the channel was called "Fast as Possible", god that was quite a long time ago now and I've been watching LMG since 2012!...
comrade
I swear the steam validity verification takes longer than an actual new install.
The thumbnail made me think this was going to be about the .zip TLD issue currently going on...
Resilient File System (ReFS) which comes with Workstation editions of Windows does this checking.
*Teracopy* - It checks checksum after files transfer.
Why have I never thought to use checksums when copying files on my own computer and network? I usually just resorted to verifying that the total byte count was identical.
Checksums are important. People (or really just Apple sycophants) like to say ALAC is just as good as FLAC and it’s totally not a problem at all that Apple doesn’t let you use FLAC, because you can just convert between the two and lossless is lossless! Except that’s not the case at all because FLAC natively stores a checksum for every track and ALAC is entirely a-lacking in this regard.
TCP/IP _needs_ the checksum because of data collision is a real problem.
When someone changes file on a website, is not a problem to change the checksum too.
A quite tingling theme regarding your recent story =)
One point you missed. Hashes aren't unique as you stated. That's why we can have collisions. After all, no finite value uniquely mapped to an infinite value. Intentional collisions are difficult to figure out currently due to provably hard math, but accidental collisions aren't what these are trying to protect against. Adding in details like original file size can help reduce collisions even more, but don't completely eliminate the possibility (speaking mathematically).
Totally agree checksums should be ubiquitous. I can't count the number of times I've been bitten by this.
Do a video on CRCs next! Kinda like a checksum
There was a forum I used to use that got confused and applied a scantily clad woman with the text SEND NUDES as my profile picture. I named my PFP the same as that one so the server was like, yeah these are the same.
Thanks for the video!
421th comment - file hashes are used to do checksums checksums check the file for any curruption
So how does that explain uploaded or downloaded files getting corrupted? This was especially an issue a with slow or unstable internet connection.
But TCP already ensures a packet drop/corruption will raise an error, right?
That means manually checking the full file isn't neccesary?
And if hackers want to tamper with the file, they can also easily change the checksum to the one calculated from the malicious file.