Mike Pound is by far my favorite person on this channel... he has the most interesting subjects, shines with crazy knowledge while still keeping the video fresh and dynamic.
Been watching a whole bunch of Mike's videos as a complement to my introductory module on Security and Authentication. One of the best teachers I have come across!
I've been trying to understand the concept for 3 days from the slides my teacher covered and the book she shared and ended up with complicated mind, this video gave me a pure understanding in 10 mins. Great job!
I am at a hackathon in Chicago Illinois at Illinois Institute of technology and I have to use sha-1 on some facts before I pass then to an api so I can make a project for the Hackathon. You did a wonderful job telling me what she-1 was so I could understand the cryptic api documentation. Thank you very much.
Mike you are my favourite person to appear on this channel. I enjoy your clear explanations and like the quite recent toppics like google deep dream, dijkstra and so on.
Thanks, Dr Pound (if you read this). I find your demeanour easy to engage with, and you set me off on the journey of understanding fully (with much work!).
Hmm, so far this is fairly straightforward, but the interesting part would be how exactly these compression functions work. Will there be a follow-up video on that?
In essence, it generates 80 32 bit words derived from bits of the plaintext, then the state does right circular shifts, some XORs, some bitwise ANDs, addition with the round word and round constant, and then permutation between all state variables
3:17 And the reasons why the NSA came out with SHA-1 to replace the earlier SHA-0 (or just plain “SHA”) were not revealed publicly. But the weaknesses in the original SHA were discovered independently a few years later. This was part of a sequence of evidence indicating that the gap between public, unclassified crypto technology and what the NSA has was narrowing, and may not be significant any more.
I think it's widening because look at Pegasus and with Pegasus 2.0 you only need phone number to target a victim. And, Pegasus is joint project between Israel and USA. Imagine what NSA would have kept to themselves. It is common understanding in computer security feild that if government wants you, they have you.
aullik Considering almost all real-world data is stored as a stream of bytes (8 bit values), That's incredibly unlikely to ever come up. It could be 504 bits, but 511 is highly improbable. If your padding has to add at least 8 bits (one byte), then the thing he described works fine. Remember working with individual bits is almost unheard of in computing. If you have to store individual bits for storage efficiency, you pack them into bytes. (similarly, if you store 7 bit values, you either store them in 8 bits and ignore a bit, or you pack it such that you store, say, 56 bit blocks. (7 x 8 - eg, 8 sets of 7 bits stored in 7 bytes)
aullik: Exactly the question that raised to my mind too :-) Since there isn't necessary enough bits left in the block to include the length of actual message.
+KuraIthys Going with bytes, the longest message that could still be padded would be 496 bits long. 504 wouldn't work as you'd only have 8 bits left but 504 in binary is already 9 bits long.
+Kuralthys I know that we usually work with bytes, But even if we say we have 512-8 = 504 bits Then we add 1 '1' bit to start the padding and now we only have 7 bytes left. The message is 504 bytes long but we can only store 128 in 7 bits. The only answer is that we expand to 1024 bits. But the question would be how do we expand. What is the "syntax" for the lack of a better word
What I want to know, for no particular reason, is if there are cases where a hash of a hash equals itself, of course sticking with one particular algorithm and hash length.
Thought I was following until 9:35 He describes a way of padding that will produce the same padding string for messages with the same length - then says it's important that messages with the same length don't have the same padding string. Did something important end up on the editing room floor?
I'll check with Mike but I think it was just a slip of the tongue - ie The padding would be the same for messages of the same length but the messages would be different if they are different >Sean
Me: Explain SHA Dr. Pound: Explains it Me confused: Explain it to me like I'm 12 Dr. Pound: Explains it like I'm 12 Me still: Explain it to me like I'm 5...
The compression function of SHA is where it gets quite complicated, and I don't think it would've fit into the scope of one video, as explaining it to someone with no prior knowledge isn't trivial, there's quite a bit of complicated math involved, and very few people actually understand the details of it.
The key idea that i got from this video is that hashing is not encryption and there is a difference between the two, while its easy someone confuse between them.
If that's how it works, it is very easy to find collisions: 1. Hash 20 bits long data 2. Copy the 512 long data that have been created (by the rules of padding one followed by zeros plus the size) Then you have two inputs that are essentially the same who share the same output. So I think there is a lot more sense in applying those rules no matter what the size of the input is, and adding 512 bits blocks to the end if needed. I think this is how the SHA works.
That's how it works, yes - it's always padded. Without padding you can easily append whatever you like at the end of a message; an important part of the integrity check is to tell where the message ends. It's not at all easy to find collisions, though. When you hash 20 bits of data you're actually hashing 512 bits of data as the algorithm only works with exactly 512 bits at a time, i.e. one block. The remaining 492 bits must therefor be padded in a consistent way - if you pad it this way and I pad it that way, we'll end up with different 512 bit blocks, which in turn will result in very different hashes. If the message length (in bits) modulus 512 is 447 or less, there's room for the padding which is one 1, followed by however many 0's needed to get to 448 bits. Finally, the 64 bit length of the message is added (which brings it up to exactly 512 bits). If there's not enough room, additional 0's are added in a following block, up until there's only 64 bits left. (If the message length modulus 512 is 0, then the final block will consist of nothing but a 1 followed by 447 0's and then the length.)
Since SHA is deterministic, even though it is non-reversible, it is still possible to guess the hashes of some reasonably short messages. For example, string 'abc' ALWAYS produces ba7816bf8f01cfea414140de5dae2223b00361a396177a9cb410ff61f20015ad. If I have a large enough database plus computational power, I could probably guess some short messages, although not the entire novel.
I kinda want to make my own hashing algorithm now. It wouldn't be very good, it would just be some random jostling around of bits until it looks weird.
That 011001011 he wrote down is actually the start of the SHA hash value for "abd". I wonder if that was intentional, because the odds of that happening randomly are less than one percent.
9:18 The trailing 1 is added at the next byte. So if you have: 01101010 it will be padded like this: 01101010[100000000000000000000000000000........ 448] + (64 bits of message size) If there isn't enough space for the 64 bit size block the block will be padded to the end and the size will get its own block. So like this: Block 1: 01101010......[1000000000000000000000000000000........ 512] Block 2: [000000000....... 448] + (64 bits of message size) I know this because I made my own implementation of the SHA-1 and SHA-2 algorithms
If there's less than 65 bits of space left in the final block for padding, you just pad toward an extra block. For example if your message is 480 bits, you add a one-bit, 479 zero-bits, and the 64-bit length, giving total length 1024 bits = 2 blocks.
For those who did not get padding. SHA1 works with 512 bits or multiples of it. If the message is less than 512 then you need to pad it. Let’s say the message is 10011, Before you hash it, you need to pad it to make it 512 bit long. Note that it has a length of 5 bits only. 5 is represented in binary is as 101. You start padding with 1 always so 6th digit will be 1 and pad 101 as last 3 digits. Pad the remaining digits (512 - 5 - 1 - 3 = 503) as 0s So message with padding will look as below. 10011_100000000........000101 (“_” represents start of padding)
Can you talk about the colliding prefix issue? As I understand it once I find a collision with a file, I can continue to create collisions by appending the same thing to both files, and some how this allows me to create two meaningful files each with the same hash value where one might expect that any collision which might be found would be obviously fake because it would have to be made up of a bunch of random bits.
Hacking: The Art of Exploitation is a great book by Jon Erickson, which teaches you the basics of reverse engineering, code flow, basic C programming, the stack, networks and other things to get you started on binary exploitation. It's a great book, I recommend it to anyone who's willing to invest time in learning how to hack properly.
cyancoyote Thanks for the reply. I've heard by many people that C is a very hard language to learn though... do you have any recommendations for introductory books to learning assembly?
How would the padding work if the final block of the message was long enough that you don't have enough padding room to say the number of bit in the message? So if the final block contained 510 bits you would have to pad in 9 bits(111111110) to say that the message is 510 bits, but you would end up with more than 512 bits.
The length field has a fixed size (which is sufficient enough) (also the field is not optional). The length of 10...0 is decided including the size of the length field i.e. you could jump over to the next block if required.
1:34 Well, by “completely changed” you mean that somewhere around 50% of the bits in the hash flip to different values. You don’t want them _all_ to flip (or flip according to any discernible pattern), otherwise that’s no longer quite so random.
Since there is an infinite amount of information that can fit in an infinite long string of characters and that a hash output a finished string of characters then two strings can have the same hash. Since the hash function is losing information it is mathematically impossible to have a perfect hash function with an arbitrary long input string
Still a bit too confusing for me........ Can you make a video on Hashing VS Encryption? When is what used? If the hash always has less information than the actual file, why would you ever need to hash something in the first place?
Encryption is reverseable, hashing is not. In hashing, the receiver only need confirmation that the data is valid. One example is password authentication. For security reason, the server does not store copy of user password, they only store hash of the password. When a user try to login, the server compare the password hash to the one stored as authentication. Meanwhile, if the database gets breached, people can't use password hash to find out the original password (other than brute-force the original password).
Ah this video has aged like milk, or rather SHA1 has I should say. It's a prime example of why you can never count on anything permanently being secure as eventually a basic error in the underlying math will be found and exploited.
Interestingly, since the number of inputs is infinite, there are infinite inputs resulting in the same digest. Don't try to find a collision tho, 2^160 is in the order of the number of atoms in the universe
Who the heck writes the subtitles for these videos and why are they so badly wrong? They managed to mishead 'SHA' as both 'char' and 'shower' within the space of like a minute. They're not auto-generated... or are they?
I just read on Wikipedia that the block size of SHA1 is 512 bits, but the internal state and output size is 160 bits. So, what's the difference between block size and internal state, and what happens to reduce 512 bits down to 160 bits and why?
The block size just says how much of the message the hash algorithm works on at a time. In this case, if you have a 1024 bit message, SHA-1 works on the first 512 bits first, then the next 512 bits. The internal state is where it "mixes" the input bits. That's five 4-byte words (Mike - and the specification - labels them H0, H1, H2, H3, H4). 5 words x 4 bytes x 8 bits = 160 bits. For SHA-1, it actually first *extends* the 512 bits to 2560 - *eighty* 4-byte words. (It makes those "extension words" by xor'ing different parts of the original 512 bits together). Then it mixes each of the eighty words, one by one, into the state, using XOR's, AND's and bit rotations - also constantly rotating the words in the state. That way, the eighty words end up in five words. That's the compression. The output is just the state when it's done. For some hashes (e.g. SHA-224, SHA-384), output size will be just a part of the final state. For SHA-1, the output is the entire state verbatim.
So the padding is only denoted by the last one with a trail of zeroes and a length at the end? That is not a prefix and without some other way of indicating that padding is present it is indistinguishable from data. After a quick google search it appears that the padding is always present so it doesn't need to be a prefix.
This was very informatice! Question: Is there any significance to the initialization constants h0 = 0x67452301 h1 = 0xEFCDAB89 h2 = 0x98BADCFE h3 = 0x10325476 h4 = 0xC3D2E1F0 Or are they chosen "randomly"? Thanks!
Is there mathematical theory to prove that the resulting hashes are "evenly" distributed in the 2^256 or 2^160 space? If the resulting hashes are somewhat "clustered" in a space that is smaller than perceived 2^256, then the chance of collision would be higher
I don't understand the last bit about padding. If changing a single bit will completely change the hash result, why can't we just fill up the message with 0s up to the next 512 multiple?
What happens if the last block has, let's say 504 bits, and the last 8 bits does not have enough room to store the length of the message? Wouldn't the padding scheme break down?
What if the length of the message is 511 bits? Then we have only 1 bit of padding, and we can't possibly store the number '511' in 1 bit of information.
Mike Pound is by far my favorite person on this channel... he has the most interesting subjects, shines with crazy knowledge while still keeping the video fresh and dynamic.
I like him and his topics too, though the AI topics are interesting and the person explaining them is good too
he has great body language, tries to use it as much as possible
And a fair looker.
And the same accent as the 11th Doctor (Matt Smith)! :-D Where is that accent from?
Absolutely agree, Tom Scott is my second favourite, that guy is hillarious
I could sit and watch videos from this guy all day long, so informative and laid back
Love how these videos get STRAIGHT to the point.
Been watching a whole bunch of Mike's videos as a complement to my introductory module on Security and Authentication. One of the best teachers I have come across!
I've been trying to understand the concept for 3 days from the slides my teacher covered and the book she shared and ended up with complicated mind, this video gave me a pure understanding in 10 mins. Great job!
This is too much work, can’t we just trust each other?
That ,my friend, is the real problem
How can I trust other people when I can't even trust myself
@Mohamed Seid GodisGood666!
Dont trust verify
No Way!!!
As a CS student im super grateful for these vids, you guys explain it better than a lot of professors.
I am at a hackathon in Chicago Illinois at Illinois Institute of technology and I have to use sha-1 on some facts before I pass then to an api so I can make a project for the Hackathon. You did a wonderful job telling me what she-1 was so I could understand the cryptic api documentation. Thank you very much.
This is my favorite guy on this channel. I just love stuff like this.
Mike Pound is the best! I love hearing him explain things - keep em coming!
Mike you are my favourite person to appear on this channel. I enjoy your clear explanations and like the quite recent toppics like google deep dream, dijkstra and so on.
Thanks, Dr Pound (if you read this). I find your demeanour easy to engage with, and you set me off on the journey of understanding fully (with much work!).
I've always loved your videos and now I study computer science and can watch your videos for studying, it's amazing
pound for pound Mike pound is the best narrator on computerphile
Roses are red
Violets are blue
Unexpected { on line 32
coding joke
A poetic compiler? I like that idea
Unresolved external symbol
Felt that on a spiritual level
Violets are blue
Roses are red
Your code isn't thread-safe
Use locks instead
The washing machine example really helped seal in this topic I was trying to understand and helped me on my final project. Thank you!!!
I love this channel so much...
Hmm, so far this is fairly straightforward, but the interesting part would be how exactly these compression functions work. Will there be a follow-up video on that?
In essence, it generates 80 32 bit words derived from bits of the plaintext, then the state does right circular shifts, some XORs, some bitwise ANDs, addition with the round word and round constant, and then permutation between all state variables
@@liljuan206 thanks, this really helped clearing things up
it isn't compression he is describing it is hashing. which is not what encryption is. which is what sha is. (notice the s part stands for secure).
@@liljuan206 how do they make it so it can't be reversed?
In essence Sha-2 uses 6 primary functions: Choice and Majority, and S0, S1, E0, and E1 all which move and permutate bytes around during compression
easy-going video which explains just enough about SHA algo to keep it simple. The details are better learnt once you "get" the basic idea.
3:17 And the reasons why the NSA came out with SHA-1 to replace the earlier SHA-0 (or just plain “SHA”) were not revealed publicly. But the weaknesses in the original SHA were discovered independently a few years later. This was part of a sequence of evidence indicating that the gap between public, unclassified crypto technology and what the NSA has was narrowing, and may not be significant any more.
I think it's widening because look at Pegasus and with Pegasus 2.0 you only need phone number to target a victim.
And, Pegasus is joint project between Israel and USA. Imagine what NSA would have kept to themselves.
It is common understanding in computer security feild that if government wants you, they have you.
How does the padding work if a block is 511 bits long?
aullik Considering almost all real-world data is stored as a stream of bytes (8 bit values), That's incredibly unlikely to ever come up.
It could be 504 bits, but 511 is highly improbable.
If your padding has to add at least 8 bits (one byte), then the thing he described works fine.
Remember working with individual bits is almost unheard of in computing.
If you have to store individual bits for storage efficiency, you pack them into bytes.
(similarly, if you store 7 bit values, you either store them in 8 bits and ignore a bit, or you pack it such that you store, say, 56 bit blocks. (7 x 8 - eg, 8 sets of 7 bits stored in 7 bytes)
aullik: Exactly the question that raised to my mind too :-) Since there isn't necessary enough bits left in the block to include the length of actual message.
You could add another block of 512 bits to the end to make it work.
Going with bytes, the longest message that could still be padded would be 496 bits long. 504 wouldn't work as you'd only have 8 bits left but 504 in binary is already 9 bits long.
I know that we usually work with bytes, But even if we say we have 512-8 = 504 bits Then we add 1 '1' bit to start the padding and now we only have 7 bytes left. The message is 504 bytes long but we can only store 128 in 7 bits.
The only answer is that we expand to 1024 bits. But the question would be how do we expand. What is the "syntax" for the lack of a better word
I love these videos when Dr. Mike Pound is in them.
What I want to know, for no particular reason, is if there are cases where a hash of a hash equals itself, of course sticking with one particular algorithm and hash length.
Thought I was following until 9:35
He describes a way of padding that will produce the same padding string for messages with the same length - then says it's important that messages with the same length don't have the same padding string. Did something important end up on the editing room floor?
I'll check with Mike but I think it was just a slip of the tongue - ie The padding would be the same for messages of the same length but the messages would be different if they are different >Sean
No, "0010110" padded would be "0010110100000...", but "001011000" would be "001011000100000...", so the 1 (first bit of padding) would be later.
+Mat2095 He obviously meant if you just pad them with zeros.
Would you please explain the workings of the "washing machine"? ;-) I.e. the compression functions?
Thanks. I'll give this snippet a look. :-)
Dr Mike Pound is the best! More videos with him please
Me: Explain SHA
Dr. Pound: Explains it
Me confused: Explain it to me like I'm 12
Dr. Pound: Explains it like I'm 12
Me still: Explain it to me like I'm 5...
My dealer need this.
Appreciate your feed back!
Thanks for watching, for more info and guidance on how to trade and earn.
Thank you so much. I had a hard time finding someone to explain it well
Some people speak terrible not understandable english, he is one of them. Even whole words were not completely spoken.
You explained everything except for the part that actually matters. :(
You may as well have said, sha works by shaing things.
Exactly my thought :/
That they explain complicated things in an easier to understand manner. Sorta like every other video they make.
Ah, I see now...it's a washing machine with some knobs that does the sha'ing.
The compression function of SHA is where it gets quite complicated, and I don't think it would've fit into the scope of one video, as explaining it to someone with no prior knowledge isn't trivial, there's quite a bit of complicated math involved, and very few people actually understand the details of it.
YES exactly this..
I always wondered how these things work. Great video
The key idea that i got from this video is that hashing is not encryption and there is a difference between the two, while its easy someone confuse between them.
If that's how it works, it is very easy to find collisions:
1. Hash 20 bits long data
2. Copy the 512 long data that have been created (by the rules of padding one followed by zeros plus the size)
Then you have two inputs that are essentially the same who share the same output.
So I think there is a lot more sense in applying those rules no matter what the size of the input is, and adding 512 bits blocks to the end if needed.
I think this is how the SHA works.
That's how it works, yes - it's always padded. Without padding you can easily append whatever you like at the end of a message; an important part of the integrity check is to tell where the message ends.
It's not at all easy to find collisions, though. When you hash 20 bits of data you're actually hashing 512 bits of data as the algorithm only works with exactly 512 bits at a time, i.e. one block. The remaining 492 bits must therefor be padded in a consistent way - if you pad it this way and I pad it that way, we'll end up with different 512 bit blocks, which in turn will result in very different hashes.
If the message length (in bits) modulus 512 is 447 or less, there's room for the padding which is one 1, followed by however many 0's needed to get to 448 bits. Finally, the 64 bit length of the message is added (which brings it up to exactly 512 bits). If there's not enough room, additional 0's are added in a following block, up until there's only 64 bits left. (If the message length modulus 512 is 0, then the final block will consist of nothing but a 1 followed by 447 0's and then the length.)
Since SHA is deterministic, even though it is non-reversible, it is still possible to guess the hashes of some reasonably short messages. For example, string 'abc' ALWAYS produces ba7816bf8f01cfea414140de5dae2223b00361a396177a9cb410ff61f20015ad. If I have a large enough database plus computational power, I could probably guess some short messages, although not the entire novel.
That's exactly how most cracking is done. Hashed database against hashed database lol
SHA Hashing Algorithm?
Secure Hashing Algorithm Hashing Algorithm
ATM Machine
RAS Syndrome
LAN Network
GNU's Not Unix...wait a minute
LCD Display
How do you know the "1000000..." padding bits are for padding purposes, and not part of the actual data/plaintext itself?
Re watched it at least 10 times. Thank you for this explanation
Compression Fuction.
SHA reversion.
I kinda want to make my own hashing algorithm now. It wouldn't be very good, it would just be some random jostling around of bits until it looks weird.
That 011001011 he wrote down is actually the start of the SHA hash value for "abd". I wonder if that was intentional, because the odds of that happening randomly are less than one percent.
the video's shoots are like modern family and that make's me happy ! also the information so thanks!
What's amazing is the Tom Scott "rocket" animation didn't show up on a video from Dr. Pound
Note to self: Don't use a regular monitor as a touch screen
Its a university flatron monitor, probably expendable.
9:40 I didn't quite understand how that padding scheme guarantees that messages with the same size would not share the same padding.
9:18 The trailing 1 is added at the next byte. So if you have:
it will be padded like this:
01101010[100000000000000000000000000000........ 448] + (64 bits of message size)
If there isn't enough space for the 64 bit size block the block will be padded to the end and the size will get its own block. So like this:
Block 1:
01101010......[1000000000000000000000000000000........ 512]
Block 2:
[000000000....... 448] + (64 bits of message size)
I know this because I made my own implementation of the SHA-1 and SHA-2 algorithms
Hey can you explain me everything in brief again I wanna know If I got my thinking right
I would love to see a video about the compression function! :)
What if the message is only a few bits shy of a block, not enough room for padding bits as described?
If there's less than 65 bits of space left in the final block for padding, you just pad toward an extra block. For example if your message is 480 bits, you add a one-bit, 479 zero-bits, and the 64-bit length, giving total length 1024 bits = 2 blocks.
Matthijs van Duin thanks
Another video explaining SHA-256 would be awesome.
For those who did not get padding.
SHA1 works with 512 bits or multiples of it. If the message is less than 512 then you need to pad it.
Let’s say the message is 10011, Before you hash it, you need to pad it to make it 512 bit long.
Note that it has a length of 5 bits only. 5 is represented in binary is as 101.
You start padding with 1 always so 6th digit will be 1 and pad 101 as last 3 digits.
Pad the remaining digits (512 - 5 - 1 - 3 = 503) as 0s
So message with padding will look as below.
(“_” represents start of padding)
Can you talk about the colliding prefix issue? As I understand it once I find a collision with a file, I can continue to create collisions by appending the same thing to both files, and some how this allows me to create two meaningful files each with the same hash value where one might expect that any collision which might be found would be obviously fake because it would have to be made up of a bunch of random bits.
@5:21 "We might talk about that in a bit", proceeds to encrypt that bit in sha and turns it to 160 bits
This man forgot more about IT security than i will ever learn
Anyone notice the 'hacking' book on the shelf behind?
It doesn't look like anything to me
Hacking: The Art of Exploitation is a great book by Jon Erickson, which teaches you the basics of reverse engineering, code flow, basic C programming, the stack, networks and other things to get you started on binary exploitation. It's a great book, I recommend it to anyone who's willing to invest time in learning how to hack properly.
cyancoyote is knowledge of a programming language required?
cyancoyote Thanks for the reply. I've heard by many people that C is a very hard language to learn though... do you have any recommendations for introductory books to learning assembly?
How would the padding work if the final block of the message was long enough that you don't have enough padding room to say the number of bit in the message? So if the final block contained 510 bits you would have to pad in 9 bits(111111110) to say that the message is 510 bits, but you would end up with more than 512 bits.
The length field has a fixed size (which is sufficient enough) (also the field is not optional). The length of 10...0 is decided including the size of the length field i.e. you could jump over to the next block if required.
Finally a simple explanation of why the hash functions can't be reversed
He didn't mention that at all.
I feel like a genius learning everything here!
It'd be amazing to see Dr.Pound reviewing some books from his collection. Get to know his technical interests apart from image analysis.
1:34 Well, by “completely changed” you mean that somewhere around 50% of the bits in the hash flip to different values. You don’t want them _all_ to flip (or flip according to any discernible pattern), otherwise that’s no longer quite so random.
The thumbnail made me think "OSHA" with the O as Dr Pound's head.
Loved the washing machine demonstration!
I remember when SHA1 was actually still secure, and people could get away with MD5 (although it was started to be frowned upon). Now I feel old.
Apple once tried to get away with MD4.
It would be amazing a video how you can get tracked for example: ip, mac, canvas, hd serial number, etc
Thanks for your great work!
never been this early for a computerphile, dope
What happens if a message is smaller than 512 bits but long enough for the padding part to not have any space left to store the length of the message?
Then you pad to 1024 bits(including message length)
Sometimes I wonder how Mike's videos are free.
Nice! Could you make a video about post-quantum cryptography please? It will be a great opportunity to learn more about this stuff
Excellent as usual, good learning resource
Love the Schildt on your wall!
would love an video on SHA-3
Since there is an infinite amount of information that can fit in an infinite long string of characters and that a hash output a finished string of characters then two strings can have the same hash. Since the hash function is losing information it is mathematically impossible to have a perfect hash function with an arbitrary long input string
Summarizing data is not the point. You should look at Tom Scotts video on hashing to understand the difference between hashing and compression.
Samuel Prevost Sounds right. Finished = finite?
Elegant explanation. Thank you, Thank you, Thank you 😊👍
Good job! Your videos are excellent.
Still a bit too confusing for me........ Can you make a video on Hashing VS Encryption? When is what used? If the hash always has less information than the actual file, why would you ever need to hash something in the first place?
Encryption is reverseable, hashing is not. In hashing, the receiver only need confirmation that the data is valid.
One example is password authentication. For security reason, the server does not store copy of user password, they only store hash of the password. When a user try to login, the server compare the password hash to the one stored as authentication. Meanwhile, if the database gets breached, people can't use password hash to find out the original password (other than brute-force the original password).
Ah this video has aged like milk, or rather SHA1 has I should say. It's a prime example of why you can never count on anything permanently being secure as eventually a basic error in the underlying math will be found and exploited.
AFAIk SHA2 is considered secure for the moment. And the very basic process is the same for them
Isn't padding used even if the message is already a multiply of 512 bits to avoid attacks?
What happens if your message is, say, 509 bits in length? How do you pad it if the length won't fit?
You’re a legend bruv -❤ from USA 🇺🇸
Interestingly, since the number of inputs is infinite, there are infinite inputs resulting in the same digest. Don't try to find a collision tho, 2^160 is in the order of the number of atoms in the universe
Who the heck writes the subtitles for these videos and why are they so badly wrong? They managed to mishead 'SHA' as both 'char' and 'shower' within the space of like a minute. They're not auto-generated... or are they?
I just read on Wikipedia that the block size of SHA1 is 512 bits, but the internal state and output size is 160 bits. So, what's the difference between block size and internal state, and what happens to reduce 512 bits down to 160 bits and why?
The block size just says how much of the message the hash algorithm works on at a time. In this case, if you have a 1024 bit message, SHA-1 works on the first 512 bits first, then the next 512 bits. The internal state is where it "mixes" the input bits. That's five 4-byte words (Mike - and the specification - labels them H0, H1, H2, H3, H4). 5 words x 4 bytes x 8 bits = 160 bits.
For SHA-1, it actually first *extends* the 512 bits to 2560 - *eighty* 4-byte words. (It makes those "extension words" by xor'ing different parts of the original 512 bits together). Then it mixes each of the eighty words, one by one, into the state, using XOR's, AND's and bit rotations - also constantly rotating the words in the state. That way, the eighty words end up in five words. That's the compression.
The output is just the state when it's done. For some hashes (e.g. SHA-224, SHA-384), output size will be just a part of the final state. For SHA-1, the output is the entire state verbatim.
So the padding is only denoted by the last one with a trail of zeroes and a length at the end? That is not a prefix and without some other way of indicating that padding is present it is indistinguishable from data.
After a quick google search it appears that the padding is always present so it doesn't need to be a prefix.
keeps me engaged great explanation
This was very informatice!
Question: Is there any significance to the initialization constants
h0 = 0x67452301
h1 = 0xEFCDAB89
h2 = 0x98BADCFE
h3 = 0x10325476
h4 = 0xC3D2E1F0
Or are they chosen "randomly"?
No, hey could be any numbers. BUt the cryptographic comunity is very sceptical of numbers that come out of nowhere.
Superb video! Understood it even better with a lefty teaching me ;)
I have no sound in either Chrome or Edge. The commercial at the beginning plays just fine. Other videos play fine.
Is there mathematical theory to prove that the resulting hashes are "evenly" distributed in the 2^256 or 2^160 space? If the resulting hashes are somewhat "clustered" in a space that is smaller than perceived 2^256, then the chance of collision would be higher
Oh nice, string hashing via SHA1 is something I've been interested in.
Excellent, finall a video with subtitles :)
I didn't know that SHA was short for anything until now.
Thank you very much for this video :) It was very helpful and educational!
9:49 captions about Merkle-Damgard Construction are hilarious
0:34 who made that visual ? :P
haha !
it's funny how video quality has not changed much in the past 7 years
the more I watch these videos, I can't help but wonder. where is the link to the computer science lectures.
I watched dozens of Computerphile's video. I don't know why this particular video has no sound.
Do you have your headset plugged ? Or you can check de level of speakers. Captain flies away
So basically it's a randomization function that is seeded with the data you give it, right?
I don't understand the last bit about padding. If changing a single bit will completely change the hash result, why can't we just fill up the message with 0s up to the next 512 multiple?
I think in that case "abc" and "abc0" would produce the same hash.
What happens if the last block has, let's say 504 bits, and the last 8 bits does not have enough room to store the length of the message? Wouldn't the padding scheme break down?
In that case, you pad up to 1024 bits.
What if the length of the message is 511 bits? Then we have only 1 bit of padding, and we can't possibly store the number '511' in 1 bit of information.
Then you pad to 1024 bits (two blocks)