Mike Pound is by far my favorite person on this channel... he has the most interesting subjects, shines with crazy knowledge while still keeping the video fresh and dynamic.
Been watching a whole bunch of Mike's videos as a complement to my introductory module on Security and Authentication. One of the best teachers I have come across!
I've been trying to understand the concept for 3 days from the slides my teacher covered and the book she shared and ended up with complicated mind, this video gave me a pure understanding in 10 mins. Great job!
I am at a hackathon in Chicago Illinois at Illinois Institute of technology and I have to use sha-1 on some facts before I pass then to an api so I can make a project for the Hackathon. You did a wonderful job telling me what she-1 was so I could understand the cryptic api documentation. Thank you very much.
Thanks, Dr Pound (if you read this). I find your demeanour easy to engage with, and you set me off on the journey of understanding fully (with much work!).
Hmm, so far this is fairly straightforward, but the interesting part would be how exactly these compression functions work. Will there be a follow-up video on that?
In essence, it generates 80 32 bit words derived from bits of the plaintext, then the state does right circular shifts, some XORs, some bitwise ANDs, addition with the round word and round constant, and then permutation between all state variables
Me: Explain SHA Dr. Pound: Explains it Me confused: Explain it to me like I'm 12 Dr. Pound: Explains it like I'm 12 Me still: Explain it to me like I'm 5...
Mike you are my favourite person to appear on this channel. I enjoy your clear explanations and like the quite recent toppics like google deep dream, dijkstra and so on.
The compression function of SHA is where it gets quite complicated, and I don't think it would've fit into the scope of one video, as explaining it to someone with no prior knowledge isn't trivial, there's quite a bit of complicated math involved, and very few people actually understand the details of it.
Since SHA is deterministic, even though it is non-reversible, it is still possible to guess the hashes of some reasonably short messages. For example, string 'abc' ALWAYS produces ba7816bf8f01cfea414140de5dae2223b00361a396177a9cb410ff61f20015ad. If I have a large enough database plus computational power, I could probably guess some short messages, although not the entire novel.
aullik Considering almost all real-world data is stored as a stream of bytes (8 bit values), That's incredibly unlikely to ever come up. It could be 504 bits, but 511 is highly improbable. If your padding has to add at least 8 bits (one byte), then the thing he described works fine. Remember working with individual bits is almost unheard of in computing. If you have to store individual bits for storage efficiency, you pack them into bytes. (similarly, if you store 7 bit values, you either store them in 8 bits and ignore a bit, or you pack it such that you store, say, 56 bit blocks. (7 x 8 - eg, 8 sets of 7 bits stored in 7 bytes)
aullik: Exactly the question that raised to my mind too :-) Since there isn't necessary enough bits left in the block to include the length of actual message.
+KuraIthys Going with bytes, the longest message that could still be padded would be 496 bits long. 504 wouldn't work as you'd only have 8 bits left but 504 in binary is already 9 bits long.
+Kuralthys I know that we usually work with bytes, But even if we say we have 512-8 = 504 bits Then we add 1 '1' bit to start the padding and now we only have 7 bytes left. The message is 504 bytes long but we can only store 128 in 7 bits. The only answer is that we expand to 1024 bits. But the question would be how do we expand. What is the "syntax" for the lack of a better word
I kinda want to make my own hashing algorithm now. It wouldn't be very good, it would just be some random jostling around of bits until it looks weird.
3:17 And the reasons why the NSA came out with SHA-1 to replace the earlier SHA-0 (or just plain “SHA”) were not revealed publicly. But the weaknesses in the original SHA were discovered independently a few years later. This was part of a sequence of evidence indicating that the gap between public, unclassified crypto technology and what the NSA has was narrowing, and may not be significant any more.
I think it's widening because look at Pegasus and with Pegasus 2.0 you only need phone number to target a victim. And, Pegasus is joint project between Israel and USA. Imagine what NSA would have kept to themselves. It is common understanding in computer security feild that if government wants you, they have you.
The key idea that i got from this video is that hashing is not encryption and there is a difference between the two, while its easy someone confuse between them.
What I want to know, for no particular reason, is if there are cases where a hash of a hash equals itself, of course sticking with one particular algorithm and hash length.
For those who did not get padding. SHA1 works with 512 bits or multiples of it. If the message is less than 512 then you need to pad it. Let’s say the message is 10011, Before you hash it, you need to pad it to make it 512 bit long. Note that it has a length of 5 bits only. 5 is represented in binary is as 101. You start padding with 1 always so 6th digit will be 1 and pad 101 as last 3 digits. Pad the remaining digits (512 - 5 - 1 - 3 = 503) as 0s So message with padding will look as below. 10011_100000000........000101 (“_” represents start of padding)
Thought I was following until 9:35 He describes a way of padding that will produce the same padding string for messages with the same length - then says it's important that messages with the same length don't have the same padding string. Did something important end up on the editing room floor?
I'll check with Mike but I think it was just a slip of the tongue - ie The padding would be the same for messages of the same length but the messages would be different if they are different >Sean
If that's how it works, it is very easy to find collisions: 1. Hash 20 bits long data 2. Copy the 512 long data that have been created (by the rules of padding one followed by zeros plus the size) Then you have two inputs that are essentially the same who share the same output. So I think there is a lot more sense in applying those rules no matter what the size of the input is, and adding 512 bits blocks to the end if needed. I think this is how the SHA works.
That's how it works, yes - it's always padded. Without padding you can easily append whatever you like at the end of a message; an important part of the integrity check is to tell where the message ends. It's not at all easy to find collisions, though. When you hash 20 bits of data you're actually hashing 512 bits of data as the algorithm only works with exactly 512 bits at a time, i.e. one block. The remaining 492 bits must therefor be padded in a consistent way - if you pad it this way and I pad it that way, we'll end up with different 512 bit blocks, which in turn will result in very different hashes. If the message length (in bits) modulus 512 is 447 or less, there's room for the padding which is one 1, followed by however many 0's needed to get to 448 bits. Finally, the 64 bit length of the message is added (which brings it up to exactly 512 bits). If there's not enough room, additional 0's are added in a following block, up until there's only 64 bits left. (If the message length modulus 512 is 0, then the final block will consist of nothing but a 1 followed by 447 0's and then the length.)
Ah this video has aged like milk, or rather SHA1 has I should say. It's a prime example of why you can never count on anything permanently being secure as eventually a basic error in the underlying math will be found and exploited.
That 011001011 he wrote down is actually the start of the SHA hash value for "abd". I wonder if that was intentional, because the odds of that happening randomly are less than one percent.
Hacking: The Art of Exploitation is a great book by Jon Erickson, which teaches you the basics of reverse engineering, code flow, basic C programming, the stack, networks and other things to get you started on binary exploitation. It's a great book, I recommend it to anyone who's willing to invest time in learning how to hack properly.
cyancoyote Thanks for the reply. I've heard by many people that C is a very hard language to learn though... do you have any recommendations for introductory books to learning assembly?
Since there is an infinite amount of information that can fit in an infinite long string of characters and that a hash output a finished string of characters then two strings can have the same hash. Since the hash function is losing information it is mathematically impossible to have a perfect hash function with an arbitrary long input string
9:18 The trailing 1 is added at the next byte. So if you have: 01101010 it will be padded like this: 01101010[100000000000000000000000000000........ 448] + (64 bits of message size) If there isn't enough space for the 64 bit size block the block will be padded to the end and the size will get its own block. So like this: Block 1: 01101010......[1000000000000000000000000000000........ 512] Block 2: [000000000....... 448] + (64 bits of message size) I know this because I made my own implementation of the SHA-1 and SHA-2 algorithms
If there's less than 65 bits of space left in the final block for padding, you just pad toward an extra block. For example if your message is 480 bits, you add a one-bit, 479 zero-bits, and the 64-bit length, giving total length 1024 bits = 2 blocks.
Can you talk about the colliding prefix issue? As I understand it once I find a collision with a file, I can continue to create collisions by appending the same thing to both files, and some how this allows me to create two meaningful files each with the same hash value where one might expect that any collision which might be found would be obviously fake because it would have to be made up of a bunch of random bits.
Interestingly, since the number of inputs is infinite, there are infinite inputs resulting in the same digest. Don't try to find a collision tho, 2^160 is in the order of the number of atoms in the universe
How would the padding work if the final block of the message was long enough that you don't have enough padding room to say the number of bit in the message? So if the final block contained 510 bits you would have to pad in 9 bits(111111110) to say that the message is 510 bits, but you would end up with more than 512 bits.
The length field has a fixed size (which is sufficient enough) (also the field is not optional). The length of 10...0 is decided including the size of the length field i.e. you could jump over to the next block if required.
Mike Pound is by far my favorite person on this channel... he has the most interesting subjects, shines with crazy knowledge while still keeping the video fresh and dynamic.
I like him and his topics too, though the AI topics are interesting and the person explaining them is good too
he has great body language, tries to use it as much as possible
And a fair looker.
And the same accent as the 11th Doctor (Matt Smith)! :-D Where is that accent from?
Absolutely agree, Tom Scott is my second favourite, that guy is hillarious
I could sit and watch videos from this guy all day long, so informative and laid back
wrg
Love how these videos get STRAIGHT to the point.
This is too much work, can’t we just trust each other?
That ,my friend, is the real problem
How can I trust other people when I can't even trust myself
@Mohamed Seid GodisGood666!
Dont trust verify
No Way!!!
Been watching a whole bunch of Mike's videos as a complement to my introductory module on Security and Authentication. One of the best teachers I have come across!
I've been trying to understand the concept for 3 days from the slides my teacher covered and the book she shared and ended up with complicated mind, this video gave me a pure understanding in 10 mins. Great job!
Roses are red
Violets are blue
Unexpected { on line 32
coding joke
A poetic compiler? I like that idea
Unresolved external symbol
Felt that on a spiritual level
Violets are blue
Roses are red
Your code isn't thread-safe
Use locks instead
Mike Pound is the best! I love hearing him explain things - keep em coming!
I am at a hackathon in Chicago Illinois at Illinois Institute of technology and I have to use sha-1 on some facts before I pass then to an api so I can make a project for the Hackathon. You did a wonderful job telling me what she-1 was so I could understand the cryptic api documentation. Thank you very much.
This is my favorite guy on this channel. I just love stuff like this.
The washing machine example really helped seal in this topic I was trying to understand and helped me on my final project. Thank you!!!
Thanks, Dr Pound (if you read this). I find your demeanour easy to engage with, and you set me off on the journey of understanding fully (with much work!).
pound for pound Mike pound is the best narrator on computerphile
My dealer need this.
Appreciate your feed back!
Thanks for watching, for more info and guidance on how to trade and earn.
W…h…a…t…s…A…p…p~~M.E……
+…1…7…2…0…3…1…9…7…5…5…1
😂😂😂😂😂
😆
🤣
Hmm, so far this is fairly straightforward, but the interesting part would be how exactly these compression functions work. Will there be a follow-up video on that?
In essence, it generates 80 32 bit words derived from bits of the plaintext, then the state does right circular shifts, some XORs, some bitwise ANDs, addition with the round word and round constant, and then permutation between all state variables
@@liljuan206 thanks, this really helped clearing things up
it isn't compression he is describing it is hashing. which is not what encryption is. which is what sha is. (notice the s part stands for secure).
@@liljuan206 how do they make it so it can't be reversed?
In essence Sha-2 uses 6 primary functions: Choice and Majority, and S0, S1, E0, and E1 all which move and permutate bytes around during compression
Me: Explain SHA
Dr. Pound: Explains it
Me confused: Explain it to me like I'm 12
Dr. Pound: Explains it like I'm 12
Me still: Explain it to me like I'm 5...
I've always loved your videos and now I study computer science and can watch your videos for studying, it's amazing
Note to self: Don't use a regular monitor as a touch screen
Its a university flatron monitor, probably expendable.
Mike you are my favourite person to appear on this channel. I enjoy your clear explanations and like the quite recent toppics like google deep dream, dijkstra and so on.
easy-going video which explains just enough about SHA algo to keep it simple. The details are better learnt once you "get" the basic idea.
I love this channel so much...
SHA Hashing Algorithm?
Secure Hashing Algorithm Hashing Algorithm
ATM Machine
RAS Syndrome
LAN Network
GNU's Not Unix...wait a minute
LCD Display
You explained everything except for the part that actually matters. :(
You may as well have said, sha works by shaing things.
Exactly my thought :/
That they explain complicated things in an easier to understand manner. Sorta like every other video they make.
Ah, I see now...it's a washing machine with some knobs that does the sha'ing.
The compression function of SHA is where it gets quite complicated, and I don't think it would've fit into the scope of one video, as explaining it to someone with no prior knowledge isn't trivial, there's quite a bit of complicated math involved, and very few people actually understand the details of it.
YES exactly this..
Some people speak terrible not understandable english, he is one of them. Even whole words were not completely spoken.
Since SHA is deterministic, even though it is non-reversible, it is still possible to guess the hashes of some reasonably short messages. For example, string 'abc' ALWAYS produces ba7816bf8f01cfea414140de5dae2223b00361a396177a9cb410ff61f20015ad. If I have a large enough database plus computational power, I could probably guess some short messages, although not the entire novel.
That's exactly how most cracking is done. Hashed database against hashed database lol
I love these videos when Dr. Mike Pound is in them.
Dr Mike Pound is the best! More videos with him please
How does the padding work if a block is 511 bits long?
aullik Considering almost all real-world data is stored as a stream of bytes (8 bit values), That's incredibly unlikely to ever come up.
It could be 504 bits, but 511 is highly improbable.
If your padding has to add at least 8 bits (one byte), then the thing he described works fine.
Remember working with individual bits is almost unheard of in computing.
If you have to store individual bits for storage efficiency, you pack them into bytes.
(similarly, if you store 7 bit values, you either store them in 8 bits and ignore a bit, or you pack it such that you store, say, 56 bit blocks. (7 x 8 - eg, 8 sets of 7 bits stored in 7 bytes)
aullik: Exactly the question that raised to my mind too :-) Since there isn't necessary enough bits left in the block to include the length of actual message.
You could add another block of 512 bits to the end to make it work.
+KuraIthys
Going with bytes, the longest message that could still be padded would be 496 bits long. 504 wouldn't work as you'd only have 8 bits left but 504 in binary is already 9 bits long.
+Kuralthys
I know that we usually work with bytes, But even if we say we have 512-8 = 504 bits Then we add 1 '1' bit to start the padding and now we only have 7 bytes left. The message is 504 bytes long but we can only store 128 in 7 bits.
The only answer is that we expand to 1024 bits. But the question would be how do we expand. What is the "syntax" for the lack of a better word
Would you please explain the workings of the "washing machine"? ;-) I.e. the compression functions?
Thanks. I'll give this snippet a look. :-)
Thank you so much. I had a hard time finding someone to explain it well
I kinda want to make my own hashing algorithm now. It wouldn't be very good, it would just be some random jostling around of bits until it looks weird.
3:17 And the reasons why the NSA came out with SHA-1 to replace the earlier SHA-0 (or just plain “SHA”) were not revealed publicly. But the weaknesses in the original SHA were discovered independently a few years later. This was part of a sequence of evidence indicating that the gap between public, unclassified crypto technology and what the NSA has was narrowing, and may not be significant any more.
I think it's widening because look at Pegasus and with Pegasus 2.0 you only need phone number to target a victim.
And, Pegasus is joint project between Israel and USA. Imagine what NSA would have kept to themselves.
It is common understanding in computer security feild that if government wants you, they have you.
The key idea that i got from this video is that hashing is not encryption and there is a difference between the two, while its easy someone confuse between them.
Another video explaining SHA-256 would be awesome.
What I want to know, for no particular reason, is if there are cases where a hash of a hash equals itself, of course sticking with one particular algorithm and hash length.
Re watched it at least 10 times. Thank you for this explanation
The thumbnail made me think "OSHA" with the O as Dr Pound's head.
Loved the washing machine demonstration!
For those who did not get padding.
SHA1 works with 512 bits or multiples of it. If the message is less than 512 then you need to pad it.
Let’s say the message is 10011, Before you hash it, you need to pad it to make it 512 bit long.
Note that it has a length of 5 bits only. 5 is represented in binary is as 101.
You start padding with 1 always so 6th digit will be 1 and pad 101 as last 3 digits.
Pad the remaining digits (512 - 5 - 1 - 3 = 503) as 0s
So message with padding will look as below.
10011_100000000........000101
(“_” represents start of padding)
the video's shoots are like modern family and that make's me happy ! also the information so thanks!
Thought I was following until 9:35
He describes a way of padding that will produce the same padding string for messages with the same length - then says it's important that messages with the same length don't have the same padding string. Did something important end up on the editing room floor?
I'll check with Mike but I think it was just a slip of the tongue - ie The padding would be the same for messages of the same length but the messages would be different if they are different >Sean
No, "0010110" padded would be "0010110100000...", but "001011000" would be "001011000100000...", so the 1 (first bit of padding) would be later.
+Mat2095 He obviously meant if you just pad them with zeros.
I remember when SHA1 was actually still secure, and people could get away with MD5 (although it was started to be frowned upon). Now I feel old.
Apple once tried to get away with MD4.
It'd be amazing to see Dr.Pound reviewing some books from his collection. Get to know his technical interests apart from image analysis.
I would love to see a video about the compression function! :)
@5:21 "We might talk about that in a bit", proceeds to encrypt that bit in sha and turns it to 160 bits
I always wondered how these things work. Great video
If that's how it works, it is very easy to find collisions:
1. Hash 20 bits long data
2. Copy the 512 long data that have been created (by the rules of padding one followed by zeros plus the size)
Then you have two inputs that are essentially the same who share the same output.
So I think there is a lot more sense in applying those rules no matter what the size of the input is, and adding 512 bits blocks to the end if needed.
I think this is how the SHA works.
That's how it works, yes - it's always padded. Without padding you can easily append whatever you like at the end of a message; an important part of the integrity check is to tell where the message ends.
It's not at all easy to find collisions, though. When you hash 20 bits of data you're actually hashing 512 bits of data as the algorithm only works with exactly 512 bits at a time, i.e. one block. The remaining 492 bits must therefor be padded in a consistent way - if you pad it this way and I pad it that way, we'll end up with different 512 bit blocks, which in turn will result in very different hashes.
If the message length (in bits) modulus 512 is 447 or less, there's room for the padding which is one 1, followed by however many 0's needed to get to 448 bits. Finally, the 64 bit length of the message is added (which brings it up to exactly 512 bits). If there's not enough room, additional 0's are added in a following block, up until there's only 64 bits left. (If the message length modulus 512 is 0, then the final block will consist of nothing but a 1 followed by 447 0's and then the length.)
10:21
What's amazing is the Tom Scott "rocket" animation didn't show up on a video from Dr. Pound
5:50 summarised the subject in 1 sentence ;-)
4:30
SHA-1
5:24
Compression Fuction.
6:29
Permutation.
7:36
SHA reversion.
Love the Schildt on your wall!
I like the words at the end. The shower function. Murkland damn.[...] Obviously speech recognition still have some way to go.
would love an video on SHA-3
This man forgot more about IT security than i will ever learn
keeps me engaged great explanation
I feel like a genius learning everything here!
Sometimes I wonder how Mike's videos are free.
Elegant explanation. Thank you, Thank you, Thank you 😊👍
How do you know the "1000000..." padding bits are for padding purposes, and not part of the actual data/plaintext itself?
Ah this video has aged like milk, or rather SHA1 has I should say. It's a prime example of why you can never count on anything permanently being secure as eventually a basic error in the underlying math will be found and exploited.
AFAIk SHA2 is considered secure for the moment. And the very basic process is the same for them
Finally a simple explanation of why the hash functions can't be reversed
He didn't mention that at all.
That 011001011 he wrote down is actually the start of the SHA hash value for "abd". I wonder if that was intentional, because the odds of that happening randomly are less than one percent.
I didn't know that SHA was short for anything until now.
9:49 captions about Merkle-Damgard Construction are hilarious
never been this early for a computerphile, dope
Anyone notice the 'hacking' book on the shelf behind?
It doesn't look like anything to me
Hacking: The Art of Exploitation is a great book by Jon Erickson, which teaches you the basics of reverse engineering, code flow, basic C programming, the stack, networks and other things to get you started on binary exploitation. It's a great book, I recommend it to anyone who's willing to invest time in learning how to hack properly.
lol
cyancoyote is knowledge of a programming language required?
cyancoyote Thanks for the reply. I've heard by many people that C is a very hard language to learn though... do you have any recommendations for introductory books to learning assembly?
Since there is an infinite amount of information that can fit in an infinite long string of characters and that a hash output a finished string of characters then two strings can have the same hash. Since the hash function is losing information it is mathematically impossible to have a perfect hash function with an arbitrary long input string
Summarizing data is not the point. You should look at Tom Scotts video on hashing to understand the difference between hashing and compression.
Samuel Prevost Sounds right. Finished = finite?
Orchestral score sheet was pretty a unexpected thing to see in a video about cryptography =))
Excellent, finall a video with subtitles :)
It would be amazing a video how you can get tracked for example: ip, mac, canvas, hd serial number, etc
Thanks for your great work!
Superb video! Understood it even better with a lefty teaching me ;)
Mike is the best
Excellent as usual, good learning resource
I have no sound in either Chrome or Edge. The commercial at the beginning plays just fine. Other videos play fine.
Love these videos.
9:18 The trailing 1 is added at the next byte. So if you have:
01101010
it will be padded like this:
01101010[100000000000000000000000000000........ 448] + (64 bits of message size)
If there isn't enough space for the 64 bit size block the block will be padded to the end and the size will get its own block. So like this:
Block 1:
01101010......[1000000000000000000000000000000........ 512]
Block 2:
[000000000....... 448] + (64 bits of message size)
I know this because I made my own implementation of the SHA-1 and SHA-2 algorithms
Hey can you explain me everything in brief again I wanna know If I got my thinking right
Good job! Your videos are excellent.
Oh nice, string hashing via SHA1 is something I've been interested in.
What happens if a message is smaller than 512 bits but long enough for the padding part to not have any space left to store the length of the message?
Then you pad to 1024 bits(including message length)
9:40 I didn't quite understand how that padding scheme guarantees that messages with the same size would not share the same padding.
What if the message is only a few bits shy of a block, not enough room for padding bits as described?
If there's less than 65 bits of space left in the final block for padding, you just pad toward an extra block. For example if your message is 480 bits, you add a one-bit, 479 zero-bits, and the 64-bit length, giving total length 1024 bits = 2 blocks.
Matthijs van Duin thanks
Can you talk about the colliding prefix issue? As I understand it once I find a collision with a file, I can continue to create collisions by appending the same thing to both files, and some how this allows me to create two meaningful files each with the same hash value where one might expect that any collision which might be found would be obviously fake because it would have to be made up of a bunch of random bits.
Interestingly, since the number of inputs is infinite, there are infinite inputs resulting in the same digest. Don't try to find a collision tho, 2^160 is in the order of the number of atoms in the universe
wingardium levioSHA! (is what I immediately thought of when you started the video with 'shaa...').
Nice! Could you make a video about post-quantum cryptography please? It will be a great opportunity to learn more about this stuff
Thank you very much for this video :) It was very helpful and educational!
Llama 2 recommended your channel on this topic 💯 😊 crazy, isn't it?
0:34 who made that visual ? :P
haha !
How would the padding work if the final block of the message was long enough that you don't have enough padding room to say the number of bit in the message? So if the final block contained 510 bits you would have to pad in 9 bits(111111110) to say that the message is 510 bits, but you would end up with more than 512 bits.
The length field has a fixed size (which is sufficient enough) (also the field is not optional). The length of 10...0 is decided including the size of the length field i.e. you could jump over to the next block if required.
What happens if your message is, say, 509 bits in length? How do you pad it if the length won't fit?
You teach this better then my professor
it's funny how video quality has not changed much in the past 7 years
At 0:34, my mind went dirty.
don't stop the video at 0:35
7:50 that made it click for me, thanks!
Thank you computerphile:-)...
Thank you! Made hashing much clearer for me now :)
He’s a very knowledgeable guy, what are his qualifications ?
dude cleaned everything up except his monitor
I love your funny words, magic man.