Essentials: Brian Kernighan on Associative Arrays - Computerphile
Вставка
- Опубліковано 6 жов 2024
- The 'Swiss Army Knife' of data structures, Professor Brian Kernighan talks about the associative array with beer & pizza.
EXTRA BITS: • EXTRA BITS: Essentials...
"Code" Books: • "Code" Books (Prof Bri...
Many thanks to Microsoft Research UK for their support with the 'Essentials' mini-series.
/ computerphile
/ computer_phile
This video was filmed and edited by Sean Riley.
Computer Science at the University of Nottingham: bit.ly/nottscom...
Computerphile is a sister project to Brady Haran's Numberphile. More at www.bradyharan.com
Pizza: 10 POUNDS!
Beer: 20 POUNDS!
Coffee: 2 POUNDS!
Beer: 20 POUNDS!
You go Kernighan, that's the spirit!
I see a Computerphile video featuring Brian Kernighan, I must drop everything and watch and "thumb-up". I'm a simple guy.
tg
I definitely fell in love with associative arrays in my Data Structures class in college. Between these and linked lists you can build just about everything.
@@myspace_forever That’s an imported library.
Built under the hood with associative arrays and linked lists.
I love Brian's voice, and how gentle and methodical he is when explaining things
It makes me so happy to get some more lectures from my favorite prof even all this time after graduating. Not many people can be this entertaining and this informative at the same time!
Map is also a common name for this data structure
Sebastian Schrader he did mention that associative arrays can be referred to as [Hash]maps.
I just rewatched it and didn't hear him say it. He mentioned only hash table, hash and dictionary.
Sebastian Schrader my bad, I must've misheard!
Although for C++, it's important to remember that map is usually some form of binary search tree and unordered_map is a hash map.
object. B)
This was a really great video! The way I get it, the value of a hash table is that it's flexible and, as the Professor Kernighan noted, has almost constant time. You can use any type of data as the indexing element, thanks to the hashing function, and you almost always go through the same number of steps to access any data in the array, which is very different from--for example--a search function. And it's probably easier to read and understand in code. The only downside I see is that a hash table can be inefficient in terms of how much memory is used.
It is the classic "cpu time" versus "memory used" trade-off in computer science.
Access time in terms of caching seems inefficient as well
hashmaps are one of many ways to implement the associative array abstract data type. some of the most famous alternatives would be tree maps, implemented using self-balanced or unbalanced binary search tree, or associative lists, implemented using linked lists.
One thing in common between most if not all of these videos is that it is such a delight to listen to these experts talking about things in their respective areas.
I'd have loved to have him as a professor! Very clear explanation :)
A legend that truly understands 'the programmer'
Too bad this series came out too late to interview Dennis Ritchie. RIP.
Ken Thompson is still with us...
@@treyquattro ken doesn't like the interviews.
this guys shopping list
Beer
Pizza
Coffee
Beer
£134 worth of coffee at that, hooooly
Classic Kernighan examples :D
Eh. Sounds like your average programmer.
you forgot beer
+noredine Sorry I forgot, I'm blaming this one on the beer.
Very interesting. I'd never studied how these structures were stored internally, and now I finally understand why data stored in a hash is stored in a somewhat random looking order.
Larry Wall: Doing linear scans over an associative array is like trying to club someone to death with a loaded Uzi.
Elegantly put.
40th
I love this legendary man...
truly legendary, too bad I'm never gonna meet him in person...
I'm using hash tables all the time in my code. In C# they are called Dictionary. Very useful collection type indeed.
When he was talking about pounds, I initially wasn't sure if he meant weight or currency, so I was thinking "he buys 20lb of beer and pizza?!; programmer for life"
Maybe that's my problem, I don't like beer.
As a programmer myself, I figured I might not learn much, but I didn’t realize hash tables utilized linked lists under the hood.
" Maybe beer collides with pizza. I mean they go well together! "
Awesome idea to bring this "Essentials" series, specially for us who have seen all this some time ago at University.
he's a young Dumbledore of programming wizardry
Can Kernighan please explain the Lin-Kernighan heuristic?
We were doing this type of algorithms back in the early 80's to manage memory allocation for paging systems.
Some "administrative" programming languages have "temporary database tables". They are not committed to disk, they are private, they do not bother much about the overhead of behaving like a database table. But they do such a job just fine (or better) and you do not have to invent a hash function or copy data when things get crowded.
typograf62 These days all languages that have sqlite bindings automatically get “temporary database tables”. In .net you also get DataSet.
In perl it's actual '%' not '#'. '#' is for comments instead.
But yeah perl has hash tables as a basic data type. That always seemed very weird to me, but now I get it. Up until now, I simply could not understand how something seemingly so elaborate could be said to be efficient or quick. I get it now.
An episode about character sets and encoding algorithms would be interesting.
7:35 the marker pen makes its sound even when not being used :-o
How the heck did you catch that!? Please teach me how to sorcerer.
Are you claiming that it's a magic marker?
Let's make a hash table JESSY!! -Misteeeer Kernighan, this is the purest blue linted code i've seen!
That small hesitation before 'Javascript "programmer"' makes me giggle.
Are you trying to tell me that HTML is not a programming language?
Hmmm!
Shared the same sentiment, until I started to program in React + Redux. It's as sophisticated as anything else really :)
People who use things like C++ and such hate to call people who use "scripting languages" like JavaScript actual programmers.
Yeah... I was on a group project in college that managed to, in one semester, add a whole 7 lines to node.js
that was a mistake... Javascript is hellish, and I feel sorry for the people that have to look at it for their jobs.
the only thing wrong with javascript is the few remnants of java in it. :P
I have used PERL hashes before but I don't think I really grasped the inner workings of them until watching this 10 minute video.
These are so essential that In Lua hash tables (called tables in the language) are the only data structuring mechanism, ie.e there are no lists, sets etc., only hash tables.
Please also make a video on Open Addressing, which is another way to implement associative arrays.
When BK looks in to the camera i feel as if he's speaking directly to me.
As if I'm Neo from The Matrix.
The Perl sigil for hash tables is %, not #.
Are tuples implemented in the same way by programming languages that have them?
how do you loop through an associative array?........like in a traditional array, you can just start a for loop as (i=0;i
You use an iterator, as you can't index memory sequentially like with arrays.
Something like
iter = map.keys(); // or values directly
while((elem = iter.next()) != null) {...}
The details differ slightly between languages, but this is in general the way to do it.
Blueluelueluelue depends on how you implement hash function, usually hash function takes key and provides a number that corresponds to that key. So what you should do is just make normal array of n elements where insertion is done on indexes that correspond to key, what that means is that developer can go through whole array like you just said but user can't.
Blueluelueluelue they're typically linked lists i believe. or you can also just use an iterator
The correct way is to use a foreach loop if your language supports it. It should automatically get the iterator for you and iterate through each element in the array.
foreach() is the easiest, IMO, way to loop through associative array. And by using associative arrays you don't have to loop through it to find the one you are looking for. For example if you need to find price of coffee, you just use that associative index. echo $data_array['coffee'];
php example follows:
foreach($data_array as $key => $data) {
// your code here
}
Inside that foreach loop, there are two variables, $key and $data, $key is the current array index and $data is anything that current index of $data_array holds. It can be anything that variable can be, another array perhaps :D
If you care about performance you should consider not using collision lists, but keep the array flat (each element contains the actual (key,value) pair instead of a pointer to a list) and use linear probing. It's usually faster. You only need to be careful where to insert new elements and how to remove elements.
You can then even separate the (key,value) array in two arrays, one for the keys and one for the values which is especially useful if you're iterating a lot and you're mostly interested in the keys for example.
(Or even better, just use the builtin)
I work with these every day. Very common in the medical industry.
For anyone just getting into the java world, if you are going to use a Hashtable somewhere, its probably better to use a HashMap instead. More details can be provided by google/stackoverflow.
In Java, they're called HashMap. In Javascript, plain, anonymous objects are used for this purpose. (Also, fun fact: in Ruby, the operator that associates a key with a value, =>, is called a "hash rocket".)
IceMetalPunk in JavaScript, there's been Map and WeakMap for a couple years.
in bash you can create an associative array with:
declare -A array
array[pizza]=20
Beer, Pizza, Coffee, and Chips... A programmer's grocery list for sure!
We only know that the value for pizza is in some location because the hash of pizza gives the "address" (not sure if it's literally the address), right? So if there is a collision with another value and we expand the linked list how exactly would we differentiate between the two values?
The foundation of many efficient algorithms :)
Loved this!
It's one step further when your associative array can have different types of key. At that point you can model OOP at some level. :)
Not that is the most efficient to do it that way. But it's a fun diversion.
I wish I had a tenth of his knowledge.
I came across hashes in PERL and thought wow as they are so logical but I never thought about how they worked under the hood.
I think you should write specific hashingfunctions for specific applications, like you make a hash out of a string, while only adding the position of the letters in the alphabet instead of the unicode-id.
Why don't you split associative arrays into associative key-array and data-array, where you can reuse the key-array on other data-arrays, as you making a struct in C(++) and the key-array to access a specific member (which is "inlined" into code by the compiler) is not stored within the struct.
For some reason I always thought associative arrays would be complicated to implement.
The complexity is in making them efficient for the maximal numbers of use cases. An associative array that only expected strings as keys can be optimized better than one that has to handle many disparate kinds of keys.
The problem with them is choosing the number of buckets. Choose too many and you have wasted space. Choose too little and you have long lookup times. Then to adjust the bucketsize as Brian talked about, it takes a fair bit, so it's not something you want to do often.
At their simplest, they are simple. But then there's the implementation choices and optimisations about the hash function, numbers of buckets, re-allocation strategies etc., and they suddenly become complicated.
The most complicated of them minimize overhead either in the space-complexity sense, or the time-complexity sense. The simple implementations fall right in the middle.
Great video!
You guys should do a video on, Network on Chip! :P
Associative arrays are especially useful when trying to conserve time and space.
Otherwise, you'd be enumerating local variables quite a bit
The master has spoken: associative array it is.
Did Brian Kernighan just make an off-by-one array length error??? So... Much... Irony...
for some reason hearing that marker really kills me inside
Why use a linked list to deal with collisions? Why not use a second-level hashtable with a different hash function? The chances that two items will collide in two hash functions is vanishingly small.
How many interviewees learn the crews' names? Cool guy.
This is THE Brian Kernighan. 27 dislikes?! Are those people nuts?!
key value pairs. oft derided by comp sci and database guys is a natural way to handle data.
I first learned about associative arrays when I learned Tcl and I thought, "that's magic!"
This video is more about Hash tables than associative arrays, and even then it only looks at one way of doing collision resolution.
Thanks, I saw debugging Java Hashtable the effect of collisions, but I didn't recognize it for what it was, I believed it was an Eclipse strange bug!
Brian could describe his breakfast for 2 hours and it would still be interesting
How about doing some UA-cam magic and making "Essentials" a actual UA-cam series, like Tom Scott did with the fizzbuzz video recently +Computerphile? Anyways, nice miniseries.
What assoc. array library should I use for C? If I don't want to implement it each time, what do you suggest?
Why is the symbol for "pound" that (strange to Americans) upside-down 7 with a line through it?
02:15 I love the £0 spent on juice! *lol*
Weirdly I refer to them as hashmaps or just maps when talking about them in general, even though my two main languages calls them dictionaries (Python) and objects (JavaScript)...
Wonder where I got that from, maybe back in programming class... Are they called hashmaps in C++ maybe?
The advantage of calling them "Dictionaries" or "Arrays" is that you abstract the problem away. After all, whether a Collection uses a fixed array or a hash table should be entirely an implementation detail, usually dependent on the number of elements in the collection, and whether uniqueness is required. The programmer typically shouldn't care about the implementation detail, only the boilerplate description, and big O characteristics.
Not a CS dude here; why have those linked lists on top of your associative array? Why not just use a hash function with truly unique outputs, and have N be the exact number of elements in your array? This way every key is assigned a unique integer, and there's no more fuss with checking for repeated hashes. I'm sure theres a reason, I'm just wondering what it is.
Android480 It’s not normally possible to have a collision-free hash function. Such hash functions are called perfect hashes, and you can only develop them when there is a tractably finite set of values you wish to ever hash. Perfect hashes work great for attaching data to dictionaries with a fixed set of keys. But generating a perfect hash function is not cheap, so you can’t cheat by enumerating all the values and recompiling the hash - it’ll usually take way more time than dealing with hash collisions.
Topic suggestion: persistant data structures.
i feel uncomfortable of the sound of the marker grinding on paper, :
I've only ever heard "associative array" used in PHP, a language I try to stay as far away as possible.
web devs (cringes)
Avoiding PHP is to your credit. It was the 90ies version of node.js ;-)
The first one I saw was written in IBM-360 ASM. They are very useful when making compilers and interpreters. Programmers are notorious for using variable names that are similar.
"Buying beer and, pizza, and coffee, and chips" - yeah, 100% programmer confirmed, lol.
how did he get my shopping list
Thank you!
Could you please add English subtitles??? It's very hard to non native English speakers like me to understand everything you say. I've seen other videos from this channel supporting this feature or at least allow Google auto captioning
Its probably queued up to be auto-captioned by Google. Likely it depends on the number of views a video has before it gets put into the queue.
5:55 Well, in that case you might need to *undrink* some beer, diplomatically speaking. That was a very common occurrence during my university years.
Buy a mic for the interviewer too. How expensive can they be ?
In essence, an array without index numbers?
In essence, an array with index numbers converted from actual keys.
I thought hash collisions were exceptionally rare... do they really come up that much in associative arrays?
It depends on your hash function. It needs to be rare for cryptographic hash functions, but hash functions for hash tables only really need to be balanced--- infrequent collisions are okay if your hash values are spread out over the entire table.
Is Dr. Kernighan in Nottingham or something?
Why arnt there just two arrays, one with the keys (So on a access you loop trought fill you find the index where the key was) and another array with the values (which you would access by the index where the key was in its array)
The "mission" critical issue which Brian didn't really get around to is reducing the lookup for any one element. You don't want your algorithm to have to traverse the entire structure in order to find what 'could' be the 'last' element in a very very long list. Too inefficient. So the modified hash table is superior to an array or standard linked list or doubly linked list.
Yeah, your two array solution is O(n) to access an element, a hashmap is O(1)
The search cost for table lookups for that approach is very expensive. For string keys (as in this video), you end of comparing the strings for equality to find the matching key in the table. With a table of size K, you can expect to have to check K/2 keys on average to find a match. With hashing you still have to scan the lookup key string once to produce a hash value, but you then only have to search a much smaller subset of keys (the collisions), trying to match the key. Much, much faster. However, as with all things, there is a worst case scenario - the one where ALL keys collide to one hash slot - that then requires checking the same K/2 key strings to the search key string (as above). But this is very unlikely to occur in practice.
Well... C# has both, Dictionarys and HashTables
I'm confused now
they essentially serve the same purpose, but have some internal differences
The only way for humans to express meaning is language, which uses words as its building blocks. So instead of a meaningless number, address in memory or numeric position in a list, you use meaningfull words instead in associative arrays ... very easy to use ... enhancing readability of code greatly ... BUT ... is hard to implement in any programming language in terms of compilers ...
No love for C++ map?
Whats the algorithm that decides when to increase N?
It can vary. The associative array would keep track internally of both how many table "slots" are used, and also how long the longest collision list is for any one hash slot. When some cost function (which combines the two in some way) reaches some cutoff value, a growth process occurs. Bear in mind that growing these tables is expensive though, as each table entry must be rehashed. So the cost/benefit between growing and not growing (but having longer search lookups) can be tricky to get right. (If you grow too often, you waste cpu growing unnecessarily. If you don't grow enough, you waste cpu on table lookups due to more collisions).
2:15, why is juice free?
"I take pizza and run it through a hash..." Yeah, nobody will eat that pizza anymore...
Coffee is essential. I like this guy :D
0:58 he almost said Perl. He did. What happened to Perl is damn tragedy.
What did happened with Perl?
I clicked on this because I had never heard of "associative arrays" before. But then he said "Hash Tables". I'm like, "Oh, I know what those are".
In actuality, "hash tables" are only one way to implement "associative arrays" in the same way that "linked list" is only one way to implement an "ordered set" ..
diaverde09 Same i heard dictionary
If I was left handed, I would have been tempted to start writing from right to left.
Or at least set up the over-head camera on the other side.
You used Inches on nail size! Shouldn't you use metrics? ;-)
Oh my.
In python Dictionaries it explains it self
just noticed that Dr. Kernighan is a lefty -- why am i not surprised haha
"any arbitrary thing" as array subscript? I'm not sure that's true. I believe the "arbitrary thing" has to be immutable.
It doesn't have to be immutable in general, it just doesn't make sense to mutate a key in such a way that it's hash value would change (unless you rehashed the key after mutation).
his voice is kinda like adam west's
> Coffee is essential
SAVE 418!!