A better hash table (in C)

Jacob Sorber

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 11 лис 2024

КОМЕНТАРІ • 63

@SimGunther Рік тому ⁺²⁸
strager had a whole video on hash tables and it turns out that a better hash function based on the understanding of the keys going into the table equals a MUCH faster hash table! 🎉
@marcossidoruk8033 Рік тому ⁺¹³
That video is completely miselading or at the very least it seems it has mislead you.
Whats being implemented in this video is a general purpose hash table, what the video you mentioned shows is a perfect hash table that only works with a predefined, hardcoded set of words because he needed that for a JavaScript compiler.
Those are two completely different problems and tbh his solution is quite dumb because for such a specific problem and such a limited set of keywords if you really want the highest performance the best option is to do a giant switch statement over the first letter of each word and inside that more switch statements over the second letter and so on, wich is ugly as heck but much faster than a hash table.
@tommasobonvicini7114 Рік тому
Folks, look at the number of thumbs up SimGunther received, then look at marcossidoruk ones: welcome to the software industry.
@godnyx117 Рік тому
@@marcossidoruk8033 Is it really faster tho? If that is the case, then a genera purpose language with good mata-programming features (like D) can easily create a library that does that!
@MaxCoplan Рік тому
it’s pretty obvious strager didn’t really know what he was talking about and just made the video for the clickbait. Did you see the thumbnail? On his stream today he said he didn’t even go to college! How anybody can take software engineering advice from him is beyond me.
@strager_ Рік тому ⁺⁶
> if you really want the highest performance the best option is to do a giant switch statement over the first letter of each word and inside that more switch statements over the second letter and so on, wich is ugly as heck but much faster than a hash table.
You should leave a comment on my video with your suggestion.
@CodePagesNet 4 місяці тому
Thank you for the helpful video and C videos in general. I encourage and promote the understanding that C has advantages over OO, even though people may not understand that yet (OO is merely a code format, and an inflexible one at that).
@ahmadhadwan Рік тому ⁺¹
Very interesting video dr. Jacob, I'm glad you decided to expand on the last video.
@Nohope__ 7 місяців тому
I'm going to have to watch this 10 times.
(TYSM for the amazing material < 3)
@sanderbos4243 Рік тому ⁺⁴
What I really enjoyed programming and found incredibly useful during my 1.5 years of C assignments was to write my own vector implementation. A basic one is only about 50 lines of code. Because my uni also requires us to free() *all* allocated memory manually, I was then able to write void *my_malloc(size_t count, size_t size, char *description): a malloc() wrapper that stores the new address in one of those vectors. I could then call print_allocations() and free_allocations() at the end of my main()! Very nice during debugging.
@Urre5 Рік тому
Did they say why you had to free stuff at the end of main
@sanderbos4243 Рік тому
@@Urre5 I presume it is because in most of our projects we don't have any loops that would force us to free memory. So they just want to make sure we are aware of how to use free() properly. On some systems the OS might not do it for us at the end of the program too.
@Urre5 Рік тому ⁺¹
@@sanderbos4243 yeah I was hoping it's because of the latter part, but they should be explicit, and even in particular teach you not to free on systems which will clean things up, because otherwise you'll waste the users time when exiting the program. Then again if it's a nice arena or something where you have all your allocations it shouldn't take too long anyway
@sanderbos4243 Рік тому
@@Urre5 Totally agree, we pretty much learn to use malloc() and free() however we like, as long as we don't have leaks. We're not told basic performance stuff like it maybe being braindead to use malloc() and free() unnecessarily all over the place, and without telling us about stuff like big O. Every exercise is a PDF, and our school (look up Codam or 42 school) deliberately doesn't have any teachers nor books we have to read, so everyone helps each other, and we spend a ton of time reading up online. It's incredibly freeing since the school is open 24/7 and you aren't required to be there for that many hours per week, but it isn't for everyone, since it's your own responsibility to become an awesome programmer. Oh, and it's completely free. :)
@sortof3337 Рік тому ⁺¹
I thought you stopped syaing without further ado. hahah. very good video. The reason I am good at c is because I have all your videos I can reference. :D
@adambishop328 Рік тому ⁺¹
wow thank you for strcspn, i've been looping through my character arrays for a long time to try and format them into null-terminated strings without any return or newlines. Sweet function
@surters 7 місяців тому
If you want to make a generic hash table, you need a lot of helper functions that knows the type that you would need to pass along, that gives a lot of extra parameter. Or you could just pass along a point to a struct with all those extra functions, each of them function pointers for that type.
Some of the extra pointers could be print_obj, destroy_obj, initialize_obj, copy_obj, assign_obj etc.
@mr.erikchun5863 Рік тому ⁺¹
Thank you Jacob for making these videos.
@greg4367 Рік тому ⁺²
Looking forward to part 2
@tiramihai1152 Рік тому ⁺¹
A 41 minute Jacob Sorber video? I'm in for a ride
@TheSulross 11 місяців тому
Just had to implement an open addressing hash table using linear probing and and double hading to reduce clustering - and I validated that, yes, double hasing does reduce clustering and the second has function can be very cheap and practically no cost.
In my case the has table is allicated up front to some size and does not have to be increased in size over operational life time.
Only keys are stored in the has table so a lookup returns an index. So the data resolved to is kept in a separate array that is of the same max entries size as the hash table itself. So the very same hash table can be used to lookup different data structure values depending on context - something that is the case in my domain.
@RobertaPROTO Місяць тому
Hi there, for a School project i do have to use a double linked list to record frequent items ( for each node i ll put the density and the time it was updated) however the problem is that i have to "organize my double linked list as a hash table using a function H" How is it possible? It also said that i do have to make the pointer to points to the next in term of time of updating
@svenvandevelde1 Місяць тому
Just know that malloc and calloc are implemented using a heap structure, which is much more complex than a hash table. Why not creating this case study without malloc and calloc. Through static memory allocation. The usage of malloc and calloc slow down the logic dramatically. Also, if your hash table size and the structure size can be binary calculated through bit shifts, the key calculation can be made using a rotating random binary calculation. Which will result in blazingly fast key calculation.
@russelwestbrick3023 Рік тому ⁺¹
wonderful teaching!!
@69k_gold Рік тому ⁺¹
Please make a video about terminal (termios.h) and different terminal modes etc
@austinraney Рік тому ⁺²
Is the calloc call at 13:16 not incorrect? I thought it was number of elements then size of t. Right, it still will allocate the same amount of memory, I just would have expected that you would need to cast to make the compiler happy. What am I missing?
@austinraney Рік тому ⁺¹
Having thought about it for a second, I guess the calls would be functionally equivalent. Are there ever cases when they aren’t?
@Uerdue Рік тому ⁺¹
@@austinraney From the manpage, I cannot find any evidence that swapping the arguments could ever mess things up.
I would however argue that it could potentially hurt performance, because the `calloc` implementation might use that extra bit of information you provide by specifying what's the amount of items and what's the size.
For example, it might try to align the memory such that no single item in the array will cross a page boundary. For this, it would need to know what's the element size.
Interestingly, I haven seen more people supplying the arguments in the wrong order than doing it correctly.
@Uerdue Рік тому ⁺⁴
Update: I checked the libc implementation on my machine, and found that it doesn't care: It multiplies the values, makes sure no overflow happened, and then just goes on to allocate a block of memory as large as the result of the multiplication.
Other libc implementations might differ.
@austinraney Рік тому ⁺¹
@@Uerdue thanks for doing some digging and sharing! It’s much appreciated!
I was curious in particular about potential page alignment problems like you mentioned.
@JacobSorber Рік тому
Yeah, calloc is typically just a multiply, a malloc call, and a memset (or the equivalent). Sorry if I caused any confusion.
@randomscribblings Рік тому ⁺⁵
strdup() == malloc() + strcpy()
@aniritri8635 Рік тому ⁺²
Have you checked Zig yet ? Seems like a nice language to overview and compare to C.
@djazz0 Рік тому
And Nim! :)
@FelixNielsen Рік тому
I have a question may be relevant or entirely unrelated. I'm not actually sure.
In short, I have a problem, the solution to which could well be a hash table. My keys are known to be unique and equal length, that is to say in terms of bytes no more than a few, or not strings, but rather integers, if you so desire.
The question the becomes, is there a special category of hash functions (or other method), which can convert these keys into values in a given range, naturally ranging from 0-(n-1) for n items?
Mind you I can think of other solutions for doing what I need to do, which is basically a runtime defined and/or modified switch case like functionality, but none I can think of are entirely well suited.
Thanks for your efforts.
@JaccovanSchaik Рік тому ⁺⁴
33:12 strdup()!
@JacobSorber Рік тому ⁺²
Very true. Thanks
@johanngambolputty5351 Рік тому ⁺²
Just to be cheeky, the thing is, you don't have to type text from video anyway, you can use optical character recognition, I like to do
`spectacle -r -b -o /tmp/screenshot.png && tesseract /tmp/screenshot.png stdout --psm 6 | xclip -sel clip`
set to a keybind, then you can just paste into your favourite editor ;)
@zxuiji Рік тому ⁺³
What I'd like to see is a vid on the new lattice based encryption algorithm, be one I'd definitely save for later and I'm sure a number of peops here would end up using it in future jobs or existing ones if they have them.
@zxuiji Рік тому ⁺²
10:25, you already typedef'd it, you don't need another typedef, rather that's just asking for compile time errors
@zxuiji Рік тому
Considering how you use the allocations should've really just used calloc everywhere to avoid runtime issues
@thomaswillson1107 Рік тому ⁺¹
Hi, can you show us how did you custom your vscode (comparaison operators like `!=`, etc...), thx for the video !
@strager_ Рік тому ⁺¹
Those look like ligatures. You need a font with code-oriented ligatures, and you need an editor which supports ligatures. I don't know what font Sorber uses, but Fira Code is a popular font which has ligatures.
@jvp5000 3 місяці тому
@@strager_ thanks
@Ido-Levy Рік тому ⁺¹
Hey, thank you for putting out these videos! I'm learning a lot from you :) Why aren't you checking for memory allocation failures?
@mytriumph 10 місяців тому ⁺¹
in my experience, it generally isn't necissary on modern computers. The odds that so much of your computer's total memory is being taken up by other processes, so much so that the program fails to allocate a comparatively small amount of memory, is small enough on modern computers that you can reasonably rule it out. Now, is it good practice to check anyway? Yes, absolutely. But it ultimately doesn't end up making that big of a difference
@Ido-Levy Рік тому
Also, why are you using uint32_t instead of just int?
@soniablanche5672 10 місяців тому
uint32_t is always gonna be unsigned 32 bit, int size will depend on your machine
@_veikkomies Рік тому
When you write "tmp = tmp->next" (e.g. lookup function), don't you have to define what "tmp->next" means? Where was that done?
@IBelieveInCode Рік тому ⁺³
"next" is a field of the struct "entry". You probably missed that 🙂
@_veikkomies Рік тому
@@IBelieveInCode Ahh yeah, probably. Thanks
@IBelieveInCode Рік тому
Good Game 🙂
@IBelieveInCode Рік тому
I've just written my own C "Hash Table" module. It's on my channel. Without sound. My english is badly written, but it's worse when I try to speak.
@greg4367 Рік тому
Jacob, the Subscribe button on your WEB page is inop.
@anon_y_mousse Рік тому ⁺⁴
It's not a bad start, but it would help if you made it slightly more generic. You could use _Generic and specialize on a few known types, or you could use a union and associate traits with whatever user defined data gets passed in. It would help to have the user pass in a hashing function and a comparison function and have flags to determine if data should be copied or merely pointed to.
@erbenton07 Рік тому
points-- for unnecessary use of feof
Jacob, long video's are fine
@randomscribblings Рік тому
strdup() again
@randomscribblings Рік тому
In delete you're leaking the key memory.
@JacobSorber Рік тому
Did you watch the video? 😀 Yeah, I know. The example is unfinished. See you next week.
@randomscribblings Рік тому ⁺¹
@@JacobSorber Yeah... the comment was made as I was watching.

Наступне

Автоматичне відтворення

Understanding and implementing a Hash Table (in C)