I'd just like to say thanks for making this video so visually simple. That includes thanks for not including a load of stock footage of different groups of youngish people inspecting laptop screens together, and extra bonus thanks for not cutting out to 1.5 second clips from Marvel MCU movies every 15 seconds. It's actually refreshing to see a UA-cam video that's directly, consitently focused on communicating knowledge about its topic.
Hi, thank you very much for leaving such comment. I’m glad to hear that you appreciate the focus on the important stuff. :-) Apologies for the late reply.
DUUUUDE, I was not expecting to understand git in one video. There are still the specific commands to learn but this video created a solid start for me. Thank you!
Thanks a lot dude! Yeah, it’s a journey, but understanding internals is a very good start. It’s not strictly necessary but I personally I like knowing how tools that I use work under the hood.
It is super nice to have a short video to dive deep into the underlying principle of git. especially, the objects and how merge and rebase work. Thanks a lot for the video!
One of the best videos in explaining git imo! Explains everything extremely clearly!!! No confusing language, and mixing of different terminology. The only thing that is't crystal clear is the very last example of "cherry picking" when rebasing the feature branch onto the main branch. It isn't very clear from this example between which commits the diffs are taken. In the example you want to rebase commit F onto commit D by applying the diff between F and it's parent commit E (which also happens to be the 1:st commit of feature branch since moving off from main) onto commit D along with the diff between D and E. But you never mention what would happen if there were more commits in the feature branch between E and F. Is the rebase operation going to take the diff between the last commit of feature and the 1:st commit of feature, or the immediate parent of the feature? In your example those two possibilities are the same since there are no commits between the 1:st and last commit of the feature branch. Anyway, that's the only thing I could find that didn't make sense. Maybe I missed something and just need to watch it again.
A great, concise and well animated explanation. Great job! PS: Guess you've been blessed by the youtube algorithm, and deservedly so. I'll watch your other videos too.
Thank you! I've been using git for over a decade and you've finally made it clear! If I ran the internet, this video would be shown at the top of any "how to" search about git.
After years of using Git, I finally understand rebase =) However, I personally prefer merge but this is probably due to how I use branches. Thanks for great video!
Rebase is useful for tidying up private branches before making things public. Once a branch is public, rebasing is going to annoy people who try to copy your branches.
@@lawrencedoliveiro9104 Indeed, projects I have contributed to usually want things rebased to the latest commit on the main public branch. They also prefer squashed commits, so your changes over multiple commits on your private branch appear as a single commit for merging to the main branch. I also like it this way, because it means I can mess around to my heart's content in my private branch until I have something I am ready to show the world, and any embrassing mistakes in the commit history of my private branch don't need to be part of the public history.
Thank you for the comment. That’s great to hear. I have no doubt that you’ll end up using Git at some point in your career, and I’m glad to hear that you’ve found this video helpful.
Very good, please keep making videos. Very clear explanation. I don't know where I picked up this habit but, as a matter of practice, while in I rebase with pulling in everything new from into the branch. Run tests to ensure my feature changes work with state. Then if all good, I go into and merge in which is always straightforward because already has the updates in . It seems kind of pointless after watching your video.
Thanks :) Actually, if I understand your approach correctly then it sounds like that is indeed the correct way to use rebase. In the video, I omitted the step where feature branch is rebased on the main branch. I think that has to happen and we can then use merge with fast-forward to update the main branch. I didn't omit this intentionally, it was an oversight.
Great explanation!!! Subscribed! 6:25 The trees on lines 3, 9, and 15 are rooted at the repo root, right? Regardless of where in the directory structure the change(s) were made.
I like to call the common commit the Base (because that's what it is) Rebase immediately makes perfect sense at that point - you are taking a list of commits from one base to a new base; you are re-basing
Very nice explanation of git! A real example of merging with some code and the command prompt would have been a nice addition for me, but I will try it by myself.
THX, Nikola - you earned yourself a new subscriber. I remember struggling keeping "some progress" available during "code rollback" back in the late 1980s. That was NOT an easy task when you had a 8.3 filename restriction on every file. And then jumping to Unix, every Unix its own flavour of oddities.
Thanks a lot Michael! That sounds tough. Fortunately, I didn't have to deal with such problems :) I've only used SVN prior to Git, which wasn't great but acceptable.
6:54 If 2 commits make the same change to a file and no other changes and had the same commit message and author, do they get the same SHA1? Or in addition to change(s), is SHA1 based on timestamp and other factors so that even in this case the two commits get different SHA1s.
Yes, that’s correct. The main eventually becomes F’ but only after the feature (using merge fast-forward). I made a mistake in the animation where I was focused on the desired outcome.
Apologies if you’ve already addressed this in the video, but when rebasing at 12:00, does it warn you if the changes from commit E and commit F will overwrite the changes made from commit C to commit D (main), since the feature branch isn’t aware of the changes in main from C onwards.
Hi, yes it does. It would result in a merge conflict. If you watch the part on how cherry picking is implemented you’ll see that it uses the same algorithm as merging, which ensures that changes are not lost.
Excellent video I have no words to demonstrate how well it's presented, but I find one important aspect missing is how git shows Delta changes in file between two commits, i can imagine that it probably traverse between nodes of tree and then compute difference but I have seen git calculating diff in large mono repo project in fraction of mili second which sounds like there might be an area of exploration here
Thanks a lot for the comment. Take the following with the grain of salt because I haven't validated if git does this, but since Git Trees are essentially Merkle Trees, this is probably how it works. If you want to compare two commits, this is essentially the same as comparing two git trees / merkle trees. Each git tree node has an ID, which is computed from its contents (other git trees or blobs). The same git trees will have exactly the same IDs, so comparing them is very fast. The problem is when two git trees don't have the same contents, then we have to compare the contents of both trees (again, note that we only compare the IDs of sub-trees and files). I think this is actually very fast in practice because we never need to compare the actual contents a file unless we know they are different in both trees. Eventually, we will identify the set of files whose content is different in each git tree, and we can run the diff tool only for these files. Above is a bit of a braindump, but hopefully it makes sense. Let me know if it doesn't though, and I'll try to elaborate.
Thank you, this was really helpful! Amazing to learn that one of my most used tools is based on foundational CS concepts like key-value stores and trees!
Thanks! At 6:05, read "previous: initial-prototype" instead of "previous: initial-project" for the 2nd entry. Question: how does it avoid SHA-1 collisions, even though the probability is low? It looks like it would have to check the whole database each time, then invent something to change the digest in case of collision.
Good catch! Yes, it should say "previous: initial-prototype" I haven't experiment with hash collisions myself, but I think that git won't do anything about collisions locally. This answer might be helpful: stackoverflow.com/questions/9392365/how-would-git-handle-a-sha-1-collision-on-a-blob
Very nice and simple explanation of the internals. It would be cool if you could produce a follow up video where you go even more in depth. For example you could explain what happens with the trees and objects when a user command is executed. And I guess there are many more things to explain. Also things like 7:24, you say there is no metadata but where is that stored?
Thank you. Yes, I will likely produce a follow up video on git, but maybe with more focus on network synchronisations. 7:24 gets me to recap so I couldn’t find what you are referring to, but my guess is that you mean metadata about blobs such as filename and metadata about commits? I have mentioned both in the video: - filenames are stored in tree objects - messages, author, etc are stored in commit objects Is this what you mean or did I misunderstand the comment?
Yes that was my question, I tried to go back and forth again but could not find where you mentioned these two details 😢, can you provide the timestamps?
Sure. I speak about filesnames being stored in git trees at 4:20 ua-cam.com/video/RxHJdapz2p0/v-deo.html Specifically, the sentence saying "tree solves the problem of not having a filename associated with a blob". There's also a tree visualization that shows filenames associated with blob object IDs. I talk about commits at 5:40 ua-cam.com/video/RxHJdapz2p0/v-deo.htmlsi=pYAahuiGe71jjSB_&t=338 I say that we store information about a single change into commits. Does this answer your question or is it maybe unclear in some way?
Okay thx but what about additional meta data of the filesystem like the bits for read write execute? I liked the summary you gave and I think it would have been perfect to add these details. Thx again, looking forward to more videos.
10:10 maybe this could do with some example commit messages, and example code? it might be easier to follow along if its presented in a real world scenario
So git merge combines the latest commits in two branches, but rebase starts by combining the first two diverging commits after the common branch, and from there it takes a sequential path
Thanks. I might create a video that shows how git diff for a single file works. Thanks for the suggestion. I don't know if git does anything special to track renamed files. The renamed file will have the same content (unless the content was changed as well), so its ID will remain the same. However, the tree object containing the list of files will change.
At 8:37, the animation looks like a rebase, should't it be a new commit which contains all the changes, thanks for the clear explantion though, great vid!
Nikola, in the Object Database, the 'Key' column contains SHA1 keys, so far so good. What does the 'Object' column contain? The actual object contents, as shown in 1:50? Or merely the paths to object files? I ask because at 2:37 you did say, "The contents of each object are stored in a file." That sounds like the contents are not themselves in the database. Is perhaps the Object Database composed of just one column and not two as shown? Never mind, I've just googled it and found that there's also a filepaths reference store connected as though by a database primary key to the object store, and both the stores merely function like one database. Your graphics are only conceptual.
Hi Dario! I think the confusion comes from the fact that the object database is not a table. I use a table-like view to visualize the properties of the database, but in practice, the object database is just a bunch of files. The key is the filename (left column) while the object (right column) are the contents of the file. To answer your questions: > What does the 'Object' column contain? The actual object contents, as shown in 1:50? The "Object" column contains the actual object contents. > Is perhaps the Object Database composed of just one column and not two as shown? I tried to explain above, but let me know if it still doesn't make sense.
> Never mind, I've just googled it and found that there's also a filepaths reference store connected as though by a database primary key to the object store, and both the stores merely function like one database. Your graphics are only conceptual. Correct, graphics are only conceptual.
Svaka cast, odican video! Mozda bi bilo kul i da si spomenuo kako git pronalazi razlike izmedju 2 grane pri mergu. Nisam zalzio konkretno u git, ali ima mi smisla da se preko Merkle stabla pronalazi koji fajlovi se razlikuju, a razlike izmedju fajlova sa LCS?
Hvala! :-) Slazem se da bi bilo dobro, mada sam negde morao da presecem. Mozda napravim novi video o gitu u nekom trenutku. Da, preko merkle stabla mozemo da pronadjemo razlicite fajlove jako brzo. Diff izmedju dva fajla je baziran na LCSu (po linijama - Myers algorithm), mada mislim da ima par algoritama i kod gita je cini mi se to configurable.
This is a great video. It would be 1000x more useful if it illustrated each concept with corresponding git command - simple, e.g. commit, to advanced, e.g. rebase. Perhaps someone here could point me to a video where git concepts are matched to git commands, please?
So how are commits actually stored on the filesystem then? Is it a text file stored as an object with a similar format to http headers? What about branches? Are they symlinks to a "commit" object?
Commits are stored in the object database, same as blobs. I think it is a binary format, not text files. Note that the object database is just a bunch of files in git (in the .git/objects directory). I wouldn't say the format is similar to HTTP headers. I don't know the exact details of the binary format, but you can find out more here if you're curious: github.com/git/git/blob/d0e8084c65cbf949038ae4cc344ac2c2efd77415/commit.h#L26. You may need to trace how the commit struct is constructed to find out the serialization format. Branches are saved as files under ./git/refs/heads/. These are not symlinks. The file for the corresponding branch contains the commit ID it points to.
I have a question. Even though git creates subdirectories to store the files, if it needs to create more subdirectories for a large project it still cannot exceed the folder limit which is 65K , right?
Hi Rafsan, FAT32 uses one index per folder which can reference around 65k files or folders (the actual number is more subtle because it depends on the length of the file/folder names, but this is a good approximation). Each folder has its own index, so you could store ~65k files/folders per folder. This means that you could have a much bigger number of folders in total by organizing them into subfolders, in a tree-like structure. You are right that there is an overall hard limit for the number of files/folders on FAT32 file system, but this number is much bigger than 65k, I think it's around ~250 million files.
If you have a large file (say several megabytes or even gigabytes) and you change one bit in the file, does it save the whole new file in the blob store or does git have any clever tricks to only store the changes?
I think git does something to improve the storage of blobs, but it's not smart to realise that the 99% of the blob is the same. My guess is that this is probably optimized for text files rather than random binaries. I don't know if Git LFS does something though, but I'd expect it to have some kind of optimizations that work better for binaries. With that said, I don't know exactly what Git does, but here I've tested the above as follows: - Create an empty git project (its size is negligible) - Create a 1GB file with random data `dd if=/dev/urandom of=sample.txt bs=1G count=1` - Commit changes. The repo size is ~2GB. ~1GB for the blob and 1GB for the working directory. (see [1]) - Change 1 byte in the sample.txt file. - Commit the change. The repo size is ~2.58GB. (see [2] for inspecting the blob) - One more time. The repo size is 3.11GB. [1] ~/test (main) $ git ls-tree 60914ef 100644 blob 1d2579e731b4de097bda567f86bcf70d4c9fb4c6 sample.txt ~/test (main) $ ll -h ./.git/objects/1d/2579e731b4de097bda567f86bcf70d4c9fb4c6 -r--r--r-- 1 Nikola 197121 1.1G Oct 2 23:27 ./.git/objects/1d/2579e731b4de097bda567f86bcf70d4c9fb4c6 [2] ~/test (main) $ ll -h .git/objects/81/c9a65ac724cfcd7afaa1986609de53368ef2ae -r--r--r-- 1 Nikola 197121 543M Oct 2 23:21 .git/objects/81/c9a65ac724cfcd7afaa1986609de53368ef2ae
good videos, but one question regarding rebase. Should the feature branch be updated, not the main one? It seems like you are rebasing feature branch on top of main, but moving main instead of feature
Thank you for the comment. That’s a good point. In practice, the feature branch would be updated first (using git rebase command), then the main branch would be moved forward (e.g. using git merge command). My focus was on the mechanics/idea of the rebase, and I forgot to include that step. Hopefully the explanation still makes sense.
I Think there is one error for rebase : when you do : git checkout feature && git rebase main after rebase, we have : - main branch points d - e becomes e' and its father is d - f becomes f' and its father is e' - feature branch no longer points to f, this one points to f' no any branch points to e and f e and f are not lost, you can find them when you do git reflog
Yes, thanks. That was a mistake in the animation that was pointed out in a few comments so far. You're right, main branch will point to d, feature branch will point to f' after `git rebase main`, then we have a final step to merge the main and the feature branches with `git merge` which simply apply the fast-forward merge and update main to point to f' as well. Regarding the: - e becomes e' and its father is d - f becomes f' and its father is e' e and f will remain unchanged, so I don't think it's accurate to say that e becomes e'. Do you mean e' will correspond to e (and f' to f)?
QUESTION : I tried myself and found commit is not a full snapshot but only changes which they know. create master branch file_1 created ( 1st commit) . Now checkout "feat" branch. Do a 2nd commit file_2 created. Do a 3rd commit file_3 created. Do a 4th commit file_4 created. Do a 5th commit file_5 created. Do a 6th commit file_6 created. Now "git switch master". git cherry-pick "7th commit hash" And boom.. In master you see only file_1 and after file_7 created. Not a full snapshot. ( I mean even 7th is latest commit in feat but it doesn't have full snapshot, just their changes what they remember at that time). That's why file_2 to file_6 are not comes in the master. Because we use cherry pick and not merge.
Hi, thanks for the question. Your experiment produces the expected results, i.e. cherry-picking a single commit will add the *diff* from that commit and the parent commit, not the whole commit. This is why you’re seeing this behaviour even though commits are full snapshots. I have explained this in the video - have you seen the part that explains how cherry-picking works?
@@TechWithNikola yes yes. Today I tried very hard and I found commit is a full snapshot. You are right about cherry-pick. Even when you Cherry-pick direct merge commit which have 2 parents, you have to specify using -m that which parent do you want to get diff from. NOTE : I create file_1 in master with first commit. file_2 - 2nd commit file_3 - 3rd commit file_4 - 4th commit file_5 - 5th commit Now 6th commit I changes some content in file_1. `git diff HEAD HEAD~1` I checked difference between latest commit and 2nd last latest. Even commit - 5th just only add file_5, It was showing file_1 content changed. That means the commit is full snapshot... in 5th commit ( full snapshot) the file_1 was as it is.. in 6th commit I changes file_1 content... And it will show everything where I am on 5th commit at that time to 6th commit full snapshots changes..even I didn't touched file_1 in my 5th commit. ( only file_5 added and rest of snapshot stays as it is) Wow man you are such a genius.
The terminology and the exact action depends on whether you're using Git, Mercurial, SVN, or maybe Microsoft TFS. Git doesn't have check-in command, so it doesn't make much sense to think about the difference between commit and check-in in Git. Commit pushes changes to the local repository in Git, and you have to use `git push` command to sends local changes to the server. In SVN, I think commit command sends changes to the server (unlike git or mercurial which stores the changes in the local repo), but I'm not 100% sure. In TFS, the `checkin` command[1] sends changes to the server. [1] learn.microsoft.com/en-us/azure/devops/repos/tfvc/checkin-command?view=azure-devops
@TechWithNikola There's a very strong but very short reverberation in your audio recording. My guess is that the audio is being recorded in a room about the size of a closet. It was jarring to listen to when I first played it so I ended up skipping the video.
Thank you for the feedback. I don't hear it myself but it may be dependent on the audio device - I will look more into this. FWIW, the room is not the size of a closet, it's ~15 sqm, so I doubt that's the isuse.
UA-cam recommendations brought me here, and I certainly am not disappointed. Thanks for an informative yet concise explanation!
I’m glad you’ve found it useful. If you have any suggestions for improvements please let me know.
I'd just like to say thanks for making this video so visually simple. That includes thanks for not including a load of stock footage of different groups of youngish people inspecting laptop screens together, and extra bonus thanks for not cutting out to 1.5 second clips from Marvel MCU movies every 15 seconds. It's actually refreshing to see a UA-cam video that's directly, consitently focused on communicating knowledge about its topic.
Hi, thank you very much for leaving such comment. I’m glad to hear that you appreciate the focus on the important stuff. :-)
Apologies for the late reply.
DUUUUDE, I was not expecting to understand git in one video. There are still the specific commands to learn but this video created a solid start for me. Thank you!
Thanks a lot dude! Yeah, it’s a journey, but understanding internals is a very good start. It’s not strictly necessary but I personally I like knowing how tools that I use work under the hood.
It’s always great to gain a deeper understanding of something I already use daily. Thank you for this! ❤
You’re welcome. I very happy that people find it useful! ❤️
Glad I came across this video of yours, amazing quality in the narration. Keep doing more. Thanks for your efforts ❤👏
Thank you so much for your kind words and support. I'm thrilled that you've enjoyed it. Your encouragement means a lot to me! ❤
It is super nice to have a short video to dive deep into the underlying principle of git. especially, the objects and how merge and rebase work. Thanks a lot for the video!
You're welcome. I'm glad you've liked it :)
One of the best videos about Git I've ever seen, great job
Thanks. This means a lot to me!
I've been using git for 15 years and I learned a lot. Thanks!
You're welcome. Great to hear that!
One of the best videos in explaining git imo! Explains everything extremely clearly!!! No confusing language, and mixing of different terminology. The only thing that is't crystal clear is the very last example of "cherry picking" when rebasing the feature branch onto the main branch. It isn't very clear from this example between which commits the diffs are taken. In the example you want to rebase commit F onto commit D by applying the diff between F and it's parent commit E (which also happens to be the 1:st commit of feature branch since moving off from main) onto commit D along with the diff between D and E. But you never mention what would happen if there were more commits in the feature branch between E and F. Is the rebase operation going to take the diff between the last commit of feature and the 1:st commit of feature, or the immediate parent of the feature? In your example those two possibilities are the same since there are no commits between the 1:st and last commit of the feature branch.
Anyway, that's the only thing I could find that didn't make sense. Maybe I missed something and just need to watch it again.
A great, concise and well animated explanation. Great job!
PS: Guess you've been blessed by the youtube algorithm, and deservedly so. I'll watch your other videos too.
Thank you!
This is best video for comprehensive understanding how git work internally.
Thank you so much…
Thank you!
This is both a simple introduction and an excellent explanation of how Git works.
You’re welcome! Glad to hear you’ve enjoyed it.
This video made clear to me concepts many others have tried and failed to help me understand. Really well done thanks man!
Thanks, I'm so glad that you've found it helpful!
Thank you! I've been using git for over a decade and you've finally made it clear!
If I ran the internet, this video would be shown at the top of any "how to" search about git.
This video is amazing, thank you. Hope you get the recognition you deserve!
Thank you very much for the kind words!
beautifully explained, thanks Nikola!
After years of using Git, I finally understand rebase =) However, I personally prefer merge but this is probably due to how I use branches. Thanks for great video!
You’re welcome. I’m glad you liked it!
Rebase is useful for tidying up private branches before making things public. Once a branch is public, rebasing is going to annoy people who try to copy your branches.
@@lawrencedoliveiro9104 Indeed, projects I have contributed to usually want things rebased to the latest commit on the main public branch. They also prefer squashed commits, so your changes over multiple commits on your private branch appear as a single commit for merging to the main branch.
I also like it this way, because it means I can mess around to my heart's content in my private branch until I have something I am ready to show the world, and any embrassing mistakes in the commit history of my private branch don't need to be part of the public history.
not many can explain things this beautifully...more content please
I have seen 2 videos from your channel and it's top-notch content❤
you are a fantastic teacher sir. Don't know how i found you, but i love you
this is true gold. best explanation i have ever seen yet for GIT.
Thank you :)
I never understood GIT or any version trackers before. I do now thanks to you awesome video. Wornder amd clear explanation. Thanks
Thanks a lot :) Glad you've liked it!
I've never used Git, but now I feel like I would have a much easier time learning it! Great video, and thanks for making it!
Thank you for the comment. That’s great to hear. I have no doubt that you’ll end up using Git at some point in your career, and I’m glad to hear that you’ve found this video helpful.
Super nice video. Thanks for your efforts both here and on the channel more widely!
Thank you for taking the time to comment Zebedee. I'm very happy to hear that you like my videos.
Thank, Pro. The best explanation I've ever seen. 1000 likes!!!
Very good, please keep making videos. Very clear explanation. I don't know where I picked up this habit but, as a matter of practice, while in I rebase with pulling in everything new from into the branch. Run tests to ensure my feature changes work with state. Then if all good, I go into and merge in which is always straightforward because already has the updates in . It seems kind of pointless after watching your video.
Thanks :)
Actually, if I understand your approach correctly then it sounds like that is indeed the correct way to use rebase. In the video, I omitted the step where feature branch is rebased on the main branch. I think that has to happen and we can then use merge with fast-forward to update the main branch. I didn't omit this intentionally, it was an oversight.
Great video! Concise and simple explanation with enough details to understand what's going on under the hood.
Thanks!
I love your work so much. It is very informative and concise. It was a pleasure
Thank you so much! :)
This channel is gold 🥇. Hope you keep this quality and make more video. I learn a lot from you. Thank you.
Thank you, and you’re welcome. I’m very happy to hear that you like my videos. I’ll do my best to keep and improve the quality.
Great explanation!!! Subscribed!
6:25 The trees on lines 3, 9, and 15 are rooted at the repo root, right? Regardless of where in the directory structure the change(s) were made.
Finally, a clear explanation! Thanks.
Glad it helped!
The explanation provided was excellent. I truly enjoyed the video!
Thank you Mina. I'm glad you've enjoyed it!
This was so interesting and informative. Thank you so much!
I like to call the common commit the Base (because that's what it is)
Rebase immediately makes perfect sense at that point - you are taking a list of commits from one base to a new base; you are re-basing
That makes sense, thanks.
Very well made video, loved the quality ❤️. Keep it up man 💯.
Thanks a lot! ❤️
finally found a video which made me understand the whole thing. Thank you!
You're welcome! I'm glad you've liked it.
Wow the way you teach will change the understaing of what's under the hood for deep learners this worth a great like
Thanks❤🎉
Thank you. I’m really glad you think so ❤️
YES I'VE BEEN WAITING FOR AN ANIMATED VIDEO THE COVERS THE .git FOLDER! THANKS SO MUCH!
You're welcome! I'm glad you've liked it.
Great video! Finally I feel like I am ready to rebase something... 🥳
Thank you! Happy rebasing 😀
That’s an amazing video. I never understood git this way before
Thanks. Glad you liked it!
Thank you Nikola as other i too came from UA-cam recommendation. Excellent explanation. Subscribed to your channel. Nice work.
Thanks a lot Manick. Hope you’ll enjoy my future videos too!
Very nice explanation of git! A real example of merging with some code and the command prompt would have been a nice addition for me, but I will try it by myself.
Thanks. That’s a great suggestion.
wow ya. youtube recommended me but i stayed for the whole video! cheers!
I’m glad you’ve enjoyed it. Cheers!
THX, Nikola - you earned yourself a new subscriber.
I remember struggling keeping "some progress" available during "code rollback" back in the late 1980s.
That was NOT an easy task when you had a 8.3 filename restriction on every file.
And then jumping to Unix, every Unix its own flavour of oddities.
Thanks a lot Michael!
That sounds tough. Fortunately, I didn't have to deal with such problems :) I've only used SVN prior to Git, which wasn't great but acceptable.
6:54 If 2 commits make the same change to a file and no other changes and had the same commit message and author, do they get the same SHA1? Or in addition to change(s), is SHA1 based on timestamp and other factors so that even in this case the two commits get different SHA1s.
I'm very surprised to see 'only' 700 subscribers. Keep it up!
Thanks a lot!
this is one of the best videos explaining git
the merge strategies are also really greatly depicted here
thank you for your work🎉
Thanks a lot for the kind words. I’m very happy to hear that you’ve liked it.
Very good explaination of git internals!
This is what i need. Thanks for the great video
You’re welcome! 🙂
This was incredibly valuable, thank you!
You’re welcome!
Thank you for this, it was very insightful. Nice to know how things work.
Glad it was helpful!
Thank you so much for this amazing video. Keep up the good work.
Thanks a lot!
In the rebase example, it's not main which becomes F', but feature. And after rebase, feature has to be merged to main again.
Yes, that’s correct. The main eventually becomes F’ but only after the feature (using merge fast-forward). I made a mistake in the animation where I was focused on the desired outcome.
Apologies if you’ve already addressed this in the video, but when rebasing at 12:00, does it warn you if the changes from commit E and commit F will overwrite the changes made from commit C to commit D (main), since the feature branch isn’t aware of the changes in main from C onwards.
Hi, yes it does. It would result in a merge conflict. If you watch the part on how cherry picking is implemented you’ll see that it uses the same algorithm as merging, which ensures that changes are not lost.
Thank you for the explanation.
Please make more videos like this.
You’re welcome. 🙂
This is high quality content. Thanks!
Thank you!
Very informative. Thanks!
Excellent video I have no words to demonstrate how well it's presented, but I find one important aspect missing is how git shows Delta changes in file between two commits, i can imagine that it probably traverse between nodes of tree and then compute difference but I have seen git calculating diff in large mono repo project in fraction of mili second which sounds like there might be an area of exploration here
Thanks a lot for the comment.
Take the following with the grain of salt because I haven't validated if git does this, but since Git Trees are essentially Merkle Trees, this is probably how it works.
If you want to compare two commits, this is essentially the same as comparing two git trees / merkle trees. Each git tree node has an ID, which is computed from its contents (other git trees or blobs). The same git trees will have exactly the same IDs, so comparing them is very fast. The problem is when two git trees don't have the same contents, then we have to compare the contents of both trees (again, note that we only compare the IDs of sub-trees and files). I think this is actually very fast in practice because we never need to compare the actual contents a file unless we know they are different in both trees. Eventually, we will identify the set of files whose content is different in each git tree, and we can run the diff tool only for these files.
Above is a bit of a braindump, but hopefully it makes sense. Let me know if it doesn't though, and I'll try to elaborate.
Thank you, this was really helpful! Amazing to learn that one of my most used tools is based on foundational CS concepts like key-value stores and trees!
You’re welcome. I agree. Git is a good example for why data structures are important.
Thanks! At 6:05, read "previous: initial-prototype" instead of "previous: initial-project" for the 2nd entry. Question: how does it avoid SHA-1 collisions, even though the probability is low? It looks like it would have to check the whole database each time, then invent something to change the digest in case of collision.
Good catch! Yes, it should say "previous: initial-prototype"
I haven't experiment with hash collisions myself, but I think that git won't do anything about collisions locally.
This answer might be helpful: stackoverflow.com/questions/9392365/how-would-git-handle-a-sha-1-collision-on-a-blob
I discovered a great channel.
I'm not so familiar with git and this helped me a lot.
So to better use git better I use single small files.
Really glad it helped!
Very nice and simple explanation of the internals. It would be cool if you could produce a follow up video where you go even more in depth. For example you could explain what happens with the trees and objects when a user command is executed. And I guess there are many more things to explain. Also things like 7:24, you say there is no metadata but where is that stored?
Thank you. Yes, I will likely produce a follow up video on git, but maybe with more focus on network synchronisations.
7:24 gets me to recap so I couldn’t find what you are referring to, but my guess is that you mean metadata about blobs such as filename and metadata about commits? I have mentioned both in the video:
- filenames are stored in tree objects
- messages, author, etc are stored in commit objects
Is this what you mean or did I misunderstand the comment?
Yes that was my question, I tried to go back and forth again but could not find where you mentioned these two details 😢, can you provide the timestamps?
Sure. I speak about filesnames being stored in git trees at 4:20 ua-cam.com/video/RxHJdapz2p0/v-deo.html
Specifically, the sentence saying "tree solves the problem of not having a filename associated with a blob". There's also a tree visualization that shows filenames associated with blob object IDs.
I talk about commits at 5:40 ua-cam.com/video/RxHJdapz2p0/v-deo.htmlsi=pYAahuiGe71jjSB_&t=338
I say that we store information about a single change into commits. Does this answer your question or is it maybe unclear in some way?
Okay thx but what about additional meta data of the filesystem like the bits for read write execute? I liked the summary you gave and I think it would have been perfect to add these details. Thx again, looking forward to more videos.
Nice work Nikola!
Thank you!
absolutely amazing video on git!
Glad you liked it! :)
Wow this great explanation ❤. I suppose one of the tradeoffs of git is that there can be only 1 history.
very clean i did understood something. Thank you.
Glad you've found it useful!
10:10 maybe this could do with some example commit messages, and example code? it might be easier to follow along if its presented in a real world scenario
Thanks for the suggestion. Yeah, that may have been better.
So git merge combines the latest commits in two branches, but rebase starts by combining the first two diverging commits after the common branch, and from there it takes a sequential path
Yes, that sounds right to me.
Minor correction: "after the common branch" -> "after the common commit"
Ngl !! It went all over my head, maybe cause English isnt my first language but great work 👍
Recommendations brought me here too. I like the use of visuals and your details! Keep it up!
Thanks a lot! I’ll do my best to make sure future videos are even better.
Nice video! I would find it interesting to learn how git tracks renamed files and finds out which lines are changed
Thanks. I might create a video that shows how git diff for a single file works. Thanks for the suggestion.
I don't know if git does anything special to track renamed files. The renamed file will have the same content (unless the content was changed as well), so its ID will remain the same. However, the tree object containing the list of files will change.
You explained how git works wow!!! 🌟
Thank you for the comment. I'm glad it was helpful! :)
At 8:37, the animation looks like a rebase, should't it be a new commit which contains all the changes, thanks for the clear explantion though, great vid!
Good video. You sound exactly like Antti from "Road to Vostok" 👀
Amazing video . 🙌🏻🔥💯💯♥️
Very helpful. Thank you!
Glad you've found it useful!
Really well done! Subbed.
Thank you!
Great video! Thanks a lot
Glad you liked it!
Nikola, in the Object Database, the 'Key' column contains SHA1 keys, so far so good. What does the 'Object' column contain? The actual object contents, as shown in 1:50? Or merely the paths to object files? I ask because at 2:37 you did say, "The contents of each object are stored in a file." That sounds like the contents are not themselves in the database. Is perhaps the Object Database composed of just one column and not two as shown?
Never mind, I've just googled it and found that there's also a filepaths reference store connected as though by a database primary key to the object store, and both the stores merely function like one database. Your graphics are only conceptual.
Hi Dario!
I think the confusion comes from the fact that the object database is not a table. I use a table-like view to visualize the properties of the database, but in practice, the object database is just a bunch of files. The key is the filename (left column) while the object (right column) are the contents of the file.
To answer your questions:
> What does the 'Object' column contain? The actual object contents, as shown in 1:50?
The "Object" column contains the actual object contents.
> Is perhaps the Object Database composed of just one column and not two as shown?
I tried to explain above, but let me know if it still doesn't make sense.
> Never mind, I've just googled it and found that there's also a filepaths reference store connected as though by a database primary key to the object store, and both the stores merely function like one database. Your graphics are only conceptual.
Correct, graphics are only conceptual.
@@TechWithNikola Wow, you're quick! :) Thanks for the answers. The video is awesome, I'm still watching it.
Great explanation!
Thank you!
Thanks a lot! That was amazing!
You’re welcome! 😀
Svaka cast, odican video! Mozda bi bilo kul i da si spomenuo kako git pronalazi razlike izmedju 2 grane pri mergu. Nisam zalzio konkretno u git, ali ima mi smisla da se preko Merkle stabla pronalazi koji fajlovi se razlikuju, a razlike izmedju fajlova sa LCS?
Hvala! :-)
Slazem se da bi bilo dobro, mada sam negde morao da presecem. Mozda napravim novi video o gitu u nekom trenutku.
Da, preko merkle stabla mozemo da pronadjemo razlicite fajlove jako brzo. Diff izmedju dva fajla je baziran na LCSu (po linijama - Myers algorithm), mada mislim da ima par algoritama i kod gita je cini mi se to configurable.
This is a great video. It would be 1000x more useful if it illustrated each concept with corresponding git command - simple, e.g. commit, to advanced, e.g. rebase. Perhaps someone here could point me to a video where git concepts are matched to git commands, please?
So how are commits actually stored on the filesystem then? Is it a text file stored as an object with a similar format to http headers? What about branches? Are they symlinks to a "commit" object?
Commits are stored in the object database, same as blobs. I think it is a binary format, not text files. Note that the object database is just a bunch of files in git (in the .git/objects directory).
I wouldn't say the format is similar to HTTP headers. I don't know the exact details of the binary format, but you can find out more here if you're curious: github.com/git/git/blob/d0e8084c65cbf949038ae4cc344ac2c2efd77415/commit.h#L26. You may need to trace how the commit struct is constructed to find out the serialization format.
Branches are saved as files under ./git/refs/heads/. These are not symlinks. The file for the corresponding branch contains the commit ID it points to.
this is a very good video
Thanks a lot!
This is so good !
Thank you!
Crisp and informativeZ@
Thank you!
I have a question. Even though git creates subdirectories to store the files, if it needs to create more subdirectories for a large project it still cannot exceed the folder limit which is 65K , right?
Hi Rafsan, FAT32 uses one index per folder which can reference around 65k files or folders (the actual number is more subtle because it depends on the length of the file/folder names, but this is a good approximation). Each folder has its own index, so you could store ~65k files/folders per folder. This means that you could have a much bigger number of folders in total by organizing them into subfolders, in a tree-like structure.
You are right that there is an overall hard limit for the number of files/folders on FAT32 file system, but this number is much bigger than 65k, I think it's around ~250 million files.
If you have a large file (say several megabytes or even gigabytes) and you change one bit in the file, does it save the whole new file in the blob store or does git have any clever tricks to only store the changes?
I think git does something to improve the storage of blobs, but it's not smart to realise that the 99% of the blob is the same. My guess is that this is probably optimized for text files rather than random binaries. I don't know if Git LFS does something though, but I'd expect it to have some kind of optimizations that work better for binaries. With that said, I don't know exactly what Git does, but here I've tested the above as follows:
- Create an empty git project (its size is negligible)
- Create a 1GB file with random data `dd if=/dev/urandom of=sample.txt bs=1G count=1`
- Commit changes. The repo size is ~2GB. ~1GB for the blob and 1GB for the working directory. (see [1])
- Change 1 byte in the sample.txt file.
- Commit the change. The repo size is ~2.58GB. (see [2] for inspecting the blob)
- One more time. The repo size is 3.11GB.
[1]
~/test (main)
$ git ls-tree 60914ef
100644 blob 1d2579e731b4de097bda567f86bcf70d4c9fb4c6 sample.txt
~/test (main)
$ ll -h ./.git/objects/1d/2579e731b4de097bda567f86bcf70d4c9fb4c6
-r--r--r-- 1 Nikola 197121 1.1G Oct 2 23:27 ./.git/objects/1d/2579e731b4de097bda567f86bcf70d4c9fb4c6
[2]
~/test (main)
$ ll -h .git/objects/81/c9a65ac724cfcd7afaa1986609de53368ef2ae
-r--r--r-- 1 Nikola 197121 543M Oct 2 23:21 .git/objects/81/c9a65ac724cfcd7afaa1986609de53368ef2ae
@@TechWithNikola That seems quite wasteful, but that could just be because it's optimized for speed, not storage size.
good videos, but one question regarding rebase. Should the feature branch be updated, not the main one?
It seems like you are rebasing feature branch on top of main, but moving main instead of feature
Thank you for the comment. That’s a good point. In practice, the feature branch would be updated first (using git rebase command), then the main branch would be moved forward (e.g. using git merge command).
My focus was on the mechanics/idea of the rebase, and I forgot to include that step. Hopefully the explanation still makes sense.
Thanks for sharing.
You’re welcome!
Thanks, much appreciated!
I’m glad you’ve found it useful!
Excelent job!
Thank you!
Amazing content quality
Thank you
hi. super nice colors here !
Where are you (not-UK) from ?
Hi, thanks! I'm from Serbia :)
I Think there is one error for rebase :
when you do :
git checkout feature && git rebase main
after rebase, we have :
- main branch points d
- e becomes e' and its father is d
- f becomes f' and its father is e'
- feature branch no longer points to f, this one points to f'
no any branch points to e and f
e and f are not lost, you can find them when you do
git reflog
Yes, thanks. That was a mistake in the animation that was pointed out in a few comments so far. You're right, main branch will point to d, feature branch will point to f' after `git rebase main`, then we have a final step to merge the main and the feature branches with `git merge` which simply apply the fast-forward merge and update main to point to f' as well.
Regarding the:
- e becomes e' and its father is d
- f becomes f' and its father is e'
e and f will remain unchanged, so I don't think it's accurate to say that e becomes e'. Do you mean e' will correspond to e (and f' to f)?
@@TechWithNikola Yes i mean e' contains the same thing as e. e' has not the same sha1 than e, this is why i call it e'
QUESTION :
I tried myself and found commit is not a full snapshot but only changes which they know.
create master branch file_1 created ( 1st commit) .
Now checkout "feat" branch.
Do a 2nd commit file_2 created.
Do a 3rd commit file_3 created.
Do a 4th commit file_4 created.
Do a 5th commit file_5 created.
Do a 6th commit file_6 created.
Now "git switch master".
git cherry-pick "7th commit hash"
And boom..
In master you see only file_1 and after file_7 created.
Not a full snapshot. ( I mean even 7th is latest commit in feat but it doesn't have full snapshot, just their changes what they remember at that time).
That's why file_2 to file_6 are not comes in the master.
Because we use cherry pick and not merge.
Hi, thanks for the question. Your experiment produces the expected results, i.e. cherry-picking a single commit will add the *diff* from that commit and the parent commit, not the whole commit. This is why you’re seeing this behaviour even though commits are full snapshots. I have explained this in the video - have you seen the part that explains how cherry-picking works?
@@TechWithNikola yes yes. Today I tried very hard and I found commit is a full snapshot. You are right about cherry-pick. Even when you Cherry-pick direct merge commit which have 2 parents, you have to specify using -m that which parent do you want to get diff from.
NOTE :
I create file_1 in master with first commit.
file_2 - 2nd commit
file_3 - 3rd commit
file_4 - 4th commit
file_5 - 5th commit
Now 6th commit I changes some content in file_1.
`git diff HEAD HEAD~1`
I checked difference between latest commit and 2nd last latest.
Even commit - 5th just only add file_5, It was showing file_1 content changed.
That means the commit is full snapshot...
in 5th commit ( full snapshot) the file_1 was as it is..
in 6th commit I changes file_1 content...
And it will show everything where I am on 5th commit at that time to 6th commit full snapshots changes..even I didn't touched file_1 in my 5th commit. ( only file_5 added and rest of snapshot stays as it is)
Wow man you are such a genius.
@@DhavalAhir10 that’s great. I’m glad that it all makes sense now!
What is the difference between "Check-in" and "commit"?
The terminology and the exact action depends on whether you're using Git, Mercurial, SVN, or maybe Microsoft TFS.
Git doesn't have check-in command, so it doesn't make much sense to think about the difference between commit and check-in in Git.
Commit pushes changes to the local repository in Git, and you have to use `git push` command to sends local changes to the server.
In SVN, I think commit command sends changes to the server (unlike git or mercurial which stores the changes in the local repo), but I'm not 100% sure.
In TFS, the `checkin` command[1] sends changes to the server.
[1] learn.microsoft.com/en-us/azure/devops/repos/tfvc/checkin-command?view=azure-devops
W wiezieniu z pewnoscia bylbys gitujacy 👌👌👌👌
Audio quality is extremely important, more important than video.....
Agreed. Is there anything regarding the audio quality that I can improve?
@TechWithNikola There's a very strong but very short reverberation in your audio recording. My guess is that the audio is being recorded in a room about the size of a closet. It was jarring to listen to when I first played it so I ended up skipping the video.
Thank you for the feedback. I don't hear it myself but it may be dependent on the audio device - I will look more into this. FWIW, the room is not the size of a closet, it's ~15 sqm, so I doubt that's the isuse.