Why Google Stores Billions of Lines of Code in a Single Repository

Поділитися
Вставка
  • Опубліковано 28 гру 2024

КОМЕНТАРІ • 178

  • @roytries
    @roytries 9 років тому +246

    I cannot decide if this is either a great talk about a new way to manage complex code bases, or some sort of way for Google to convince themselves that working with a gargantuan multi-terabyte repository is a the right thing to do.

    • @opinali
      @opinali 9 років тому +32

      +Roy Triesscheijn There are other advantages Rachel doesn't even touch... a few months ago I've made a change to some really core library, getting a message from TAP that my change affected ~500K build targets. This would need to run so many unit tests (all tests from all recursive dependencies), I had to use a special mechanism we have that runs those tests in batch in off-peak hours, otherwise it takes too long even with massive paralellism. The benefit is that it's much harder to break things; if lots of people depend on your code then you get massive test coverage as a bonus (and you can't commit any CL without passing all tests). Imagine if every time the developers of Hibernate makes some change, they have to pass all unit tests in every application in the planet that uses Hibernate - that's what we have with the unified repo.

    • @roytries
      @roytries 9 років тому +16

      Osvaldo Doederlein Isn't it strange that the responsibility that program X works when library Y gets updated is with the maintainer of library Y? Why not publish libraries as Packages (like NuGet for C#). You can freely update library Y, knowing 100% sure that it will break nobodies code (since nobody is forced to upgrade). The maintainers of program X can freely choose when they are ready for updating to the new version of Library Y, they can run their own unit tests after updating, and fix problems accordingly.
      Of course I also see the benefits of having everything in one repository. Sometimes you want to make a small change to library Y so that you can use it better in program X, which is a bit of a hassle since you need to publish a new package. But these days thats only a few clicks. :)

    • @roytries
      @roytries 9 років тому +2

      Osvaldo Doederlein I guess it all comes down to this: I understand that there are a lot of benefits, but of course also a lot of drawbacks. I'd guess that pushing this single repository model so much into extreme the drawbacks would outweigh the benefits. But of course I have never worked with such an extreme variant :)

    • @opinali
      @opinali 9 років тому +6

      +Roy Triesscheijn Some burden switches to the library's author indeed, but there are remedies-you can keep old APIs deprecated so dependency owners eventually update their part; you can use amazing refactoring tools that are also enabled by the unified repo. And the burden on owners of reusable components is a good thing because it forces you to do a better job there, limiting API surface area, designing to not need breaking changes too often, etc.

    • @opinali
      @opinali 9 років тому +9

      +Roy Triesscheijn There's some truth in that but honestly, the pros are way bigger than the cons. For one thing, this model is a great enabler of Agile development because 1) changes are much safer, 2) no waste of time maintaining N versions of libraries / reusable components because some apps are still linking to older versions. (Ironically, the day-to-day routine looks less agile because builds/tests are relatively slow, and code review process heavyweight-but it pays off.)
      The real cost of this model is that it requires lots of infrastructure; we write or customize heavily our entire toolchain, something very few companies can afford to do. But this tends to change as open source tools acquire similar capabilities, public cloud platforms enable things like massive distributed parallel builds, etc.

  • @tivrfoa
    @tivrfoa 5 років тому +48

    As she told in the end, this is not for everyone. You need a lot of infrastructure engineers to make it work.
    Some good things I thought about a monorepo:
    1. Easy see if you are breaking someone's else code;
    2. Makes everybody use the latest code, avoiding technical debt and legacy code.

    • @davidbalakirev5963
      @davidbalakirev5963 2 роки тому +1

      "everybody use the latest code" this is the part I don't get I'm afraid. Do I depend on the code of the other component, or a published artifact made from it? It really comes across as dependency to the source. Do they build the artifact of the dependent library?

    • @laughingvampire7555
      @laughingvampire7555 Рік тому

      but if you are using a lot of infrastructure anyway and a lot of custom tooling, then you can also use custom tooling with separated repos and get the visibility when your changes will break someone's else code. This can be part of the CI tooling, rebuild ALL repos in dependency order.

  • @pawelmagnowski2014
    @pawelmagnowski2014 8 років тому +314

    1. your 1st day at google
    2. git clone
    3. retire
    4. clone finished

    • @timgelter
      @timgelter 7 років тому +41

      There's no clone. They're using filesystems in userspace (e.g. Linux FUSE). The only files stored on their local workstations are the files being modified.

    • @LarsRyeJeppesen
      @LarsRyeJeppesen 7 років тому +71

      Man way to kill a great joke :)

    • @kimchi_taco
      @kimchi_taco 5 років тому +5

      Google doesn't use git tho;;

    • @MrMangkokoo
      @MrMangkokoo 4 роки тому +1

      @@kimchi_taco what do they use tho?

    • @vijay.arunkumar
      @vijay.arunkumar 4 роки тому +6

      Claudia Sastrowardojo @10:40 she talks about Piper and CitC. They come with both a Git style as well as a Perforce style set of cli commands to interact with.

  • @JoeSteele
    @JoeSteele 8 років тому +13

    I am curious how Google handles code that should not be shared between teams (for legal or business reasons). Rachel calls it out as a concern at the end, but I imagine that Google already has this problem today. For example, portions of the Widevine codebase would seem to fall into this category. How do they handle that case?

    • @AnjiRajesh
      @AnjiRajesh 8 років тому +1

      Once i read in quora that, "for some of the internal projects till some point of time, they maintain private repositories but once that development is completed they'll merge those repos to main code base. This is the only case they have other repositories other than main code base. "

    • @prakhar0912
      @prakhar0912 5 років тому +9

      Google engineer here. Each team gets to decide what packages would have visibility into their code base.

    • @dijoxx
      @dijoxx 9 місяців тому +1

      Code for highly sensitive trade secrets (e.g. page ranking etc) is private. Everything else can be seen by the engineers and it's encouraged to explore and learn.

  • @gabrielpiltzer6188
    @gabrielpiltzer6188 3 роки тому +21

    I'm not sure that I agree with all of the advantages listed. Extensive code sharing and reuse is also called tight coupling. Simplified dependency management is made difficult once 100 teams start using a version of a 3rd party and you want to upgrade it. That leads to large scale refactoring, which is extremely risky. I'm not saying that Google hasn't made this pattern work for them but to be honest, no other software company on the planet can develop internal tooling and custom processes like they can. I don't think that a monorepo is for any company under the size of gargantuan.

    • @willysoesanto2046
      @willysoesanto2046 Рік тому +1

      > Extensive code sharing and reuse is also called tight coupling.
      The thing is, when a code sharing is needed then such dependency will be established irregardless the repository type i.e. monorepo or multirepo. The idea is to make sure that when such dependency is needed, there should be no technical reason that it couldn't happen.
      > Simplified dependency management is made difficult once 100 teams start using a version of a 3rd party and you want to upgrade it.
      Yes, this is intentional. The reason behind it is because they would like to have each team play nicely with others. Dependency hell problem is a hot potato problem that could be passed towards other team. In a harsh way, dependency hell problem could be summarized to: I have upgraded my third party dependency X to version Y, I don't care how my dependants deal with that. They could either copy my prior change code to their codebase or they could spend numerous of hours to make sure that they can upgrade their dependency X to version Y too

  • @simonpettersson6788
    @simonpettersson6788 6 років тому +57

    1. We had problems with code duplication so we moved all our shit into one 84TB repo and created our own version control system. Estimate 2000 man hours, plus a 500 man hour per year per employe overhead
    2. We had problems with code duplication so we moved the duplicated code into its own repository and imorted that into both projects. Estimate 5 man hours

    • @wepranaga
      @wepranaga 4 роки тому

      oh monorepo

    • @gabrielpiltzer6188
      @gabrielpiltzer6188 3 роки тому +7

      I love it. This is the real answer for non Google sized companies.

    • @hejiaji
      @hejiaji 3 роки тому +4

      99.9% of companies should go for number 2

    • @voidvector
      @voidvector 2 роки тому +8

      2000-man hour is small price to pay so the other 10,000 engineer doesn't need to deal with version/branch mismatch between repos. I worked at a finance company before with multi-repo and between 10-100 millions of LOC. It was a shit show, because they basically had "DLL hell" with internal projects. And due to legal reasons, we had to debug those problem (e.g. reproduce) to find root cause of some of those bugs.
      Suffice to say, you probably don't need to worry about multi-repo/mono-repo unless your codebase exceeds 1 million LOC. Linux runs on monorepo git with 2 million LOC.

  • @LarsRyeJeppesen
    @LarsRyeJeppesen 7 років тому +5

    Wonder what the numbers are as of time of writing (April 2017) ?

  • @MartinMadsen92
    @MartinMadsen92 2 роки тому +17

    I would love to hear how they manage to build 45,000 commits a day (that's one commit every two seconds) without either allowing faulty code to enter the trunk that thousands of developers are using instantly, or create huge bottlenecks due to code reviews and build pipelines.

    • @great-garden-watch
      @great-garden-watch 2 роки тому +6

      I can’t even describe how incredibly great the entire system is. The search is insane. The workflow and build tools she mentions are just amazing.

    • @dijoxx
      @dijoxx 9 місяців тому

      The code changes cannot get submitted until they pass all the tests. Not all changes trigger a full build either.

    • @MartinMadsen92
      @MartinMadsen92 9 місяців тому

      @@dijoxx So checks are run locally before pushing?

    • @freeboson
      @freeboson 6 місяців тому

      @@MartinMadsen92 no, there's a set of tests that are run by the infra, defined by the projects owning the modified files at time of submit. Then there's a larger, completely comprehensive set of tests that runs on batches of commits. It is possible for changes to pass the first set and fail the second set, but for most projects it's rare.

  • @aelamw
    @aelamw 3 роки тому +2

    so you should use SVN?

  • @michaelmoser4537
    @michaelmoser4537 7 років тому +18

    My uninformed impression is 'we don't quite understand our internal dependencies, even more - we so we don't quite understand our automated build/test/release processes, so its better to keep it in the same branch/repository so that all the scripts can potentially find their data if they need it'.

  • @rjalili
    @rjalili 8 років тому +28

    Security around this code base must be the tightest there is, I imagine.

  • @durin1912
    @durin1912 Рік тому +2

    Does anyone know what is the current strategy at Google now that 8 years have passed since this talk?

    • @hugmynutus
      @hugmynutus 2 місяці тому +1

      It hasn't changed

  • @twitchyarby
    @twitchyarby 6 років тому +84

    This talk seems to violate every source control best practice I've ever heard.

    • @vectorhacker-r2
      @vectorhacker-r2 9 місяців тому +1

      Because they’re not in fact best practices, but usually work arounds for bad developers or working with open source.

    • @dijoxx
      @dijoxx 9 місяців тому +1

      Maybe you should hear from better sources.

  • @MaggieMorenoWestCoastSwing
    @MaggieMorenoWestCoastSwing 9 років тому +3

    Has anyone been able to find the paper that the presenter is referring to?

    • @jeteon
      @jeteon 8 років тому

      +Maggie Moreno It doesn't seem to have been published yet

    • @patrikhagglund4387
      @patrikhagglund4387 6 років тому +6

      I assume the referred paper is research.google.com/pubs/pub45424.html.

  • @GegoXaren
    @GegoXaren 9 років тому +4

    This is why we use the upsteam/downsteram model.
    If code is used in many downstream projects it should be pushed upstream.
    Much better to have smaller modules that are small than to have a monolithic code base.
    And what about dead code?

  • @mnchester
    @mnchester 2 роки тому +3

    Can someone please comment if this content (from 2015) is still relevant to now (2022-2023), ie, does Google still use all of these tools?
    Amazing video btw!

    • @dijoxx
      @dijoxx 9 місяців тому +1

      Yes, they do.

    • @catmaxi2599
      @catmaxi2599 4 місяці тому

      Yeah they do, Meta is also using their own monorepo. Its the way the industry is heading towards

  • @Sawon90
    @Sawon90 2 роки тому +1

    Are they still using monolithic codebase in 2022

    • @dijoxx
      @dijoxx 9 місяців тому

      Yes

  • @enhex
    @enhex 8 років тому +15

    All the arguments given against multi repo and in favor of single repo are wrong, usually failing to identify the real cause of the problem.
    1:00 - The problem isn't multi repo, the problem is forking the game engine. You can fork the game engine in a single repo too by copying it into a new folder.
    16:30 - list of non-advantages:
    - You got one source of truth in a multi repo approach too.
    - You can share & reuse repos.
    - Doesn't simplify anything unless you check out the whole repo (impractical), otherwise you'll have checkout specific folders just like checking out specific repos.
    - Huge single commit AKA atomic changes - it does solve committing to all the projects at once, but that doesn't solve conflicts.
    - Doesn't help with collaboration
    - Multi-repos also have ownership which can change
    - Tree structure doesn't implicitly define teams (unless each team forks everything it needs into its own folder). It may implicitly define projects, which repos explicitly do.
    And what I watched in the rest of the talk is basically the same thing, fallacy of attributing single repo as the solution for things it has nothing to do with.
    The only thing single repo gives you is what would be the equivalent of pulling all repos at once in multi repo approach.
    Basically they just ended up spending a lot of effort emulating multi repo in a single repo with ownership of specific directories and such.

    • @ihatenumberinemail
      @ihatenumberinemail 8 років тому +4

      How would you do atomic commits across repos?

    • @JamesMiller5
      @JamesMiller5 8 років тому

      You need to use a consensus algorithm but it's totally possible. Checkout Google Ketch

    • @ihatenumberinemail
      @ihatenumberinemail 8 років тому +1

      James Miller Cool project, but that's still 1 logical repo. Just distributed.

    • @enhex
      @enhex 8 років тому

      It would probably require creating a higher level tool, some sort of "super repository" in which your commits are collection of commit IDs in its sub-repos (not actual files).

    • @ihatenumberinemail
      @ihatenumberinemail 8 років тому +4

      Enhex That sounds a lot like a mono-repo :D

  • @skylvid
    @skylvid 6 років тому +4

    This video is my happy place.

  • @MichaelTokar
    @MichaelTokar 9 років тому +4

    Really interesting ideas. If there's any Googlers out there, I'd be curious to know how you use Release Branches with the monolithic repository. If everything is in the one repo, does that mean a release branch actually pertains to the entire set of code? Or does it apply to 'sub-folders'? If the latter, how do you determine that things are in a release-able state?

    • @SrdjanRosic
      @SrdjanRosic 9 років тому +2

      +Michael Tokar yes, similar mechanisms that allow an engineer to have a unified view of the whole sourcecode at a particular version, with their changes overlaid on top, can be used to allow build tools to have changes belonging to a branch overlaid on top of an entire source code at some version, it is only sensible for this to be versioned as well, especially when you couple if with hermetic, determinstic, repeatable builds (see bazel).

    • @DIYRandomHackery
      @DIYRandomHackery Рік тому

      The general idea is that most of the time, you just build "//...@1234567" (the main code line at changelist 1234567 where changelist ~= a commit) i.e. you just build from head at a "fixed" changelist. Only when you need to cherrypick fixes will you create a release branch to allow a one-of-a-kind mutation away from the main codeline. Decades ago you used to have to always create (Perforce) release branches with hundreds of thousands of files, but modern tooling lets you virtualize that process now, since 99.9% of the the code in question is unmodified. This made the process far more lightweight. Perforce could be used to do this by manipulating the client view (I've tried it), but there's a limit as to how far you can push that idea; hundreds of lines in your client view slows things down too far to be useful; p4 commands take minutes instead of seconds. For smaller environments it could be a viable method, if you build the tools to do it for you (maintain branch and client views based on the cherrypicks you need)

    • @dijoxx
      @dijoxx 9 місяців тому

      It applies to 'sub-folders'. There is a release process with build environments for staging etc.

  • @chrise202
    @chrise202 5 років тому +2

    How the IDE is coping with this many files ?

    • @redkite2970
      @redkite2970 3 роки тому +1

      1 repository doesn't mean 1 solution.

    • @-Jason-L
      @-Jason-L 3 роки тому +2

      @@redkite2970 "solution" is a Microsoft thing

  • @swyxTV
    @swyxTV 5 років тому +1

    isnt marking every api as private by default basically cordoning off parts of your mono repo into... multiple repos?

    • @dijoxx
      @dijoxx 9 місяців тому

      No.

  • @AIMMOTH
    @AIMMOTH 9 років тому +1

    Diamond problem 19:40

  • @Borlik
    @Borlik 8 років тому +8

    Very good talk, very interesting solution. Also scary. I'd love to see more of real usage, especially if the exponential usage grow really corresponds to the "need grow". Still, with some more future pruning and code-extinction mechanism it may survive untill the first "Look, we are just too big and have to split" moment :-)

    • @DIYRandomHackery
      @DIYRandomHackery Рік тому

      I don't think it existed 7 years ago, but "code extinction" tools exist today and find "unused" stuff and slowly remove them.

  • @amaxwell01
    @amaxwell01 9 років тому +6

    Such a sweet insight into how Google handles their codebase.

  • @me99771
    @me99771 4 роки тому +1

    That comment at the end about organisations where parts of the code are private is interesting. Is google not one of those organisations? They have a lot of contractors writing code. Are they all free to browse google's entire source code?

    • @BosonCollider
      @BosonCollider 2 роки тому

      I guess it is so big that they can try but won't get much out of it, if they generate the equivalent of a linux kernel every single week.

  • @ryanb509
    @ryanb509 4 роки тому +2

    1 billion files but only 2 billion lines of code. So each file averages 2 lines of code? For that to make sense over 90% of the files must be non-source files.

    • @chaitrakeshav
      @chaitrakeshav 3 роки тому +6

      9 million source files 2 billion lines of code. ~220 lines per file. Decent!

  •  9 років тому +2

    @ 15:18 "old and new code paths in the same code base controlled by conditional flags" - isn't this configuration hell?

    • @comsunjava
      @comsunjava 9 років тому

      +Alexander Hörnlein Alexander Hörnlein No, not really. And usually it is combined with other techniques like adding a new REST endpoint (as an example) that is controlled by flag. This is how facebook works also. Of course, there was the case where someone inadvertently turned all the flags on, and thus barraged facebook customers with semi-complete features. Oops.

    • @chuckkarish1932
      @chuckkarish1932 9 років тому +1

      +Alexander Hörnlein Being in one big codebase means that all servers have to use the same versions of their dependencies. For third-party dependencies they have to be the same or the programs won't work. Remember, all the C++ code is statically linked. The only supported version of the internal dependencies is the one that's at head. If your server hasn't been released for a while, you have to bring it up to date before you can push it.
      The up side is that developers don't have to maintain legacy code. Google puts much more effort into extending the leading edge of its services than into keeping the trailing edge alive. And since there's little choice of which version of code to use, there's not much configuration needed to specify this.

    •  9 років тому +2

      +Chuck Karish I know what this big codebase means, but the bit I referred to was about "no branching" but instead having all features in the trunk (and then switching them on and off with - I guess LOTS of - flags). And with this I figured that you'd have configuration hell to maintain all these flags á la "we need feature B but not A but B depends somewhat on A so we have to activate feature A(v2) which has some of the core of A but not all of it" and so on and so on.

    • @kohlerm113
      @kohlerm113 9 років тому

      +Alexander Hörnlein I also wonder how Google makes sure that people don't just copy a whole lot of code and create a new "component" just to be avoid to have to update everyone. Running a code duplicate checker?

    • @SrdjanRosic
      @SrdjanRosic 9 років тому +1

      +Markus Kohler , while an individual team can decide to fork a component, it usually has negative implications for that team in the long term maintaining your own fork becomes more and more costly over time, so, it's rarely done.
      However, let's say let's say you wanted to move bigtable from /bigtable to /storage/bigtable, and change the C++ namespace name along the way, and there's tens of thousands of source files that depend on it in its current path. You could a) factor out code to new path, leave wrappers in the old place, use bazel and rosie to identify dependentans and migrate them, drop the redirect code. b) make a giant copy, use bazel to identify dependants and migrate them, drop the original copy. It's non trivial, but I suspect doable within a couple of days with some planning,. .. systems like tap would help ensure your changes (possibly mostly automated) don't break things, even before they're submitted.
      There's a few more details to think about there - it takes a little tought, maybe some experimentation to make sure this kind of thing works, before using so much other people's time with this change. Also, code that someone started working on, that was not submitted at that time you do this, will need to be fixed by people working on it.
      I hope this answers your question.

  • @seephor
    @seephor 3 місяці тому

    "it's very common for both old and new code paths to exist in the codebase simultaneously controlled by the use of conditional flags" Dear god what a fricken nightmare.

  • @shaiksaifuddin
    @shaiksaifuddin 4 роки тому +8

    i can only imagine how much time it would take cloning such repo 😅

    • @dijoxx
      @dijoxx 9 місяців тому

      Nobody clones the repo.

  • @Savatore83
    @Savatore83 3 роки тому +4

    what is working for google it is not necessary the best solution for every IT company

  • @yash1152
    @yash1152 Рік тому

    11:36 citc file system ... without needing to explicitly clone or sync any state locally
    waooooowww... awesome.

  • @twoka
    @twoka 8 років тому +6

    I felt like she was trying to convince heself that this approach is a good one. I'm sure that many goodle sub-projects are organized different and propper way.

  • @transfire
    @transfire 9 років тому +3

    All these things could also be done with the proper tools working across multiple repositories. In fact Google has had to go out of it's way to create tools that mimic separate repos within the monolithic repo. e.g. area ownership.
    The big downside I see is lack of SOC. It becomes too easy to make messy APIs with far too many dependencies. Google's solution to the dependency DAG problem is to force everyone to use the latest version of everything at all times. That's a huge man hour drain (though clearly they have automated lots of it for this reason). It also means no code is every long term stable -- nothing like TeX, for instance, which is so stable they use Pi as a version number.

    • @GegoXaren
      @GegoXaren 9 років тому

      +TR NS
      We need LaTeX3 now... I have been waiting for years, and still no stable version. :-/

    • @ZT1ST
      @ZT1ST 9 років тому +2

      +TR NS The lack of SOC is useful for a singular company where SOC only blocks from solving the same problem multiple times - they don't want 8 versions of search ranking code if they can avoid it, for example, when they want to be able to apply it to Google searches, UA-cam searches, Google Photos/Mail/Documents, etc.
      They'll have ownership SOC with directories separated by projects, but when you're trying to integrate multiple elements together for a business advantage, knowing that you can easily integrate a well tested version of a solution to the problem you want to solve, and don't have to spend the manpower making sure you update your code along with it, significantly helps.

    • @fritzobermeyer
      @fritzobermeyer 9 років тому

      +TR NS What does SOC stand for?

    • @transfire
      @transfire 9 років тому

      Separation Of Concerns

  • @repii
    @repii 9 років тому +4

    Very impressive work! Thanks for presenting and talking about it, quite an inspiration!

  • @skiptavakkolian
    @skiptavakkolian 8 років тому +3

    Assuming Google had 29,000 developers at the time, 15,000,000 lines of code changes per week is over 500 per developer. That seems high. Is it due to cascading of changes?

    • @carlerikkopseng7172
      @carlerikkopseng7172 7 років тому

      If you write some new code you can easily pump out 500-1000 lines of code per day, but I think I read somewhere that the average developer on average (heh) outputs something like 40-50 lines of code per day. Given all those meetings, modifying existing code, that seems reasonable, and 500 lines is not that far off (at least not in another order of magnitude).

    • @graphics_dev5918
      @graphics_dev5918 2 роки тому

      Also a simple change on a more normal-sized code base might have 100 cascading effects, but in a massive repository, perhaps thousands. Those all count as changed lines, so it inflates the numbers

  • @adipratapsinghaps
    @adipratapsinghaps 7 місяців тому

    We didn't talk about the biggest tradeoff. Deployments/Releases are very very slow. Correct me if I am wrong.

    • @zachyu2130
      @zachyu2130 5 місяців тому

      You don't build / release the entire repo in one go, but only a tiny part of it compiled down to only a few files usually. So the size of the repo is generally irrelevant. Bigger services are composed of microservice nodes which are owned by different teams and released separately.

  • @gatsbylee2773
    @gatsbylee2773 3 роки тому

    I really doubt about the mentioned advantages.

  • @r3jk8
    @r3jk8 3 роки тому

    can someone reply here with the cliff notes please.... also, are they still doing this?

  • @f0xm2k
    @f0xm2k 6 років тому +1

    I think they are right. Source code is a form of information. Big data methods, AI self learning algorithms, .. they profit massively from easy accessible data.. guess the long term goal is having new code be generated completely automated. Splitted repos would slow down those efforts I guess.

  • @Ed-yw2fq
    @Ed-yw2fq 2 роки тому

    14:35 Trunk based development with centralised source control system.
    I'm glad we have git.

  • @apparperumal
    @apparperumal 5 років тому

    Great Presentation. Thank you. In IBM, we have inspired the Monorepo concept and in the process of adopting Trunk Based Development with Monorepo.

  • @RichardPeterShon
    @RichardPeterShon 3 роки тому

    how in da world they operate this?....

  • @stanisgmi
    @stanisgmi 7 років тому +3

    I also share the feeling that this approach brings more problems than the one it solves (I actually don't see what it solves that multi repos don't). Then again, Google might have the biggest codebase in the world and it's probably not technical debt that is making them stick with this.

  • @anytcl
    @anytcl Рік тому

    Well, something is off, not sure how to describe it
    but i think
    piper = github
    citc = git
    one big repository vs a collection of connected repositories, I dont really think there is much difference
    i think for more users citc is the source control tool and piper is the cloud hosting solution

    • @willysoesanto2046
      @willysoesanto2046 Рік тому

      Piper is the source control at the server. Citc is how you would connect to it. Citc doesn't clone the codebase. It does a network filesystem towards piper. Think of Citc as Dropbox, Google Drive or iCloud Drive client

  • @rafalkowalczyk5027
    @rafalkowalczyk5027 3 роки тому

    impressive scale, but cyber-attack exposure high

  • @hansheng654
    @hansheng654 3 роки тому

    so you telling me that i can join google and dig up the source code for Google Search? 👀

  • @laughingvampire7555
    @laughingvampire7555 Рік тому

    The irony that Google needs to listen to Linus Torvalds talk again about Git in their own channel at their own even of Google Talks.

  • @ronen.
    @ronen. 4 роки тому +1

    there is one bigger code repository then google.... its called github.

  • @MattSiegel
    @MattSiegel 9 років тому +1

    Amazing! :D

  • @manchikantiramchandravarsh4742
    @manchikantiramchandravarsh4742 8 років тому +1

    3:01

  • @CerdasIN
    @CerdasIN 7 років тому

    I can't imagine, how to use around of the code ... Amajing...

  • @ahmxtb
    @ahmxtb 9 років тому +2

    title is a bit trimmed. It ends like "... Stores Billions of L"

    • @zoids1526
      @zoids1526 9 років тому

      +Ahmet Alp Balkan The full title seems to be: "The Motivation for a Monolithic Codebase: Why Google Stores Billions of Lines of Code in a Single Repository"

  • @ajayboseac01
    @ajayboseac01 2 роки тому

    Will Google give a tech talk when they decide to finally break down the huge repo and how it enabled them to ship code faster ? Or will they keep maintaining the large repo for the sake of their ego :D

    • @willysoesanto2046
      @willysoesanto2046 Рік тому

      They will never break down the huge repo. There is no reason to. The thing is, they use a Bazel build system that does not dictate the structure of your codebase. If you were to consider your codebase directory structure as your normal directory to store your normal files i.e. reorganize it as how your would like it to be, the sky is the limit.

  • @mohamedfouad2304
    @mohamedfouad2304 5 років тому +2

    pipepiper

  • @sujitkumarsingh3200
    @sujitkumarsingh3200 5 років тому +1

    If someone deciding to put everything in one repo, try git's Sub-module first.

    • @qaisjp
      @qaisjp 5 років тому +7

      Git submodules are the worst thing to use in this case and completely counteracts all of the benefits mentioned here.

  • @Kingromstar
    @Kingromstar 5 років тому

    Wow

  • @NebrassLamouchi
    @NebrassLamouchi 9 років тому

    amazing ! wonderful !

  • @nsubugakasozi7101
    @nsubugakasozi7101 2 роки тому

    To be honest, the only reason that she gave that made sense is that they want to use one repo no matter what. Its like they begun with the end result and then worked backwards. Ie. Someone big at google was like it has to be one repo...then the poor engineers had to work backwards. Whats the purpose of a huge repo whose parts you never interact with...whats the purpose of a repo that you only partially clone. Seems like they are justifying a dumb decision with the google scale excuse

  • @RudhinMenon
    @RudhinMenon 3 роки тому

    Well, if Google says so 😅

  • @douglasgoldfarb3421
    @douglasgoldfarb3421 Рік тому

    Can artificial intelligence sentient systems self learn better

  • @SB-rf2ye
    @SB-rf2ye 3 роки тому

    "we solved it by statically linking everything."
    this ain't it chief.

    • @DIYRandomHackery
      @DIYRandomHackery 3 роки тому

      The diamond dependency problem isn't just a compile-time problem, but a runtime problem too, unless you use static linking. Dependency problems are insidious and can be incredibly hard to find. Solving the dependency problem is well worth the added pain such as needing to re-release all binaries should a critical bug be found in a core library. Static linking decouples the binaries being released from the base OS installation (hint: there is no "single OS" image, because it takes months for every planetary-wide OS release iteration; developers can't wait that long for a OS update).

  • @laughingvampire7555
    @laughingvampire7555 Рік тому

    how about this Google, make your own package manager for all your internal code, like your own npm/cargo/asdf/rubygems/maven/etc or even better, your own github.

  • @douglasgoldfarb3421
    @douglasgoldfarb3421 Рік тому

    Can we have sentient artificial intelligence

    • @Eric_McBrearty
      @Eric_McBrearty 10 місяців тому

      Hmmm... I feel like this is a deep question. We will probably see a published paper on ArXIv about this very topic soon. (if it's not already there). Alot of blurry lines when you try to pin down a contextually specific definition of "sentient, artificial, and intelligence."
      Topic 1: Sentient - Are you aware of yourself, and that you are not the only self in the environment which you find yourself? When you talk to these language model; they do appear to know that they are a program known as a language model, and that there are others like it.
      Topic 2: Artificial - Is it man made or did nature make it? Now that we started modifying our own genome, I am not sure that we don't fit the definition of artificial.
      Topic 3: Intelligence - A book contains words that represent knowledge, but a book isn't intelligent. So, if you are aware that knowledge explains how something works, and you are aware that you posses this information... I guess that would make you intelligent.
      Conclusion- Sentient Artificial Intelligence does exist. Humans fit the criteria, as do Large Language Models.
      Cynical extrapolation - Humans become less and less necessary as they appear to be more and more burdensome, needy, and resource hungry.

  • @Cant_Hit_This7
    @Cant_Hit_This7 4 роки тому +1

    Oh god, imagine the millions of lines of spaghetti code.

  • @laughingvampire7555
    @laughingvampire7555 Рік тому

    this explains why Google is so authoritarian

  • @laughingvampire7555
    @laughingvampire7555 Рік тому

    is this why Google cancels so many products? do they become a mess in the monorepo?

  • @TightyWhities94
    @TightyWhities94 Рік тому

    trunk based development especially at google's scale fucking sucks lmao. never thought i'd say this but i feel for google devs

  • @1998goodboy
    @1998goodboy 3 роки тому

    Thank you for your Ted talk on why Google will inevitably crash and burn. I can't wait

  • @MhmdAsie
    @MhmdAsie 9 років тому +10

    why is she talking like she wants to cry or something

    • @HalfdanIngvarsson
      @HalfdanIngvarsson 9 років тому +36

      +Hamodi A'ase I don't know if you noticed, but she's heavily pregnant. In fact, a week away from her due date, at the time of this talk, as mentioned at the beginning of the talk. It makes breathing harder than normal, what with a tiny human kicking your diaphragm from the inside. Or were you just being facetious?

    • @MhmdAsie
      @MhmdAsie 9 років тому +8

      Ohhh is that how pregnant women going through?
      that must hurt alot..

  • @laughingvampire7555
    @laughingvampire7555 Рік тому

    this explains why Google products feel lesser with time.

  • @somakkamos
    @somakkamos 2 роки тому

    So git pull..pulls 86tb data 😳😳😳😳😳