I think the issues you've stated with DB Migration files aren't the actual issues you run into at scale. I worked on a team with 50 engineers using Knex and the scaling issues we ran into was failing to grab a lock on the table to run the migration. This is a bigger issue then out of sequence migration files. When you have tables with high write throughput, it becomes extremely difficult to grab a lock on the table before timeout occurs. This is something that Planetscale would solve because of their branch feature like you mentioned. We definitely need to have better tooling for doing migrations but the overall ecosystem needs to improve. Like Planetscale + Prisma DB Push is ideal but that is a service and a specific npm package. It isn't inherit to the SQL ecosystem that can be utilized by all SQL Providers and SQL Packages. I'm happy for the innovation but I really hope to spreads.
Sometimes even the most simple table alterations take forever because you can’t acquire that lock. I remember having to choose specific times to run them and it was usually late in the evening. Downtime works well too however if its not a planned outage where you notify users, you have to gracefully kick users out without messing things up for them and not just brute force terminating connections which leads to poor experience.
@@djdocstrange yeah, I often had to run big migrations on the weekend during "off peak" hours. The worst was when there would be a long running transaction (customers often uploaded very large files that created a lot of rows) and we'd have to wait for it to finish. Sometimes it would finish and you could run the migration, sometimes you'd have to wait until the next weekend (which also meant the code that accompanies the change went stale). Of course we also had monthly planned downtime, so that was when most big migrations were run.
@@imichael. I was meaning that it's an ideal developer experience. Does Prisma handling migrations perfectly? Nope. But being able to "modified" the schema and the SQL needed to make the changes are auto-generated is nice. I used to have to write migrations by hand and you always forget something. But yeah running the migration in a transaction is a nice option but I don't see it as a blocker if you're utilizing a service like PlanetScale with their DB Branches. Transactions aren't as critical in that environment.
@@TranquilMarmot Yup, the number of times we had to revert code changes for migrations because we couldn't acquire a lock to run the migration was getting out of hand. We started to look at low activity periods to run migrations. Which is fine when your service only exists in a few timezones, but you quickly realize that low activity periods are rare when you go international.
Every migration tool I've used trivially prevents migrations from being applied in an unexpected order by e.g. requiring each migration file to specify the hash of the previous migration file. That still obviously requires you to manually resolve conflicting migration files (e.g. from two parallel feature branches), but it does trivially prevent your new migration from applying onto a db schema that's different from the one it's expecting (assuming of course that your migration tool is the only thing ever changing the schema).
Hmm... The problem looks a bit forced. 1. There is nothing wrong with running many migrations at once, for example when you're new developer and you have to create DB schema from scratch. It will take seconds anyway. 2. You can batch your changes and put them into one migration, wrapped in transaction. 3. If you have many teams, working on the same database, that's architecture smell. Why they are sharing so many data through database? Probably you service is doing to much, it's moving out of its boundary context? Syncing migration changes in scope of one team shouldn't be so hard :) Also it's worth to mention that migrations are only for SCHEMA changes, NOT DATA. I've been working with migrations for a couple of years and the biggest problem was always a developers who didn't see the impact of changes.
I think the "architectural smell" is highly conditioned on a belief that things should be broken down in small services. I think *that's* a smell, too much trivial stuff gets taken out into small services, which might make your DB a bit easier to use but at the cost of massive organizational overhead, degraded performance on average, and particularly loss of correctness as maintaining transactional semantics becomes untractable. So overall, a larger service with multiple dev teams using the same DB is a much better tradeoff. Dealing with DB schema is an easier problem than distributed transactions. It just happens that DB schema problems are hard to ignore, and distributed transaction problems are hard to detect. A large amount of people don't even think about them or realize the nature of the beast.
@@ghujdvbts I believe his point is that your schema migration tooling should not be used for data seeding. Data migrations are a must and occur via ETL, something entirely different, that is its' own bag of worms altogether.
@@AntoineGrondin there are many solutions to the issues you describe, the best microservices write to a ledger rather than the database for example, and a consumer handles the rest.
i mean the biggest issue with the model itself is the disconnect between SQL conflicts and merge conflicts. I've ran into that on teams a couple times but I just wrote a CI check to enforce migrations that dont conflict
Rails' active record has a schema file that represents the current state of the entire DB as well, so conflicts will show up in that file. Also, you can load the schema directly from the schema file, so you don't have to run all migrations to get the database into the correct state.
This has bitten me more than a few times 😅 Like he said in the video, it's not really a problem unless you have a database where the schema is actively being changed my multiple developers at a time. After a few mistakes, I learned to check all open PRs for schema changes before merging any myself. Not a fun waste of time haha.
This shouldn't be possible with a migration tool that, for instance, requires each migration file to specify the hash of the previous migration file or specify all the migration files that it depends on. You still need to manually resolve conflicts, but you shouldn't ever run into a migration accidentally running on top of a different db schema that you expect.
If you use Diesel (Rust), then migrations result in schema changes that are compiled against queries and conflicts will be observed during merge/rebase
Django (maybe other framework have this) has the best migration pattern I’ve seen. You make changes to your model and code is generated to reflect the migration. Additionally, each migration has a dependency. So if you do merge 2 pull requests at the same time the migrations will not run because the dependency tree is not leafed correctly. I like this system so much that I mix the django orm when making golang projects.
TypeORM or MikroORM do the same, developers write code and ORMs took care of the DB. - Schema update programmatically (useful during testing) or through the command line - Generate migration once developers are sure they designed the right data model Both support decorator and decorator(less) data model. This is something inspired by more mature ORMs like Hibernate for JVM or Doctrine for PHP.
my understanding is that django works backwards through the tree. So it has one file that points to the newest migration, and it starts there and works until it find a migration that points to requiring the last applied. So doing two merges would have conflicts in the files, and also the migration itself would not run since it would never match (unless the last merge had not actually be applied yet)
I think EF Core also works like this, you change your models then generate and run migrations, i think it also saves some kind of snapshot of the db to be able to say "the database is not in the state i was expecting" or something like this. I'm not an EF Core user, just had to use it a couple of times
Yeah, django migrations work fine, but you can still run into git conflicts (doesn't break the db but you have to address them), and they still have the downtime / backwards problem.
Not familiar w/ active record, but alembic for instance avoids the problem of someone changing the state of the DB from under you by having each migration file reference it's 'parent' migration, with the db keeping track of the current revision. So if I make migration A on top of B (the current parent) and someone merges in migration C on top of of B before me, alembic will yell and not allow A to be run w/o a manual change to A.
Most db migration tool would have something similar - basically checks whether the db is in a valid state or not If you run your migration scripts in your CI/CD (which you would need to if you’re doing any black box testing), then any conflicts would show up as errors (either on the migration script or on the tests) But since theo dont write tests, i guess they only find out in production if there’s any issues 😅
Alembic doesn't solve the issue of code not necessarily knowing about the DB changes. So his dilemma of 100% backwards compatible vs downtime is still present.
Django won't start if you have out of sync migrations, and it also provides automatic merging of migrations. Also in migration files you can specify what script/sql you need to run for forward migration or backward migration. hence i believ all the probems are fixed specified in video. One problem is to acquire lock on table, if table is updated frequently, and lets say you want to roll back to previous version, you might have problem acquiring lock for table. I beleive this is what we do, to avoid problems, to have seperate releases, one that updates database and second which populates data, and third release which is deleting old field / code asscociated with it.
How is this tool different from what I can do with migration files? I can avoid the sequential problem by using a high resolution prefix to maybe second or millisecond, this will give a file system sequential order and the chances my changes and yours are done at the same time is incredible small, also code reviews before merging migrations can avoid the problems of accidentally causing conflicting changes. Additionally, I can implement this "just add or remove fields, nothing else" logic in migration files without requiring the use of a tool that I need to pay for? Don't get me wrong, I'm all for cool tools that make our lives easier. But this video seems a bit too much like hawking a product instead of actually helping people with a solution. So as a developer, I kinda feel a bit icky about these types of videos where you're not solving problems, but just trying to get revenue from pitching products.
This is only the case when you don't have a process in place. What are PR review? what are release plans? What is due diligence? 5 years of using migrations, haven't once had a time where we had down time due to use of migrations. It was always a poorly thought out change which is equally as likely in branching model. Speaking of which, imagine designing an essential part of change management around proprietary technology.
One way to think about this is it's a dependency issue. Fundementally you always take two actions to update a dependency in a non backwards compatible way. Roll out new stuff, cleanup old stuff or alternatively break the old stuff, roll out new stuff. It's still exactly two actions taken on the interface of your dependency though. This abstraction applies to more than db, for example dependent microservices.
Yep and this is just like PostgreSQL MVCC in a nutshell, except you are applying it in a wider context across separate consuming applications connecting to the db as opposed to just separate transactions managed by the db itself.
Do you honestly believe that everyone using Django, Rails, Alembic, FlyWay and other file-based migration tools hasn't encountered and worked around every single "problem" you describe in this video? Consider the possibility that there are massive engineering teams who are building important software that use these migration tools very successfully. Do migration conflicts exists? Yes, but just because conflicts exists in git doesn't mean people should drop git or feature branches.
This is also a workflow problem. Making sure your branch is up to date before merging so you have all the latest migrations. If you just push directly to main or don’t enforce branch protection then yes you’ll run into these problems. You also just learn to migrate things safely with multiple steps and zero downtime. It also helps to not use a single database for multiple services. That is just asking for problems. That is a reason why you’d not want to use migration files, because you have multiple people in multiple repos making database changes. (Never do this, please)
In the rust world with sqlx, the queries are checked at compile-time. So, Incorrect migrations will not let the rust code compile and the CI will prevent those changes in the code which can cause sql conflicts.
To be honest I haven't found the issues mentioned here problems with the backend systems I've worked on. 1) Most migration systems track the "parent" migration in some way which ensures that migrations are always sequential 2) Code reviews help shake-out any potential conflicts 3) In smaller teams, its unlikely that there will frequently be multiple changes to the database anyway 4) Backwards in-compatible changes to the database are, in-general, uncommon - the migration steps process mentioned by Theo (add new col, fill new data, backpopulate old data) are all pretty straightforward in the migration systems I've used. 5) You add migration checks to your CI - so if there is a conflict, you know about it. 6) Migration systems usually do in-fact have a concept of backwards/forwards - so even if stuff goes wrong, its easy to revert a migration and re-deploy an older version of your software Speaking here having worked on small-medium sized teams with Django, Alembic, Phoenix migration systems - no doubt there are more acute issues at larger scales.
Actually since i have to solo develope alone and this was one of my biggest questions... how do you update in a "professional matter a schema" and i always did the "downtime" but now the second option seem alot friendlier for development in a rapid chaning environment where downtime can cost alot
You can reduce a significant portion of migration issues if you use db views for reads. Granted, there are still problems with inserts/updates/etc, but most web-based interactions will likely be reading way more than writing, so the overall issue is mitigated pretty considerably.
I don't get one thing. How is PlanetScale's approach different with "synchronization"? One still needs to go to their dashboard and manually deploy changes from the dev branch to the production branch. So a dev merges and deploys frontend code to production, they lose connection and cannot approve deployment of PlanetScale branch - production is out of sync. Or it will be anyways for a little period of time difference between frontend deployment and database deployment. Where am I wrong? There's still the downtime/backwards compatibility issue.
We still use migration files, works fine. We actually have a CI check that validates the schema expected by the codebase against the migrations, so if there is a problem trying to merge migrations which are out of order, we know about it before the code hits master branch. We also have 2-phase migrations. Each migration file has a "before" phase and an "after" phase. The before phase runs SQL before new containers (expecting new schema) are created, and the after phase is run after the last of the old containers is destroyed. It enabled us achieve forwards and backwards compatibility with automatic execution. It's not perfect, and writing migrations is a hell of a lot more complicated this way. But yeah, databases are hard.
It sounds like you're just using the migration feature of whatever framework to just update the schema in a predictable way. Basically automation of the backwards compatability model described in the video. I wonder at what point we don't call them "migrations" anymore.
I wonder if you have courses. Like at Frontend Masters. I want to learn more about this way of managing changes in Planetscale (which I might consider using since I'm going to be starting a new project)
he doesn't like making coding tutorials (from what i've seen) if that's what your looking for. he's also really focused on his company (Ping) right now. if he were to make courses, they would be free since he believes in free education edit: to be clear, i dont want to put words in his mouth (except the free education thing, he's said that explicitly), but this is the general vibe i get from theo after watching and talking in the discord for several months
I worked for a large corporation that had a major project with migration files named with prefixed sequential numbers, and it would fail because the script sorted the numbers in alphabetical order. What people actually did was manually load a db image recent enough to avoid the issue before running migrations. I wish I could say that was the stupidest thing I saw, but it didn't even make the top ten list.
I'm confused, are you advocating against manually written migration files, or against migration files alltogether? In the work-flow you describe at time 4:52, the system creates a migration file for you, right? (at least I hope that this is what happens, so that you have a way to inspect what changes to the database are about to be made). If you consider this approach to be sound, then can you explain once more what you mean by "we don't need migrations anymore"?
The issues you raise regarding conflicting changes in different MRs can be resolved with good CI/CD and automated testing. DB migrations have been around for 30 years because they work - providing you know the dangers and how to mitigate them.
I think that people can use stored procedures and functions to interact with the database, instead of plain sending SQL queries. Abstracting the use of the database can help simplify and allows the DBA to do his job without interrupting the developers job.
What is the benefit of db push over the migrations? In prismas case? Prisma scheme doesn't allow all possible things you'd want on the db, like checks for nonnegative, so you'd need to write that in SQL...
Don't let multiple people work on the same table or feature at the same time. Stand up meetings are for that. You don't directly ship your code to production it goes through staging first then when it's stable you deliver the new code that's expecting "fullName" and right after that you run your new migrations. Downtimes are a must for maintenance
That seems like the harder thing to do. Don't leave to humans what you can automate. You can easily add systems to prevent the kinds of migration conflicts mentioned.
Well for 1:22 Django migrations system actually handles that pretty well - it will force you to redo migration files if you merge two parrallel branches that affects the same tables. And as well it stores all the history and state. And you can as well write custom (python) migration code to handle dramatic changes without downtime
This is exactly why me and my team just give the f up and go for a NoSQL database instead on new projects. Had enough with syncing structures and randomly having constraint errors just because you missed a single step.
The out of order thing can be worked around with hashed migrations. The migration application and the whole "downtime" thing is definitely one of the trickier scenarios. If I have to include data mutations that will cause breaking changes I apply it in stages, so nothing breaks. Having migration files for this does provide good documentation for changes over time however which is great.
We work at a reasonable scale with multiple team members. Our migration files never get out of sync. We have short sprints with code reviews. Developer branches have to be up to date with the latest merges on production. That is more about using git well and managing sprints well.
Also, while it is true that your local would run through all the migrations, that is not what is happening on production. I'm sure Theo understands this, but it may be easy for new comers to get confused by this point. Production is only going to run new migrations. So it's not really that hard to perform code review and make sure you don't have issues with the migrations up front. A lot of the time a migration file will be a one line change that is adding a column or removing a column. It's not hard to look at that line and figure out whether or not it will conflict with some other migration in the release. Furthermore, and now that I think of it, I think it would be extremely rare to have any conflict between these files anyway. Why? Imagine it. What needs to happen for a conflict to occur?Basically two different developers would need to be making changes to the same table. Then we could imagine some situation where one developer removes a column, presumably for clean-up purposes, while another developer tries to manipulate data in that column. Or maybe one developer wants to add a column to table that another developer is going to drop entirely. Or maybe two developers want to change the value of the data in one field to different values because their features require different values in that field. Every single one of those scenarios is due to bad planning and has nothing to do with migrations. If you are running into any of those situations, you need to seriously rethink how you are going about planning features and sprints. If two different features require different values in the same field, that's a problem well before you get to the stage of writing a migration. You need to figure it out before you assign tickets to anyone.
The video is confusing because how we were taught in uni and how I have seen people do databases is they plan it out so carefully you'd think they are building a nuclear reactor before they write actual code to use the db. Why is he talking about something that's more like a nosql problem than a SQL problem?
I havent run into this. When a dev changes a column name this dev also changes the code to ensure columnName is changed in the code. Also our pipeline will fail if your migration cannot go down
Maybe I'm naive, but why isn't there a 'Rename Symbol' (vscode) like feature between code and DB. I know, we need a graph/tree as an index and there needs to be some underlying 'hard rules' that allow a structure to be refactored. If we can do it for files, why not for databases.
One major point missing from the vid: does PlanetScale solve the issue of the need for 3 queries (in the column rename example) to maintain compatilibity?
Isn't it easy enough for the migration thing to have a file that points to the latest migration? So that would conflict when two PRs are being merged. That's something I've seen. I think Django does that automatically in how their migrations work.
Yes, and VIEWS can help a lot, it keeps the original table untouched and it can perf really well with materialize enable (cache system). Thing that I don’t like with PlanetScale it’s that WITH queries are not supported yet…
Love your videos, as aways. Just wondering, as a data scientist, we deal with data versioning on our mlops stack, I was wondering what are your thoughts on technologies like dvc, dolt or lakefs, the planetscale feature looks like it's doing this under the hood
but how would you do this for software that is self hosted by lots of people? How can you provide updates for them without migrations? like for example open source forum software that is self hosted
Would db push fix the problem of devs having two different prisma models on a different git branch? Wouldn't this cause a similar conflict where db push would override another?
Indeed, Laravel makes this safe and easy. But in general someone has to be dedicated for making migrations in a team. Not everyone should do it, and some communication is needed for migrations changes to avoid any conflict.
I thought I knew what a schema was but I guess I do not... I change my schema using ALTER TABLE. But the existence of a schema definition doesn't explain how to transform an old schema into the new. That's what migrations are... I guess I'm missing the point.
Creating a new field with duplicate data (and removing the original later) is a good solution for renaming. Is there anything better than creating a new field with a new name & type if you just want to change the type but you still wanted the original name? i can't think of an exact example of when you would want to do this off the top of my head, but for example if you wanted to change the number of characters in a varchar, the size of an integer field, integer -> float, etc.
The examples you gave are backwards compatible, at least in one direction (from less to more). A less backwards compatible conversion would be f.e. the opposite - from float to an int, less chars in varchar... In that case, you would need to ensure your code doesn't produce data that needs more space, and also somehow update the existing data to fit the new constraints before you change the type.
Migrations need to have a central lookup with a directed acyclic graph to know which migration depends on what. We ran into other issues such as purging caches, leader/follower replication restrictions and locks on huge tables.
We use a separate db 'lock' file which updates when every migration is run. If a developer tries to merge and another migration has happened there will be a conflict they need to resolve. Doesn't solve the downtime/backwards compatibility issues though.
The first problem just doesn't occur in some systems, specifically in Python SQLAlchemy migrations as it requires sequential numbering with no repetitions. In other systems it has a notion of branching and merging built into the migration structure. It's a bad implementation not an inherent problem of the system.
In the case of renaming fields, I would argue for just not making that change. Like, how much better is FullName than Name anyway? Is it really worth all the risk and hassle? A name would need to be pretty misleading and bad before I'd want to take that on. I'm a big fan of monolithic idempotent DDL scripts. It conditionally makes changes to the database if they haven't already been made. Most of the time, it will check the schema, and take no action. Git history makes it easy to see what was changed when. And it's written in the native language of the RDBMS so you have access to the full feature set without some leaky abstraction getting in your way
MirkoORM or TypeORM already solved the problem and the solution exists decades ago, I think it is good to learn how other communities solved the same problems that Node developers are facing nowadays.
We really really need a video about uploading files. What to see as streaming upload and as 1 request file upload. Because it's such a bad thing now. Also we will appreciate to see it in trpc, usual rest and graphql
Agree with the sentiment but migrations are a less evil part of the more evil SQL which, in IMHO after a few decades dealing with it, needs to be depreciated. By what, I don't really know but edges are a clue, but maybe not in the form that we know them. So many challenges, so little time.
@@Dude29 Hi Dude29, thanks for the comment. The argument is that there is too much management and overhead involved with: Joins, Referential integrity, Indices and access leading to issues with: Migrations, Access tuning and Resilience, Computation power, and Sharding Sure there has been a lot of work put into it lately, but from a cursory perspective does it really change base fundamentals, or is it just a work around? Compare Cypher to SQL queries with respect to clarity and response, and I think findings lead to interesting questions. SQL is 50 years old. I like to think that data wrangling has come a long way, and maybe we need to periodically review assumptions and directions adopted.
Possibly, if you add tests that compare e.g. information_schema or the underlying catalog tables so that full comparisons are made for database objects, constraints, roles etc. This is quite a mammoth task though! Not sure if you could easily detect whether an out of order change due to git commit sequence would result in the same schema but data loss in the process, which is why our migrations are vetted by DBAs. Git rebase can help avoid out of order changes and result in a clean git history graph, but tends to be misunderstood by most developers to the point where they screw up without questioning it. Rebase also a bit more cumbersome as it replays through history for the rewrite and not suitable for shared branches. Saying that, some devs can't avoid screwing up a git merge either: "well I did `accept all my changes` last time and it worked so I'll do that again"
migrations as a whole are what prevented me from getting anything learned when I attempted to learn database stuff. They literally just wouldn't make sense to me.
atlasgo provides a solution for no merge conflict part. It creates a file named "atlas.sum" (Migration Directory Integrity File ), which depends on all the migration files. Adding new files changes the atlas.sum and will raise merge conflicts.
I would argue that what you're doing in PlanetScale is still a migration under the hood. You just have them handle the coordination against the database. A schema migration is just going from one schema state to another. Adding more steps doesn't really change that concept. I also struggle to see how this would work with decentralized database setups, such as products deployed on-prem, which is still a very real scenario in many instances. Especially if your customers are government institutions. Then you don't have a "production database", then you might have 20 different versions running all over the world and on different hardware. Our company needs to support customer hosted on-prem deployments on customer hardware and we support 4 database dialects. If we just stopped doing migrations, we wouldn't have a product. Sure, it's great to assess your needs and pick the most intelligent solution for your situation but proclaiming that we don't need migrations anymore seems like a very narrow minded hot take.
I believe you may be not using best/expected practices when using migrations, so the problems listed here seem like a big stretch. I’ve seen migrations in small projects to huge monoliths with multiple teams and these problems simply don’t exist given that people know what they are doing. Seems like you have an HR problem instead.
Planetscale is doing migrations. You're just paying for them to manage it and give you some abstraction over it. There are some nuances to migrations but it's the best option and even mentioning looking back in git history to manage your database is absurd.
I prefer to have the database model in git, and having direct control over the database instead of trusting a tool to automatically do migrations is preferable for me, and in my opinion a more scalable solution
How large is your database out of interest? In my case it has some 20 schemas, 500+ tables and about 30000 columns. Still have a mixture of some automated migrations and some very manual ones depending on the complexity and risk. Little and often is best I find.
Migration files are not an option with our SaaS as we have nearly 300 tables that constantly evolve. We are lucky that our ORM does not use migration files - we would never be able to easily consolidate all those files to see what the table structure should be now. With our ORM, we just change the properties in the original model file, and it runs the migration seamlessly in the background - oh, and it doesn't ever remove columns - that has to be manually done by us later if needed.
@@1879heikkisorsa No actually, we use a Ruby framework (not Rails) and the DataMapper ORM which is a really old ORM that is no longer maintained (well, we maintain it ourselves now), but I have yet to find another Ruby ORM that does the job better...
@@1879heikkisorsa Actually, looking at the GORM features - it sounds remarkably similar to the DataMapper ORM that we use - auto migrations, hooks, foreign key support, eager loading - all except transactions.
@@1879heikkisorsa Yes, sorry, I meant that saw GORM does transactions, but DataMapper alas, doesn't. But we can still do them by sending raw SQL statements in DataMapper.
This is not an issue with migration files itself but with how Prisma handles them. I don't like Django but what they did well is how migrations work. They all know which parent they're supposed to belong to and django will detect when you added one in-between. It will instruct you to merge them so they make sense again. Django also has features to roll-back a migration so you can basically implement both the real and the inverse migration.
@@MARKOTHEDEV I can’t send links on UA-cam without being shadow-banned, but just search for "Django Migrations" and check out the "django migration files" subsection. There are also subsections for "reversing migrations" and "dependencies" that are relevant for the context of this video
We generate migrations only after merging branches. It can slow things down when you have many devs, but it's a small team. It ensures that I'm testing with other people's code and schema changes before generating any SQL. We also run a staging deployment "backup -> migrate -> test -> restore and fix (if I broke things)" cycle to be certain before accidentally nuking production. This way if the migration does break anything, the staging DB can just be quickly restored and we can give it another go. We still have downtime, but that's just part of the project's expectations.
@@iaminterestedineverything Either mock data or obfuscated production data, depending on what we have available. Newer features usually won’t have the best data. It’s definitely not a mirror of prod data. That would be closer to a blue/green deployment. We connect to the staging DB directly over a VPN which means it can’t be true prod data for security reasons.
@@matthewevans3884 yeah I have that same issue, meaning mock or non-client data are used in staging. Makes life very difficult though, especially with constraints to dictionary tables etc. that are growing separately in prod
All those problems have solutions or mitigations. Running migrations in filename order (ie timestamp order) is a terrible design. It's better to have a master file that references all the migration files. That way the migration file will be run in the same order as commits. liquibase works this way, as do some others. I never had this kind of issue with liquibase. Your CI/CD should be running migrations on a test database. Also, individual developers should each have a copy of the database locally with migrations running after each git pull. Any migration should have been run many times before it ever gets to production. This should find many types of migration issues. There's a github project that will create a set of views and triggers to make make a new schema that behaves like the prior version of your schema I don't know if I'd trust it at scale, however. In a past job, we would effectively fork our database during migrations. The system ran 100K+ db transactions/hour during peak (11:30am). We had a master + read replicas. We would do a deployment every day at lowest usage, at 4am. We warned users beforehand with a notification, and submit buttons were disabled. At the start of the migrations, we disabled replication, disabled mutations in the app, and applied migrations to master. This allowed the app to run and for reads to continue working during the migration. At the end of the migrations we'd re-enable replication, restart the app with mutations (re)enabled, and make the read replicas unavailable until they had caught up (master would be considered a read replica until then). I can't say it had zero affect on users, but it was negligible.
That is why ORMs are so important, TypeORM synchronizes the state of the code with the database in building time. Which solves the problem about git sync and you dont need to run DB changelogs anymore. You just need to run your code against a new DB instance
This is one of the first rants from Theo I have listened to and been like now you are just bitching about your problems. I have worked with a team of twenty-three engineers before and had never had this issue.
Yeah I think the lack of mainstream tooling for changing SQL schemas has always been a primitive part of SQL... and likely was a big part of the reason behind the NoSQL fad. It also holds up progress on so many projects, because both devs + project managers etc are reluctant to make schema changes unless "we really need to". I ended up writing my own tooling for this to do things declaratively. It doesn't just make the job itself easier, but it also means I'm more likely to add to + improve the schema in the first place. It's a bit like the advantages of static typing when it comes to refactoring application code... the fact that the job is made easier, means you're more likely to do it at all.
I think the issues you've stated with DB Migration files aren't the actual issues you run into at scale. I worked on a team with 50 engineers using Knex and the scaling issues we ran into was failing to grab a lock on the table to run the migration. This is a bigger issue then out of sequence migration files. When you have tables with high write throughput, it becomes extremely difficult to grab a lock on the table before timeout occurs. This is something that Planetscale would solve because of their branch feature like you mentioned.
We definitely need to have better tooling for doing migrations but the overall ecosystem needs to improve. Like Planetscale + Prisma DB Push is ideal but that is a service and a specific npm package. It isn't inherit to the SQL ecosystem that can be utilized by all SQL Providers and SQL Packages. I'm happy for the innovation but I really hope to spreads.
How is that ideal? Prisma doesn't even execute the migrations in a transaction.
Sometimes even the most simple table alterations take forever because you can’t acquire that lock. I remember having to choose specific times to run them and it was usually late in the evening.
Downtime works well too however if its not a planned outage where you notify users, you have to gracefully kick users out without messing things up for them and not just brute force terminating connections which leads to poor experience.
@@djdocstrange yeah, I often had to run big migrations on the weekend during "off peak" hours. The worst was when there would be a long running transaction (customers often uploaded very large files that created a lot of rows) and we'd have to wait for it to finish.
Sometimes it would finish and you could run the migration, sometimes you'd have to wait until the next weekend (which also meant the code that accompanies the change went stale). Of course we also had monthly planned downtime, so that was when most big migrations were run.
@@imichael. I was meaning that it's an ideal developer experience. Does Prisma handling migrations perfectly? Nope. But being able to "modified" the schema and the SQL needed to make the changes are auto-generated is nice. I used to have to write migrations by hand and you always forget something. But yeah running the migration in a transaction is a nice option but I don't see it as a blocker if you're utilizing a service like PlanetScale with their DB Branches. Transactions aren't as critical in that environment.
@@TranquilMarmot Yup, the number of times we had to revert code changes for migrations because we couldn't acquire a lock to run the migration was getting out of hand. We started to look at low activity periods to run migrations. Which is fine when your service only exists in a few timezones, but you quickly realize that low activity periods are rare when you go international.
Every migration tool I've used trivially prevents migrations from being applied in an unexpected order by e.g. requiring each migration file to specify the hash of the previous migration file. That still obviously requires you to manually resolve conflicting migration files (e.g. from two parallel feature branches), but it does trivially prevent your new migration from applying onto a db schema that's different from the one it's expecting (assuming of course that your migration tool is the only thing ever changing the schema).
Sounds like a good safeguard. Afaik Prisma doesn't have it. Do you have a term for it?
Prisma Advisement....
Hmm... The problem looks a bit forced.
1. There is nothing wrong with running many migrations at once, for example when you're new developer and you have to create DB schema from scratch. It will take seconds anyway.
2. You can batch your changes and put them into one migration, wrapped in transaction.
3. If you have many teams, working on the same database, that's architecture smell. Why they are sharing so many data through database? Probably you service is doing to much, it's moving out of its boundary context? Syncing migration changes in scope of one team shouldn't be so hard :)
Also it's worth to mention that migrations are only for SCHEMA changes, NOT DATA.
I've been working with migrations for a couple of years and the biggest problem was always a developers who didn't see the impact of changes.
I agree with the architectural smell, I reckon a lot of the issues mentioned here are solved by solid review, communication and a QA environment.
I think the "architectural smell" is highly conditioned on a belief that things should be broken down in small services. I think *that's* a smell, too much trivial stuff gets taken out into small services, which might make your DB a bit easier to use but at the cost of massive organizational overhead, degraded performance on average, and particularly loss of correctness as maintaining transactional semantics becomes untractable.
So overall, a larger service with multiple dev teams using the same DB is a much better tradeoff. Dealing with DB schema is an easier problem than distributed transactions. It just happens that DB schema problems are hard to ignore, and distributed transaction problems are hard to detect. A large amount of people don't even think about them or realize the nature of the beast.
@@ghujdvbts I believe his point is that your schema migration tooling should not be used for data seeding. Data migrations are a must and occur via ETL, something entirely different, that is its' own bag of worms altogether.
@@AntoineGrondin there are many solutions to the issues you describe, the best microservices write to a ledger rather than the database for example, and a consumer handles the rest.
i mean the biggest issue with the model itself is the disconnect between SQL conflicts and merge conflicts. I've ran into that on teams a couple times but I just wrote a CI check to enforce migrations that dont conflict
Rails' active record has a schema file that represents the current state of the entire DB as well, so conflicts will show up in that file. Also, you can load the schema directly from the schema file, so you don't have to run all migrations to get the database into the correct state.
I wonder if Laravel has some package to do this. What Theo describes is often an issue with big teams working laravel :(
@@Souljacker7 check schema:dump it generates a schema file. It works in somewhat similar fashion you can give a try if thats fits your need
@@blank001 Ohhhh, that's right, there's a migration squashing option. That's interesting!
Apparently Theo is misusing migrations in Ruby
I hadn't even considered the first point about different git branching making conflicting database migrations
This has bitten me more than a few times 😅 Like he said in the video, it's not really a problem unless you have a database where the schema is actively being changed my multiple developers at a time.
After a few mistakes, I learned to check all open PRs for schema changes before merging any myself. Not a fun waste of time haha.
Of course, the error was almost always caught in staging/QA environments so it was "easy" to fix before it but prod
@@TranquilMarmot what about rebasing?
This shouldn't be possible with a migration tool that, for instance, requires each migration file to specify the hash of the previous migration file or specify all the migration files that it depends on. You still need to manually resolve conflicts, but you shouldn't ever run into a migration accidentally running on top of a different db schema that you expect.
It really isn't complicated to work around this.
We had a dedicated guy that did only the model. Then he merged it all himself and left the rest of the work for others.
Ah yes, the Senior SQL Model Manager
If you use Diesel (Rust), then migrations result in schema changes that are compiled against queries and conflicts will be observed during merge/rebase
Django (maybe other framework have this) has the best migration pattern I’ve seen. You make changes to your model and code is generated to reflect the migration. Additionally, each migration has a dependency. So if you do merge 2 pull requests at the same time the migrations will not run because the dependency tree is not leafed correctly.
I like this system so much that I mix the django orm when making golang projects.
TypeORM or MikroORM do the same, developers write code and ORMs took care of the DB.
- Schema update programmatically (useful during testing) or through the command line
- Generate migration once developers are sure they designed the right data model
Both support decorator and decorator(less) data model.
This is something inspired by more mature ORMs like Hibernate for JVM or Doctrine for PHP.
my understanding is that django works backwards through the tree. So it has one file that points to the newest migration, and it starts there and works until it find a migration that points to requiring the last applied.
So doing two merges would have conflicts in the files, and also the migration itself would not run since it would never match (unless the last merge had not actually be applied yet)
Hey, is there an equivalent of django migration for Go?
I think EF Core also works like this, you change your models then generate and run migrations, i think it also saves some kind of snapshot of the db to be able to say "the database is not in the state i was expecting" or something like this. I'm not an EF Core user, just had to use it a couple of times
Yeah, django migrations work fine, but you can still run into git conflicts (doesn't break the db but you have to address them), and they still have the downtime / backwards problem.
Not familiar w/ active record, but alembic for instance avoids the problem of someone changing the state of the DB from under you by having each migration file reference it's 'parent' migration, with the db keeping track of the current revision. So if I make migration A on top of B (the current parent) and someone merges in migration C on top of of B before me, alembic will yell and not allow A to be run w/o a manual change to A.
Most db migration tool would have something similar - basically checks whether the db is in a valid state or not
If you run your migration scripts in your CI/CD (which you would need to if you’re doing any black box testing), then any conflicts would show up as errors (either on the migration script or on the tests)
But since theo dont write tests, i guess they only find out in production if there’s any issues 😅
Alembic doesn't solve the issue of code not necessarily knowing about the DB changes. So his dilemma of 100% backwards compatible vs downtime is still present.
Django won't start if you have out of sync migrations, and it also provides automatic merging of migrations.
Also in migration files you can specify what script/sql you need to run for forward migration or backward migration. hence i believ all the probems are fixed specified in video.
One problem is to acquire lock on table, if table is updated frequently, and lets say you want to roll back to previous version, you might have problem acquiring lock for table.
I beleive this is what we do, to avoid problems, to have seperate releases, one that updates database and second which populates data, and third release which is deleting old field / code asscociated with it.
How is this tool different from what I can do with migration files? I can avoid the sequential problem by using a high resolution prefix to maybe second or millisecond, this will give a file system sequential order and the chances my changes and yours are done at the same time is incredible small, also code reviews before merging migrations can avoid the problems of accidentally causing conflicting changes. Additionally, I can implement this "just add or remove fields, nothing else" logic in migration files without requiring the use of a tool that I need to pay for?
Don't get me wrong, I'm all for cool tools that make our lives easier. But this video seems a bit too much like hawking a product instead of actually helping people with a solution. So as a developer, I kinda feel a bit icky about these types of videos where you're not solving problems, but just trying to get revenue from pitching products.
This is only the case when you don't have a process in place. What are PR review? what are release plans? What is due diligence? 5 years of using migrations, haven't once had a time where we had down time due to use of migrations. It was always a poorly thought out change which is equally as likely in branching model. Speaking of which, imagine designing an essential part of change management around proprietary technology.
One way to think about this is it's a dependency issue. Fundementally you always take two actions to update a dependency in a non backwards compatible way. Roll out new stuff, cleanup old stuff or alternatively break the old stuff, roll out new stuff. It's still exactly two actions taken on the interface of your dependency though. This abstraction applies to more than db, for example dependent microservices.
Yep and this is just like PostgreSQL MVCC in a nutshell, except you are applying it in a wider context across separate consuming applications connecting to the db as opposed to just separate transactions managed by the db itself.
Do you honestly believe that everyone using Django, Rails, Alembic, FlyWay and other file-based migration tools hasn't encountered and worked around every single "problem" you describe in this video?
Consider the possibility that there are massive engineering teams who are building important software that use these migration tools very successfully. Do migration conflicts exists? Yes, but just because conflicts exists in git doesn't mean people should drop git or feature branches.
This is also a workflow problem. Making sure your branch is up to date before merging so you have all the latest migrations.
If you just push directly to main or don’t enforce branch protection then yes you’ll run into these problems.
You also just learn to migrate things safely with multiple steps and zero downtime.
It also helps to not use a single database for multiple services. That is just asking for problems. That is a reason why you’d not want to use migration files, because you have multiple people in multiple repos making database changes. (Never do this, please)
This is where he started growing the moustache, guys.
In the rust world with sqlx, the queries are checked at compile-time. So, Incorrect migrations will not let the rust code compile and the CI will prevent those changes in the code which can cause sql conflicts.
To be honest I haven't found the issues mentioned here problems with the backend systems I've worked on.
1) Most migration systems track the "parent" migration in some way which ensures that migrations are always sequential
2) Code reviews help shake-out any potential conflicts
3) In smaller teams, its unlikely that there will frequently be multiple changes to the database anyway
4) Backwards in-compatible changes to the database are, in-general, uncommon - the migration steps process mentioned by Theo (add new col, fill new data, backpopulate old data) are all pretty straightforward in the migration systems I've used.
5) You add migration checks to your CI - so if there is a conflict, you know about it.
6) Migration systems usually do in-fact have a concept of backwards/forwards - so even if stuff goes wrong, its easy to revert a migration and re-deploy an older version of your software
Speaking here having worked on small-medium sized teams with Django, Alembic, Phoenix migration systems - no doubt there are more acute issues at larger scales.
Django's migration system is so good I saw people using it for migrations in node.js projects.
Actually since i have to solo develope alone and this was one of my biggest questions... how do you update in a "professional matter a schema" and i always did the "downtime" but now the second option seem alot friendlier for development in a rapid chaning environment where downtime can cost alot
You can reduce a significant portion of migration issues if you use db views for reads. Granted, there are still problems with inserts/updates/etc, but most web-based interactions will likely be reading way more than writing, so the overall issue is mitigated pretty considerably.
I don't get one thing. How is PlanetScale's approach different with "synchronization"? One still needs to go to their dashboard and manually deploy changes from the dev branch to the production branch. So a dev merges and deploys frontend code to production, they lose connection and cannot approve deployment of PlanetScale branch - production is out of sync. Or it will be anyways for a little period of time difference between frontend deployment and database deployment. Where am I wrong? There's still the downtime/backwards compatibility issue.
Just advertising prisma
This content is awesome, I love hearing these concepts with out specific tech tied to it!
good vid. really preaches the basics of good project management and record keeping of change controls.
We still use migration files, works fine.
We actually have a CI check that validates the schema expected by the codebase against the migrations, so if there is a problem trying to merge migrations which are out of order, we know about it before the code hits master branch.
We also have 2-phase migrations. Each migration file has a "before" phase and an "after" phase. The before phase runs SQL before new containers (expecting new schema) are created, and the after phase is run after the last of the old containers is destroyed. It enabled us achieve forwards and backwards compatibility with automatic execution.
It's not perfect, and writing migrations is a hell of a lot more complicated this way. But yeah, databases are hard.
It sounds like you're just using the migration feature of whatever framework to just update the schema in a predictable way. Basically automation of the backwards compatability model described in the video. I wonder at what point we don't call them "migrations" anymore.
I wonder if you have courses. Like at Frontend Masters.
I want to learn more about this way of managing changes in Planetscale (which I might consider using since I'm going to be starting a new project)
he doesn't like making coding tutorials (from what i've seen) if that's what your looking for. he's also really focused on his company (Ping) right now. if he were to make courses, they would be free since he believes in free education
edit: to be clear, i dont want to put words in his mouth (except the free education thing, he's said that explicitly), but this is the general vibe i get from theo after watching and talking in the discord for several months
I'm a big fan of Django Framework and how it deals with migrations. I just think the way it deals with migrations is outstanding.
I worked for a large corporation that had a major project with migration files named with prefixed sequential numbers, and it would fail because the script sorted the numbers in alphabetical order. What people actually did was manually load a db image recent enough to avoid the issue before running migrations. I wish I could say that was the stupidest thing I saw, but it didn't even make the top ten list.
This is sadly common, especially when people don’t bother to set up db seeds for development environments.
I'm confused, are you advocating against manually written migration files, or against migration files alltogether? In the work-flow you describe at time 4:52, the system creates a migration file for you, right? (at least I hope that this is what happens, so that you have a way to inspect what changes to the database are about to be made). If you consider this approach to be sound, then can you explain once more what you mean by "we don't need migrations anymore"?
Based on reading the comments, It sounds like there is much more to discuss on the topic. Maybe you can make a follow-up video.
Yea as other comments have mentioned migrations work really well for my team. Even at scale it usually works fine
The issues you raise regarding conflicting changes in different MRs can be resolved with good CI/CD and automated testing. DB migrations have been around for 30 years because they work - providing you know the dangers and how to mitigate them.
I think that people can use stored procedures and functions to interact with the database, instead of plain sending SQL queries. Abstracting the use of the database can help simplify and allows the DBA to do his job without interrupting the developers job.
What is the benefit of db push over the migrations? In prismas case? Prisma scheme doesn't allow all possible things you'd want on the db, like checks for nonnegative, so you'd need to write that in SQL...
Don't let multiple people work on the same table or feature at the same time. Stand up meetings are for that.
You don't directly ship your code to production it goes through staging first then when it's stable you deliver the new code that's expecting "fullName" and right after that you run your new migrations.
Downtimes are a must for maintenance
That seems like the harder thing to do.
Don't leave to humans what you can automate.
You can easily add systems to prevent the kinds of migration conflicts mentioned.
Well for 1:22 Django migrations system actually handles that pretty well - it will force you to redo migration files if you merge two parrallel branches that affects the same tables. And as well it stores all the history and state.
And you can as well write custom (python) migration code to handle dramatic changes without downtime
This is exactly why me and my team just give the f up and go for a NoSQL database instead on new projects. Had enough with syncing structures and randomly having constraint errors just because you missed a single step.
seen like a ads
The out of order thing can be worked around with hashed migrations.
The migration application and the whole "downtime" thing is definitely one of the trickier scenarios. If I have to include data mutations that will cause breaking changes I apply it in stages, so nothing breaks. Having migration files for this does provide good documentation for changes over time however which is great.
We work at a reasonable scale with multiple team members. Our migration files never get out of sync. We have short sprints with code reviews. Developer branches have to be up to date with the latest merges on production. That is more about using git well and managing sprints well.
Also, while it is true that your local would run through all the migrations, that is not what is happening on production. I'm sure Theo understands this, but it may be easy for new comers to get confused by this point. Production is only going to run new migrations. So it's not really that hard to perform code review and make sure you don't have issues with the migrations up front. A lot of the time a migration file will be a one line change that is adding a column or removing a column. It's not hard to look at that line and figure out whether or not it will conflict with some other migration in the release.
Furthermore, and now that I think of it, I think it would be extremely rare to have any conflict between these files anyway. Why? Imagine it. What needs to happen for a conflict to occur?Basically two different developers would need to be making changes to the same table. Then we could imagine some situation where one developer removes a column, presumably for clean-up purposes, while another developer tries to manipulate data in that column. Or maybe one developer wants to add a column to table that another developer is going to drop entirely. Or maybe two developers want to change the value of the data in one field to different values because their features require different values in that field.
Every single one of those scenarios is due to bad planning and has nothing to do with migrations. If you are running into any of those situations, you need to seriously rethink how you are going about planning features and sprints. If two different features require different values in the same field, that's a problem well before you get to the stage of writing a migration. You need to figure it out before you assign tickets to anyone.
The video is confusing because how we were taught in uni and how I have seen people do databases is they plan it out so carefully you'd think they are building a nuclear reactor before they write actual code to use the db. Why is he talking about something that's more like a nosql problem than a SQL problem?
db push with Prisma seems risky. Would you automate it? Or should this be a manual thing?
And whats the alternative to migrations? Thats does not involve a saas product?
I havent run into this. When a dev changes a column name this dev also changes the code to ensure columnName is changed in the code. Also our pipeline will fail if your migration cannot go down
this requires downtime as there's no guarantee that the production code that is running and DB schema is synced
@@askingalexandriaaa yes, a migration does require downtime. That’s why you inform your users and do it in the middle of the night.
Maybe I'm naive, but why isn't there a 'Rename Symbol' (vscode) like feature between code and DB. I know, we need a graph/tree as an index and there needs to be some underlying 'hard rules' that allow a structure to be refactored. If we can do it for files, why not for databases.
One major point missing from the vid: does PlanetScale solve the issue of the need for 3 queries (in the column rename example) to maintain compatilibity?
no
Yeah, it requires deprecation of fields. Utilize the `@ignore` directive in Prisma.
Isn't it easy enough for the migration thing to have a file that points to the latest migration? So that would conflict when two PRs are being merged. That's something I've seen. I think Django does that automatically in how their migrations work.
Theo my man, this is the type of content I subscribed for 💪
What if migration was a single file that could be interpreted by the editor to be multiple files. Then it would actually solve the git problem
Yes, and VIEWS can help a lot, it keeps the original table untouched and it can perf really well with materialize enable (cache system).
Thing that I don’t like with PlanetScale it’s that WITH queries are not supported yet…
Love your videos, as aways. Just wondering, as a data scientist, we deal with data versioning on our mlops stack, I was wondering what are your thoughts on technologies like dvc, dolt or lakefs, the planetscale feature looks like it's doing this under the hood
but how would you do this for software that is self hosted by lots of people? How can you provide updates for them without migrations? like for example open source forum software that is self hosted
This guy looks like he knows what he's talking about
Would db push fix the problem of devs having two different prisma models on a different git branch? Wouldn't this cause a similar conflict where db push would override another?
Need more content on data modeling and deployment pls
This is why Laravel makes a table with which migrations have been ran. Never had a problem with git and migrations
Indeed, Laravel makes this safe and easy. But in general someone has to be dedicated for making migrations in a team. Not everyone should do it, and some communication is needed for migrations changes to avoid any conflict.
@@hsider Well yes, as always. Even Theo's proposed fix requires communication because the feature in your branch could be relying on the old schema
Ooof this is a spicy one. I don't know how I feel yet.
I thought I knew what a schema was but I guess I do not... I change my schema using ALTER TABLE. But the existence of a schema definition doesn't explain how to transform an old schema into the new. That's what migrations are... I guess I'm missing the point.
Creating a new field with duplicate data (and removing the original later) is a good solution for renaming. Is there anything better than creating a new field with a new name & type if you just want to change the type but you still wanted the original name? i can't think of an exact example of when you would want to do this off the top of my head, but for example if you wanted to change the number of characters in a varchar, the size of an integer field, integer -> float, etc.
The examples you gave are backwards compatible, at least in one direction (from less to more).
A less backwards compatible conversion would be f.e. the opposite - from float to an int, less chars in varchar... In that case, you would need to ensure your code doesn't produce data that needs more space, and also somehow update the existing data to fit the new constraints before you change the type.
Migrations need to have a central lookup with a directed acyclic graph to know which migration depends on what. We ran into other issues such as purging caches, leader/follower replication restrictions and locks on huge tables.
We use a separate db 'lock' file which updates when every migration is run. If a developer tries to merge and another migration has happened there will be a conflict they need to resolve. Doesn't solve the downtime/backwards compatibility issues though.
Thank you for this. I have been doing db push for a while now and was thinking about using migration files. I think I'll stick with db push for now 😸
the noise at 2:46 scared the shit out of me it was so loud
Hey Theo! Have you used the new AuthJS successfully with SvelteKit 1.0? I'm hitting some errors... wondering if its bugs or ... TS 🤦🏾♂
Can you please make videos about the schema approach for better understanding 🙏🙏
The first problem just doesn't occur in some systems, specifically in Python SQLAlchemy migrations as it requires sequential numbering with no repetitions. In other systems it has a notion of branching and merging built into the migration structure. It's a bad implementation not an inherent problem of the system.
In the case of renaming fields, I would argue for just not making that change. Like, how much better is FullName than Name anyway? Is it really worth all the risk and hassle? A name would need to be pretty misleading and bad before I'd want to take that on.
I'm a big fan of monolithic idempotent DDL scripts. It conditionally makes changes to the database if they haven't already been made. Most of the time, it will check the schema, and take no action. Git history makes it easy to see what was changed when. And it's written in the native language of the RDBMS so you have access to the full feature set without some leaky abstraction getting in your way
I'm really not convinced with your counter arguments. db push still poses the same risks for example.
MirkoORM or TypeORM already solved the problem and the solution exists decades ago, I think it is good to learn how other communities solved the same problems that Node developers are facing nowadays.
We really really need a video about uploading files. What to see as streaming upload and as 1 request file upload. Because it's such a bad thing now. Also we will appreciate to see it in trpc, usual rest and graphql
Agree with the sentiment but migrations are a less evil part of the more evil SQL which, in IMHO after a few decades dealing with it, needs to be depreciated. By what, I don't really know but edges are a clue, but maybe not in the form that we know them. So many challenges, so little time.
Can you expand on that a little bit more?
@@Dude29 Hi Dude29, thanks for the comment.
The argument is that there is too much management and overhead involved with:
Joins, Referential integrity, Indices and access
leading to issues with:
Migrations, Access tuning and Resilience, Computation power, and Sharding
Sure there has been a lot of work put into it lately, but from a cursory perspective does it really change base fundamentals, or is it just a work around?
Compare Cypher to SQL queries with respect to clarity and response, and I think findings lead to interesting questions.
SQL is 50 years old. I like to think that data wrangling has come a long way, and maybe we need to periodically review assumptions and directions adopted.
Wouldn't CI/CD tests fix the first issue mentioned?
Possibly, if you add tests that compare e.g. information_schema or the underlying catalog tables so that full comparisons are made for database objects, constraints, roles etc. This is quite a mammoth task though! Not sure if you could easily detect whether an out of order change due to git commit sequence would result in the same schema but data loss in the process, which is why our migrations are vetted by DBAs. Git rebase can help avoid out of order changes and result in a clean git history graph, but tends to be misunderstood by most developers to the point where they screw up without questioning it. Rebase also a bit more cumbersome as it replays through history for the rewrite and not suitable for shared branches. Saying that, some devs can't avoid screwing up a git merge either: "well I did `accept all my changes` last time and it worked so I'll do that again"
migrations as a whole are what prevented me from getting anything learned when I attempted to learn database stuff. They literally just wouldn't make sense to me.
atlasgo provides a solution for no merge conflict part. It creates a file named "atlas.sum" (Migration Directory Integrity File ), which depends on all the migration files. Adding new files changes the atlas.sum and will raise merge conflicts.
Migrations are a must when you're deploying your server-based product to many locations that you do not manage.
Your issues with this seem more to do with poor communication or project management, nothing to do with migrations themself.
Just a lil bumparooski for the algorooski
That’s not an issue with migrations. It’s a problem with the framework not knowing the database schema. Migrations are great
I would argue that what you're doing in PlanetScale is still a migration under the hood. You just have them handle the coordination against the database. A schema migration is just going from one schema state to another. Adding more steps doesn't really change that concept.
I also struggle to see how this would work with decentralized database setups, such as products deployed on-prem, which is still a very real scenario in many instances. Especially if your customers are government institutions. Then you don't have a "production database", then you might have 20 different versions running all over the world and on different hardware. Our company needs to support customer hosted on-prem deployments on customer hardware and we support 4 database dialects. If we just stopped doing migrations, we wouldn't have a product.
Sure, it's great to assess your needs and pick the most intelligent solution for your situation but proclaiming that we don't need migrations anymore seems like a very narrow minded hot take.
Theo can you please make a video about the best databases to use right now? Document and SQL?
But it's difficult to go through git history
I believe you may be not using best/expected practices when using migrations, so the problems listed here seem like a big stretch.
I’ve seen migrations in small projects to huge monoliths with multiple teams and these problems simply don’t exist given that people know what they are doing. Seems like you have an HR problem instead.
Planetscale is doing migrations. You're just paying for them to manage it and give you some abstraction over it. There are some nuances to migrations but it's the best option and even mentioning looking back in git history to manage your database is absurd.
I prefer to have the database model in git, and having direct control over the database instead of trusting a tool to automatically do migrations is preferable for me, and in my opinion a more scalable solution
How large is your database out of interest? In my case it has some 20 schemas, 500+ tables and about 30000 columns. Still have a mixture of some automated migrations and some very manual ones depending on the complexity and risk. Little and often is best I find.
@@iaminterestedineverything 630 tables, 8000 columns in the main schema. Some secondary schemas that are much smaller colum wise
@@iaminterestedineverything I've been fiddling with prisma and the tooling is pretty fkin nice...
Migration files are not an option with our SaaS as we have nearly 300 tables that constantly evolve. We are lucky that our ORM does not use migration files - we would never be able to easily consolidate all those files to see what the table structure should be now. With our ORM, we just change the properties in the original model file, and it runs the migration seamlessly in the background - oh, and it doesn't ever remove columns - that has to be manually done by us later if needed.
@@1879heikkisorsa No actually, we use a Ruby framework (not Rails) and the DataMapper ORM which is a really old ORM that is no longer maintained (well, we maintain it ourselves now), but I have yet to find another Ruby ORM that does the job better...
@@1879heikkisorsa Actually, looking at the GORM features - it sounds remarkably similar to the DataMapper ORM that we use - auto migrations, hooks, foreign key support, eager loading - all except transactions.
@@1879heikkisorsa Yes, sorry, I meant that saw GORM does transactions, but DataMapper alas, doesn't. But we can still do them by sending raw SQL statements in DataMapper.
Great video , You sound 2x smarter at 2x speeed
Or use Django?
This is not an issue with migration files itself but with how Prisma handles them. I don't like Django but what they did well is how migrations work. They all know which parent they're supposed to belong to and django will detect when you added one in-between. It will instruct you to merge them so they make sense again. Django also has features to roll-back a migration so you can basically implement both the real and the inverse migration.
Can. U send like a reference to this will like to read more
@@MARKOTHEDEV I can’t send links on UA-cam without being shadow-banned, but just search for "Django Migrations" and check out the "django migration files" subsection.
There are also subsections for "reversing migrations" and "dependencies" that are relevant for the context of this video
@@FlorianWendelborn alright man thanks
My database has it coming.
What's an Aunty Pattern?
Use sequential migration names like 0001.sql, 0002.sql, 0003.sql
Git problem solved
You are welcome
sounds like solutions looking for problems. good luck for snake oil corp
We generate migrations only after merging branches. It can slow things down when you have many devs, but it's a small team. It ensures that I'm testing with other people's code and schema changes before generating any SQL.
We also run a staging deployment "backup -> migrate -> test -> restore and fix (if I broke things)" cycle to be certain before accidentally nuking production. This way if the migration does break anything, the staging DB can just be quickly restored and we can give it another go.
We still have downtime, but that's just part of the project's expectations.
I disagree
@@Rust_Rust_Rust Very insightful
Do you populate your staging with production sourced data before the migration, or do you fill it with mock data?
@@iaminterestedineverything Either mock data or obfuscated production data, depending on what we have available. Newer features usually won’t have the best data. It’s definitely not a mirror of prod data. That would be closer to a blue/green deployment.
We connect to the staging DB directly over a VPN which means it can’t be true prod data for security reasons.
@@matthewevans3884 yeah I have that same issue, meaning mock or non-client data are used in staging. Makes life very difficult though, especially with constraints to dictionary tables etc. that are growing separately in prod
All those problems have solutions or mitigations.
Running migrations in filename order (ie timestamp order) is a terrible design. It's better to have a master file that references all the migration files. That way the migration file will be run in the same order as commits. liquibase works this way, as do some others. I never had this kind of issue with liquibase.
Your CI/CD should be running migrations on a test database. Also, individual developers should each have a copy of the database locally with migrations running after each git pull. Any migration should have been run many times before it ever gets to production. This should find many types of migration issues.
There's a github project that will create a set of views and triggers to make make a new schema that behaves like the prior version of your schema I don't know if I'd trust it at scale, however.
In a past job, we would effectively fork our database during migrations. The system ran 100K+ db transactions/hour during peak (11:30am). We had a master + read replicas. We would do a deployment every day at lowest usage, at 4am. We warned users beforehand with a notification, and submit buttons were disabled. At the start of the migrations, we disabled replication, disabled mutations in the app, and applied migrations to master. This allowed the app to run and for reads to continue working during the migration. At the end of the migrations we'd re-enable replication, restart the app with mutations (re)enabled, and make the read replicas unavailable until they had caught up (master would be considered a read replica until then). I can't say it had zero affect on users, but it was negligible.
SSDT BOIS!
That is why ORMs are so important, TypeORM synchronizes the state of the code with the database in building time. Which solves the problem about git sync and you dont need to run DB changelogs anymore. You just need to run your code against a new DB instance
and it gets worse the more environments you have
What does he mean when he says "don't have a good story". What kind of story?
I think he meant development experience. The steps you usually do with a certain tool to achieve a certain goal. I think he calls that a story
This is one of the first rants from Theo I have listened to and been like now you are just bitching about your problems. I have worked with a team of twenty-three engineers before and had never had this issue.
Yeah I think the lack of mainstream tooling for changing SQL schemas has always been a primitive part of SQL... and likely was a big part of the reason behind the NoSQL fad. It also holds up progress on so many projects, because both devs + project managers etc are reluctant to make schema changes unless "we really need to". I ended up writing my own tooling for this to do things declaratively. It doesn't just make the job itself easier, but it also means I'm more likely to add to + improve the schema in the first place. It's a bit like the advantages of static typing when it comes to refactoring application code... the fact that the job is made easier, means you're more likely to do it at all.
Not convinced, sorry. Migrations are hard, but your alternatives don't seem much better