Do you want to simplify your development process? Grab my free Clean Architecture template here: bit.ly/3Andaly Want to master Clean Architecture? Go here: bit.ly/3PupkOJ Want to unlock Modular Monoliths? Go here: bit.ly/3SXlzSt
Hi Milan! When ef core did Cartesian product without split query. .did that minimal api end point also returns 68k rows for that emp id similar to SQL results?
Ah, the cartesian explosion. SQL is one of most underrated skills in my opinion. Great SQL skills (and DB skills in general, not just SQL) usually make your software 20x better, wheter you use ORM or not
Hey Dule :) This is an interesting one, since you'd get a similar problem even with SQL. I did exaggerate the issue by using a very large dataset, but it's still interesting.
This is exactly why i allows have the sql profiler open My problem with the new generation of developers whos working EF without having any sql knowledge or how EF is translating I saw over a 20 include in 1 EF core call
You would also have people that don't realize the SQL query that EF writes is bad also. Cartesian explosion was an issue well before people started using ORMs. :)
Another way to get around this issue without having to use split queries if you're using Postgres is by using the built-in json aggregate functions. Instead of the database returning tons of duplicated rows due to the cartesian explosion problem you explained, it returns the exact JSON that you need. Not sure how this feature works with EF Core, but it is indeed really useful and it would be cool if you could compare both approaches!
@@christian.mar.garcia Nice, didn't know. But how do you create the structure of the JSON? Seems like you would be returning a stringified version of the result set?
@@gigantedocil there are some json functions in t-sql for getting the query results in json format. You have to decide if you want to include an empty array for no data or null for some related data not present.
don’t depend on EF or any orm for complex join queries with perf critical, you don’t know what happened when running, in this case call ef with your raw sql that you can control it.
Is there any reason to have the Where method after include? I instinctively write the filter first as if in any other linq. I guess since it is translated into SQL anyway, it doesn't make much difference, but if it were some operations on IEnumerable, you would actually do Include on everything, before discarding majority of it.
ORMs make this such an easy thing to get wrong by mistake, though even in the raw sql it's not necessarily obvious at a glance. It's a shame that returning complex objects from relational SQL databases never really became a 1st class feature, and we still have to resort to hacks like embedding into json or (ugh) XML, or otherwise fiddle with multi-result sets.
So if i am correct you are basically saying that if you are using a lot of includes its bad. Then you are suggesting query splitting and then based on benchmarks query splitting is making it slower. Is this correct? If this is correct what is the punchline of the video? Because it is confusing.
I don't think you quite got this right. - The problem isn't _just_ the includes. The problem is JOINs on the same level, causing a Cartesian explosion. - Based on the benchmarks, query splitting is 9x faster in the case of a Cartesian explosion. So yes, it's needed here. - The other example, where we just have many JOINs but no Cartesian explosion, there isn't a benefit from split queries.
Is AsSplitQuery effective when I use the navigation properties in the Select method rather than Include? In the Select method, I only fetch some necessary properties of navigation property rather than fetch all them. Example: var employees = await context .Employees .Select(e => new EmployeeDto() { Salary = e.SalaryPayment... }).ToListAsync();
@@MilanJovanovicTech So you are saying that EfCore optimises query and filter the where query before join or it just doesn’t matter if we filter before join or not?
According to EF Core documentation: “Note that cartesian explosion does not occur when the two JOINs aren't at the same level” Are you sure your example applies to cartesian explosion case?
could've mentioned the ginormous difference of data (400KB vs 18MB) on the closing "Employees" benchmark Querysplitting can drastically improve "scalability" (as in reducing shared bandwith usage) and reduce egress costs. especially for common "crud" endpoints where some entities literally join 8 tables, with 4 of them being one-to-many or many-to-many.
Do you write the SQL statement as part of the code in Dapper or do you use StoreProc? Because in my experience I have made a lot of mistakes caused by typing the wrong SQL statement. That's what I don't like about Dapper. I always prefer using strong type.
So what is your alternative? I know there is another option like Dapper. But I don't like writing the SQL statement as part of the code because it's prone to errors or typos.
Do you want to simplify your development process? Grab my free Clean Architecture template here: bit.ly/3Andaly
Want to master Clean Architecture? Go here: bit.ly/3PupkOJ
Want to unlock Modular Monoliths? Go here: bit.ly/3SXlzSt
Hi Milan! When ef core did Cartesian product without split query. .did that minimal api end point also returns 68k rows for that emp id similar to SQL results?
100K UA-camr , congrats brother !!!
Thank you 🙌
Большое спасибо за видео! Очень интересно.. Поздравляю с 100 000!
Спасибо! :)
Ah, the cartesian explosion.
SQL is one of most underrated skills in my opinion. Great SQL skills (and DB skills in general, not just SQL) usually make your software 20x better, wheter you use ORM or not
Hey Dule :) This is an interesting one, since you'd get a similar problem even with SQL. I did exaggerate the issue by using a very large dataset, but it's still interesting.
Congratulations on 100K subscribers
Thank you so much 😀
This is exactly why i allows have the sql profiler open
My problem with the new generation of developers whos working EF without having any sql knowledge or how EF is translating
I saw over a 20 include in 1 EF core call
This is why I always say to learn SQL first 😁
You would also have people that don't realize the SQL query that EF writes is bad also. Cartesian explosion was an issue well before people started using ORMs. :)
Another way to get around this issue without having to use split queries if you're using Postgres is by using the built-in json aggregate functions. Instead of the database returning tons of duplicated rows due to the cartesian explosion problem you explained, it returns the exact JSON that you need. Not sure how this feature works with EF Core, but it is indeed really useful and it would be cool if you could compare both approaches!
Returning json from sql is also possible in SQL server since the 2016 version. You have to store the result in a varchar(max) variable and return it.
@@christian.mar.garcia Nice, didn't know. But how do you create the structure of the JSON? Seems like you would be returning a stringified version of the result set?
@@gigantedocil there are some json functions in t-sql for getting the query results in json format. You have to decide if you want to include an empty array for no data or null for some related data not present.
Hopefully we get a batching API in EF Core one day
don’t depend on EF or any orm for complex join queries with perf critical, you don’t know what happened when running, in this case call ef with your raw sql that you can control it.
The cross-product in 2:54 is wrong, isn't it? It's not combining the rows correctly right? Correct me if I'm wrong
Yep, didn't update the yellow table. But seems I fixed it in the 3x3 example
I was going to leave a similar comment 😅 thanks @MilanJovanovicTech keep the good work
Congratulations on 100K.
Thanks!
would as no tracking with identity help here?
Not really, no. Different type of issue.
Is there any reason to have the Where method after include? I instinctively write the filter first as if in any other linq. I guess since it is translated into SQL anyway, it doesn't make much difference, but if it were some operations on IEnumerable, you would actually do Include on everything, before discarding majority of it.
I prefer having it after Include
Putting includes before .Where() won't make query slower?
It won't matter, since it's all translated into one SQL query
How did you setup your benchmark project to use the AppDbContext?
Just hardcoded the connection string
@@MilanJovanovicTech any chance of the source code. Would like to try this benchmarking out for myself.
could you please explian projection ? 11:00
Select(x => { take only the columns you need } )
amazing content man , thanks
My pleasure!
how did you load the related data?, did you install a package and configure something in a file?
No, just EF includes
@@MilanJovanovicTech interesting when we are loading related data it produces error until we configure it in program.cs
Congratulations for 100k.
Thanks!
The thing is sometimes, for more complex use cases, you need joins and left jokns and many queries working together
We're not arguing against it 😁
How can I appear database icon next to entity class? 😅😅
ReSharper
ORMs make this such an easy thing to get wrong by mistake, though even in the raw sql it's not necessarily obvious at a glance. It's a shame that returning complex objects from relational SQL databases never really became a 1st class feature, and we still have to resort to hacks like embedding into json or (ugh) XML, or otherwise fiddle with multi-result sets.
I don't think the ORM is to blame here
Problem is objects, not SQL. Data is first class.
@@FrancoGasperino I mean support for sum types as well as product types
It's a little bit off topic. Which IDE to use for doing .NET on MacOS ? Since few months, Visual Studio is deprecated.
Thanks for all your videos !
Rider
VS Code?
So if i am correct you are basically saying that if you are using a lot of includes its bad. Then you are suggesting query splitting and then based on benchmarks query splitting is making it slower. Is this correct? If this is correct what is the punchline of the video? Because it is confusing.
I don't think you quite got this right.
- The problem isn't _just_ the includes. The problem is JOINs on the same level, causing a Cartesian explosion.
- Based on the benchmarks, query splitting is 9x faster in the case of a Cartesian explosion. So yes, it's needed here.
- The other example, where we just have many JOINs but no Cartesian explosion, there isn't a benefit from split queries.
How to see the query content in EF ?
ToQuery or look at the logs
Is AsSplitQuery effective when I use the navigation properties in the Select method rather than Include?
In the Select method, I only fetch some necessary properties of navigation property rather than fetch all them.
Example:
var employees = await context
.Employees
.Select(e => new EmployeeDto() {
Salary = e.SalaryPayment...
}).ToListAsync();
Try linqpad to see the sql generated by the linq query. Normally include create joins whereas projection creates subqueries
Well it's hard to tell from your small example, but you can easily test if split queries work better
Amazing video, thanks
Glad you liked it!
Bro why you just didn't filter employeeId (Where(e => e.Id == id)) before the include ?
What difference would it make? 😅 This is LINQ -> SQL
@@MilanJovanovicTech So you are saying that EfCore optimises query and filter the where query before join or it just doesn’t matter if we filter before join or not?
According to EF Core documentation:
“Note that cartesian explosion does not occur when the two JOINs aren't at the same level”
Are you sure your example applies to cartesian explosion case?
I don't know, did you watch the video?
It’s more like in sql , u create temporary table 1 and 2 and then combine them before return to client
Won't it still have the same problem?
Could you suggest a tool for working with SQL Server databases? I find Azure Data Studio a bit lacking.
DBeaver is a great tool to use with SQL Server!
SSMS
DBeaver is great
2:05, please add a white screen warning
I forgot 😁😁
Milan, please submit your implementation of Result pattern to Github if its possible. Thanks for your rich content 👍
Хвала пуно! 😁
could've mentioned the ginormous difference of data (400KB vs 18MB) on the closing "Employees" benchmark
Querysplitting can drastically improve "scalability" (as in reducing shared bandwith usage) and reduce egress costs.
especially for common "crud" endpoints where some entities literally join 8 tables, with 4 of them being one-to-many or many-to-many.
I was implying that by the # of records retrieved in either example
I know that pagination does not work in split query.
Probably (but that won't suffer from this issue as much since we're limiting the # of rows we return)
May I get source code of it?
www.patreon.com/posts/source-code-are-111190292
There is a lot of magic (black box + documentation) to read about EF. This is why I like to use my own SQL with Dapper
You'd have the same problem in SQL
The EF Core docs are very comprehensive and detailed.
@MilanJovanovicTech however, inefficient joins are easier to observe and fix.
@krccmsitp2884 true and long
Do you write the SQL statement as part of the code in Dapper or do you use StoreProc? Because in my experience I have made a lot of mistakes caused by typing the wrong SQL statement. That's what I don't like about Dapper. I always prefer using strong type.
As for consistency, a REPEATABLE READ transaction is to the rescue. With its own implications, of course.
Would detract from performance though
Now add a text field with Task description
What would that do?
Better to use linq for complex queries
Maybe, benchmark it
wait i might done this 😂 ill check asap
And...? 😁
splitting the query is not a magic feature, it is proof that ef core is slow.
Slow is relative
If you want performance, its simple. Dont use EF.
Some people can't be helped
Am I only one guy who hates EF and don't understand why people still using it?
Among the few
Yes you are
Probably
Yes
So what is your alternative? I know there is another option like Dapper. But I don't like writing the SQL statement as part of the code because it's prone to errors or typos.
Great video , thank you.
Thank you too!