T-SQL Skills: Loading Millions Of Rows Of Test Data In Seconds
Вставка
- Опубліковано 27 тра 2024
- Most tools that generate test data do so iteratively. But SQL loves set based operations. With a little T-SQL know how, you can create millions of rows of test data in seconds.
You can even leverage public data sources to create more realistic data that conforms to your applications business rules.
No third party tools, just a handful of queries. I’m keeping it simple.
#sql #csharp #testing #database
Blog:
betterwithcode.com/
LinkedIn:
/ jeff-zuerlein-2aa67b7
00:00 Introduction
00:13 Why you should generate test data.
01:11 How much data do you need?
01:48 Fake Data
02:48 Sql loves set based operations.
04:05 Fill the table with fake data.
05:38 Realistic test data
06:23 Populating a parent & child tables.
07:39 Leveraging modulus
08:38 Joining with a case statement.
10:45 Let SQL cook. - Наука та технологія
With privacy being a top concern, this is a great way to not fall in to the trap of using real customer data simply because "it's easy or we've always done it that way before".
Well done Jeff!
Great point! The last thing you want to do is leak sensitive data, or interact with real customers when you’re testing.
Nice content Jeff! Your explanation helped me to test a bottleneck in a LINQ code I had in the ORM of my application.
That's the best feedback I could get. Glad I was able to help you out!
You’re a legend!! Your content is priceless! Thanks 🤘🏻
That warms my heart. Glad I can help!
Appreciate you sharing this. Thanks. More of this, please?
Well I appreciate the feedback, and yes I will be making more!
Thanks! I have leant a lot
Glad to hear it!
Great video - very useful for testing databases.
As far as I can tell you don’t really utilize to Qty column is you script and the created data just get equally distributed. Do you have a sample of you script where you show the use of Qty?
If anyone is interested in learning more about generating test data with a distribution of values, like this comment. That way I know how much interest there is in it.
It works better with a small number of discrete values, like states. If 10% of the population lived in CA, you could create a new seed table with 1000 rows, and 100 would have the value CA. If WY had 1% of the population, it would get 10 rows. Then you can join the new seed table to the numbers table and generate values. If there are lots of discrete values like names of people....The volume of data gets too big, and it slows down.