I loaded 100,000,000 rows into MySQL (fast)

Поділитися
Вставка
  • Опубліковано 29 січ 2024
  • Watch me wrangle over 100,000,000 rows of data and load it into MySQL.
    To learn more about PlanetScale, head to planetscale.com/youtube!
    💬 Follow PlanetScale on social media
    • Twitter: / planetscale
    • Discord: / discord
    • TikTok: / planetscale
    • Twitch: / planetscale
    • LinkedIn: / planetscale
  • Наука та технологія

КОМЕНТАРІ • 162

  • @bozgi
    @bozgi 4 місяці тому +89

    Didn't even realise that this is a company account. Loved that video

  • @tcarreira
    @tcarreira 4 місяці тому +167

    you are doing it right PlanetScale. Promoting your brand on the background while you give real value to people watching this videos. Congrats.
    Even letting different people is well done (I got used to Aaron though :D)

    • @PlanetScale
      @PlanetScale  4 місяці тому +18

      We could all use some more Aaron.

    • @TrondHus
      @TrondHus 3 місяці тому

      Every company should have an Aaron,@@PlanetScale

  • @69k_gold
    @69k_gold 3 місяці тому +21

    Using the "teach a man how to fish" rule in product marketing is one hell of a genius idea. What a win-win!

  • @dudaseifert
    @dudaseifert 4 місяці тому +170

    PlanetScale is LITERALLY the only company i'm subbed to on youtube. your content is THAT good

    • @darshanrander1950
      @darshanrander1950 4 місяці тому

      So true!!

    • @electrolyteorb
      @electrolyteorb 3 місяці тому +1

      But not the platform itself

    • @dudaseifert
      @dudaseifert 3 місяці тому

      @@electrolyteorb what do you mean?

    • @8starsAND
      @8starsAND 3 місяці тому

      Yeah, especially loading ready to go python packages…. Yeah, rocket science…

    • @Frank2212
      @Frank2212 2 місяці тому

      Time to unsubscribe. They laid him off. He's the only reason the videos are good

  • @ahmettek315
    @ahmettek315 4 місяці тому +4

    Update: I also tried this on Mac Mini M1. This time instead of using all those 17 columns, I did as you did and only imported 6 of them. For postgreSQL, it took 4 m 39 s 810 ms this time :) Almost half of your MySQL timing (and again this is in docker container).
    I knew that postgreSQL should be better. I fell in love with postgreSQL once more. Thanks.
    Ah, BTW, maybe I should be considered cheating, I use the open source postgreSQL extension Citus (but single node in a docker container - it would be insane with multiple nodes).
    (I guess planet scale is also using the Go project Vitess and should be fast on multiple nodes)

    • @PlanetScale
      @PlanetScale  4 місяці тому +3

      Good experimentation! This was loaded into vanilla MySQL, and you're right that the performance is very dependent an hardware, which DBMS you're using, etc.

  • @robbyface3707
    @robbyface3707 4 місяці тому +21

    Ben! My past lecturer at the UofA, he is an amazing teacher so I’m glad I can still learn from him 🎉

  • @HoverCatzMeow
    @HoverCatzMeow 4 місяці тому +4

    Love that you kept errors/issues in the video! Shows that even professionals makes simple mistakes :)

  • @miguel-espinoza
    @miguel-espinoza 4 місяці тому +6

    as a frequent user of that same lichess database I would really appreciate that c-program you wrote :)

  • @codewithantonio
    @codewithantonio 4 місяці тому +27

    Aaron, you look different!
    jk, always love to see new PlaneScale content! ❤

  • @regmiller536
    @regmiller536 3 місяці тому

    I love this kind of videos, they are so educational.

  • @igboanugwocollins4452
    @igboanugwocollins4452 4 місяці тому

    This is phenomenal, thank you very much for this masterpiece

  • @DoubleM55
    @DoubleM55 4 місяці тому +15

    09:23 - Am I the only one who noticed that you got error at row one, not even importing any data :)
    Anyway, awesome video :) I always get excited when I see new one uploaded :)

    • @PlanetScale
      @PlanetScale  4 місяці тому +1

      Good catch! @dloura07 pointed that out also. Using -f to continue after errors causes it to take even longer! Definitely need a better solution.

  • @Arkounay
    @Arkounay 4 місяці тому +3

    Great video, I didn't know about the "load data local infile" sql command 👍
    I would have loved a comparison with INSERT into that inserts multiples lines and once instead of one insert per games, and see if it's faster or slower than LOAD DATA LOCAL INFILE

  • @tjnangosha
    @tjnangosha 3 місяці тому

    This video is so satisfying to watch.

  • @Pici3k
    @Pici3k 3 місяці тому

    Learned a ton from this. Thanks!

  • @mirzashakirbaig1950
    @mirzashakirbaig1950 4 місяці тому

    It's really awesome.
    Recently I tried importing a CSV into MySql database ( using Laravel and excel libraries ) which only had a few million records. It was taking 5-7 minutes even after a lot of improvisations I did.
    Using multiple tools and technologies to get the data processed is the main ingredient here. Will give it a try with my import your ideas.

  • @0xtz_
    @0xtz_ 4 місяці тому

    man the best channel ever 😎, cool videos as always and ton of informations

  • @gemmaatroxxibox5322
    @gemmaatroxxibox5322 4 місяці тому

    I was thinking it must be -> Load data infile because that has saved me many a time. But the preprocessing in C has made me think I really need to learn that, or Rust or Zig. Very impressive.

  • @Richard_GIS
    @Richard_GIS 3 місяці тому +1

    8:42 thats totally understandable why it takes for ever, each ';' is a commit and this takes processing power. Best would be a bulk load of the data. But the direct loading is great too

  • @avramcs
    @avramcs 4 місяці тому +1

    what a suprise lol, great vid

  • @G5STU
    @G5STU 4 місяці тому +1

    Great vid, loving the C, always use the right tool for the job (not what's cool) 👍

  • @thejonte
    @thejonte 4 місяці тому +10

    Please use the torrent next time. That'll save them bandwidth and probably also be faster.

    • @I_SEE_RED
      @I_SEE_RED 4 місяці тому +1

      🤓

    • @thejonte
      @thejonte 3 місяці тому

      Thanks for calling me smart. Y'know the nerds are the ones successful in this field@@I_SEE_RED

  • @atharqadri3738
    @atharqadri3738 3 місяці тому +1

    I have used SQL Loader in one of the pipelines I developed for the company I work for. That process time was reduced from 30 minutes to 2 minutes.

  • @sounakmondal3342
    @sounakmondal3342 3 місяці тому +1

    Would really love a playlist dedicated to performance tips

  • @IvanRandomDude
    @IvanRandomDude 3 місяці тому +18

    100k inserts in 1.7s. Yep, your app with 5 active users and 1000 inserts per day definitely needs Cassandra.

    • @llothar68
      @llothar68 3 місяці тому +7

      But what happen when we go viral and become as rich and huge as Google? We have to be prepared for this. More Microservices, more distributed systems please and we need a registration server for the 3 users that logged in yesterday -

    • @IvanRandomDude
      @IvanRandomDude 3 місяці тому +6

      @@llothar68 We need separate instance for each user, to make sure it scales.

    • @indramal
      @indramal 3 місяці тому

      What do you expect to tell us? " your app with 5 active users and 1000 inserts per day definitely needs Cassandra."
      Concurrent connection is slow in this MySQL database?
      5 x 1000 = 5,000 raws per day, so why do we need cassandra DB for this small insets?

    • @llothar68
      @llothar68 3 місяці тому +5

      @@indramal You did not get the joke. Or the terrible situation the web development world is in at the moment.

    • @AlexeyKlimko
      @AlexeyKlimko 3 місяці тому

      The thing is that it failed after 21000 rows…

  • @0xmassive526
    @0xmassive526 4 місяці тому

    Video so good i didnt even need to watch this but still did. 😅

  • @alexrusin
    @alexrusin 4 місяці тому +2

    Great video, as always. Thank you. I got a question: why did you use ec2 instance to load data to Planetscale? Couldn't you do it directly from your local Mac?

    • @PlanetScale
      @PlanetScale  4 місяці тому +4

      Good question. Mostly for the sake of improved network performance (which can be a bottleneck in instances like this). Home internet often has very slow upload speeds :)

  • @scottstroz
    @scottstroz 4 місяці тому +1

    I am curious to know if there would be any performance improvements in the dump and load process if you used multi-threaded dump and load methods of MySQL Shell instead of mysqldump.

  • @reed6514
    @reed6514 4 місяці тому

    I like that you do everything in the terminal

  • @MrPudgyChicken
    @MrPudgyChicken 4 місяці тому +15

    I would be curious to see the performance difference between multiple independent INSERT statements and one INSERT statement with many values. In my experience, it is much, much faster, though I have never tried importing a dataset quite this large.

    • @reed6514
      @reed6514 4 місяці тому

      Same experience here.
      On another note, server-side, I've had to chunk my inserts into a thousand or so rows at a time bc my production db is a diff server than my web server, and it only lets me send so much data at once.
      I also wonder how a transaction would affect speed of many separate inserts.

    • @PlanetScale
      @PlanetScale  4 місяці тому +2

      Great idea! That's exactly what mysqldump does when exporting a db. I didn't show a performance comparison, but let me know what you find :)

    • @ToX82
      @ToX82 4 місяці тому +1

      I did that for an import tool that inserts ~50.000 rows each time. I wrote that tool a few years ago so the numbers could be slightly different, but according to my benchmarks (and my memory), using one insert per 800-1000 records is the fastest option, and it takes about 5-6 seconds. Using one insert per line takes ~30 seconds. More than 800-1000 records per insert, again turned out to be slower.

  • @UTUBMRBUZZ
    @UTUBMRBUZZ 4 місяці тому +1

    I had the same problem but with a 300 hundred million rows. The real pain was hidden characters in the data and comma at the wrong places. So too import that data into the database did take over 6 hours. So i had to run the script over night. Too optimize it we did compare the changes with the first import so the second and further imports did go in 1-5 minutes.

  • @Mcs1v
    @Mcs1v 3 місяці тому +1

    "Load data" isn't faster if you write your loader properly, so you just won a running competition against a disabled opponent

  • @NuncNuncNuncNunc
    @NuncNuncNuncNunc 4 місяці тому

    In general, should you run multiple tests with different number of rows in order to measure any startup time? Should the first test be using MySQL's built in LOAD DATA INFILE, i.e. baseline should be the zero-effort case?

  • @Tjkrusinski
    @Tjkrusinski 4 місяці тому +5

    Very good. My brutal feedback is Aaron sometimes has too many opinions sprinkled into his videos where this fella has almost none. A middle ground would be like wow chefs kiss.

  • @Klinoklaz
    @Klinoklaz 4 місяці тому +1

    thought you were to use awk to process that huge file. i sometimes use it to parse gigantic xml files, it's much easier to write than C. feel like it shouldn't be too slow in your usecase either

  • @ignauy
    @ignauy 3 місяці тому

    Loved the video. I just have a question, couldn't you just have established a connection to planetscale even before the dump?
    When you run `mylsql -u root chess < inserts.sql`, couldn't you have directly imported that into planetscale like you did import the dump in the ec2 instance?

    • @PlanetScale
      @PlanetScale  3 місяці тому

      Yes! A connection to the PlanetScale DB could have been established directly form my computer instead of from the EC2 instance. The advantage of the EC2 instance is that, if it is in the same region as the database, it can have better network bandwidth to the server.

  • @adityanandula
    @adityanandula 3 місяці тому

    This is interesting! When I tried the same amount of records from one SQL instance to another in AWS via network. It took me 6 hours. I've doubt what about the BinLog data ? Can we apply the same method with the backup SQL data chunks ?

  • @AndrzejPauli
    @AndrzejPauli 4 місяці тому +1

    Wher's "Seee yyaa" ??? ... ?? 😀
    Naa, just kidding. Great content!

  • @matrizaev
    @matrizaev 4 місяці тому +1

    I would rather make that first C converter load the data into MySQL. This way you could use your PGN file as a SoT and load it into both your local MySQL and PlanetScale.

  • @user-ql1rg9mj9d
    @user-ql1rg9mj9d 3 місяці тому

    good topic

  • @danielgangl6560
    @danielgangl6560 4 місяці тому +2

    How long should it take to load data into Planetscale? My application currently takes ~1 second for 1000 (1 thousand) in a single createMany Prisma statement.
    (1000 entries into 1 table)
    Adding more or less seems to scale linearly.
    That seems really slow to me. Wondering if there are some benchmark for that to compare to see if I am doing something wrong or if it's normal.

    • @PlanetScale
      @PlanetScale  4 місяці тому +1

      Depends on a bunch of factors including the PlanetScale plan you are on, network connectivity / proximity to your DB, how Prisma's createMany function is implemented, etc. What's your setup?

    • @danielgangl6560
      @danielgangl6560 3 місяці тому

      @@PlanetScale I'm only interested in the query latency. So connectivity and all that should be outside of that. My createMany simply gets an array with 1000 objects, each with 10-15 column values. Plan is Scaler. (but does it make a difference in this case with e.g. free? from the pricing page it only looks like the upper quantity would later make an issue but otherwise I don't see what would affect it)

  • @gefyaqiilah6678
    @gefyaqiilah6678 3 місяці тому

    wow, thank you for sharing

  • @wes337
    @wes337 4 місяці тому +1

    Great video! Would actually love to see the code for that C program you wrote to convert to CSV

    • @PlanetScale
      @PlanetScale  4 місяці тому

      The pause button is your friend here ⏸

    • @phpnotasp
      @phpnotasp 3 місяці тому

      @@PlanetScaleSarcastic comments aside, even going frame-by-frame in your video, you cannot see all the code.

  • @permad5189
    @permad5189 3 місяці тому

    Coolio neato! Only thing that's a problem is the MySQL pronunciation, even the official mysql docs says the correct way is my s-q-l!

  • @thegrumpydeveloper
    @thegrumpydeveloper 4 місяці тому +1

    Curious why the upload to ec2 rather than direct from local. Is that just to run so you can do other things and to take advantage of network? Otherwise seems similar to upload direct rather than upload then upload again.

    • @musashigarami
      @musashigarami 3 місяці тому +1

      Came to comments for this, also was wondering, as surely you'd spend time uploading first to ec2? And then upload again from there.. but the first upload to ec2 negates benefits? Really would like to know the reasoning behind it

  • @ropoxdev
    @ropoxdev 4 місяці тому +4

    Aaron where!?

  • @abimbolaobadare6691
    @abimbolaobadare6691 3 місяці тому

    Bro this is wild, the C script got me questioning my life choices. Can we train an AI model with this chess data?

  • @felixisaac
    @felixisaac 3 місяці тому

    Just curious, how did you send the dump file to your AWS instance? Is it through FTP?

    • @PlanetScale
      @PlanetScale  3 місяці тому

      Transferred via scp!

    • @felixisaac
      @felixisaac 3 місяці тому

      Thank you! @@PlanetScale

  • @evko9264
    @evko9264 3 місяці тому

    what should we do if we have to use python, s3 buckets and aws glue just to make pretty the same: convert some txt file to csv? The problem is that we need it FAST)

  • @jeetchheda8916
    @jeetchheda8916 3 місяці тому

    please tell me this load script can be developed in PostgreSQL

  • @vishwanathbondugula4593
    @vishwanathbondugula4593 3 місяці тому

    Can you share the C code you have written to parse pgn to CSV ?

  • @gfeher123
    @gfeher123 3 місяці тому

    Finished watching when you inserted records one by one

  • @bowenwang
    @bowenwang 3 місяці тому

    Can you provide a link to your C code that converts pgn to csv

  • @DavisTibbz
    @DavisTibbz 4 місяці тому

    Wow impressive. Came here from Google News, this video was featured 😅

  • @equus6460
    @equus6460 4 місяці тому

    was able to successfully load 100 million record in SQLite3 database its about 20 GB. Would be interested to know how big was the 100 million MYSQL database.

    • @PlanetScale
      @PlanetScale  4 місяці тому +1

      The chess DB in MySQL was around 10.7 gigs after running the tests and importing the big data set. The dump was ~13.

    • @equus6460
      @equus6460 4 місяці тому

      ​@@PlanetScale Was planning to scale to 1 billion rows but had issues with my SSD space. Guess MySQL would scale upto 1 billion rows..
      Was impressed with SQLite3 ability to scale up with ease to 100million records.
      Administration overhead for SQLite3 is much less compared to other databases such as Postgress , MYSQL and Oracle that's my opinion.

  • @abdelkadermh
    @abdelkadermh 3 місяці тому

    Can you please make a video of using ARK or SED to transform the data and show how fast it can be?

  • @bilalthepunjabi
    @bilalthepunjabi 3 місяці тому

    I automate all these process of downloading, extracting, schema building and migrating into database in a Laravel project.
    I initially choose python for this task, but it is slower which I amazed too so php with short lifespan and chunking with recursive calling make all this happen with realtime updates over socket...!

    • @PlanetScale
      @PlanetScale  3 місяці тому

      Cool!

    • @datamini01
      @datamini01 3 місяці тому

      Hello bilalthepunjabi,
      Could you please share the source code you implemented with PHP (Laravel)? I would greatly appreciate it. Currently, I am also exploring some solutions to accomplish this task with PHP, but I think it may not be as fast as Python.

  • @OscarFromIndia
    @OscarFromIndia 3 місяці тому

    How can someone learn all the stuff that your doing here?? Can you share some concepts to get started?

  • @Mesinko6
    @Mesinko6 4 місяці тому +1

    Maybe using batch insert would taken less time

  • @samislam2746
    @samislam2746 3 місяці тому

    Next video: Do this in mongoDB

  • @tigranrostomyan9231
    @tigranrostomyan9231 4 місяці тому +2

    Did you just loaded 200+gb file in 1 second with vim..? (2:16)

    • @PlanetScale
      @PlanetScale  4 місяці тому +1

      In vim, you can control-c to cut loading short. The more you know!

  • @Wald246
    @Wald246 3 місяці тому

    15:59 I'm always baffled that MySQL does not come with an option to import and export data using a binary file format and instead forces the use of a plaintext SQL file. Seems so inefficient

  • @CaboLabsHealthInformatics
    @CaboLabsHealthInformatics 3 місяці тому

    Did you really decompress and stdout the whole 200GB to the terminal?

    • @sudeshryan8707
      @sudeshryan8707 3 місяці тому

      i was more impressed vim was able to open 200gb file in no time.. while im struggling with 300mg text files in Notepad++ 🤣🤣

    • @CaboLabsHealthInformatics
      @CaboLabsHealthInformatics 3 місяці тому

      @@sudeshryan8707 please avoid to stdout the whole thing :)

  • @NaourassDerouichi
    @NaourassDerouichi 4 місяці тому +1

    Wait, have you just loaded a 200 gig text file into the editor without any fire ?!

    • @Shuroii
      @Shuroii 4 місяці тому +2

      The art of vim chunked loading

  • @sudeshryan8707
    @sudeshryan8707 3 місяці тому

    i was more impressed vim was able to open 200gb file in no time 😅😅

  • @RandomGeometryDashStuff
    @RandomGeometryDashStuff 4 місяці тому

    07:38 where are mystery numbers (64, 256, 32) coming from?

    • @lian1238
      @lian1238 4 місяці тому

      I think the numbers could be higher in a real application

    • @tcarreira
      @tcarreira 4 місяці тому

      When creating a MySQL table and using VARCHAR data types with specific sizes like 32, 64, or 256, you are essentially defining the maximum length of the character data that can be stored in that column.
      The specific numbers like 32, 64, and 256 are often used in VARCHAR column definitions for several reasons, but they are not rigidly fixed; you can choose other sizes based on your specific data requirements.
      Here are some reasons behind the common use of these numbers:
      - Powers of 2: 32, 64, and 256 are powers of 2 (2^5, 2^6, and 2^8), which aligns well with the binary storage and memory management systems used in computer hardware. This alignment can make data manipulation and storage more efficient in some cases.
      - Common Data Sizes: These sizes often align with common data sizes in various contexts. For example, 32 characters might be chosen for things like usernames, which are often limited in length. 64 and 256 characters cover a broad range of data sizes, accommodating longer strings or descriptions while still keeping data within reasonable limits.
      - Historical Precedence: These sizes have been used in database design for a long time, and they have become somewhat standard. Database administrators and developers are accustomed to seeing these sizes in schemas.
      - Practicality: They are practical sizes that work well for many types of data. Using numbers like 32, 64, and 256 simplifies decisions about column sizes because they are easy to work with and understand.
      That said, there's no requirement to use these specific sizes. Your choice should always be based on your specific data needs. If your data requirements don't align with these sizes, it's perfectly fine to choose different sizes that better suit your use case.

    • @PlanetScale
      @PlanetScale  4 місяці тому +1

      Just "reasonable" values chosen for this example. For a production application, would want to validate that those are good values more thoroughly.

  • @rombael8152
    @rombael8152 3 місяці тому

    Hardware? Intel Pentium III 500Mhz?

  • @amranimohamad4741
    @amranimohamad4741 3 місяці тому

    what about mojo isn't it making py for effeciant

  • @vardmans
    @vardmans 4 місяці тому

    Would you mind to share that C script?

  • @AlexeyKlimko
    @AlexeyKlimko 3 місяці тому

    9:23 query failed, 100000 rows were not inserted. Read all of the output.

  • @dimitar.bogdanov
    @dimitar.bogdanov 4 місяці тому +2

    Question - why not torrent when downloading the set?

  • @januzi2
    @januzi2 3 місяці тому +1

    You don't have to use select count(*). You could check the schema for the next autoinc or just ask for the id from the last record: select id from chessgame order by id desc limit 1; It will use the index and will be a lot faster than the whole table scan.
    It will return the correct number under the conditions: 1. the table has the autoinc column, 2. nothing was deleted, 3. the query doesn't contain any "where".

  • @towhidulislam6192
    @towhidulislam6192 4 місяці тому +1

    where's AARON?

  • @TheMainManNorthOfEquator
    @TheMainManNorthOfEquator 3 місяці тому

    Will try with Postgres 16

  • @bitwise2832
    @bitwise2832 4 місяці тому +1

    FIELDS are not FILEDS.
    To run meaningful queries you need some more indices.
    id alone is limited.

  • @syntaxerorr
    @syntaxerorr 3 місяці тому

    Did vim just open a 214 gigabyte file in seconds? I think I've opened a 12 gig file before, not 214!

    • @PlanetScale
      @PlanetScale  3 місяці тому +1

      Tip: You can CTRL-C as vim is in the process of loading a file to cut it short, allowing you to view part of the file!

  • @ahmettek315
    @ahmettek315 4 місяці тому +2

    Well that was fast for 100 million rows, but you didn't tell us about the hardware you are using. From video I can only tell that you are using a Mac, I guess it has M series CPU and a decent drive. Likely faster than my AMD Ryzen 9 3900x, 32 Gb RAM, Corsair MP600 1 TB NVME.
    I did what you did, converted that pgn to CSV (using Go, all 17 columns), and imported all 17 columns to postgreSQL. 100,023,791 rows affected in 8 m 27 s 347 ms. This was done in a docker container, thus I think it made it a little bit slower.
    Interestingly you wait over 23 seconds just for a count(*)???? That was really slow. It took 3.7 seconds for me on postgreSQL. I wonder what would happen if I do this on my Mac mini M1. I named my table "speedo", this query:
    select count(*) from Speedo where Termination = 'Time forfeit';
    returns 31,471,426 in 4.7 seconds. Can you also time this on that mySQL, just curious.

  • @spacetabs
    @spacetabs 4 місяці тому +3

    True definition of BigData

    • @DoubleM55
      @DoubleM55 4 місяці тому +8

      Not even close :)

    • @IvanRandomDude
      @IvanRandomDude 3 місяці тому

      30 gigs is nothing in big data terms.

  • @KiamatChange-ke8nd
    @KiamatChange-ke8nd 3 місяці тому

    200 GB text data? WTH is that? 😳

  • @teenigma_
    @teenigma_ 4 місяці тому +4

    I see that while_elo 😉

  • @chiebidoluchinaemerem5860
    @chiebidoluchinaemerem5860 3 місяці тому

    Man, Python is really slo............w

  • @benjaminshtark5977
    @benjaminshtark5977 4 місяці тому

    its all good, the only part not usable by most people is writing that C code to convert to CSV :)
    most people are not that good in classical C programming...

  • @vahanmkrtchyan2504
    @vahanmkrtchyan2504 3 місяці тому

    Bruh we have terabytes in mysql . And all good just don't do aggregations use warehouse for that.

  • @LewisCowles
    @LewisCowles 4 місяці тому +1

    Where is Aaron?

  • @dloura07
    @dloura07 4 місяці тому +2

    Did the multiple insert statemets actually work at 9:22 or did it take 1.7s in total because it failed, so it would actually take longer if it worked?

    • @lian1238
      @lian1238 4 місяці тому

      Yes. Seems like the error cut the operation short

    • @PlanetScale
      @PlanetScale  4 місяці тому +1

      Nice catch. You're right that the error cut it short. Using -f to continue after errors causes it to take even longer! All the more reason the use LOAD DATA LOCAL INFILE 😄

  • @mbe2916
    @mbe2916 3 місяці тому

    Python can be fast if you stay in the c world.

  • @MrGilbert1995
    @MrGilbert1995 3 місяці тому

    How make query run faster 22 sec for a count is long :P

  • @NedumEze
    @NedumEze Місяць тому

    Surely, Sir, you're not running the same mysql version as the rest of us who have PHPMYADMIN's integrated version? Or are you?
    If you are, how is it that you can have 100M rows in one Table and I can't have even half a million in my Version in PHPMYADMIN?

  • @hqcart1
    @hqcart1 3 місяці тому

    dude, you could have asked chatgpt and its done

  • @kodosensei
    @kodosensei 3 місяці тому

    👀

  • @patlecat
    @patlecat 3 місяці тому

    Dude you had all the data loaded into that C program. It would have been super easy and fast to then write it into MySQL directly from there! Why you go all those complicated detours and persist to use Python? LOL

  • @robertgarrison1738
    @robertgarrison1738 3 місяці тому

    while_elo instead of white_elo lmao

  • @alquinn8576
    @alquinn8576 3 місяці тому

    y use mysql when you could have loaded it into 100 LINKED ACCESS DATABASES?

  • @peacekeeper7968
    @peacekeeper7968 4 місяці тому +1

    You discovered that C is faster than python hahahahahahhahahahhahah

  • @asdfaf343r4
    @asdfaf343r4 3 місяці тому

    skill issues

  • @HeenDee
    @HeenDee 3 місяці тому

    please..pgn to csv converter c source....... python very slow....ㅜ_ㅜ

  • @IamAbhiMishra
    @IamAbhiMishra 3 місяці тому

    Everything is fine, but why have you stopped your free tier for certain countries including India?

  • @privacyvalued4134
    @privacyvalued4134 4 місяці тому +2

    Doing 10,000 separate insert prepare/execute SQL statements without surrounding them with transaction calls = extremely bad performance. It's not even close to a fair test. Not that Python is helping matters. MySQL can also insert multiple rows in a single statement, which you also weren't taking advantage of. So, yeah, of course MySQL's command-line client built-ins will outperform even the fastest languages when you're doing the worst possible thing performance-wise.

    • @Comeyd
      @Comeyd 4 місяці тому

      He never made any queries to MySQL from Python.
      Python was only used to generate the queries.
      Yes, he should have shown using multiple values instead of separate insert statements. I would argue though… thats effectively what he did with the last mysqldump.
      Loaded it directly from the CSV and then dumped it out.
      You also don’t need to use C to turn the original data into a CSV either. There are plenty of lands that are faster than Python that can do that.
      I was almost hoping to see something that used Awk!
      With enough shell-fu you could build an actual “pipeline” (literally, just chain the commands together! That’s what they’re there for!) that goes straight from the PGN file to MySQL with no intermediate files in between…

  • @zrny
    @zrny 3 місяці тому

    my"SEQUEL", im not watching this LOL :D

  • @mimbaradinugraha
    @mimbaradinugraha 3 місяці тому

    subscribed because he is using vim