As an extension to this tutorial would be really cool to see a way of hosting this sqlite3 database instance in a docker instance w/redis or something. Nevertheless excellent video, super practical. Love your content.
At no point do you close any of the connections. Does this mean you have a load of open connections in the background or is there some sort of (out of scope?) clean-up?
Sure, you don’t need to worry about closing the connection, just the transactions with execute(). There’s generally no issue leaving it to close itself
Hello, great content.I wanted to ask a question that i have looked in Google so long couldn't find answer. Is it possible to use selium in headless mode and then when a get 200 response open that same request in gui mode? I would appreciate if you respond to this.
I'm sure you could - open it in headless, get a 200, close the browser and then reopen with headless=False. I've not tried it but I believe it would work
hey that was wonderful , could u make a video on when amazon blocks from scrapping or shows captcha . because the way you explains and teaches the things are really really easy to copup with . Thanks
John I think mongodb is better than sqlite for crawling multiple spiders. In sqlite we have to write more codes and gives unnecessary errors related to pipelines and database connection. For complex and large project mongodb is better.
sure, SQLite has its downsides. I like mongo and have used it in some of my personal projects. The point I wanted to make was that if you aren’t familiar with databases then use SQLite now and start getting used to it. Good points though
Amazing video! Could you do a video on web scraping using a VPN network as a proxy? For example, using Protonvpn or Nordvpn for scraping data from amazon? Thanks!
I need your advice regarding SQL... Is there any advantage to learning it, if I am pretty comfortable with pandas? So basically the question is, what do I get out of learning SQL ... what are its advantages over pandas? Something that pushed me to learn SQL (I keep saying SQL because I see a lot of python programmers say SQLite, I wonder how it is different from SQL?!) ... the thing that pushed me to want to learn it was seeing pandas not able to deal with large data frames, so how is SQL doing int that regard? pandas is great, but once the pickle file exceeds 2 GB, handling data becomes extremely difficult.. is it the same with SQL?
The background to my question is that I once scraped a website, and the output was huge (around 15 GB) ... so I had to devise a plan to scrape and save the output to around 30 smaller JSON files, so I can process the data in pandas.
You should probably use something well established for that kind of data. Try SQLite first as you can see it's quite easy. Probably You already have a solution for it after 10months would you mind sharing your experience?
Hello John thank you for amazing, really amazing video again. You lessons are awesome and it looks so easy. 😊😊😊👏👏👏 P. S. Please, Please, make a course on Udemy about all the Python stuff and I buy it immediately 😎😎
@@JohnWatsonRooney all your UA-cam videos have a big value for everyone who will learn Python language. And your style to teach is really pleasant. 👍👍👍👏👏👏 Thank you
so I did this using pycharm and jupyter and neither didn't created my .db database, and it created an error alert stating that the name I gave my .db; NameError: (name) is not defined. Did you define your example.db off video?
Please work on your speaking style. I often come on your channel and try to learn something from you. But the way you speak destroys the learning passion. 😢
How does this compare to storing the data in a pandas dataframe and then exporting the pandas dataframe to sql format? Is this just a better way to save the scrapped data into a db in real time, line by line? I need to get better at extract transform load processes for my scrapers that I want to run consistently to build a time series picture. Would be interested to see some videos on simple ways to set up a scraper to run say once per week, and push data to a sql database on aws, that can then be queried via a graphql api using something like hasura.io, and then how to monitize that dataset on rapid api or a make your own site for it with something like blobr.io.
Yes skip pandas if you end goal is just storing the data. Put it all into a database like this and then pull out the bits too want to analyse into a pandas data frame. I’ve done some videos on cronjobs before but I do prefer the project approach - I’m working on a series now that takes scraped data and saves it to a database, I could adapt it to run each week and then work on a front end to display it
@@JohnWatsonRooney This is exactly what I would like to see. Please consider doing video about how to automatically (periodically) run scraper + save to database + show in front end. Many thanks.
Keep getting this error, can anyone help? Traceback (most recent call last): File "/Users/noah/Documents/Database/main.py", line 7, in cur.execute('''CREATE TABLE IF NOT EXISTS patientnameandage sqlite3.OperationalError: near "in": syntax error
Very nice overview of creating and viewing a database.
Thank you, John!
You're doing an amazing job by sharing your knowledge with others in such a great form.
Wish you all the best!
Thank you!
An absolute pillar of society.
Really helpful and great stuff. Thanks John.
SQLite is very easy as compared to other databases. A short but very useful video.👌👍
Another great one John!
Thank you for this! Keep up the great work.
Thank you lots more coming!
wow this vid is so clean
great practical stuff !
would be cool to see this as part of a scrapy project e.g. in a pipeline
That video is done and will be released in a week or so!
@@JohnWatsonRooney Nice! thanks for all your hard work, your channel is amazing!
Thanks for this!
Thanks
Great work.
Thank you for the videos. I've learnt a lot from you. Can you please make a video about handling captchas without using selenium?
As an extension to this tutorial would be really cool to see a way of hosting this sqlite3 database instance in a docker instance w/redis or something. Nevertheless excellent video, super practical. Love your content.
At no point do you close any of the connections. Does this mean you have a load of open connections in the background or is there some sort of (out of scope?) clean-up?
Sure, you don’t need to worry about closing the connection, just the transactions with execute(). There’s generally no issue leaving it to close itself
Hello, great content.I wanted to ask a question that i have looked in Google so long couldn't find answer. Is it possible to use selium in headless mode and then when a get 200 response open that same request in gui mode? I would appreciate if you respond to this.
I'm sure you could - open it in headless, get a 200, close the browser and then reopen with headless=False. I've not tried it but I believe it would work
hey that was wonderful , could u make a video on when amazon blocks from scrapping or shows captcha . because the way you explains and teaches the things are really really easy to copup with . Thanks
Could you please make some video about using microsoft graph API to access outlook or sharepoint with some steps to Register the app
Thank you very much.
Hi, Can you guide us how to create new sheet in workbook everyday for data update.
Thanks John! Really helpful again.
A while back you mentioned a video for deploying to Heroku, is that still in works?
It is!
John I think mongodb is better than sqlite for crawling multiple spiders. In sqlite we have to write more codes and gives unnecessary errors related to pipelines and database connection. For complex and large project mongodb is better.
sure, SQLite has its downsides. I like mongo and have used it in some of my personal projects. The point I wanted to make was that if you aren’t familiar with databases then use SQLite now and start getting used to it.
Good points though
late comment but how about checking if data is found ?? i could loop through but depending on data that'd be slow
The best of the best
What if the "price real " changes the value ? How would you update the entry than ?
Hello John.Thanks for your job.Can you make video how to send email once a day automatclly?
Yes I can - i will add it to my notes!
Great video.
Can you talk a bit about using sql "create table as" in python?
lets go !!! first comment. keep up the work john
Thanks!
Cool thanks
Amazing video! Could you do a video on web scraping using a VPN network as a proxy? For example, using Protonvpn or Nordvpn for scraping data from amazon? Thanks!
Great suggestion! I'll add it to my video notes!
Hi John,
Could you please share your VSC settings which you were usingin previous videos?
The theme looks pretty good and terminal also!
HI - The VS Code is Gruvbox Material, and the terminal is ZSH installed into WSL2, I dont remember the specific settings though sorry!
Hii sir,
It cant shoe the datbase on mysql
Great Video once again. Can you please finish the fastapi video? Thanks again
Sure - it’s on my list, I’ve got a lot going on but will get there
I need your advice regarding SQL... Is there any advantage to learning it, if I am pretty comfortable with pandas? So basically the question is, what do I get out of learning SQL ... what are its advantages over pandas? Something that pushed me to learn SQL (I keep saying SQL because I see a lot of python programmers say SQLite, I wonder how it is different from SQL?!) ... the thing that pushed me to want to learn it was seeing pandas not able to deal with large data frames, so how is SQL doing int that regard? pandas is great, but once the pickle file exceeds 2 GB, handling data becomes extremely difficult.. is it the same with SQL?
The background to my question is that I once scraped a website, and the output was huge (around 15 GB) ... so I had to devise a plan to scrape and save the output to around 30 smaller JSON files, so I can process the data in pandas.
You should probably use something well established for that kind of data. Try SQLite first as you can see it's quite easy. Probably You already have a solution for it after 10months would you mind sharing your experience?
it would be great if you go from scraped API json to NOSQL without duplicates, deduping so to speak
Sure, I’m working on mongodb videos for next month
Can you scrap Udemy website ?
Hello John thank you for amazing, really amazing video again.
You lessons are awesome and it looks so easy. 😊😊😊👏👏👏
P. S. Please, Please, make a course on Udemy about all the Python stuff and I buy it immediately 😎😎
thanks! I'd love make a course, but I would want it to be worth it to people. I do have a rough plan down
@@JohnWatsonRooney all your UA-cam videos have a big value for everyone who will learn Python language. And your style to teach is really pleasant. 👍👍👍👏👏👏
Thank you
❤
so I did this using pycharm and jupyter and neither didn't created my .db database, and it created an error alert stating that the name I gave my .db; NameError: (name) is not defined. Did you define your example.db off video?
Moving too fast
Please work on your speaking style.
I often come on your channel and try to learn something from you. But the way you speak destroys the learning passion. 😢
How does this compare to storing the data in a pandas dataframe and then exporting the pandas dataframe to sql format?
Is this just a better way to save the scrapped data into a db in real time, line by line?
I need to get better at extract transform load processes for my scrapers that I want to run consistently to build a time series picture. Would be interested to see some videos on simple ways to set up a scraper to run say once per week, and push data to a sql database on aws, that can then be queried via a graphql api using something like hasura.io, and then how to monitize that dataset on rapid api or a make your own site for it with something like blobr.io.
Yes skip pandas if you end goal is just storing the data. Put it all into a database like this and then pull out the bits too want to analyse into a pandas data frame.
I’ve done some videos on cronjobs before but I do prefer the project approach - I’m working on a series now that takes scraped data and saves it to a database, I could adapt it to run each week and then work on a front end to display it
@@JohnWatsonRooney This is exactly what I would like to see. Please consider doing video about how to automatically (periodically) run scraper + save to database + show in front end. Many thanks.
Keep getting this error, can anyone help? Traceback (most recent call last):
File "/Users/noah/Documents/Database/main.py", line 7, in
cur.execute('''CREATE TABLE IF NOT EXISTS patientnameandage
sqlite3.OperationalError: near "in": syntax error