Easy Web Scraping in Power BI
Вставка
- Опубліковано 1 сер 2024
- Using Power BI for web scraping is super easy! Watch this video to find out how to do it for yourself.
Enroll in my introductory or advanced Power BI courses:
training.bielite.com/
Elite Power BI Consulting:
bielite.com/
Data Insights Tools:
www.impktful.com/
Link to Callum Green's blog post:
blogs.adatis.co.uk/callumgreen... - Наука та технологія
Excellent to-the-point video on how to connect to web pages with Power BI and import data easily! Thanks!
Thanks for watching, Clea!
Fantastic. Was looking to see how to scrape multiple pages and you showed a great way of doing it as a function.
Nice, glad you found it useful!
THANK YOU SO MUCH. PERFECT INSTRUCTIONS
Thanks for the clear explanation. You rule!
No problem Der!
great technique , thank you
No problem!
Thanks for the great tutorial. I just want to as is there a way to scrape a page that contains a table of URLs and keep the URL target instead of the URL text?
saludos desde Perú, gracias por la enseñanza.
Great video. Not too complicated. Complications come when you need to bypass "Accept cookies" or similar pop-up.
Great thanks!
Love the video and technique. One thing though, you could have simply deselected the "Use original column name as prefix" when expanding the columns. This way you wouldn't need to go into the Advanced Editor and manually edit the word "Scraped" out of each column header.
Oh nice. That would be pretty tedious to edit all of those columns. Thanks for the tip!
Hugely valuable!
Glad to hear it!
I really appreciate this as a Power BI developer, but I love it even more as a baseball fan. Great content!
Thanks for the wonderful video on scraping. I'm doing a similar kind of project to scrap data from bank link and pull statements for particular dates from vba.. requesting if you could show us the way to do in powerbi that'll be great..
Hi Rajkumar, I'm not sure if I would know how to do that... My first instinct would be to use a Python script and use the BeautifulSoup module to scrape the information but I don't readily know how to do that.
Awesome video!!! One question though 🤔, if I invoke the function, does the table get replaced by the new info on the website? Is there a way to create a new table every time I run the function?
Is there a way to use PowerBI to pull sales on specific sales on eBay? Id like to utilize it for sportscard sales. Thanks
Great video, thanks! Some follow up questions if that's okay:
1.) Is there a way to add more names to the "Hitters" table (in your example) after you've invoked the custom function over it?
2.) Can you add the date & time the website was scraped to the results table somehow?
3.) How do you rerun it (to get the latest data from the website)?
Cheers!
Hey Danny! All good questions. 1) you would have to add hitters in the first step. Basically have all of the hitters defined before invoking the function. 2) yes you can add a custom column as a step in the query. Or you just create a DAX calculated column and set it equal to NOW(). This date will refresh every time you refresh the dataset. 3) just click the refresh button! PowerQuery will then run all of the steps again in the order you implemented them
Hello does the stats update live? or do i have to do that manually?
Outstanding demonstration. it really helped a lot. I am using it to download excel sheets with a list of employee numbers and it works basically the same. I was just wondering if it was possible to make a for loop instead of a list. For example
for example = 100 to 200{
employee" & example & ".xlms
}
Thanks BI Elite
Hello! There isn't a for loop equivalent in PowerQuery, unfortunately. I would look into running an R script or Python script to do what you are looking for. Might be a lot easier than with PowerQuery
@@BIElite Thank you. I will try python. In the meantime I have set up a list of dates using the method you showed here. It works brilliantly but I want to future proof it. Thanks for the video. I look forward to watching the next one.
Helpful and simple video. However, I am not able to schedule a refresh in the service (warning for both invoke function & hitters table). Any workarounds to deploy this and have it auto refresh?
I haven't actually run into this issue... Sorry I couldn't be of more help.
Question, I just created a script to scrape car ratings off edmunds.com by using the make and model as the parameters for the function. I ran it for about 8 cars and all of them show up correctly except one and I can't figure out why, it returns null columns. If I create a separate query for the chevrolet blazer it shows up, but not with the query. I think it's because of a "/" in the URL that isn't needed for that car but is for all the other ones. Is there any way to write a conditional function in M to deal with an anomoly?
Awesome video, now a subscriber and looking forward to your future videos! If you were doing the same exercise again would you repeat this process or would you use Power BI's new python feature? Only asking because I need to run this for a list of 170,000 inputs. Never used python before but I thought that's what people usually scrape websites with and wondering if it has some added benefits to this exact scenario you just ran?
Hi Joshua, I would most definitely use Python (probably beautifulsoup). This method is pretty slow so I would definitely go the Python route for 170,000 inputs.
You have to keep in mind that the R and Python inside of Power BI is not the same latest versions as what you can download from CRAM and elsewhere. It is way behind on features and capabilities. So, again, you would be better off in this case to find a custom coded application or process to do this for you. Just search UA-cam for scraping websites with R or Python or similar. There are plenty of examples available. :)
When you write a function what language is being used? Is is DAX?
This is Power Query, also called M.
very nice and informative video , have asked same query in power bi community , but no answer to it yet ,
I trying to scrape 200 pages from a website , with each page having 96 URL links , and then I am using the a function next to scrape the table from each of these pages , but its very slow , or it take hours and then fails , I have disabled data preview load in background but no avail .
Can you suggest any way to make it quicker
Hey inder, this method is pretty slow so I understand your pain. If you want to do this quicker but stay inside Power BI, I would recommend writing an R script to scrape the pages. If you know R, it probably won't be too hard. If not, then it's always good to learn!
For special scrapes like this you would need a custom coded application to do this. This is beyond straight forward Power BI. Tutorials on this are available in most languages like C#, R, Python, JavaScript, etc... Just search UA-cam for scraping websites in whatever language you prefer. :)
Another question is, would this work only on web pages where data is presented in a table form?
The default Power BI Web connector looks for HTML tables which use the tag, so you are correct!
Hi, what do if the tables do not show up under the web source?
Have you tried using the "Add tables by examples" button on the bottom left? If that doesn't work, I would recommend using a Python script to run BeautifulSoup to do some real web scraping.
How can I scrape yokatlas.yok.gov.tr/lisans.php?y=104112286#c1000_1. I see the table, but I cannot get the data from your instructions
Hey, Erol. That looks tough because the data looks to be in collapsible tables. Have you watched my recent video on the new web connector functionality? You may be able to get this to work for you. If you are still having trouble grabbing the data, I know that Curbal made a nice video on the web connector and how to troubleshoot when it doesn't work
Thank you for your kind and quick response. Yes I did watch your recent video on the new web connector functionality. The error I got is "NO CSS selector was found for the sample values you provided in the following column.". I asked the same question the curbal, I wait for her response as well. I appreciate your help already. Take care.
Let me know if you hear back! I'd love to know how to fix that
Yes I will. Take care