Easy Web Scraping in Power BI

Поділитися
Вставка
  • Опубліковано 1 сер 2024
  • Using Power BI for web scraping is super easy! Watch this video to find out how to do it for yourself.
    Enroll in my introductory or advanced Power BI courses:
    training.bielite.com/
    Elite Power BI Consulting:
    bielite.com/
    Data Insights Tools:
    www.impktful.com/
    Link to Callum Green's blog post:
    blogs.adatis.co.uk/callumgreen...
  • Наука та технологія

КОМЕНТАРІ • 49

  • @cleanermail8816
    @cleanermail8816 6 років тому

    Excellent to-the-point video on how to connect to web pages with Power BI and import data easily! Thanks!

    • @BIElite
      @BIElite  6 років тому

      Thanks for watching, Clea!

  • @atifmir7409
    @atifmir7409 5 років тому +1

    Fantastic. Was looking to see how to scrape multiple pages and you showed a great way of doing it as a function.

    • @BIElite
      @BIElite  5 років тому

      Nice, glad you found it useful!

  • @MrErolyucel
    @MrErolyucel 6 років тому

    THANK YOU SO MUCH. PERFECT INSTRUCTIONS

  • @derpate5039
    @derpate5039 4 роки тому +1

    Thanks for the clear explanation. You rule!

    • @BIElite
      @BIElite  4 роки тому

      No problem Der!

  • @sadyaz64
    @sadyaz64 6 років тому +3

    great technique , thank you

  • @some_one_real
    @some_one_real 4 роки тому

    Thanks for the great tutorial. I just want to as is there a way to scrape a page that contains a table of URLs and keep the URL target instead of the URL text?

  • @miltondiaz2531
    @miltondiaz2531 5 років тому

    saludos desde Perú, gracias por la enseñanza.

  • @1yyymmmddd
    @1yyymmmddd Рік тому

    Great video. Not too complicated. Complications come when you need to bypass "Accept cookies" or similar pop-up.

  • @Unbox747
    @Unbox747 2 роки тому

    Great thanks!

  • @Scott-ll2rl
    @Scott-ll2rl 6 років тому +13

    Love the video and technique. One thing though, you could have simply deselected the "Use original column name as prefix" when expanding the columns. This way you wouldn't need to go into the Advanced Editor and manually edit the word "Scraped" out of each column header.

    • @BIElite
      @BIElite  6 років тому +2

      Oh nice. That would be pretty tedious to edit all of those columns. Thanks for the tip!

  • @hussamamal
    @hussamamal 2 роки тому +1

    Hugely valuable!

    • @BIElite
      @BIElite  2 роки тому

      Glad to hear it!

  • @brianmegilligan6778
    @brianmegilligan6778 3 роки тому

    I really appreciate this as a Power BI developer, but I love it even more as a baseball fan. Great content!

  • @rajkumargerard5474
    @rajkumargerard5474 5 років тому +1

    Thanks for the wonderful video on scraping. I'm doing a similar kind of project to scrap data from bank link and pull statements for particular dates from vba.. requesting if you could show us the way to do in powerbi that'll be great..

    • @BIElite
      @BIElite  5 років тому

      Hi Rajkumar, I'm not sure if I would know how to do that... My first instinct would be to use a Python script and use the BeautifulSoup module to scrape the information but I don't readily know how to do that.

  • @juangut4531
    @juangut4531 Рік тому

    Awesome video!!! One question though 🤔, if I invoke the function, does the table get replaced by the new info on the website? Is there a way to create a new table every time I run the function?

  • @RebirthCrow
    @RebirthCrow 3 роки тому

    Is there a way to use PowerBI to pull sales on specific sales on eBay? Id like to utilize it for sportscard sales. Thanks

  • @Doubs16
    @Doubs16 5 років тому

    Great video, thanks! Some follow up questions if that's okay:
    1.) Is there a way to add more names to the "Hitters" table (in your example) after you've invoked the custom function over it?
    2.) Can you add the date & time the website was scraped to the results table somehow?
    3.) How do you rerun it (to get the latest data from the website)?
    Cheers!

    • @BIElite
      @BIElite  5 років тому

      Hey Danny! All good questions. 1) you would have to add hitters in the first step. Basically have all of the hitters defined before invoking the function. 2) yes you can add a custom column as a step in the query. Or you just create a DAX calculated column and set it equal to NOW(). This date will refresh every time you refresh the dataset. 3) just click the refresh button! PowerQuery will then run all of the steps again in the order you implemented them

  • @Oscarfrederiksen
    @Oscarfrederiksen Рік тому

    Hello does the stats update live? or do i have to do that manually?

  • @DENtvCork
    @DENtvCork 5 років тому

    Outstanding demonstration. it really helped a lot. I am using it to download excel sheets with a list of employee numbers and it works basically the same. I was just wondering if it was possible to make a for loop instead of a list. For example
    for example = 100 to 200{
    employee" & example & ".xlms
    }
    Thanks BI Elite

    • @BIElite
      @BIElite  5 років тому

      Hello! There isn't a for loop equivalent in PowerQuery, unfortunately. I would look into running an R script or Python script to do what you are looking for. Might be a lot easier than with PowerQuery

    • @DENtvCork
      @DENtvCork 5 років тому +1

      @@BIElite Thank you. I will try python. In the meantime I have set up a list of dates using the method you showed here. It works brilliantly but I want to future proof it. Thanks for the video. I look forward to watching the next one.

  • @slakha000
    @slakha000 5 років тому +1

    Helpful and simple video. However, I am not able to schedule a refresh in the service (warning for both invoke function & hitters table). Any workarounds to deploy this and have it auto refresh?

    • @BIElite
      @BIElite  5 років тому

      I haven't actually run into this issue... Sorry I couldn't be of more help.

  • @johnkaragoulis6012
    @johnkaragoulis6012 4 роки тому

    Question, I just created a script to scrape car ratings off edmunds.com by using the make and model as the parameters for the function. I ran it for about 8 cars and all of them show up correctly except one and I can't figure out why, it returns null columns. If I create a separate query for the chevrolet blazer it shows up, but not with the query. I think it's because of a "/" in the URL that isn't needed for that car but is for all the other ones. Is there any way to write a conditional function in M to deal with an anomoly?

  • @REInvestorCEO
    @REInvestorCEO 5 років тому +2

    Awesome video, now a subscriber and looking forward to your future videos! If you were doing the same exercise again would you repeat this process or would you use Power BI's new python feature? Only asking because I need to run this for a list of 170,000 inputs. Never used python before but I thought that's what people usually scrape websites with and wondering if it has some added benefits to this exact scenario you just ran?

    • @BIElite
      @BIElite  5 років тому +1

      Hi Joshua, I would most definitely use Python (probably beautifulsoup). This method is pretty slow so I would definitely go the Python route for 170,000 inputs.

    • @techknowhow4802
      @techknowhow4802 5 років тому

      You have to keep in mind that the R and Python inside of Power BI is not the same latest versions as what you can download from CRAM and elsewhere. It is way behind on features and capabilities. So, again, you would be better off in this case to find a custom coded application or process to do this for you. Just search UA-cam for scraping websites with R or Python or similar. There are plenty of examples available. :)

  • @RussianVideoPodcast
    @RussianVideoPodcast 4 роки тому +1

    When you write a function what language is being used? Is is DAX?

    • @BIElite
      @BIElite  4 роки тому

      This is Power Query, also called M.

  • @indergarg
    @indergarg 6 років тому +1

    very nice and informative video , have asked same query in power bi community , but no answer to it yet ,
    I trying to scrape 200 pages from a website , with each page having 96 URL links , and then I am using the a function next to scrape the table from each of these pages , but its very slow , or it take hours and then fails , I have disabled data preview load in background but no avail .
    Can you suggest any way to make it quicker

    • @BIElite
      @BIElite  6 років тому

      Hey inder, this method is pretty slow so I understand your pain. If you want to do this quicker but stay inside Power BI, I would recommend writing an R script to scrape the pages. If you know R, it probably won't be too hard. If not, then it's always good to learn!

    • @techknowhow4802
      @techknowhow4802 5 років тому

      For special scrapes like this you would need a custom coded application to do this. This is beyond straight forward Power BI. Tutorials on this are available in most languages like C#, R, Python, JavaScript, etc... Just search UA-cam for scraping websites in whatever language you prefer. :)

  • @RussianVideoPodcast
    @RussianVideoPodcast 4 роки тому

    Another question is, would this work only on web pages where data is presented in a table form?

    • @BIElite
      @BIElite  4 роки тому

      The default Power BI Web connector looks for HTML tables which use the tag, so you are correct!

  • @natevanwyk952
    @natevanwyk952 4 роки тому

    Hi, what do if the tables do not show up under the web source?

    • @BIElite
      @BIElite  4 роки тому

      Have you tried using the "Add tables by examples" button on the bottom left? If that doesn't work, I would recommend using a Python script to run BeautifulSoup to do some real web scraping.

  • @MrErolyucel
    @MrErolyucel 6 років тому

    How can I scrape yokatlas.yok.gov.tr/lisans.php?y=104112286#c1000_1. I see the table, but I cannot get the data from your instructions

    • @BIElite
      @BIElite  6 років тому

      Hey, Erol. That looks tough because the data looks to be in collapsible tables. Have you watched my recent video on the new web connector functionality? You may be able to get this to work for you. If you are still having trouble grabbing the data, I know that Curbal made a nice video on the web connector and how to troubleshoot when it doesn't work

    • @MrErolyucel
      @MrErolyucel 6 років тому

      Thank you for your kind and quick response. Yes I did watch your recent video on the new web connector functionality. The error I got is "NO CSS selector was found for the sample values you provided in the following column.". I asked the same question the curbal, I wait for her response as well. I appreciate your help already. Take care.

    • @BIElite
      @BIElite  6 років тому

      Let me know if you hear back! I'd love to know how to fix that

    • @MrErolyucel
      @MrErolyucel 6 років тому

      Yes I will. Take care