Document Understanding with UiPath's Intelligent OCR - Full Tutorial

Поділитися
Вставка
  • Опубліковано 18 вер 2024

КОМЕНТАРІ • 96

  • @botbotgo4902
    @botbotgo4902  4 роки тому +1

    I have update the workflow by adding the Train Classifier Scope - This would allow you to train keyword based classifiers in cases where they are unable to classify the document.

  • @nehaaggarwal8001
    @nehaaggarwal8001 2 роки тому

    thank you for this very informative video

  • @andersjensenorg
    @andersjensenorg 4 роки тому +2

    Hey Anurag, awesome work you are putting in 😊👍💪 Kind regards, Anders

  • @nobi6139
    @nobi6139 3 роки тому

    Excellent video! Very detailed. It took the complexities out of document understanding. I learnt a lot from this video. Thanks heaps!

  • @ateruel84
    @ateruel84 4 роки тому +2

    Amazing explanation, congratulations!

  • @ezrateferra8146
    @ezrateferra8146 3 роки тому +1

    Great explanations! Thanks a million

  • @chandrayeddala6673
    @chandrayeddala6673 3 роки тому

    Superb explanation, it is clear and clean explanation. Thank you.

  • @larryding7618
    @larryding7618 2 роки тому

    oh my gosh. i hope you can upload more videos.

  • @visumanelli9360
    @visumanelli9360 Рік тому

    Very Nice

  • @anushnayak6080
    @anushnayak6080 4 роки тому

    Amazing content.Looking forward for more videos which will help us.
    Thanks

  • @deathsquad383
    @deathsquad383 3 роки тому

    Great tutorial, thank you for posting

  • @MateusLyra1991
    @MateusLyra1991 4 роки тому

    Hello, awesome video. Many thanks. I am from Brazil and this was really helpfull. Looking forward for more videos.

    • @botbotgo4902
      @botbotgo4902  4 роки тому

      Thanks for the feedback Mateus Lyra. Please comment if there are some specific video you are looking for

  • @JS-zm5se
    @JS-zm5se 4 роки тому

    Excellent Explanation

  • @renukadevi1829
    @renukadevi1829 3 роки тому

    Beautyful video, please make note such videos.

  • @prashantrai5911
    @prashantrai5911 3 роки тому +1

    Where you get this end point while using machine learning extractor.that point I don't understand can u eloborate this point more

  • @omololasamson5877
    @omololasamson5877 2 роки тому

    Thanks for the good job you're doing. Please how did you get the endpoint or is it general for everyone?

  • @mscoder9902
    @mscoder9902 3 роки тому

    Thank you

  • @RaniSingh-dx9tt
    @RaniSingh-dx9tt 4 роки тому

    Very informative!!

  • @zoeyuwang6123
    @zoeyuwang6123 3 роки тому +1

    I made a process based on your video, but I reported an error in one place:
    Data Extraction Scope: Index was outside the bounds of the array.
    And I cant fix it.
    Can you help me?

    • @savagestroke4943
      @savagestroke4943 3 роки тому

      when defining the keywords, make sure that you typed correctly "invoice" , "receipt", "walmart"

  • @sassydebbie
    @sassydebbie 2 роки тому

    Hi, good day. Please I can't seem to download those packages you mentioned. Do you have any idea how I can work it out?

  • @swatikarot7486
    @swatikarot7486 4 роки тому +1

    great video. I had a question, why does the message box pop up twice with the outputDT string?

    • @botbotgo4902
      @botbotgo4902  4 роки тому

      Hello Swati!
      the dataset that is created from the export extraction result is a collection of DataTables. This collection has has two DataTables - *Simple Field* and *Simple Field Formatted* This is the reason you are getting two message boxes. To check the names of the dataTables yourself you can add a message box in the for each loop with "table.TableName"

    • @swatikarot7486
      @swatikarot7486 4 роки тому

      @@botbotgo4902 Thanks for the response. I did implement the for each loop with "table.TableName" and saw formatted and unformatted output. But if I unchecked the ‘FormatValuesIfPossible’ option in data extraction scope, then there will be duplication of data. How can I get only one set of data here?

  • @allthecommonsense
    @allthecommonsense 2 роки тому

    31:09 I don't see a "due date" on that invoice, yet you seem to have configured a custom area and edited that process out. Seems to me like a mistake.

  • @tejasvimangal2184
    @tejasvimangal2184 3 роки тому

    Thanks for a great informative video. Just had a question in mind. If we have to define a keyword.json file for document classification then what is the use of texonomy.json?

  • @MrAyubX
    @MrAyubX 3 роки тому

    Great video. What is not apparently clear for me is the documentPath variable you specified in the digitize document activity. I do not think you showed how you set that up, though I assume it is a variable that has the path of the file, correct ? If yes, alternatively we could also specify the file path directly in the Document path without creating the variable documentPath ? Thank you

    • @mohanrajs8832
      @mohanrajs8832 3 роки тому

      Yes you understanding is correct. either we can directly specify the path in document path place or we can create a variable for same and pass it into specify area.

  • @premacharles8610
    @premacharles8610 3 роки тому +1

    Anurag, can you please tell me how to extract line items from the invoice along with these details. I want to write it to excel preferably for a case when each document might have different number of line items

    • @mohanrajs8832
      @mohanrajs8832 3 роки тому

      Hi Prema, You can go with Form based extractor in order to extract the line items from table.

  • @gauravbatra10
    @gauravbatra10 8 місяців тому

    Hi Bro, I am not able to select 5 information on Page 1. I am only able to select one. Are you using shift or ctrl ki to select 5 information... I a,m working on 2 page PDF.Please suggest. I am waiting

  • @sktanaka
    @sktanaka 3 роки тому

    Great video, thanks. One question: if you are satisfied with the results, can you remove the "Present Validation Station" command so it does not prompt the user everytime ? I have dozens of invoices to be processed in an unattended machine.

    • @mohanrajs8832
      @mohanrajs8832 3 роки тому

      yes you can remove the present validation station in order to skip the human in loop.

  • @sushantshiwakoti5578
    @sushantshiwakoti5578 4 роки тому +1

    Can you do it with hand written documents,it will be helpful for everyone. Thank you

    • @botbotgo4902
      @botbotgo4902  4 роки тому

      Hello Sushant!
      For hand written documents with fixed formats (example - bank account opening form). You can use intelligent form extractor.

    • @sushantshiwakoti5578
      @sushantshiwakoti5578 4 роки тому

      @@botbotgo4902 Thank you

    • @ayahabuhantash5948
      @ayahabuhantash5948 4 роки тому

      @@botbotgo4902 is intelligent form extractor the extractor we used in this video? thanks a lot

  • @zoeyuwang6123
    @zoeyuwang6123 3 роки тому

    Could you share the hole project that I want to learn carefully.

  • @tibyanralibi
    @tibyanralibi 3 роки тому

    Hi, this is a good video. Actually I have question related to the intelligent ocr activities license. Is the activities free or must pay for the licenses. Thank you

  • @rajatdhammi
    @rajatdhammi 4 роки тому

    Hi, while setting up the form extractor, you manually specify the location of document of 2 image (choose 2.jpg) , but at last you change document path location from 2 to 3.jpg.
    If we are manually specifying the location , how the form extractor fetches correct information!!

    • @botbotgo4902
      @botbotgo4902  4 роки тому +1

      hey,
      file that you uploaded in the form extractor is just for generating a template. So no matter what document you read it will still work till the time the structure or the positions of various elements in the document remain same.
      Having said that if you try to extract data from an invoice with different structure, the extraction wont work.

    • @rajatdhammi
      @rajatdhammi 4 роки тому

      @@botbotgo4902 Ok got it
      One more query, when i tried the same with create doc validation action and wait for validation action and comment out present validation it gives me that error
      "An extension of type 'UiPath.Activities.Contracts.Persistence.IPersistenceBookmarks' must be configured in order to run this workflow."
      ( I have created the storage bucket in orchestrator)

  • @aryashrivastav6187
    @aryashrivastav6187 3 роки тому

    What if our pdf have lots of pages and lots of pdf can it extract specific data?

  • @chongyihyang309
    @chongyihyang309 4 роки тому

    Hi may I know why you used both Form Extractor and ML Extractor? And also why does the workflow produce 2 sets of the same data table? What do i do if i only need 1.

    • @patilrc
      @patilrc 4 роки тому

      Dataset is collect of Datatables, you can try Dataset.Tables(0) and check

    • @botbotgo4902
      @botbotgo4902  4 роки тому +1

      Hey Sorry for replying late!
      1. *Why i used both extractors* - I wanted to show that it is possible to combine extractors. It could happen that some attributes cannot be accessed by one of the extractors and in such case the other extractor will be used. Also the order of extractor usage is from left to right, that is, if the left most extractor is not able to get a particular attribute (or the confidence score is less than set threshold) only then the next extractor would be used. Also with configure extractor you have the possibility to decide which attributes are to be accessed by which extractor.
      2. You get a list of tables (also know as *DataSet* ) and always take the first one from the list. -> *Dataset.tables(0)*

    • @chongyihyang309
      @chongyihyang309 4 роки тому

      I see. Thanks for the help

  • @issacpaul9846
    @issacpaul9846 4 роки тому

    Hey will u do a video on regex based extraction

  • @shankota5547
    @shankota5547 4 роки тому

    Hello @botBotGo,
    That was a great explanation.currently i am able to extract a single page with specific extraction fields,so how to loop through all pages in a pdf file with similar invoices ?

    • @botbotgo4902
      @botbotgo4902  4 роки тому

      if you are using community version then u can only process documents with max 2 pages at once.
      one work around would be use some uipath pdf activities to breakdown your single pdf file into multiple pdf files and then loop through them.

  • @viralesvideos
    @viralesvideos 3 роки тому

    A question how do I do so that it no longer shows the percentage or the "validation station" screen because every time it says to select the area 96% and it always takes it well? "Present Validation Station"

    • @mohanrajs8832
      @mohanrajs8832 3 роки тому

      if you are so confident about the extraction confidence percentage then no need to use present validation activity in the flow, Directly you can check in the export result in the excel.

  • @ronak7480
    @ronak7480 4 роки тому

    hey anurag,
    thank you for the wonderful explanation.
    i have one issue with the invoice date, its not comingb proper in csv file.
    its coming like : Key,Value "Month","5".
    my actual date in pdf is : May 26/20.
    it would be great if you could help on this..

    • @botbotgo4902
      @botbotgo4902  4 роки тому

      Can you please check if the date is available in the text coming out of the digitize document activity.
      If not then it would not be possible to extract the date from any extractor. Then you might have to try with other OCR engines.
      If the date is available then you need to do some trial and error with different extractor activities.

  • @prasadparalikar954
    @prasadparalikar954 3 роки тому

    Hello sir,
    I also want to extract the items along with its specified cost in the excel file. Can i do that?
    Please help

    • @mohanrajs8832
      @mohanrajs8832 3 роки тому

      You can use form based extractor for extract the line items in the table

  • @KiranPudi
    @KiranPudi 3 роки тому

    How can we use Intelligent keyword classifer in Classify Document scope??

    • @mohanrajs8832
      @mohanrajs8832 3 роки тому

      Intelligent Keyword Classifier for handwritten documents not for unstructured documents

  • @laxmipriyapradhan1704
    @laxmipriyapradhan1704 2 роки тому

    Sir, can you please do it for pan card and aadhar jpg file ??because I have tried lot of time but didn't get and also when i have give the whole folder path it's showing error why is it so I don't know .... Please please help me to do the task where we have some folder of different candidate where each candidate have their own pan and aadhar card image from that need to extract the particular field like aadhar no.,pan no. And store in a file ... If u can store in MySQL that is very good for me but please sir can you do for whole folder to provide in the documentPath variable where each candidate have their own aadhar and pan card. Please i need it please do this.

  • @sampledemo2947
    @sampledemo2947 4 роки тому

    Hello!! Thanks for the video. This is Rohit S. Lanjewar. Please help me how I can change confidence percentage of each field of Invoices in Present Validation station using Intelligent Form extractor in Document Understanding using UiPath.

    • @botbotgo4902
      @botbotgo4902  4 роки тому +1

      Hello Sample Demo!!
      sorry for replying late. The confidence score is something that you set for a kind of extractor and if any attribute needed to be extracted by this extractor is below this score that field is not extracted. In such cases you can try using combination of extractors, where in if one extractor fails then the next extractor would be used. And if all fail then the user has to explicitly enter it.
      Did I answer your question?

  • @shalinisingh2816
    @shalinisingh2816 4 роки тому

    Page 1 has less than 5 selected words as Page Matching Information. Please select at least 5 words.
    This notification is appearing on the screen when i am creating template. It gets pop up again and again even after extracting the elements.

    • @botbotgo4902
      @botbotgo4902  4 роки тому

      Hello Shalini!
      You need to select 5 keywords on the page.
      Please watch from 30:00

    • @shalinisingh2816
      @shalinisingh2816 4 роки тому

      @@botbotgo4902 I did it in the same way. Let me recheck again if I am doing any mistake

  • @WebHNT
    @WebHNT 3 роки тому

    Can you share me a slide ? Video is interesting and helpful. Thank you !!!

  • @souravsingh4305
    @souravsingh4305 4 роки тому

    Hi Anuraag .This video is very important for RPA beginners. Thank you for this. But I was facing some issue while creating a template after a custom supply to the keyword I'm extracting after configure I can see a long red color error. That even we cannot read.

    • @botbotgo4902
      @botbotgo4902  4 роки тому +1

      Hello Sourav!
      I am sorry but I cannot understand what you mean

    • @souravsingh4305
      @souravsingh4305 4 роки тому

      @@botbotgo4902 that's cool Anuraag I could solve the error. Is this solution is applicable for images invoices also?

    • @souravsingh4305
      @souravsingh4305 4 роки тому

      @@botbotgo4902 Hello Anurag Actually I'm using your instructed workflow but, It is not extracting the values always.

    • @botbotgo4902
      @botbotgo4902  4 роки тому +1

      @@souravsingh4305 hello saurav!
      So where are you facing problems? I mean with which extractor are you working?

    • @souravsingh4305
      @souravsingh4305 4 роки тому

      @@botbotgo4902 I'm working with form extractor. Although I have invoices that includes pdfs, receipts, images , scanned pdf invoices etc of all types which extractors I should use to get the values from all types of invoices

  • @allthecommonsense
    @allthecommonsense 2 роки тому

    Also... seems like a mistake to ASSUME that classification result will only match 1 document type. You never check how many matches it got, and *assume* it's always classificationResult(0)

  • @shalinisingh2816
    @shalinisingh2816 4 роки тому

    i am not gettgin omnipage OCR in my activity panel

    • @botbotgo4902
      @botbotgo4902  4 роки тому

      Hello Shalini,
      You need to install this package before you can use it.
      To Install go to 06:37
      1. go to Manage packages in Studio
      2. click on All Packages
      3. Search for UiPath.OmniPage.Activities
      4. Install it

    • @shalinisingh2816
      @shalinisingh2816 4 роки тому

      @@botbotgo4902 Yes, It is. Thanks for your prompt response. :)

  • @aakashm.2495
    @aakashm.2495 4 роки тому

    Hey Anurag. Thanks for the video.
    How to perform this on multiple pdf at time?

    • @patilrc
      @patilrc 4 роки тому

      you can use for each loop and provide the folder path where the multiple PDF files are there

    • @botbotgo4902
      @botbotgo4902  4 роки тому

      Hey Year down!
      Sorry for replying late - I have made a video where i am solving a RPA challenge by extracting data from multiple pdfs - ua-cam.com/video/56AOiixQPKY/v-deo.html
      Let me know if this what you were looking for.

    • @aakashm.2495
      @aakashm.2495 4 роки тому

      @@botbotgo4902 data extraction scope index was outside the bounds of the array. I am facing this error

    • @aakashm.2495
      @aakashm.2495 4 роки тому

      @@patilrc data extraction scope index was outside the bounds of the array.I am facing this issue

    • @botbotgo4902
      @botbotgo4902  4 роки тому

      @Year Down this is mainly happening because Classifier is not able to classify your document. You would have to validate if the classification worked and if did not work then you need to extract data manually in present validation station. In order to check if classification worked
      1. After the classification scope activity add an *IF Activity*
      2. In The *IF Activity* check for condition if *classificationResult.Any* is True
      3. In the true section move your *data extraction scope*
      4. in the false section add an *assign activity* and assign extractionResults = Nothing

  • @ponnusamyk5258
    @ponnusamyk5258 3 роки тому

    Man link for invoice file download

  • @allthecommonsense
    @allthecommonsense 2 роки тому

    You overcomplicated the classification keywords by using "Add a new set" instead of just typing the right syntax into the first set to add multiple keywords. No need to have more than 1 set in these examples.

  • @umaramnath1961
    @umaramnath1961 3 роки тому

    Hello! I followed your tutorial. I am trying to extract data from the receipt using ML extractor. I used "du.uipath.com/ie/receipts" as the end point but I am not getting the dropdown under the ML extractor while defining the attributes of the document to be extracted. Can you please help me solve this?

  • @tejasvimangal2184
    @tejasvimangal2184 3 роки тому

    Thanks for a great informative video. Just had a question in mind. If we have to define a keyword.json file for document classification then what us the use of texonomy.json?

    • @mohanrajs8832
      @mohanrajs8832 3 роки тому

      Taxonomy for identify the fields on what needs to extracted and same is going to extracted by bot using intelligent OCR