How To Extract PDF File Table Data Using Amazon Textract and AWS Lambda Asynchronously

Поділитися
Вставка
  • Опубліковано 7 січ 2025

КОМЕНТАРІ • 60

  • @QuanNguyen-z2g
    @QuanNguyen-z2g 9 місяців тому +1

    I wish to know whether this solution would be able to extract the structured table formats embedded as [IMAGES] in PDF file . It seems a bit tricky to parse the images in PDF and require more overhead. How to adapt the Lambda code to meet this requirement? I look forward to your advise. Thank you

    • @cloudquicklabs
      @cloudquicklabs  9 місяців тому

      Thank you for watching my videos.
      It should be to support it.
      Please try from you side. Let me know if that works for you, but Indeed it's unique requirements.

    • @QuanNguyen-z2g
      @QuanNguyen-z2g 9 місяців тому

      May I have your corporate email so that I may contact with. thank you

  • @santhoshsreshta
    @santhoshsreshta 9 місяців тому +1

    can you help me parse the table data into json format, I don't see much documentation on this as it has to be fetched based on relationship id's. ideally am looking to read multipage pdf containing multiple tables and convert these tables to json (title, header, cell footer)

    • @cloudquicklabs
      @cloudquicklabs  9 місяців тому

      Thank you for watching my videos.
      Indeed it should be possible but you would need to write custom code on modifying the textract output.

  • @QuanNguyen-z2g
    @QuanNguyen-z2g 9 місяців тому +1

    I wish to know whether this solution would be able to extract the structured table formats embedded as [IMAGES] in PDF file . How to adapt the Lambda code to meet this requirement? I look forward to your advise. Thank you

    • @cloudquicklabs
      @cloudquicklabs  9 місяців тому

      Thank you for watching my videos.
      It should be to support it.
      Please try from you side. Let me know if that works for you, but Indeed it's unique requirements.

    • @QuanNguyen-z2g
      @QuanNguyen-z2g 9 місяців тому

      @@cloudquicklabs thank you for your advise.

    • @QuanNguyen-z2g
      @QuanNguyen-z2g 9 місяців тому

      @@cloudquicklabs may I have your corporate email to contact you?. thank you

  • @nazerhussain804
    @nazerhussain804 9 місяців тому +1

    Hi sir I have a usecase that have to read multiple page invoice details and send as json format please guide me i have watched your video but im unable ro get it properly so please guide me

    • @cloudquicklabs
      @cloudquicklabs  9 місяців тому

      Thank you for watching my videos.
      Did you try watch my another video ua-cam.com/video/z_QU3FBpwNc/v-deo.htmlsi=X185h5Zwj5SePAaQ. This could help you to give solutions here.

  • @prashantmittal9772
    @prashantmittal9772 Рік тому +1

    @cloudquicklabs Hi can you give more information for invoking second lambda?
    its same situation for me as others, it is not getting invoked or triggered by sqs, I checked SNS, it is getting response from first lambda function.

    • @cloudquicklabs
      @cloudquicklabs  Рік тому

      Thank you for watching my videos.
      Did you check if roles have been created has required permissions to push the messages sqs

    • @prashantmittal9772
      @prashantmittal9772 Рік тому

      @@cloudquicklabs Hi can you check your first lambda function code, because it is not able to send any message to SNS topic.
      I have tested this with all small steps.
      Thanks in advance if you can help me out by checking your code once.

  • @adityakommu344
    @adityakommu344 Рік тому +1

    I am getting message in flight SQS, but I don't see data in s3. what might be the issue

    • @cloudquicklabs
      @cloudquicklabs  Рік тому

      Thank you for watching my videos.
      This could be because you have not added permission or logic to delete the read messages from queue. Please check on it.

  • @niteshrawat432
    @niteshrawat432 8 місяців тому +1

    hey! @cloudquicklabs its showing such error in test
    Test Event Name
    test
    Response
    {
    "errorMessage": "'Records'",
    "errorType": "KeyError",
    "requestId": "4bbac1a8-23e2-404f-a25f-09945dee594d",
    "stackTrace": [
    " File \"/var/task/lambda_function.py\", line 57, in lambda_handler
    for record in event['Records'] :
    "
    ]
    }
    Function Logs
    START RequestId: 4bbac1a8-23e2-404f-a25f-09945dee594d Version: $LATEST
    how to fix this error?

    • @cloudquicklabs
      @cloudquicklabs  8 місяців тому

      Thank you for watching my videos.
      It's lambda handler runtime error.
      Please check what is the syntax of your event here.

    • @niteshrawat432
      @niteshrawat432 8 місяців тому

      @@cloudquicklabs what do u mean by syntax of an event, can you please elabrorate.

  • @yogeshborkhade4653
    @yogeshborkhade4653 11 місяців тому +1

    I am facing: Exception in GetTableFromTextractResult and error is 'NoneType' object has no attribute 'key'. Any idea what would be the possible reason ?

    • @cloudquicklabs
      @cloudquicklabs  11 місяців тому

      Thank you for watching my videos.
      May be first you need watch my another video on extracting text. Here ua-cam.com/video/-SpHPW3RTx8/v-deo.html

  • @brucespencer6042
    @brucespencer6042 2 роки тому +1

    I appreciate the video. Quick question, I have completed the tutorial but when I run the test, it seems like my invoke lambda isn't doing anything. I believe the only thing I needed to change was the SNS topic Arn and role on lines 6 and 7 correct? I've also added the lambda layer as well and set the trigger to one of my buckets.
    Appreciate your time

    • @cloudquicklabs
      @cloudquicklabs  2 роки тому

      Thank you for watching my videos.
      Yes indeed .. You need add SNS topic ARN AND IAM role with sufficient permission. Please let me know if still face further issues.

    • @brucespencer6042
      @brucespencer6042 2 роки тому +1

      @@cloudquicklabs Apologies for the delay. I've made sure the role has sufficient permissions and matches what you've shown. After checking the lambda logs, I'm able to see that the lambda activates. But the result does not make it to the second S3 bucket

    • @brucespencer6042
      @brucespencer6042 2 роки тому +1

      @@cloudquicklabs However, it also seems like my second lambda is not being invoked since there are no logs to view. So maybe the problem is there. I also made sure there is a subscription between SQS and SNS

    • @cloudquicklabs
      @cloudquicklabs  2 роки тому

      Thank you for coming back on this.
      I could provide more support through my channel membership or freelance (please reach me at - vrchinnarathod@gmail.com). Please have a look at these if it helps you.

    • @SpacyDsgn
      @SpacyDsgn Рік тому

      @@brucespencer6042 Hello! Did you find out ? Same problem here

  • @meetakukde2029
    @meetakukde2029 11 місяців тому +1

    I am getting key error in result function, can you please guide me.
    error message- "records"
    error type- key error

    • @cloudquicklabs
      @cloudquicklabs  11 місяців тому

      Thank you for watching my videos.
      This looks to syntax error. You could reference the code file shared in video's description.

  • @Semaj1985
    @Semaj1985 Рік тому +1

    Hi..Its very useful.. thanks
    Can we get output in csv format?

    • @cloudquicklabs
      @cloudquicklabs  Рік тому

      Thank you for watching my videos.
      Converting the output to csv should be a code trick. It just you need output to .csv file instead of json.

  • @AbhishekKumar-wx3dh
    @AbhishekKumar-wx3dh Рік тому +1

    Sir, thanks for the video. One question, Is it good idea to directly upload the pdf in s3 ? Can we first take the pdf in our own microserver than upload it into s3.

    • @cloudquicklabs
      @cloudquicklabs  Рік тому

      Thank you for watching my videos.
      This is solution so it works in both ways.. I suggest do it as part of your application process it could through micro services or from external directly to s3 bucket

  • @inhxuanhanh3978
    @inhxuanhanh3978 Рік тому +1

    Why use sqs when sns can also trigger lambda

    • @cloudquicklabs
      @cloudquicklabs  Рік тому

      Thank you for watching my videos.
      SQS is required when you want process huge list of documents to avoid Lambda processing burn out. And it is way to process Asynchronously.

    • @inhxuanhanh3978
      @inhxuanhanh3978 Рік тому

      @@cloudquicklabs thank you for the explanation, the video is great

  • @bhawnagupta8687
    @bhawnagupta8687 2 роки тому +1

    I am getting a key error in resultTextract function. Can you help me?

    • @bhawnagupta8687
      @bhawnagupta8687 2 роки тому +1

      [ERROR] KeyError: 'Message'
      Traceback (most recent call last):
      File "/var/task/lambda_function.py", line 115, in lambda_handler
      qmessage = json.loads(modifiedEvent['Message'])
      [ERROR] KeyError: 'Message' Traceback (most recent call last): File "/var/task/lambda_function.py", line 115, in lambda_handler qmessage = json.loads(modifiedEvent['Message'])

    • @cloudquicklabs
      @cloudquicklabs  2 роки тому

      Thank you for watching my videos.
      Its issue of queue message key and value that you are finding in it. Please check the correct scheme of the dictionary and use it accordingly in code.

    • @cloudquicklabs
      @cloudquicklabs  2 роки тому

      Thank you for watching my videos.
      Please check the queue message format and use it accordingly.

    • @bhawnagupta8687
      @bhawnagupta8687 2 роки тому +1

      @@cloudquicklabs Thankyou for your reply. I made that change , Now I am getting this Exception in GetTableFromTextractResult and error is 'NoneType' object has no attribute 'key'

    • @cloudquicklabs
      @cloudquicklabs  2 роки тому

      Thank you for coming back on this.
      This need to be checked, its just python run time error. I could you help you with my channel membership, would you mind if I request you to look at it.

  • @sandeepvaderarocks
    @sandeepvaderarocks 2 роки тому +1

    What is poc as a service?

    • @cloudquicklabs
      @cloudquicklabs  2 роки тому +1

      Thank you for watching my videos.
      'poc as a service' means i would conduct proof of concepts (poc) of a technical scenario and convert that as a lab video for my channel. Please let me know what kind of proof of concept are you looking for.

    • @sandeepvaderarocks
      @sandeepvaderarocks 2 роки тому

      What is your number to reach

    • @cloudquicklabs
      @cloudquicklabs  2 роки тому

      My email address to reach me is - 'vrchinnarathod@gmail.com'

  • @sucuklukolboregi1691
    @sucuklukolboregi1691 2 роки тому

    well i have a question, beautiful video explaining everything btw
    What does this method any differ from just calling the api?

    • @cloudquicklabs
      @cloudquicklabs  2 роки тому

      Thank you for watching my videos.
      Yes, there is difference as this solution support asynchronous request of pdf table data extraction.

  • @Jaheer-k8h
    @Jaheer-k8h Рік тому +1

    Helo sir. This process working multi page pdf also

  • @balajic-bs8me
    @balajic-bs8me 2 роки тому

    I am watched this video. I need one clarification bro. Python trp module which part is helped in pdf file to text extract method. Please explain python trp module bro.
    Thanks in advance

    • @cloudquicklabs
      @cloudquicklabs  2 роки тому

      Thank you for watching my video.
      trp is open source python module which we use in pdf file textracts.