Donut 🍩 : OCR-free Document Understanding Transformer (Research Paper Walkthrough)

Поділитися
Вставка
  • Опубліковано 26 вер 2024

КОМЕНТАРІ • 20

  • @ravi03071991
    @ravi03071991 2 роки тому +1

    Interesting work from the researchers. 'Prompt' has totally changed the way how models are being built.

  • @hectorlioneluranlandaburu1898
    @hectorlioneluranlandaburu1898 2 роки тому +2

    Thanks! Great explanation. In practice, I've noticed Donut works fairly OK if samples are too similar to the ones in the training set (say CORD). In a typical prod scenario, one would normally try to maximize the generalization capacity to avoid maintaining multiple ML endpoints customly trained for a given family of documents. In this regard, I think Donut is still not market friendly. Still, a great addition, nonetheless. Thanks for the video!

    • @TechVizTheDataScienceGuy
      @TechVizTheDataScienceGuy  2 роки тому +1

      Hey, thanks for appreciating the efforts. Yeah, agreed that you can’t fully rely on out-of-the-box checkpoints for serving in prod. Having said that, fine tuning on expected distribution set would surely help. Thanks!

    • @BrianMorgan-z9d
      @BrianMorgan-z9d 7 місяців тому

      What solution is market friendly in your opinion?

  • @diegocaples5602
    @diegocaples5602 2 роки тому

    Thanks for an amazing video! It was really usefull.

  • @JeRrY31st
    @JeRrY31st 2 роки тому +1

    Great explanation. 👍

  • @DrAIScience
    @DrAIScience 2 місяці тому

    Thanks. Can donut be used for text region detection such as caption, oage number, serial number and classifiying them?

    • @TechVizTheDataScienceGuy
      @TechVizTheDataScienceGuy  2 місяці тому

      You could train it to specifically extract required entries from pdf. Making it simulate like region specific extracts

  • @muskanrath7125
    @muskanrath7125 5 місяців тому

    How do i call the model? There is no pipeline mentioned.

  • @sarthakmohanty6691
    @sarthakmohanty6691 Рік тому

    Thanks for the detailed explanation. Does it support prediction of arabic text?

    • @TechVizTheDataScienceGuy
      @TechVizTheDataScienceGuy  9 місяців тому

      I don’t think so. You can check the HF website or paper. But can be done if pretrained.

  • @ShivShankarDutta1
    @ShivShankarDutta1 2 роки тому

    great explanation
    can u please create video on layoutlmv3 and patch embedding

  • @nikhilsinghnine
    @nikhilsinghnine Рік тому

    Hi, thanks for the video. Is it possible to get confidence scores for the predictions and respective bounding boxes with this donut model?

    • @TechVizTheDataScienceGuy
      @TechVizTheDataScienceGuy  Рік тому

      Yeah, it should be returning it already I guess.

    • @nikhilsinghnine
      @nikhilsinghnine Рік тому

      @@TechVizTheDataScienceGuy if possible, could you please guide me to some resource/ reference for it. I am unable to get it.

    • @Alrajput31
      @Alrajput31 Рік тому

      are you able to get score and boxes?