Це відео не доступне.
Перепрошуємо.

Text Extraction From a Corpus Using BERT (AKA Question Answering)

Поділитися
Вставка
  • Опубліковано 19 сер 2024

КОМЕНТАРІ • 80

  • @abhishekkrthakur
    @abhishekkrthakur  4 роки тому +21

    Kernel is available here: www.kaggle.com/abhishek/text-extraction-using-bert-w-sentiment-inference

    • @artemryzhikov3418
      @artemryzhikov3418 4 роки тому +7

      Could you share all the code including training scripts? I'd be really grateful for that! (Because I haven't found kernel with training process on Kaggle)

    • @ashishsingh-bv1rq
      @ashishsingh-bv1rq 4 роки тому +2

      Can you please share the Training code also?

    • @harendrakumar7647
      @harendrakumar7647 3 роки тому

      I have bought the book and it's on it's way. So just want to know, does your book contains this part too ? and also let me know.. if any other book written by you

  • @chapterme
    @chapterme 2 роки тому +5

    Chapters (Powered by ChapterMe) -
    00:02 - Introduction
    02:35 - Designing the DataLoader
    10:34 - Implementing the DataLoader
    44:13 - Implementing the model
    01:24:40 - Kernel walkthrough

  • @marco_gorelli
    @marco_gorelli 4 роки тому +4

    I had high expectations for this video, but somehow you managed to surpass them!
    You're a legend Abhishek, thank you

  • @kiranchowdary8100
    @kiranchowdary8100 2 роки тому

    I had a stop in my learning process for a while and this videos resumed my interest again. Thanks for the video abhishek

  • @jackvial5591
    @jackvial5591 4 роки тому +3

    Really enjoying these BERT videos. Great to see how you go about creating dataloaders for tasks that have a sequence as a target.

  • @CodeEmporium
    @CodeEmporium 4 роки тому +1

    Found gold! Nice work ma guy

  • @afzalkhan5094
    @afzalkhan5094 4 роки тому +2

    It's a privilege to learn from first 4GM...please include some CV topics too.. Thank you for making these videos as a free content..Subscribed..

  • @renatoviolin
    @renatoviolin 4 роки тому +2

    thank you very much for your explanations. You made hard concepts/implementations easy to understand. 👍

  • @deepakkumarsuresh1921
    @deepakkumarsuresh1921 4 роки тому

    Congratulations for gaining 10k subscribers .. Your videos are practical and advanced ..Looking forward to more such videos ..

  • @JamesBond-ux1uo
    @JamesBond-ux1uo 2 роки тому

    it would be great help for beginners like me if you explain the things a little bit more, like why you are using something(like why you have taken 2 outputs in output layer or why you are choosing a particular evaluation matrix).

  • @anielvillegas4070
    @anielvillegas4070 2 роки тому

    Great video! Thanks a lot Abhishek. I'm your subscriber. Please make more videos!!

  • @bertobertoberto3
    @bertobertoberto3 3 роки тому

    Excellent

  • @vpsfahad
    @vpsfahad 4 роки тому +3

    Good to learn from you appreciate your work. I have small request that explain a little more then it will be helpful for people like me with intermediate knowledge

  • @houssemayed9272
    @houssemayed9272 4 роки тому

    Thanks for this amazing tutorial on BERT it's very useful for me

  • @jirokaze6380
    @jirokaze6380 4 роки тому

    Thanks, it helps to understand the steps in your Kaggle kernel.

  • @ax5344
    @ax5344 4 роки тому

    Thanks so much for including the inference section.

  • @suryavikram3717
    @suryavikram3717 4 роки тому

    Thank you for taking time to create this. :)

  • @JaskaranSingh-hp3zy
    @JaskaranSingh-hp3zy 4 роки тому

    Thanks for this amazing video !
    Just did my first Kaggle Submission with the same ARCH in tensorflow .

  • @thak456
    @thak456 4 роки тому

    Thank you for your time and effort. Keep the good work going on.

  • @tejkol123
    @tejkol123 4 роки тому

    Thanks very much for the video.
    I believe it would be of great help to provide a outline of code first showing what we will be doing and why we are doing them.

  • @bosepukur
    @bosepukur 4 роки тому

    wonderful video

  • @meacedric2328
    @meacedric2328 4 роки тому

    Thank you for sharing your knowledge. Your videos help me a lot. I still have a question regarding how the BERT works for QA. Could you (or anyone who knows ) please explain me in the prediction mode when you have a new sentence how are question's tokens take into account to predict embedding of context's tokens ?² Thank you in advance for your reply !

  • @artemryzhikov3418
    @artemryzhikov3418 4 роки тому

    Thanks a lot for your kernel and video! Subscribed and voted your kernel) Keep doing good stuff)

  • @Simply-Charm
    @Simply-Charm 4 роки тому

    Thank you so much!

  • @sumantthakur2383
    @sumantthakur2383 4 роки тому

    Thanks a lot for this video. Please make a video on forecasting model creation

  • @krantikumar2886
    @krantikumar2886 4 роки тому

    Thank you Abhishek for sharing. Can you please share some examples on speech to text and text to speech too.

  • @jitendersinghvirk47
    @jitendersinghvirk47 4 роки тому +1

    Hi Abhishek, Thanks for your helpful videos. I have a question. Are you aware of fast.ai module? If yes, why you don't use it? Just asking out of curiosity because I'm watching your platform videos.

  • @teetanrobotics5363
    @teetanrobotics5363 4 роки тому

    Please keep updating your playlists

  • @arjungoalset8442
    @arjungoalset8442 4 роки тому

    Thanks!

  • @as7070as
    @as7070as 2 роки тому

    Could you please write the name of coding environment that you use? Or make video about best coding environments for NLP ?

  • @jeevankumar3527
    @jeevankumar3527 4 роки тому

    @Abishek Thakur We could have used model BERTQuestionAndAnswering and sorry if the question is inappropriate i am a newbie to deeplearning

  • @8003066717
    @8003066717 2 роки тому

    hi, i have a dataset which contains three features like word, tag,seq and I want to apply bert model for ner So how can I proceed with that.

  • @yerdauletalibayev602
    @yerdauletalibayev602 4 роки тому

    Hello ! Big thanks ! Learning a lot from your videos. If you have time please can you please make a short video on how you use GCP/AWS services

    • @abhishekkrthakur
      @abhishekkrthakur  4 роки тому +1

      Thank you very much! GCP/AWS is huge. Are there any particular areas you are interested in?

    • @kushalavm3758
      @kushalavm3758 3 роки тому

      @@abhishekkrthakur amazon sage maker may be ?

  • @SAINIVEDH
    @SAINIVEDH 3 роки тому

    In the model, can't we change self.lo layer to nn.Linear(768, num_tokens). Will there be any difference in output accuracy ?

    • @SAINIVEDH
      @SAINIVEDH 3 роки тому

      2 linear layers( l0, l1), for start and end tokens separate

  • @saadebad4123
    @saadebad4123 2 роки тому

    sir can we apply bert in other programming language like java or c#?

  • @kushagrabhatiaIXb
    @kushagrabhatiaIXb 4 роки тому

    Hey, so I had two small queries. Firstly, can we use softmax instead of sigmoid as suggested in the original BERT paper (Devlin et al 2019) (application on SQuAD)? Secondly, rather than using the first start and end index (with value> threshold) would it improve the performance to choose start and end based on the indices having maximum sum of index scores (with condition start

  • @eugeneware3296
    @eugeneware3296 4 роки тому

    Really fantastic video. Thanks so much. How did you mirror / record your iPad to your linux desktop? What software did you use?

    • @eugeneware3296
      @eugeneware3296 4 роки тому

      Realised you might have just recorded locally and then imported and synced the video in post. Is that right?

  • @shahules4432
    @shahules4432 4 роки тому

    hi sir,Can we compare the same problem with NER? I know this is more similar to problems like Squad dataset but I also feel it is similar to NER.

  • @nsuryapa1
    @nsuryapa1 4 роки тому +1

    What is the software you are using for whiteboarding, it is good.. though it is bad question in this context

  • @dyogesh2303
    @dyogesh2303 4 роки тому

    Hi Abhishek, Thanks for the video. I have one doubt it would be great if you can help me with this. How much big corpus can bert handle to extract the text also in SQuAD when we provide the corpus how big can it accept to find the answer? do we have any hyperparameter for that so that we can tune it?

  • @saifeddineazzabi4377
    @saifeddineazzabi4377 4 роки тому

    First , thanks for this great work , i have a question : why padding_len :tensor(93)

  • @ashmanikumar2488
    @ashmanikumar2488 4 роки тому

    Hi Abhishek
    I am working on NLG model for data to text. I am getting problems like out of vocabulary word for testing and also some diverse field like data of birth is always predicting wrong in output.How to overcome this problem any suggestion

  • @user-zy9hj9qx7p
    @user-zy9hj9qx7p Рік тому

    how to create wuestion-answer pairs from a piece of text

  • @dipF50
    @dipF50 4 роки тому

    Could really help with an XLNET video. . .

  • @user-vs1jb6dm2d
    @user-vs1jb6dm2d 4 роки тому

    Hi.Hope you ask.Sometimes: word != tokenizer.decode(tokenizer.encode(word )). we get different values sometimes and they are important to calculate jaccard metric.Any ideas how to fix it? Also why you use seq_out if can use pool out?

  • @avigupta2612
    @avigupta2612 4 роки тому

    Can you suggest some other datasets for question answering tasks.

  • @darpan810
    @darpan810 3 роки тому

    how can you write all python function to fast, this is great, you are to tez like your library

  • @ranitchatterjee5552
    @ranitchatterjee5552 2 роки тому

    This video is based on sentiment analysis, right?

  • @onlinekamaowithms1341
    @onlinekamaowithms1341 2 роки тому

    i want to make QA system retrieval of legal information domain how can i start

  • @dhruvgangwani469
    @dhruvgangwani469 4 роки тому

    Can I use encode_plus() to get input id, attention id and token id.... JUST ASKING

  • @harshgupta-vn1tl
    @harshgupta-vn1tl Рік тому

    Which editor is he using in the video?

  • @kushalavm3758
    @kushalavm3758 3 роки тому

    may i know the editor please ... someone let me know ...

  • @ankushjindal9536
    @ankushjindal9536 4 роки тому

    can you tell how we can perform question answering using bert for SQuAD dataset v2.0?

  • @alexandremondaini
    @alexandremondaini 4 роки тому

    Which IDE do you use?

  • @bhavikapanara9550
    @bhavikapanara9550 4 роки тому

    Thanks, Abhishek. It's a good tutorial. However, after running it, I got an error in line number 78 in engine.py file. The error message is:
    78. fin_padding_lens.extend(padding_len.cpu().detach().numpy().tolist()) # [engine.py file]
    AttributeError: 'tuple' object has no attribute 'cpu'
    Can you please help me to resolve it.
    Thanks,
    Bhavika

  • @uthamkanth2505
    @uthamkanth2505 4 роки тому

    Thanks for sharing it, I did OCR using two different python libraries on huge data.can u suggest the best possible metric and approach to compare the two python libraries outputs(text files) in terms of quality of text and content including spaces , special characters..etc .so that i choose one of them to proceed for my further tasks.awaiting for your reply:)

  • @rajeshpanthri7149
    @rajeshpanthri7149 4 роки тому

    Hi Abhishek.. Which code editor are you using?

  • @chadchang1784
    @chadchang1784 4 роки тому

    Does anyone know why using tokenizer package instead of tokenizer built-in Bert?

    • @JaskaranSingh-hp3zy
      @JaskaranSingh-hp3zy 4 роки тому

      The built-in tokenizer of transformers does not offer "offsets" which are used in preparing the target vectors.

    • @kumarsundaram4659
      @kumarsundaram4659 4 роки тому

      @@JaskaranSingh-hp3zy Hi, I am new to NLP and don't know the meaning of 'Offsets' in terms of NLP. Can you please help me by elaborating it a bit in terms of NLP? like why/where/when it is used, and what are the scenarios where we need to have offsets? thanks for your patience and kind help! :)

    • @shivamsrivastava3076
      @shivamsrivastava3076 4 роки тому

      ​@@kumarsundaram4659 github.com/huggingface/tokenizers#quick-examples-using-python
      Follow this. It might help you :)

  • @denisgontcharov7307
    @denisgontcharov7307 4 роки тому

    Hi Abhishek,
    I really like how you organize your work. I
    have been using your structure with create_folds, engine, config, etc. myself since your Bengali video and I’m a big fan of it.
    I’m now at this point where I want to experiment with different models, hyperparameters and post-processing for the Tweet Sentiment Kaggle competition.
    I struggle a bit with keeping track of my modifications to make sure I can reproduce the models.
    How do you keep track of changes to your scripts throughout a competition?
    Do you still use the model dispatcher from your Bengali video?
    I had the idea of relying on Git by making a branch of my repo each time I train a model.
    Other Kagglers suggest making a copy of the entire repo each time.
    I think both solutions have their drawbacks, given that 99% of the code doesn’t change.
    I found some great ideas in this Kaggle thread but I wanted to know how you manage it:
    www.kaggle.com/c/porto-seguro-safe-driver-prediction/discussion/42416
    I realize this is a vast subject. Perhaps you could just share a few tips?
    I believe this would be a subject for a very interesting UA-cam video as well!
    Thanks again for putting so much effort in your videos, I learn a lot from your code!
    edit: formatting

  • @NemishKanwar
    @NemishKanwar 4 роки тому

    Why is Valid batch size twice, since there is no backprop in valid, isn't it better to have it the other way around. Based on discussion here...forums.fast.ai/t/different-batch-size-for-train-and-valid-data-loaders/30691

  • @menon92
    @menon92 4 роки тому

    You doing great work by doing this kinds of tutorial for people like me. I would like to request one think that when you explain a code portion please add a sample example like you use in this video at ua-cam.com/video/XaQ0CBlQ4cY/v-deo.html

  • @EGspider
    @EGspider 2 роки тому

    Why don't you type on the keyboard, its MUCH faster and clearer

  • @rajputjay9856
    @rajputjay9856 3 роки тому +1

    (base) self-made-lol@selfmadelol:~/Desktop/sentiment analysis/src$ python dataset.py
    Errors: Os { code: 2, kind: NotFound, message: "No such file or directory" }
    Traceback (most recent call last):
    File "dataset.py", line 1, in
    import config
    File "/home/self-made-lol/Desktop/sentiment analysis/src/config.py", line 16, in
    lowercase=True
    File "/home/self-made-lol/anaconda3/lib/python3.7/site-packages/tokenizers/implementations/bert_wordpiece.py", line 30, in __init__
    tokenizer = Tokenizer(WordPiece(vocab_file, unk_token=str(unk_token)))
    Exception: Error while initializing WordPiece anyone knows how to solve this