End To End NLP Project Implementation With Deployment Github Action- Text Summarization- Krish Naik
Вставка
- Опубліковано 21 лип 2024
- In this video I will explain you how to implement an end to end NLP project with deployment using github action
Text summarization is the process of condensing a longer piece of text, such as an article, document, or paragraph, into a shorter and more concise version while preserving its key information and meaning. The goal of text summarization is to extract the most important points and main ideas from the original text and present them in a condensed form.
Github Code Link: github.com/krishnaik06/Text-S...
Support me by joining membership so that I can upload these kind of videos
/ @krishnaik06
Timestamp
00:00:00 - Introduction
00:00:52 - Project Introduction
00:06:33 - Prerequisite
00:07:49 - Introduction of the Instructor
00:08:54 - What is Text Summarization (Online Demo)
00:11:49 - Github Repository Setup
00:14:46 - Project Template Creation
00:40:04 - Requirements Installation & Project Setup
00:51:42 - Logging, Exception & Utils Modules
01:08:39 - Entire Project Notebook Experiment
01:24:17 - Project Workflows
01:32:44 - Data Ingestion Notebook Experiment
01:52:12 - Data Ingestion Final Implementation
02:03:28 - Data Validation Notebook Experiment
02:11:04 - Data Validation Final Implementation
02:20:13 - Data Transformation Notebook Experiment
02:25:45 - Data Transformation Final Implementation
02:30:59 - Model Trainer Notebook Experiment
02:42:18 - Model Trainer Final Implementation
02:47:52 - Model Evaluation Notebook Experiment
02:53:37 - Model Evaluation Final Implementation
02:58:27 - Prediction Pipeline & User App
03:08:31 - Project CI/CD Deployment on AWS
03:36:41 - Outro/ End of the Project
---------------------------------------------------------------------------------------------------------
Join the PWSKILLS Data Science Masters Course
Best Affordable Data Science Course From Pwskills(6-7 Months)
Impact Batch 2.0:- Data-Science-Masters (Full Stack Data Science)
1. Data Science Masters Hindi: bit.ly/3TPdrDz (Hindi)
2. Data Science Masters English: bit.ly/40gZ9hn (English)
Direct call to our team in case of any queries
+9186600 34247
+919880055539
+918147625763
+918660034247
+918951939425
------------------------------------------------------------------------------------------------------------
►Data Science Projects:
• Now you Can Crack Any ...
►Learn In One Tutorials
Statistics in 6 hours: • Complete Statistics Fo...
Machine Learning In 6 Hours: • Complete Machine Learn...
Deep Learning 5 hours : • Deep Learning Indepth ...
►Learn In a Week Playlist
Statistics: • Live Day 1- Introducti...
Machine Learning : • Announcing 7 Days Live...
Deep Learning: • 5 Days Live Deep Learn...
NLP : • Announcing NLP Live co...
►Detailed Playlist:
Stats For Data Science In Hindi : • Starter Roadmap For Le...
Machine Learning In English : • Complete Road Map To B...
Machine Learning In Hindi : • Introduction To Machin...
Complete Deep Learning: • Why Deep Learning Is B...
Hello guys,
Going forward every sunday there will be a end to end project uploaded in my channel. I hope you learn well and crack any jobs and interview. Love you All❣
❤❤
Awesome!
Thanks a lot, 😃
thank you ☺
waiting for another porject
while watching any end to end projects, you will face many errors, like many errors. dont quit after encountering an error, search the internet, youtube, chatgpt, and other resources but dont quit. i had atleast 10 errors yesterday and i woke till 5 am and corrected the error. so keep pushing yourself and never quit. :))
sir we are facing many problems in this code and waste our 3-4 days but no result come please help us . we make this project for academic criteria
can you provide us the main part code . please help us
bro can we connect??
on discord or anything?
Did u face errors in the data transformation ipynb file?
@@taeshikookfictions9830 hello bro
Sir i am learning data science from your content on youtube since last 5 months i found very informative and helpful your tech stuff on machine learning thanks a lot for sharing your experiences and knowledge with us ❤🙏sir
KRISH YOU'RE THE BEST
.GOD BLESS YOU FOR ALL OF THIS INVALUABLE CONTENT.PLEASE KEEP THE COMING . I DEEPLY APPRECIATE THEM.
Thank sir for the great work.
All your tutorials are awesome.
May God give you the zest to be helping us . 🙏🙏
Sumaila from Accra, Ghana.
superb content, amazing explanation. Thank you very much for the project
Thanks Krish ❤
Hard working soul.
it is asking me to use PYTORCH_MPS_HIGH_WATERMARK_RATIO = 0.0 and i am unable to find where should i put this
The video itself was amazing, the workflows and the directory design were very good and essential for learning, but it should be mentioned beforehand that the model that would be trained would not fetch good results, as the rouge score comes out to be nearly 0.02.
I had trained the model on training dataset with 1 epochs and it took my computer 2 days, if the pretrained version of the model was available then please let us know, otherwise the rouge score of the whole project just falls down a lot.
I think if the number of epochs is increased the rouge score might increase but that would be very computationally expensive to do from our side.
My kernel gets crashed when I try to run model trainer. Why it be happening? Should I use Machine Learning Approach?
How to create the conda environment in vs code? or in the git cmd
Also is there a way for me to train my model on colab and from colab use Json or any API and get result to visual studio/ local machine.
I have a question. I see in the evaluation phase, the rouge value is so small, which means the model does not work well? Can you explain to me why?
Is there any video about implementing end to end Linemod for example or PVN3D? from the github?
Approx how much time it take for model trainer to run. Actually in my process it takes many hours but still it not complete
at 1:18 min I am getting name error "convert_examples_to_features" not defined
?
I am getting this error:
RuntimeError: MPS backend out of memory (MPS allocated: 6.25 GB, other allocations: 2.57 GB, max allowed: 9.07 GB). Tried to allocate 375.40 MB on private pool. Use PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 to disable upper limit for memory allocations (may cause system failure).
I am running code on Macbook M2(8GB RAM). Any help?? in Model Trainer.ipynb
Amazing video !!
Following your videos from my college 1st year to office!
hey Krish I am facing one issue while deploying this end to end project?
Great effort but can you explain the training process in the text summarization notebook line by line or write/build code for training in front of us? Just going through the code paragraphs quickly doesn't quite explain how you went step by step to train
When i run my project through EC2 instance and and try to predict on new text. It gives 500 internal server error. Can anyone please help me with it?
Mine is failing out while creating the packages with error hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
Thanks a lot, I have been looking to run transformer models using gpu on notebook.
Thanks a lot Krish ❤️❤️
While doing data ingestion i used the same link as in the video. But I'm getting the error "url unresponsive". I tried everything i know but I'm not able to solve can someone please provide with a solution.
Thank you sir. much love from Pakistan
Not able to get your zip file from GitHub it's showing connection timeout
i am getting module not found error in notebook in this line
from textSummarizer.constants import *
from textSummarizer.utils.common import read_yaml,create_directories
please help
is there a need for folder structure to be so complicated?
after training the model can we not just save the tokenizer and training model in drive and download that files instead of training in user systems because everyone will not be able to train locally due to computational limitations
Hi All,
I am getting KeyError: 'dialogue' in DataTransformation line 'example_batch['dialogue'], max_length=1024, truncation=True)'
Can anyone help?
Which dataset you have used
Iam getting segmentation fault while model training. Can anybody help?
Hi krish,
Can you please do one for fine tuning LLM models from scratch
Can someone tell how much time it takes to train the model? (On local cpu)
Great work
God bless you my friend
Thank you so much..🙏🙏🙏
Thanks alot sir ❤
Thanks a lot for this vid
Great video 🎉
Sir is there any vedio on expalinable AI methods by you?
Hiii can anyone please help me as i am getting error as cannot import logger from textSummarizer
is it give abstractive summary or extractive summary
After certain time , its just copy waste , the focus is more on "what" rather than "why" for each modules in the later part of the video , I also feel the structure is made a bit more complex than it actually need to be. The video also become less engaging after the initial few minutes.
Brother I got something to ask hope you don't mind.
Is this video, something even a fresher in the field of data science should learn or is it for some experienced person.
@@maaleem90 Certainly I don't expect a fresher to know this , designing frameworks comes with experience and maturity, and there could be 'n' number of ways to do it , and if I put myself in an interviewers position there are very less chances I would ask how did you organise your project template , instead I may ask questions around S
O.L.D design principles and Design patterns but that too for a experienced candidate. There could be other templates across the web which you could clone and start working with.
I am not saying the content in above is less worthy , I am just saying it was not very engaging to me personally. For freshers knowing things at a certain abstract level should suffice since no one is going to ask a fresher to design frameworks during the interview process.
@@nikhilmugganawar yes brother thanks this gives me an idea.
I now understand that deploying models and making a project template are things that are handed to an experienced no matter how much a fresher is good with it because as you said it needs maturity and that comes with experience.
Iam really confused and non getting what topics ar ejusy enough for a fresher and thya landed me here.
Thanks brother for helping me
Where we will enter the text and where will get summary
It is hard to keep track of various configurations. You are using config, configuration manager, get_data_ingestion_config, data_ingestion config... etc in 10 different places! i am lost there are so many configurations...Why are we complexifying the folders and workflow?
Dod you use tf-idf or not?
hello sir..need a video to understand the text summarization using discourse knowledge
why we used pip=0.2 version and not the updated.
In the model_trainer notebook, is there a way to train on a subset of test? even training on the test takes a lot of time and resource on a local computer, and to test if the notebook's logic is correct and the model file is being saved in the artifacts we need to train on a small set. I tried to use dataset_samsum_pt["test"][0:50] but I get error. Without subsetting the data I don't get any error and the training process starts. Should i use a different command to use a subset of test dataset for training? should I change any parameter value?
Hey @elnazfathi, I am facing the same issue. Its taking a lot of time to train the test set on my PC. Have you figured out a way to train in lesser time??
PS: Training in Google colab will require making a lot of changes to the code which I am not sure of doing currently.
Why in this project, logging and exception is not used...
Is this for intermediate level? In ML?
Can someone please explain how to download the data
Hi Sir.please can you make video how LLM created and how it works?
I am facing this error in model_trainer.ipynb file and not able to solve it . RuntimeError Traceback (most recent call last)
Cell In[11], line 11
9 model_trainer_config = config.get_model_trainer_config() # should be in one variable
10 model_trainer_config = ModelTrainer(config=model_trainer_config) # should be in a different variable and not model_trainer_config as train has nothing to do with the config, and is part of the ModelTrainer class
---> 11 model_trainer_config.train()
i'll explain later on , explain later on . . . .. kabh explain karenge ji, sabh video toh dekh liya, itna detailed video banaya hai lekin kyu use karna yeh kon batayega, thoda toh samjhna chahiye na ke views ke liye daal diya video, krish sir bhi naaa
In the Data Ingestion Notebook Experiment Part I, I used my own link for the file uploaded on GitHub after zipping it with 7-Zip. In my case, it shows the error 'BadZipFile: File is not a zip file,' even when I use your URL for the source_URL in the Config_yml file. The error occurred at the 1 hour, 36 minutes, and 36 seconds mark in the video. please anyone help me to overcome this error.
I am facing the same issue here. Can anyone help ?
Even I am also facing same issue, please help if anybody resolves this error
I face a problem when i practices
when i change data in ymal file or any variable value in project like constants __init__.py , when i read ymal file or use variable I get the old value and I didn't know why !? any one can help me to fix this
Sometimes it happens. Try saving it once close it and reopen or restart. Also try to clear cache.
Hi, can you please show, How I can integrate frontend with this project. It's fantastic work, but UI is what grasps the attention and plus with how can we integrate trained model with UI, is my other question
Bhai 3:30mins vasool hogaye mere. Dhanyawad 🙏
Sir I emailed you but didn't get any response?
Hi Krish,
Thank for you video. Just a question to clarify.
Why did you install python as isolated environment in your project folder, but conda dependencies outside your project folder?
Helps in preventing dependencies conflicts.
Sir, Bokiar Sir ka vs code theme and font bata do please.
How about showing the final output as well? I am trying to see what the output is gonna look like, nowhere in the timestamp I see it. If anyone see the output, then please reply with timestamp
This project doesn't has any front-end. Which I also find quite weird.
1:14:38 Sir this code isn't running. tokenizer one, it's throwing this below error :
ValueError: Couldn't instantiate the backend tokenizer from one of:
(1) a `tokenizers` library serialization file,
(2) a slow tokenizer instance to convert or
(3) an equivalent slow tokenizer class to instantiate and convert.
(please help)
Brother I got something to ask hope you don't mind.
Is this something even a fresher in the field of data science should learn or is it for some experienced person
@@maaleem90 You need knowledge of the library he is using (transformer one) to understand the code, and for the model part, If you really need to understand what's happening under the hood, you need to know Deep Learning :).
data zip file is not working
great
Thanks to @DSWithBappy for this Amazing Video. It shows the real Complete End to End project.
Traceback (most recent call last):
File "C:\Test\20211030 Test Assignment\template.py", line 33, in
os.makedirs(filedir, exist_ok=True)
File "", line 225, in makedirs
FileNotFoundError: [WinError 3] The system cannot find the path specified: ''
What about this error on the cmd and its not creating the other files which are in " "?
Can anyone help me on this?
please share the path which is causing the issue: 2 possible fixes, path needs to be copied correctly, or you are in the wrong directory. Make sure you run the change directory code
Please make a video on large language model
Pleass try to make end to end prj ON MMM and recommender system using ML
Dear Sir,
I've been following you since I started to learn data science. Your videos are awesome 👌and very easy to understand.
I request that you make a video on an ⚡end-to-end deep learning project⚡🙏.
Bappy has created playlist for Deep Learning Project - ua-cam.com/play/PLkz_y24mlSJZtxpM7dkfiOYxs6PZXHt0_.html
@@saksham1990 thanks bro👍
thank you Sir!
getting below errors: Import "box.exceptions" could not be resolved
Import "yaml" could not be resolved from source
Import "ensure" could not be resolved
Import "box" could not be resolved
If you teach like this, nobody is gonna understand anything... Great that you created the video, but 99 percent wont understand the OS package
I can't paste the link to repository in GitBash while executing git clone
Can somebody please help!
right click, then click paste. Ctrl + v wont work
omg. Really? these are the questions? in linux terminal .. CTRL + SHIFT + V . In Windows ... don't use Windows.
Traceback (most recent call last):
File "main.py", line 1, in
from textSummarizer.logging import logger
ModuleNotFoundError: No module named 'textSummarizer'
I'm getting this error when I try to run
python main.py in git bash
can anyone please help..
I faced the same issue. Please consider these notes to avoid it from happening: 1- Create the environment as it's mentioned in the video (textS), 2- when you create the environment make sure use the Python version that is used in the video which is 3.8. After doing these two steps my issue was resolved. Before doing these steps at the end of requirement installation I was getting some errors. Also I did not have a textSummarizer in the SRC folder. In fact once the requirements were installed correctly, the setup.py file created the testSummarizer folder inside the src folder.
@@elnazfathi i got a different solution to this , i did some Google searches and i found that I've to give tha path to the directory. So i wrote
Import sys
Sys.path.append(dir_path)
This provided the directory path and now i was able to access that
At 1:43 im getting an error: no module named 'textSummarizer' does anyone know how to resolve it
I think it is small s not capital of txtsummadizer
check the path
how to activate anaconda virtual environment in Gitbash?
got it... conda init bash
Plz deployment on azure also...
Hello everyone, my CICD is getting completed properly but I am not able to access the final web page ☹. is anyone facing the same issue how do you resolve it?
I am facing the same issue !!
I am currently facing the same problem . Have you figure out the solution yet ?
I am getting an error "IndexError: Invalid key: 740 is out of bounds for size 0" which points to model_trainer.train(). This error corresponds to execution of code block in jupyter notebook at 2:42:18 timestamp in the video. Any help will be much appreciated. Thanks for the wonderful video krish sir
It's saying you are reading one extra data which is not there in dataset. Or start from 1.
To be honest I have not gone fully through video. But on basis of error I told.
Is the app similar to the online demo?
Feels like lots of unnecessary design choices in the name of "modular coding".
the amount of "okay" he was throwing after every 2 or 3 words was extremly obnoxious
is there any github link or colab link to the notebook being demonstrated at timestamp 1:17:59 ?
Great video, but I'm not 100% sure this is the right showcase of the workflow, too much time devoted to paths, configuration
Maybe the video is just not for me
@krishnaik06 while the video is extremely informational, I keep getting confused trying to go back and forth between the different modules and YAML files. I'm relatively new to the channel and to "modular" programming in general, so that could be the reason. The approach here is that every module and file is written in the order of execution. This makes it clear for the author as he's already aware of the structure. But for viewers, I believe it would be clearer if the entire code could be explained and written, in order, in a single script, and then it could be shown how each segment could be moved to different modules, and from there to the next level of modules or YAML.
Would you have a video in this manner or a video explaining your folder structure so that I can follow it?
Thanks in advance. 🙏🏻
Brother I got something to ask hope you don't mind.
Is this something even a fresher in the field of data science should learn or is it for some experienced person
my CICD is getting completed properly but I am not able to access the final web page, can you help me
I am getting following error during model training:
OutOfMemoryError: CUDA out of memory. Tried to allocate 376.00 MiB (GPU 0; 6.00 GiB total capacity; 4.93 GiB already allocated; 0 bytes free; 5.00 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
I have 6GB RTX 3060 and I have also tried by clearing the cuda cache
Reduce the size of your model. May be layer.
Did you solve the problem? because I have same problem
@@petersunny4481 Sir can you explain what you're saying? I'm a bit new and I don't understand, if we delete the layers of an already existing state-of-art model, wouldn't we lose performance?
Brother I got something to ask hope you don't mind.
Is this something even a fresher in the field of data science should learn or is it for some experienced person
@@maaleem90 I guess for experienced people as the problem is using large models
Please Sir upload model_trainer and model_evaluation folder trained on train dataset please its a request hoping for response:(
getting below error @1:43:00
in dataingestion file:
1 from src.textSummerizer.constants import *
----> 3 from src.textSummerizer.utils.common import read_yaml, create_directories
File d:\Text-Summerizer\src\textSummerizer\utils\common.py:2
1 import os
----> 2 from box.exceptions import BoxValueError
3 import yaml
4 from textSummerizer.logging import logger
ModuleNotFoundError: No module named 'box'
bro even i am getting the same error , can you tell me how did uh fix it?
@@krishnakumar-sy1gk Please try to run "pip install python-box" and run it again, this step resolved issue for me
can any one help me with downloading the zip dataset file ?
I have tried :
1. uploaded dataset on my github and tried executing the code
2.From the bappy github dataset link
it is getting error as BadZip file and it is downloading 0 KB.
internet is working fine
I have written same code as bappy
Any help is appreciated...
Check the file of req.txt is empty or written if empty then save and run
brother i am stuck at the same error i have been trying for two days still couldnt find any solution. did you overcome it?
@@prathameshparab4880 bro i have manually downloaded the dataset and kept at that location
@@AV_Kumar ok thanks
i have uploaded in my aws s3 bucket and changed the link of the dataset try this bro
Module not found error occurs again and again in textSummarizer
Error:
from textSummarizer.logging import logger
ModuleNotFoundError: No module named 'textSummarizer'
I am also getting the same please help me if you are able to solve this
I am also getting the same error, how did you fixed it??
@@sudhirmalik100 I am also getting the same error, how did you fixed it??
wherever you have mentioned textSummarizer, please give "src.textSummarizer" like "from src.textSummarizer.utils.common import read_yaml, create_directories"
as textSummarizer is inside the src folder, it is not taking when we have given textSummarizer directly , I hope this resolves the issue.
what are the prerequisites to start with this project?
Python basics, OOPs concepts.
@@geekyprogrammer4831 ok..thanks
Thank you Sir, for your great work. Your tutorials are really helpful.
I am facing difficulty in ingesting the data at 1:52:08
It is showing:- BoxKeyError: "'ConfigBox' object has no attribute 'root_dir'"
Somehow it can't read the config.yaml file details. I'd appreciate your guidance.
sir please double check the spelling in config.yaml and your existing .py file. I encountered a similar error. Spent 2 days debugging it and in the end it was a typo. I am 99% sure it is the same in your case.
I am also getting same error can you please elaborate what kind of typo error you faced @@TEJASJ05 ?
Does anybody face the issue of Importing the logger?
yes, I meet this issue, have you solved it?
Bro we know entire code is you have already written...
But atleast try to explain it...
I am saying about Colab notebook code..😠
At 39.37 realized the component spelling is wrong and and is currently as conponent
Time stamp 1:33:03
Time stamp: " 57:04 "
Guys after writing code in __init__.py file of textSummarizer/logger folder,
during testing in main.py...
video shows...
import textSummarizer.logging import logger ---> this will give error saying....
(ImportError: cannot import name 'logger' from 'textSummarizer.logging' (unknown location))
instead:
write this...
import textSummarizer.logging.__init__ import logger
in my case I supppose to got an error, which i resolved with this!