Hey Hackers, I have updated the code to test the model with new image/image url along with flickr32k dataset for better results. You can find the latest code in the description. For users getting the following error: `output_signature` must contain objects that are subclass of `tf.TypeSpec` Please update the code snippets in data_generator and model creation like I updated in my website/description link. It's working with latest version of tensorflow as well without issues. Happy Learning!!!
@@HackersRealm sir i have uploaded dataset and captions file in my google drive and started doing in google colab now what i have to keep my base dir and working dir??
Yes, indeed your explanation and video have helped me understand machine learning and models much better than my professor's explanation. 😅😅 However, I have a simple question. When I try to use the part for "Test with Real Image", I get an incorrect prediction result. Could you please explain to me what I should do? Keep in mind that all the results in the code are correct and all the steps match exactly as in your explanation.@@HackersRealm
why you didn't use keras imagedatagenerator to extract the features from the model. It create whole image preprocessing pipeline so you don't have to do it manually. Btw great tutorial!
It's a wonderful project and I could easily get the output by following your instructions , but after completing everything , if I try predicting the output for a new image , the output is not relevant , how can I correct this , It would be very helpful if you could help us do this . Thank you
Hello sir. I am doing this project but using EfficientNetV2B0 and GRU. But my bleu1 score is not getting more than 0.22. What needs to be changed? Is it possible to get bleu1 score more than 0.5? also, how can we load this model so that retraining is not required and how to implement it in the GUI
Thanks for this video, explained well! Can the model predict on monuments and historical structures? I mean can the model predict on totally unseen data and can you please make a video of how to put entity awareness on top of it
Hello! I tried with 70 epochs and the result doesn’t improve than 52 BELU score and I want to try hyper parameter using grid search but it’s not work without “y-train”, could you tell me How to get the y and how to apply this technique?!
@@nivedansharma4293 Hello, I was getting the same error, in addition to another error. The mistake that I did is: While creating the model: fe1 = Droupout(0.4)(imputs1) and se2 = Droupout(0.4)(se1). But I was writing 0,4 instead of 0.4 This error was gone when I corrected it
It will be great if you add the pickled model in the git repo, as it's going to take my pc about 4hrs to train the model... :(. Other than that, fantastic video!
Thank you for the nice implementation. I have a question, can i use the same approach to generate text from numbers (like tabular data) instead of image features?
In the original extract features from image step, I followed your steps and displayed 'Error displaying widget: model not found'. How to solve it? I've been looking for a solution for a long time, but there's no solution.
Getting there error while triainng the model . Code : # train the model epochs = 20 batch_size = 32 steps = len(train) // batch_size model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy']) for i in range(epochs): # create data generator generator = data_generator(train, mapping, features, tokenizer, max_length, vocab_size, batch_size) # fit for one epoch model.fit(generator, epochs=1, steps_per_epoch=steps, verbose=1) Error: TypeError: `output_signature` must contain objects that are subclass of `tf.TypeSpec` but found which is not.
I tried actual code also :# train the model epochs = 20 batch_size = 32 steps = len(train) // batch_size for i in range(epochs): # create data generator generator = data_generator(train, mapping, features, tokenizer, max_length, vocab_size, batch_size) # fit for one epoch model.fit(generator, epochs=1, steps_per_epoch=steps, verbose=1)
Getting this error while training the model - assertion failed: [You are passing a RNN mask that does not correspond to right-padded sequences, while using cuDNN, which is not supported. With cuDNN, RNN masks can only be used for right-padding, e.g. `[[True, True, False, False]]` would be a valid mask, but any mask that isn\'t just contiguous `True`\'s on the left and contiguous `False`\'s on the right would be invalid. You can pass `use_cudnn=False` to your RNN layer to stop using cuDNN (this may be slower).] [[{{node functional_1_1/lstm_1/Assert/Assert}}]] [Op:__inference_one_step_on_iterator_423791] I have not changed anything in the code. Running your code only. Please suggest what to do?
There is error in data generator function that is basically some of the keys are not present in features def data_generator(data_keys, mapping, features, tokenizer, max_length, vocab_size, batch_size): # loop over images X1, X2, y = list(), list(), list() n = 0 while 1: for key in data_keys: n += 1 captions = mapping[key] # process each caption for caption in captions: # encode the sequence seq = tokenizer.texts_to_sequences([caption])[0] # split the sequence into X, y pairs for i in range(1, len(seq)): # split into input and output pairs in_seq, out_seq = seq[:i], seq[i] # pad input sequence in_seq = pad_sequences([in_seq], maxlen=max_length)[0] # encode output sequence out_seq = to_categorical([out_seq], num_classes=vocab_size)[0]
# store the sequences if key in features: X1.append(features[key][0]) X2.append(in_seq) y.append(out_seq) if n == batch_size: X1, X2, y = np.array(X1), np.array(X2), np.array(y) yield [X1, X2], y X1, X2, y = list(), list(), list() n = 0 do the needful so that no one encounters the same error.
Thank you for the nice explanation. I have few questions. 1. Can we use this flow with larger dataset? 2. Can we use this flow for an image caption generator of fashion product images?
@@HackersRealm Thank you for your response. I've two more questions. 1. Can we use this flow for generating a caption for a new image which is not in the training dataset? 2. I want to create an image caption generator for fashion products. I created a dataset with images and captions for training. Can I use this flow to generate captions by extracting features (attributes and categories) of the fashion products?
Bcz there is only 1 lstm layer.... We don't need output of every time step to pass to next layer here.. if u are stacking multiple lstm or gru you will need output from every time steps
Thanks for wonderful implementation 😊. I run it successfully but Can you tell me how context.txt fill is created because I saw that our input image should be in particular format and we get correct results only for 8000 images. Is it possible for other images? and I think it's not extracting text from image, it's extracting from context.text file. If I am wrong then please correct me. Thank you 😊
@@HackersRealm can you make short video ? Then everyone get idea about it. We are not looking for accuracy. We just excited to know how image processing done by CNN. Amd i don't have enough resources to train model with 30000 images. 8000 images is sufficient for me 😅
Hello. I've followed your video and I tried to train a model on flickr30k. My problem is that the captions that I generate are repetitive. What I mean by that is that whatever is in the image, my captions are always something like: "A man in a black shirt is walking down the street". How can I make the model more diverse?
@@user-px8qq6on1p it's very unlikely happen if you follow the same steps, as you can see in the video... it's generating different results for each image... we need to find out where it's going wrong as there are so many moving parts
@@HackersRealm how will we achieve the finding of unseen image's captions in the code? Would be grateful if you help me in this regard, since I have a demo to present on new/unseen images the next week.
When i try model.fit(generator, epochs=1, steps_per_epoch=steps, verbose=1) i have this error: KeyError: '1000268201_693b08cb0e' & when i do len(features) i find 80 While len(image_names) = 8101, whyy it did not process all the images ??
Thanks for the implementation. But I have a question and that is, what is the LSTM layer doing (1:00:12)? What's the use of this layer? All the papers use the LSTM for the word generation but you're not using the LSTM layer for word generation, you are using a Dense layer for word generation. Then why are you using the LSTM layer? And also, how is the Embedding layer learning here? TIA.
Does you model doing training for all 8000 images in the dataset? Because when I tried different model it only taking at the most 1600 images for training from dataset due to memory issue.
Thank you for this video! Had a question. How can I pickle the implemented model to use it in some app. I am having trouble getting models out in .h5 or pkl formats in general. Can anyone help with that?
When I do it in colab hw do i set the working directory ....i understood the path for base directory but I'm unable to do it for working directory... please help
Sir i m getting this error --> TypeError: `output_signature` must contain objects that are subclass of `tf.TypeSpec` but found which is not. And "2.6.2" version of tensorflow is not available. Is anyone else facing this same issue? How to solve this?
I have a question why we are using both image features and sequences from captions, we can just image features for converting into captions, after vg16 we can use bi-lstm and get our output.
could you please create a video captioning model using MSR VTT dataset it will be very helpful for my major project which is due in2 weeks thank you sir
Really appreciate your videos ! I want to ask you what if we want the system to answer the user's query about the text file ID , and then the system generate the picture file that represents ID. How can we change the code?
hello sir as image captioning has been done previously as my project is on video captioning can u plz make a video or guide with the same procedure but for video captioning
@@HackersRealm sir please can u make it quick as i have less time remaining and im really worried about my project sir it would be really great and i would be really thankful sir
Thank you for such a nice implementation and explanation ,I have 1 doubt So can you please guide me for changes to be done to get captions for random internet images ?Thank you
getting error at this line " yhat = model.predict([image, sequence], verbose=0)" ValueError: Layer "model" expects 1 input(s), but it received 2 input tensors. Inputs received: [, ]
I have a few questions please. How did you choose the hyperparameters of the model? Why is the decoder after the encoder? Why use a dropout layer for the images if they're already gone through VGG16? How can I add a validation_data in the fit function? It shows a compatibility error. Thanks!
Hey greatly explained can you please tell me that if how can i reduce the model complexity to run it in raspberry pi4 and can you explain how do i run this image captioning through my webcam
Yeah you can run it in rpi by converting this model.hdf5 file into tflite file ... For doing it with webcam,u have to capture each frame and pass it as input to the model using cv2
hi, so whenever the session ends, on restarting or resuming it, it loads all the data and training again, so it takes 3 hours again. Even on saving the model, it does the same. what to do?
Hi, Thanks for the tutorial. I used your code without modifications and it is generating the same caption for every image. "startseq two people are sitting on the street endseq" I didn't change anything in the code. Imported the dataset and using kaggle. What should I change for the model to predict correctly?
Hello. I am doing this project. In addition to this code, I want to write a caption. Then I want to make it find the closest image. (for example: two dogs are running) (When I write this caption, it will find the closest image.
You could predict the captions all the images and do the text similarly to get closest. Other than that, there are few other ways where this can be done
@@pradyumnasushanth4430 then it might be some module error in your local machine... you have to check and resolve the error in your local or you could run it in kaggle
Hi, DO you have an attention mechanism code applied to the same code. I am not quite sure about how to go about it. If not can you please explain briefly how it can be done
Here you have used a 8K dataset along with the captions, but if I give a new image why the model is not working, if it should work for any image how the approach should be ? can you give a flow of approach
FileNotFoundError: [Errno 2] No such file or directory: '/kaggle/input/flickr8k/Images'. - i m getting this error though i have the dataset folder and the project file at the same place. i m trying this in jupyter notebook. can you please whats wrong i m doing ?
Sir, after successfully importing a dataset using a Kaggle token, all steps proceeded smoothly until the 'summarize' stage. However, an error emerged during the 'extracting' step, indicating 'No such file or directory: '/kaggle/input/flicr8k/Images.'' can you please guide
i run the code in jupyter notebook but i found this error: FileNotFoundError Traceback (most recent call last) Cell In[7], line 5 2 features = {} 3 directory = os.path.join(BASE_DIR, 'Images') ----> 5 for img_name in tqdm(os.listdir(directory)): 6 # load the image from file 7 img_path = directory + '/' + img_name 8 image = load_img(img_path, target_size=(224, 224)) FileNotFoundError: [WinError 3] The system cannot find the path specified: '/kaggle/input/flickr8k\\Images' please explain how i fix this error
Hey, ModuleNotFoundError: No module named 'tensorflow.security' Getting this error while importing , i have installed tensorflow , please show me a way!!
can you please give the versions of the packages u installed , because I am trying to make a user interface using streamlit in pycharm and the versions should match
Hello! Thanks for your video. I am trying your code but while extracting features i am getting this error "cannot identify image file ". Can you please help me in fixing this! please
Good evening sir, actually I am building a food website where I want to implement a feature like taking input food image from the user and generate caption of that image and then search in the database using that caption.. So my question is just that can I use the same code to generate name for the food image inputted from user using Food-101 dataset.
@@HackersRealm oh I had created a new notebook not from the dataset but separately and then I faced this problem. By the way your doubt clearance helped, thankyou
Hey Hackers,
I have updated the code to test the model with new image/image url along with flickr32k dataset for better results. You can find the latest code in the description.
For users getting the following error:
`output_signature` must contain objects that are subclass of `tf.TypeSpec`
Please update the code snippets in data_generator and model creation like I updated in my website/description link. It's working with latest version of tensorflow as well without issues.
Happy Learning!!!
Sir what will be the base dir and work dir if working in jupyter notebook
@@petenallan24 you can change to your dataset directory and some new folder as working directory!!!
@@HackersRealm sir i have uploaded dataset and captions file in my google drive and started doing in google colab now what i have to keep my base dir and working dir??
@@rohith646 the base dir will be the dataset folder... Try to check if that works or change the code accordingly
thanks
Appreciated your project details. It took me almost 3 weeks to reproduce similar results.
Glad it helped you!!!
My final year project also this topic
can I please contact you i need some ideas about the project
By the way, I forgot to thank you for all this excellent explanation. You, sir, are a truly great person. I am very grateful to you.
Thanks for your kind words!! Happy to help!!!
Yes, indeed your explanation and video have helped me understand machine learning and models much better than my professor's explanation. 😅😅
However, I have a simple question. When I try to use the part for "Test with Real Image", I get an incorrect prediction result. Could you please explain to me what I should do? Keep in mind that all the results in the code are correct and all the steps match exactly as in your explanation.@@HackersRealm
@@MsBothynaCurrently we are using a smaller dataset, if you train with flikr 32k dataset, you might see better results.
Beautiful explanation! Thanks for this!
Glad you like it!!!😃
why you didn't use keras imagedatagenerator to extract the features from the model. It create whole image preprocessing pipeline so you don't have to do it manually. Btw great tutorial!
It will extract the features step again for rerun. By extracting separately and storing helps to avoid the rerun from scratch.
thank u sir best explained IC video so far
Glad to hear that!!!
Loved the implementation and the explanation. Could you please do an end to end chatbot implementation like this, using cornell movie dataset?
chatbot application is already done for generic messages, check the python projects playlist
It's a wonderful project and I could easily get the output by following your instructions , but after completing everything , if I try predicting the output for a new image , the output is not relevant , how can I correct this , It would be very helpful if you could help us do this . Thank you
You could use flickr 32k dataset which has much variety so that new image can work very well
Hello sir. I am doing this project but using EfficientNetV2B0 and GRU. But my bleu1 score is not getting more than 0.22. What needs to be changed? Is it possible to get bleu1 score more than 0.5? also, how can we load this model so that retraining is not required and how to implement it in the GUI
Can we use jupyter notebook for this project
Thanks for this video, explained well! Can the model predict on monuments and historical structures? I mean can the model predict on totally unseen data and can you please make a video of how to put entity awareness on top of it
yes, but you have to train with more data for better results, I have used smaller dataset for the demo
Thanks for the wonderful video , code and explanation
glad you liked it!!!
How to do it like i will provide any random google image and it will prove caption according to that
If you train with flick32k dataset, it would provide better results
Hello! I tried with 70 epochs and the result doesn’t improve than 52 BELU score and I want to try hyper parameter using grid search but it’s not work without “y-train”, could you tell me How to get the y and how to apply this technique?!
me too have same issues
I'm getting error 'int' object not iterable in model.fit(generator, epochs=1 , steps_per_epoch = steps , verbose =1)
Did u run the same code
@@HackersRealm yes i run the exactly same code
epochs =15
batch_size = 64
steps = len(train) // batch_size
for i in range(epochs):
generator = data_generator(train , mapping , features , tokenizer , max_length , vocab_size , batch_size)
model.fit(generator , epochs=1 , steps_per_epoch = steps , verbose=1)
TypeError: 'int' object is not iterable
and i run it in kaggle notebook
@@nivedansharma4293 Hello, I was getting the same error, in addition to another error. The mistake that I did is: While creating the model:
fe1 = Droupout(0.4)(imputs1) and se2 = Droupout(0.4)(se1). But I was writing 0,4 instead of 0.4 This error was gone when I corrected it
this is the best video and so perfectly explained. sir can you please make a video on video captioning using MSVD dataset. thankyou 👍🏼
Planning to do it as upcoming project, will do. Glad you liked this video!!!
@@HackersRealm thats great! 😊
will be waiting for it and hoping to see it soon
I really liked this video, great!!!
Glad you liked it!!!
Kaggle in "Accelerator" tab now provides even TPU, out of 4 options shown in the drop down, which to choose?
Thank you for great explanation. I have a question, what's the accuracy of this project?
Helle Sir when i try to Extract image features it shows gaierrors , url errors, exception in model=VGG16 etc
How Can i fix it? Plz help me..
same
Same
@@mohitramchandani3205did u correct it
Well explained. Thank you so much bro
Glad you like it!!!
The video was great,so much love.
Can you tell me how can I apply the same code for Bengali caption generation?
where will be the changes?
If you have the dataset similar to this, you can proceed with the same workflow
vai apni ki ai project niye r kaj korecen?
It will be great if you add the pickled model in the git repo, as it's going to take my pc about 4hrs to train the model... :(. Other than that, fantastic video!
I will try to upload that if possible
The predicted caption is empty, only startseq and endseq is there, I am too trying to resolve, any suggestion whIch part should I check?
Thank you for the nice implementation. I have a question, can i use the same approach to generate text from numbers (like tabular data) instead of image features?
Yes, It may possible but you have to properly adjust the layers and features accordingly
Bro,Have you got the output of the code
@@hariom6910 you can see at the end of the video
What is the use of batch size and dense layer??✨✨
Wonderful video. Very insightful. Can you please mention what version of tensorflow and keras you are working with. Thanks in advance!
I am using the modules available in kaggle, you can check the version of the modules there itself
when i load the VGG16 model it pass an error
In the original extract features from image step, I followed your steps and displayed 'Error displaying widget: model not found'. How to solve it? I've been looking for a solution for a long time, but there's no solution.
Getting there error while triainng the model . Code : # train the model
epochs = 20
batch_size = 32
steps = len(train) // batch_size
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
for i in range(epochs):
# create data generator
generator = data_generator(train, mapping, features, tokenizer, max_length, vocab_size, batch_size)
# fit for one epoch
model.fit(generator, epochs=1, steps_per_epoch=steps, verbose=1)
Error:
TypeError: `output_signature` must contain objects that are subclass of `tf.TypeSpec` but found which is not.
I tried actual code also :# train the model
epochs = 20
batch_size = 32
steps = len(train) // batch_size
for i in range(epochs):
# create data generator
generator = data_generator(train, mapping, features, tokenizer, max_length, vocab_size, batch_size)
# fit for one epoch
model.fit(generator, epochs=1, steps_per_epoch=steps, verbose=1)
Hey I am getting url fetch exception in image extraction. How to correct it can u tell me
Getting this error while training the model -
assertion failed: [You are passing a RNN mask that does not correspond to right-padded sequences, while using cuDNN, which is not supported. With cuDNN, RNN masks can only be used for right-padding, e.g. `[[True, True, False, False]]` would be a valid mask, but any mask that isn\'t just contiguous `True`\'s on the left and contiguous `False`\'s on the right would be invalid. You can pass `use_cudnn=False` to your RNN layer to stop using cuDNN (this may be slower).]
[[{{node functional_1_1/lstm_1/Assert/Assert}}]] [Op:__inference_one_step_on_iterator_423791]
I have not changed anything in the code. Running your code only. Please suggest what to do?
Are you running this in kaggle?
There is error in data generator function that is basically some of the keys are not present in features
def data_generator(data_keys, mapping, features, tokenizer, max_length, vocab_size, batch_size):
# loop over images
X1, X2, y = list(), list(), list()
n = 0
while 1:
for key in data_keys:
n += 1
captions = mapping[key]
# process each caption
for caption in captions:
# encode the sequence
seq = tokenizer.texts_to_sequences([caption])[0]
# split the sequence into X, y pairs
for i in range(1, len(seq)):
# split into input and output pairs
in_seq, out_seq = seq[:i], seq[i]
# pad input sequence
in_seq = pad_sequences([in_seq], maxlen=max_length)[0]
# encode output sequence
out_seq = to_categorical([out_seq], num_classes=vocab_size)[0]
# store the sequences
if key in features:
X1.append(features[key][0])
X2.append(in_seq)
y.append(out_seq)
if n == batch_size:
X1, X2, y = np.array(X1), np.array(X2), np.array(y)
yield [X1, X2], y
X1, X2, y = list(), list(), list()
n = 0
do the needful so that no one encounters the same error.
did you get resolve this issue? I got the same error ..... Can you help me out with what to do next ...thanks in advance
If its the
X1.append(features[key][0]) then me too and i cant seem to solve it so if anyone could please send it
Even I too get the same error
Thank you for the nice explanation. I have few questions.
1. Can we use this flow with larger dataset?
2. Can we use this flow for an image caption generator of fashion product images?
yes you can use the same!!!
@@HackersRealm Thank you for your response. I've two more questions.
1. Can we use this flow for generating a caption for a new image which is not in the training dataset?
2. I want to create an image caption generator for fashion products. I created a dataset with images and captions for training. Can I use this flow to generate captions by extracting features (attributes and categories) of the fashion products?
@@mahidiwijayantha yes, it's possible for both scenarios
Load the model file vgg16 is error 😢😢 how can i resolved??
Try to enable internet connection in kaggle settings
I made the gui for this thanks for the code btw
That's great
May I know why the 'return_sequence' and 'return_state' of LSTM set as False (default) in a text prediction network?
Bcz there is only 1 lstm layer.... We don't need output of every time step to pass to next layer here.. if u are stacking multiple lstm or gru you will need output from every time steps
If the BLEU is above 0.5 then what is accuracy in percentage. Can you please tell that
accuracy is not a meaningful metric for this problem
Great video but can anyone tell what application is being used like what is the name of IDE that is being used ? anyone pls quick
it's kaggle notebook
Thanks for wonderful implementation 😊. I run it successfully but
Can you tell me how context.txt fill is created because I saw that our input image should be in particular format and we get correct results only for 8000 images.
Is it possible for other images? and I think it's not extracting text from image, it's extracting from context.text file.
If I am wrong then please correct me.
Thank you 😊
I can able to predict for new images that are not in the dataset as well; for better prediction use flickr32k dataset and use it
@@HackersRealm Thanks for replied.
Can you tell me how you get new input image.
Images need proper name. How you set that name?
@@HackersRealm can you make short video ? Then everyone get idea about it. We are not looking for accuracy. We just excited to know how image processing done by CNN. Amd i don't have enough resources to train model with 30000 images. 8000 images is sufficient for me 😅
@@pradnyeshdoshi348 Then you can try to predict with new image and check the results, the process is same for the prediction
@@HackersRealm Hi how to predict captions for new images (which is not present in the flickr dataset)?.
Hello. I've followed your video and I tried to train a model on flickr30k. My problem is that the captions that I generate are repetitive. What I mean by that is that whatever is in the image, my captions are always something like: "A man in a black shirt is walking down the street". How can I make the model more diverse?
Is this showing for any image you try? But that shouldn't happen as there should be slight difference in output even the input is changed
@@user-px8qq6on1p it's very unlikely happen if you follow the same steps, as you can see in the video... it's generating different results for each image... we need to find out where it's going wrong as there are so many moving parts
What if I give an image other than in the dataset? Will it preditct the caption ?
It will try to predict the caption in general manner; You can train with more images for better prediction
@@HackersRealm how will we achieve the finding of unseen image's captions in the code?
Would be grateful if you help me in this regard, since I have a demo to present on new/unseen images the next week.
@@FatimaYousif You can train the model with flickr 32k dataset, that will give good predictions on new image data
is the predicted output, not wrong in every case?
When i try model.fit(generator, epochs=1, steps_per_epoch=steps, verbose=1) i have this error:
KeyError: '1000268201_693b08cb0e' & when i do len(features) i find 80 While len(image_names) = 8101, whyy it did not process all the images ??
have you solved this error?
Thanks for the implementation. But I have a question and that is, what is the LSTM layer doing (1:00:12)? What's the use of this layer? All the papers use the LSTM for the word generation but you're not using the LSTM layer for word generation, you are using a Dense layer for word generation. Then why are you using the LSTM layer? And also, how is the Embedding layer learning here? TIA.
All the mentioned layers are used for the lstm model to generate a new word at a time
Does you model doing training for all 8000 images in the dataset?
Because when I tried different model it only taking at the most 1600 images for training from dataset due to memory issue.
the memory issue won't happen due to custom data generator function.
@@HackersRealm Okay, but approx how many image does your model using for training, is it using all the 8000 images from the dataset?
@@bhushanambhore8378 i think around 6.5k something, you can check the video again as i have split the data for train and test
Does BASE_DIR consists of only images or the folder consisting images and captions?
And what does working_dir holds?Is that an empty folder?
we will store extracted features there in working directory
@HackersRealm I am getting url fetch failure in image extraction how to correct it. Can u pls tell sir
Thank you for this video! Had a question. How can I pickle the implemented model to use it in some app. I am having trouble getting models out in .h5 or pkl formats in general. Can anyone help with that?
Usually we store in the model in h5 format and it works well without any issues while reloading!!! What error you're facing in this?
I am getting ZeroDivision error when finding the BLEU score, can you please help me what to do?
Sir there is no march out there to this project. I am new to deep learning really want to learn deep in DL. Can you suggest some good Institute??
you can learn everything in youtube itself, you can check the channel playlist to learn more concepts
@@HackersRealm thank you sir. Can you share any reference to this project. Indepth explanation of this project. Any article
@@rakeshkumarrout2629 it's in the description for text based tutorial
When I do it in colab hw do i set the working directory ....i understood the path for base directory but I'm unable to do it for working directory... please help
For colab, you can mount the drive and give the dataset path directly to use it
Sir i m getting this error --> TypeError: `output_signature` must contain objects that are subclass of `tf.TypeSpec` but found which is not.
And "2.6.2" version of tensorflow is not available. Is anyone else facing this same issue? How to solve this?
it's resolved, please check the github code for latest update
you are amazing
thanks for your kind words!!!😄
I have a question why we are using both image features and sequences from captions, we can just image features for converting into captions, after vg16 we can use bi-lstm and get our output.
Could you explain the last few lines in detail
could you please create a video captioning model using MSR VTT dataset it will be very helpful for my major project which is due in2 weeks thank you sir
Really appreciate your videos ! I want to ask you what if we want the system to answer the user's query about the text file ID , and then the system generate the picture file that represents ID. How can we change the code?
I didn't get the full context here, could you type it fully?
What's your IDE? Looks pretty cool
It's kaggle notebook, online ide
hello sir as image captioning has been done previously as my project is on video captioning can u plz make a video or guide with the same procedure but for video captioning
Sure I will add that to the list
@@HackersRealm sir please can u make it quick as i have less time remaining and im really worried about my project sir it would be really great and i would be really thankful sir
nice
Hi thanks a lot for this awesome tutorial. Can you please make a tutorial on how to deploy this model on cloud eg AWS?
I have already made a local deployment for basic ml model... I will try to make a video for cloud deployment soon
Can u make a vedio of software installation n setting environments for image captions generating..
You can use kaggle notebook which is a online IDE, it's simple to use like I showed in the video
Thank you for such a nice implementation and explanation ,I have 1 doubt So can you please guide me for changes to be done to get captions for random internet images ?Thank you
The code snippet is already available in my website. link is in the description. For better results, you have to train with more images.
I am getting gaierror when running the cell consisting of creating a model.Could u plz help me
getting error at this line " yhat = model.predict([image, sequence], verbose=0)"
ValueError: Layer "model" expects 1 input(s), but it received 2 input tensors. Inputs received: [, ]
Have you used the same notebook to train and test?
I have a few questions please.
How did you choose the hyperparameters of the model?
Why is the decoder after the encoder?
Why use a dropout layer for the images if they're already gone through VGG16?
How can I add a validation_data in the fit function? It shows a compatibility error.
Thanks!
You can change the model parameters or layers for experimentation too, but make sure the sequence of flow does not break
Hey greatly explained can you please tell me that if how can i reduce the model complexity to run it in raspberry pi4 and can you explain how do i run this image captioning through my webcam
Yeah you can run it in rpi by converting this model.hdf5 file into tflite file ... For doing it with webcam,u have to capture each frame and pass it as input to the model using cv2
Is it necessary to run the code everytime when we open or can we save the trained model
yes you can save the trained model
Hello, what changes I need to do if I want to implement video captioning i.e. generating captions for short video clips?
The whole structure has to be changed... from features to the model. It will be a big task for sure
hi, so whenever the session ends, on restarting or resuming it, it loads all the data and training again, so it takes 3 hours again. Even on saving the model, it does the same. what to do?
If you save the model, you can skip some the steps used for training. Else saving the model is no use for us.
Hi, Thanks for the tutorial.
I used your code without modifications and it is generating the same caption for every image.
"startseq two people are sitting on the street endseq"
I didn't change anything in the code. Imported the dataset and using kaggle.
What should I change for the model to predict correctly?
is this occurring for all the images? which i tested in the video?
how did you add kaggle data to jupyter notebook. Which version notebooks is this ?
If you go to the dataset and click new notebook in kaggle. It will automatically add the dataset to that notebook
Hello. I am doing this project. In addition to this code, I want to write a caption. Then I want to make it find the closest image. (for example: two dogs are running) (When I write this caption, it will find the closest image.
You could predict the captions all the images and do the text similarly to get closest. Other than that, there are few other ways where this can be done
@@HackersRealm How can ı do this?Can you help me
Sir I am getting an error in training the model i.e Graph execution error . What to do sir
Are you using the same code in kaggle notebook?
No Iam running in Jupyter notebook and yes I have written same code
@@pradyumnasushanth4430 then it might be some module error in your local machine... you have to check and resolve the error in your local or you could run it in kaggle
ok
@@HackersRealm
How to fix "gaierror" at extracting image features? Please help.
How to do this for dense captioning task ?
Can you provide me the link of the research paper ?
Hi, DO you have an attention mechanism code applied to the same code. I am not quite sure about how to go about it. If not can you please explain briefly how it can be done
You just have to add corresponding layers to the text model here, flow remains the same
Could you please tell me which application are you using to code? I am new to this and only know about Colab and Notebook.
This is kaggle
hi, I wanted to ask do we need to train 1:04:00 model here every time after opening kaggle. Isnt there any other way to save this?
You can save the model using model.save method
Here you have used a 8K dataset along with the captions, but if I give a new image why the model is not working, if it should work for any image how the approach should be ? can you give a flow of approach
If you want to test with a new image, you can try the same flow with flickr32k dataset, that will improve your results
@@HackersRealm Thanks
and epoch, batch size should be higher with GPU ?
@@Manojkumar-vh4tc For bigger networks, 16 or 32 is the optimal number
Hlo sir if we give the image from our gallery will it workout , i am output
You could use flickr32k dataset to get better results for that... the code in the description contains the updated code which predicts from web url
when i run the epoch it shows the value error
FileNotFoundError: [Errno 2] No such file or directory: '/kaggle/input/flickr8k/Images'. - i m getting this error though i have the dataset folder and the project file at the same place. i m trying this in jupyter notebook. can you please whats wrong i m doing ?
seems correct only, check with different folder structure, if you're using local machine
Hey in model training i am getting the error that is it failed to convert a numpy array to a tensor
No woory its good now😂😂
Sir, after successfully importing a dataset using a Kaggle token, all steps proceeded smoothly until the 'summarize' stage. However, an error emerged during the 'extracting' step, indicating 'No such file or directory: '/kaggle/input/flicr8k/Images.'' can you please guide
are you using the notebook in kaggle environment as shared in the video?
i am using google colab@@HackersRealm
Hey, is it also possible to generate a longer description than only one sentence?
yeah if you train with longer description for the whole model. Then the model can predict longer descriptions
i run the code in jupyter notebook but i found this error:
FileNotFoundError Traceback (most recent call last)
Cell In[7], line 5
2 features = {}
3 directory = os.path.join(BASE_DIR, 'Images')
----> 5 for img_name in tqdm(os.listdir(directory)):
6 # load the image from file
7 img_path = directory + '/' + img_name
8 image = load_img(img_path, target_size=(224, 224))
FileNotFoundError: [WinError 3] The system cannot find the path specified: '/kaggle/input/flickr8k\\Images'
please explain how i fix this error
You have to change the directory path if you're running this notebook in local accordingly!!!
sir at 17th cell the output coming only start and end the caption doesn't coming in between.please tell how to solve(after clean(mapping))
Are you using the same notebook and the dataset?
Hey,
ModuleNotFoundError: No module named 'tensorflow.security'
Getting this error while importing , i have installed tensorflow , please show me a way!!
If you're running locally. You can uninstall and reinstall the module or create a new environment and install the module!!!!
can this be done in visual studio instead of jupyter lab.
Yes, You just need to modify few things like print statements or changing few things to functions. You can use any ide you want
@@HackersRealm what do we do regarding base directory and working directory in visual studio.
@@dotnet8925 You just have to point to the dataset folder. Please change the code accordingly for the folder structure you're using
can you please give the versions of the packages u installed , because I am trying to make a user interface using streamlit in pycharm and the versions should match
Sorry, I didn't note the packages for this.
Hello! Thanks for your video. I am trying your code but while extracting features i am getting this error "cannot identify image file ". Can you please help me in fixing this! please
I think image may be corrupted, try removing the image which is corrupted and do the process again
Good evening sir, actually I am building a food website where I want to implement a feature like taking input food image from the user and generate caption of that image and then search in the database using that caption..
So my question is just that can I use the same code to generate name for the food image inputted from user using Food-101 dataset.
If you have a similar dataset, you can train the model.
While extracting the image features, i am getting error,to rectify this error what shoul i do.
Could you share the error?
I downloaded the code. Why it is note working?
When I want to train. It gives me an error
Use it in kaggle, it will work
can you please tell me the which alogrithms are used in image captioning???
I have used vgg and lstm models for the neural network
Can we add audio to it.. I mean it should read the caption that is generated
If the caption is "A man is driving a car"
Audio must read the same
yes you can do it using text to speech
You did not show how to upload the dataset in the kaggle in the same location as yours, please help
If you go to the dataset link and click new notebook, the dataset will be there automatically!!!
@@HackersRealm oh I had created a new notebook not from the dataset but separately and then I faced this problem. By the way your doubt clearance helped, thankyou
why u not splitting data into training dan testing data?
to check how the model is performing, we need test data which is not present in training.
Hey brother, whats code editor did u use?
It's kaggle IDE
Hi sir
Can we use jupyter notebook for this project??
Yes, you can
i am getting url fetch failure while loading the VGG16 model...please tell me what to do
please enable internet in the settings of kaggle. It's in right pane of the notebook