Machine Learning / Deep Learning Tutorials for Programmers playlist: ua-cam.com/play/PLZbbT5o_s2xq7LwI2y8_QtvuXZedL6tQU.html Keras Machine Learning / Deep Learning Tutorial playlist: ua-cam.com/play/PLZbbT5o_s2xrwRnXk_yCPtnqqo4_u2YGL.html
I came across your comment and was just curious. I assume that you were taking a class with neural networks for a degree or certification. How have your studies been going over the past three years?
Found this channel today, I am just a beginner in neural networks and I am so glad, your explanations are so amazing and easy to understand the concepts.
This series really moves along nicely. New concepts are introduced at just the right level of detail to get the job done. This streamlines explanations and allows rapid progression by not overloading working memory.
I liked, subscribed, suggested and now commented as well. Your videos are GREAT!! Just enough depth timed at the right moment. I'm amazed by your ability to break down complicated topics. Thanks a lot!
I am pleasantly surprised by the quality of this playlist's content - going back and forth between theory and practice is so seamless! Well done to the narrator and everyone involved in production and thank you
Of course I am :) My Engineering Thesis (in next year) includes image recognition with usage of Deep Neural Networks. I feel lucky that I found this channel. You are the best
@@tarsala1995 Hey I know it's been years, but I just came across your comment and I was curious. How did your thesis go, and what have you been up to since graduation?
your tutorials are tooo good ...I've been searching for good ones for the past 3-4 days and then I stumbled upon your playlist and now I can confidently say that your tutorials are the best. Keep up the good work! :)
Thank you so much for your contribution! Your videos are incredibly to the point and I feel I've learned so much in so little time. Just wanted to point out that at the weight updating I got mildly confused (and then understood everything with a glance at your blog). I believe a mere flash of the equation new w= old w - learning rate*gradient would leave no room for confusion. Thank you again!
Mam! You are a legend! I was almost considering learning calculus to figure out how to calculate the new weight values...I know that doing it the "calculus" way is probably a better way but this way makes more sense to me. Thank you so much for this!
First of all, thank you so much for your videos. I'm new to this but I'm learning a lot, as these tutorials are clear and concise. I need help, here's the code: import keras from keras import backend as K from keras.models import Sequential from keras.layers import Activation from keras.layers.core import Dense from keras.optimizers import Adam from keras.metrics import categorical_crossentropy model = Sequential([ Dense(16, input_shape = (1,), activation = 'relu' ), Dense(32, activation = 'relu'), Dense(2, activation = 'softmax'), ]) model.compile(Adam(lr = 0.0001), loss = 'sparse_categorical_crossentropy', metrics = ['accuracy']) model.fit(scaled_train_samples, train_labels, batch_size = 10, epochs = 20, shuffle = True, verbose = 2) And I'm getting this error: --------------------------------------------------------------------------- NameError Traceback (most recent call last) in 15 16 model.compile(Adam(lr = 0.0001), loss = 'sparse_categorical_crossentropy', metrics = ['accuracy']) ---> 17 model.fit(scaled_train_samples, train_labels, batch_size = 10, epochs = 20, shuffle = True, verbose = 2) NameError: name 'scaled_train_samples' is not defined Thanks
Glad you're enjoying the videos, Srikar. In regards to the error, you have not defined the scaled_train_samples variable. I used it in this video just an example to show how the fit() function works. You can see how we defined this variable and follow the full code in the Keras playlist: ua-cam.com/play/PLZbbT5o_s2xrwRnXk_yCPtnqqo4_u2YGL.html
I’ve told everyone I know who thinks Deep learning is hard to check out your tutorials and they always give great feedback. Thank you so much for these tutorials, they’re changing lives. Please, is a tutorial on RNN coming up soon - in-line with LSTM, GRU and the likes for sequences. Just curious, thank you.
Thank you, Sanni! Glad to see you're still around :D Yes, we have RNNs, LSTMs, etc. on our list of potential topics for future videos, but not currently sure of the time line for these at the moment.
For sure, Tim! Thanks for letting me know! The video is part of a larger series on deep learning fundamentals. Check it out if you haven't already! ua-cam.com/play/PLZbbT5o_s2xq7LwI2y8_QtvuXZedL6tQU.html
This is a really great video and I haven't seen any other series explain this stuff so clearly. However the step that the optimizer takes during stochastic gradient descent is in the opposite direction of the cost function gradient. The gradient points in the direction where the cost function is increasing but we're looking to minimize it.
Two questions: 1. Is the algorithm attempting to minimize the loss at the end of each epoch ? Does that mean that If I have 10 pictures of cats and dogs, the loss function only runs once all my pictures were treated ? 2. could the learning rate be a function itself ? Meaning that we may want a bit more "tuning" for the first epochs, then a bit less after ?
Hey Yohann, 1. The loss is being calculated after each batch and is then averaged at the end of each epoch. So, with your example of 10 images of cats and dogs, if your batch size was equal to 1, then the network would run through 10 batches before completing a single epoch. It will be calculating the loss for each batch for all 10 batches, and then will calculate the average loss from the 10 batches, which will be the loss for a single epoch. I have a video on batches/batch size if you need more info regarding batches: ua-cam.com/video/U4WB9p6ODjM/v-deo.html 2. Yes, when training a model, you may want to lower the learning rate as the model improves, for example. This would be a "decayed learning rate." In general, a changing learning rate during training is referred to as an "adaptive learning rate." More on the learning rate is here also: ua-cam.com/video/jWT-AX9677k/v-deo.html
What made you decide on 16 neurons and 32 neurons for the hidden layers? How is that decision made, is it an optimization method that we can approach for deciding?
We don't just use the (gradient*learningRate) value to update the weights, we find each weight(w) by calculating for which value of w gradient*learningRate value is coming out as 0 (finding a minima). Correct me if I am wrong. Great learning material though!
i want to first thank you very much about your videos it is very useful for me i want to ask you 2 question 1)if we have for example 5 image to train and our batch is 1 i got from your previous answer that to complete one epoch we need to run the model 5 times,then we will calculate the loss function of each batch and take the average of these. but what about the weights ?? i mean is the weight of the neuron is constant for each epoch ? 2) i have a research on video summarization by deep learning ... i want to know from your advice what is the best path ,my starting point is to use a convolution neural network(CNN) to classify image to be key frame or not ? my question is it correct to classify image to two type one desired (key frame) and other (not key frame).
Hey mahmoud - You're very welcome! 1) Yes, you're correct. The only thing to point out here is that when you say "we need to run the model 5 times," what we are actually doing is passing in a batch of data to the model 5 times. Each batch will be made up of different data. I'm not sure if you've seen the video on Batch Size in this series, but if you haven't check it out. It has examples that help illustrate this concept further: ua-cam.com/video/U4WB9p6ODjM/v-deo.html 2) This can be achieved with a CNN. Check out the question and highest-voted answer from this stackoverflow post. It gives two approaches available for classifying data that belong to a "none of the above" class for your network. Your "not key frame" class would need to catch everything that is not a key frame. stackoverflow.com/questions/43578715/how-best-to-deal-with-none-of-the-above-in-image-classification
Hi Mandy, around 0:59 you talk about Gradient of the loss function, can you please let us know 1. How that Loss Function look like Mathematically for the Neural Network that's displayed in the video at 0:59. 2. How does the Gradient look like Mathematically AND 3. How will overall Hypothesis for this Neural Network would look like Mathematically. I am Self Learning and filling these small gaps/cracks takes lot of time. Thanks in Advance... Ashish
Hey Piers - On deeplizard.com, we are in the process of creating text-based resources and blogs for corresponding videos. You can see this for the first few videos in the Deep Learning Fundamentals series, as well as all of the TensorFlow.js series if you browse to the site. We plan to roll this out for all videos.
Hello there, First, Thank you so much for your great videos! it's really helpful! I have a question: Is that right if we increase the number of epochs, we will get a better result? In our example it was 20 so if we make it 30 or 40 wouldn't be better? and how about trying some other loss functions to our model? Thanks again! Keep it up!
Hey الانترنت لحياة أسهل - You're welcome! I'm glad you're enjoying the videos! Generally, you'll want to continue training your model until the metrics flatten and stop improving. When your model is trained to the point where it will not learn further, the accuracy will stop improving. If you continue to train the model for many epochs after the performance has flattened, then you risk the possibility of running into the issue of overfitting. In regards to your question on the loss, yes, you can try other loss functions with the model. Note that some loss functions are known to work better for different types of problems. The highest-voted answer on the stackoverflow question below has a thorough explanation for this: datascience.stackexchange.com/questions/9850/neural-networks-which-cost-function-to-use Hope this info helps!
Hi @2:20 you say "weights would be getting closer and closer to their optimized values." So I'll try to frame the question, so what does "optimized" mean? Optimized means such a value for weight that the predicted output is closest to actual output. How do you explain that Mathematically? What happens? Does the slope of a function change or some multiplier to some function changes such that the final value is closest to actual....
{ "question": "_______ is the number of samples that will be passed through to the network at one time.", "choices": [ "The Batch Size", "An Epoch", "Shuffle", "Sequential" ], "answer": "The Batch Size", "creator": "Hivemind", "creationDate": "2019-10-09T17:58:04.057Z" }
Around the 4:39 mark, where model.fit parameters are discussed, they are described using terms that I can't hear clearly. I hear "empire rate". I tried Close Captioning, but the speech recognition says the parameter narrations are "numpy array" and "empire rate". Can anybody tell me what the narration is for these two parameter descriptions? Thanks!
you mentioned having 2 hidden layers with 16 and 32 nuerons respectively , the code shows one of those , could you explain me how are there 2 hidden layers when in the 3 lines of the code one is input one is output and the one in the middle is the hidden ?
The input layer isn’t explicitly declared here, as Keras creates the input layer implicitly given the input_shape passed to the first hidden layer. Given this, the model looks like this: input layer that accepts data of shape (1, ) hidden Dense layer1 with 16 outputs hidden Dense layer2 with 32 outputs output layer with 2 outputs Does this clarify?
Hi, thank you so much for the amazing videos, very clear and easy to understand. However, I notice that sometimes the first accuracy can be 0.0000e+00 value and it stays at the top 0.5 no matter how many epochs I run. And sometimes the accuracy will increase/decrease like "jumping". Could you please kindly explain why? I was running codes on Colab not Jupyter which is a TensorFlow-based google API, does it matter?
Sorry I think I might be a little confused. If we have batch=10 and one epoch, does it mean we pass all of the data we have, in chunks of 10, until we have exhausted all the data to pass? Because at the end of each epoch you have computed the loss, but is that the sum of the losses for each passing of an image through the model? and do the weights get optimised say, at once, after one epoch, or every time after one singular image is passed? Hope my questions are not too dumb!
Hi there! I followed the video to a 't' and am getting this error message "NameError: name 'scaled_train_samples' is not defined" I have updated version of keras, numpy and all other programs on anaconda. any idea why else I would be getting this message?
Hi Austin, you have not defined the scaled_train_samples variable. I used it in this episode just as an example to show how the fit() function works. You can see how we defined this variable and follow the full code in the Keras course: deeplizard.com/learn/playlist/PLZbbT5o_s2xrwRnXk_yCPtnqqo4_u2YGL
Thanks very much for easy explanation.. I have a model with loss from (1~20) epochs reduced from (6.1011e-04 ~ 3.3847e-04) BUT accuracy was constant (accuracy: 5.4975e-04). does it make sense or I made a mistake? Thanks,
Hey Andre - The code files are available as a perk for the deeplizard hivemind. The code for this series is available at: www.patreon.com/posts/code-for-deep-19266563 Check out the details regarding deeplizard perks and rewards at: deeplizard.com/hivemind
Hey ravi - Yes, the number we specify for epochs is interpreted as how many times the model will run through all the data. So, by specifying 20 epochs, we'll have the model training on the full data set for 20 iterations.
Here at 1:19 you said that we feed in one image, calculate the loss for that image's prediction and then find the gradient of the loss w.r.t to the current weight. My question is, all we have right now is just one constant value of the loss function. How can we find the derivative of a sing constant value? Isn't it supposed to be an entire function whose derivative we wanna find?
I suggest checking out the 5-part series on backpropagation that comes later in this course. It explains this process in full detail. Starting with this episode: deeplizard.com/learn/video/XE3krf3CQls
Hey Martin - You're welcome! The weights are typically randomly generated around a normal distribution (with a mean of 0 and standard deviation of 1). They do not need to sum to 1. This video gives details on the process of how weights are initialized: ua-cam.com/video/8krd5qKVw-Q/v-deo.html Let me know if this helps!
Hey Jeffrey - The input layer is just a tensor representation of the input data. So yes, the number of nodes in the input layer is equal to the number of features in a single input sample from the training set.
I have Error, please help me !!! --------------------------------------------------------------------------- NameError Traceback (most recent call last) in ----> 1 model.fit(scaled_train_samples, train_labels, batch_size=10, epochs=20, shuffle=True, verbose=2) NameError: name 'scaled_train_samples' is not defined
is there a reason for choosing a certain number of layers and neurons? does it depend on the problem? Until this video I only understand the number of neurons of the output layer is related to the problem itself (for example the cat and dogs example)
Yes it depends on the problem. Either you read blogs and research to find out what's recommended for your problem. Or you just experiment and try different models to find out what works best. At least that's my approach as long as I don't have enough experience to construct successful models right from the start. In the end, a simpler model that gives the same result as a more complex one would usually be preferred because it would be faster to train. (Even if you probably have found out that by now, someone else who reads this in the future may find my answer helpful...)
Around 1:55, it would be helpful if you can put the mathematical equation through which you calculated updated value of "w". Great video by the way, for the first time I am getting a feeling of "WHAT it means TO LEARN!!!"
Hey Ashish - For the math explanation, check out our 5-part series on backpropagation used by networks during the training process. Part 1 of 5 starts here: deeplizard.com/learn/video/XE3krf3CQls
I got confused with the calculation of the gradient. The new/optimized value of weight is given by the calculated gradient (for a certain weight) multiplied by the learning rate. But what if the loss becomes approximately zero, then the gradient is approximately zero and the new/optimized weight value also becomes zero? Please help! Thanks! :)
I am bit surprised that the topic of gradient is mentioned as delta loss over delta weight, but in the same video I don't hear that delta loss is calculated over TWO epochs.This suggests to me the delta loss is completely and independently calculated within just one epoch. I find that confusing. I lack experience and thus understanding of what it means to calculate either "delta", "change", or "gradient" (e.g., delta loss) without using two or more values to calculate differences. If you calculate two or more values of loss, would that not require MULTIPLE epochs for the calculation? Please note I am not looking for the underlying math. I am trying to understand why the HIGH-LEVEL idea seems to be that the loss change is calculated all within the same epoch. How is that done without initially running two epochs (with arbitrarily changed weights at first?) and then calculating how the loss changes?
We can calculate the derivative of the loss with respect to a given weight within a single training iteration. We are not making use of the change of the loss over two iterations, but rather just calculating its derivative with respect to the weights in each iteration.
Hey Raju, The batch size is the number of samples that will be passed through to the network at one time. An epoch is an entire pass of the data through the network. So, for example, if we have 100 samples in our training set, and we set our batch size to 5, then we will be passing in 5 samples at a time to our network during training. It will take a total of 20 batches to complete an entire epoch at 5 samples per batch (because 100 samples divided by a batch size of 5 is 20 total batches). Once we pass 20 batches, then that will complete an epoch because 20 batches of 5 each make up our entire data set. If you haven't seen it yet, I have a video later in the playlist that covers batch size in more detail. ua-cam.com/video/U4WB9p6ODjM/v-deo.html Hope this helps!
@@deeplizard How is that you pass 5 samples at a time through a network?. How is it possible?, aren't you constrained to pass 1 sample at a time? (since the inputs are the characteristics of 1 sample)
I'm working on a neural net project which is detecting cardiac arrhythmia using ECG signals and I just need someone to take a look and give me a general overall input. I would really appreciate any help as this project is very important for me.
The Keras course: deeplizard.com/learn/playlist/PLZbbT5o_s2xrwRnXk_yCPtnqqo4_u2YGL This episode is where we begin creating the dataset: deeplizard.com/learn/video/UkzhouEk6uY
You are amazing. Everything is so clear and pointy. May God bless you. Can i mail you for a little help on my undergraduate thesis work? it will be agreat help. Thanks
@@deeplizard You are most welcome and I'm glad that you think (or are pretending to think) they're funny! I'm really liking the series and I fully intend to complete it. Part of the reason is because I was making a program as a hobby and I implemented a very rough neural net, and I've been trying to find a much more elegant way to reimplement it. I'm doing this in Java, so I know it's a bit different, but I'm thinking I can still transfer over a basic understanding of the concepts...
Hey XxYyZz - Download access to code files and notebooks are available as a perk for the deeplizard hivemind. Check out the details regarding deeplizard perks and rewards at: deeplizard.com/hivemind
In the previous videos, you imported the Dense type of layer from keras.layers. In this video you did it from keras.layers.core. Is there any particular reason for this?
Hey Hasan - They both point to the same place, so both import statements achieve the same thing. I didn't intend to import the Dense class differently between videos.
Thanks for your reply. I should mention that I am finding your tutorials amazing. The tutorials I find on other places are either too vague and superfluous to be helpful or too theoretical and end up being overwhelming. I really appreciate your tutorial which gives practical lessons with necessary theoretical knowledge but doesn't get bogged down with esoteric details. I feel like I can get rolling on with my project once I finish your playlist. Same can't be said of other tutorials. So, again, thanks a bunch. You are a life saver.
Thank you, Hasan! Practicality is one of the major points we strive for when producing these videos, so I'm glad to hear you've spotted that and that you're learning. I appreciate you letting me know your thoughts! If you've not come across our Keras series for deep learning, you may be interested in that as well. It focuses more on coding with the same practical approach used in the conceptual videos of this playlist. ua-cam.com/play/PLZbbT5o_s2xrwRnXk_yCPtnqqo4_u2YGL.html
thanks for this amazing video again. Only thing im asking myself is what d(weigth) actually is. Is this just the value of the specific Weight or was the value put in a function=
d(loss)/d(weight) means "the derivative of the loss with respect to the weight." This is covered in much more depth in the later episodes in this course on backpropagation. The backpropagation episodes start here: deeplizard.com/learn/video/XE3krf3CQls
Your videos are great but I have a question: Are the weights and losses functions? I think they're supposed to be constants but I'm confused because you said d(loss)/d(weights) * LR The derivative of a constant is 0. Please help
The loss is a function of the weights. Therefore, you can take the derivative of the loss with respect to any given weight. Call the loss function by the name "f" and a given weight by the name "x". Then the derivative of the loss function f with respect to the weight x would be written as df/dx.
Lol later in the series, there are five episodes that go into all the details (including the calculus) for how backpropagation is working during gradient descent. It starts with this one below. Maybe that would be helpful for you :) deeplizard.com/learn/video/XE3krf3CQls
Hello there... First of all thank-you for your wonderful work. I got an error here The error says : Name error: name 'scaled_train_samples' is not defined. Note: I'm using this in Google colab. Note2: i wrote everyline exactly the same.
Hi hari, you have not defined the scaled_train_samples variable. I used it in this video just an example to show how the fit() function works. You can see how we defined this variable and follow the full code in the Keras course: deeplizard.com/learn/playlist/PLZbbT5o_s2xrwRnXk_yCPtnqqo4_u2YGL
cant help it, but at 3:40 I thought you said it's a very Asian optimizer. And went like "she said what now?" had to re watch 5 times before i noticed you said variation. Sometimes my brain goes boomboom
pls validate ... { "question": "How many data samples must be passed through the model before the weights are updated?", "choices": [ "One", "Batch Size", "Total of all nodes", "All samples from dataset" ], "answer": "One", "creator": "Adrian", "creationDate": "2020-05-29T20:11:13.767Z" }
Hey Adrian, thanks for the question! Before we post it to the site, note the explanation under the section called Mini-Batch Gradient Descent in the blog below: deeplizard.com/learn/video/U4WB9p6ODjM How often the weights are updated depends on which gradient descent algorithm is being used. Can you please edit/confirm which algorithm your question is for? Also, it may be more suitable for this question to go under the linked blog above since we go into more detail about it there. Thank you! :)
@@deeplizard yes, the next "lesson" went into the detail needed for this question. Certainly, it came to mind (what I wrote) during the lesson it's attached to. Sorry for jumping the gun. Perhaps a little clarity about how often the weights are updated would suffice in that given lesson (i.e. it depends).
Kai, you're explanation is correct. One variable holds the data, which in this case is people's age. The other variable holds the labels to the corresponding data, which in this case is whether the individuals become ill or not. Thanks for chiming in!
Yes, the backprop process is detailed later in the playlist. There is a 5-part backprop mini series. Starting here: deeplizard.com/learn/video/XE3krf3CQls
These videos are great but sadly, installing keras on Windows 10 seems to be an impossibility. Nothing works. A whole day wasted (as is usually the case with anything open-source).
Machine Learning / Deep Learning Tutorials for Programmers playlist: ua-cam.com/play/PLZbbT5o_s2xq7LwI2y8_QtvuXZedL6tQU.html
Keras Machine Learning / Deep Learning Tutorial playlist: ua-cam.com/play/PLZbbT5o_s2xrwRnXk_yCPtnqqo4_u2YGL.html
Instablaster...
Honestly, your videos are a service to humanity! You have no idea how many sleepless nights I had for neural networks. Thank you so much!
I came across your comment and was just curious. I assume that you were taking a class with neural networks for a degree or certification. How have your studies been going over the past three years?
❤ - I have seen 1000s of videos on this learning process but this was one of THE best explanations. Simple & Effective. Kudos
you have cleared most of my doubts in ANN. thank you so much for these precise and clear videos
Found this channel today, I am just a beginner in neural networks and I am so glad, your explanations are so amazing and easy to understand the concepts.
This was so good! I look forward to watching your whole series on Neural Networks!
This series really moves along nicely. New concepts are introduced at just the right level of detail to get the job done. This streamlines explanations and allows rapid progression by not overloading working memory.
These videos are awesome - easy to understand. Thank you for putting these out there!
I liked, subscribed, suggested and now commented as well.
Your videos are GREAT!! Just enough depth timed at the right moment. I'm amazed by your ability to break down complicated topics.
Thanks a lot!
Thank you, Dawidi!
You are the best person in the entire world ! and your playlist will serve the humanity until the end of the time !
I am pleasantly surprised by the quality of this playlist's content - going back and forth between theory and practice is so seamless! Well done to the narrator and everyone involved in production and thank you
Binge watching your playlist, just amazing , much better than those premium courses
This is Wonderful!! You are the Best!!
This is my first comment on a youtube video, and I'm happy to tell you did is a series of kickass videos. Well done!
Crystal-clear explanation! Excellent content!
Marvelous description of the fundamentals. And I keep referring to your blog for this video for further details. Thanks.
You are doing really good job :) everything is clear, precise and presented in a simple form. Thank you
Thanks, Pawel! I'm glad you're enjoying the videos!
Of course I am :) My Engineering Thesis (in next year) includes image recognition with usage of Deep Neural Networks. I feel lucky that I found this channel. You are the best
Thank you for such a kind remark! And I hope your thesis goes well!
Sure it does :) Thanks
@@tarsala1995 Hey I know it's been years, but I just came across your comment and I was curious. How did your thesis go, and what have you been up to since graduation?
Well explained. Good work and thanks
These videos are super awesome..they are short and to the point
your tutorials are tooo good ...I've been searching for good ones for the past 3-4 days and then I stumbled upon your playlist and now I can confidently say that your tutorials are the best. Keep up the good work! :)
This was explained perfectly! Subscribed!
Always enjoy these fundamental topic videos of yours.
Thank you very much for this video! I like how you explain the meaning of the code's parameters too!
Clearly explained and easy to understand. Great Job!! Thank you very much
This is really good. Great job.
Such a great video!
Awesome vids, i have an interview on Wednesday, you're saving me big time!!
Ah, good luck! Let me know how it goes if you're willing to share! 😃
Thank You Very Much. Your Videos Helped Me To Understand The Basics.
Thank you so much for your contribution! Your videos are incredibly to the point and I feel I've learned so much in so little time. Just wanted to point out that at the weight updating I got mildly confused (and then understood everything with a glance at your blog). I believe a mere flash of the equation new w= old w - learning rate*gradient would leave no room for confusion. Thank you again!
it's amazing how clearly you explain with such simple examples! so awesome! LF to the next in the series :)
Thanks Richard!
This was so clearly explained.
Mam! You are a legend!
I was almost considering learning calculus to figure out how to calculate the new weight values...I know that doing it the "calculus" way is probably a better way but this way makes more sense to me.
Thank you so much for this!
Thanks, Vincent! I also cover the calculus in later episodes, starting with this one:
deeplizard.com/learn/video/XE3krf3CQls
@@deeplizard Good stuff, I'll definitely check it out :)
Love your teachings...mam
thanks so much i was looking for a vid like this
Great video!
First of all, thank you so much for your videos. I'm new to this but I'm learning a lot, as these tutorials are clear and concise.
I need help, here's the code:
import keras
from keras import backend as K
from keras.models import Sequential
from keras.layers import Activation
from keras.layers.core import Dense
from keras.optimizers import Adam
from keras.metrics import categorical_crossentropy
model = Sequential([
Dense(16, input_shape = (1,), activation = 'relu' ),
Dense(32, activation = 'relu'),
Dense(2, activation = 'softmax'),
])
model.compile(Adam(lr = 0.0001), loss = 'sparse_categorical_crossentropy', metrics = ['accuracy'])
model.fit(scaled_train_samples, train_labels, batch_size = 10, epochs = 20, shuffle = True, verbose = 2)
And I'm getting this error:
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
in
15
16 model.compile(Adam(lr = 0.0001), loss = 'sparse_categorical_crossentropy', metrics = ['accuracy'])
---> 17 model.fit(scaled_train_samples, train_labels, batch_size = 10, epochs = 20, shuffle = True, verbose = 2)
NameError: name 'scaled_train_samples' is not defined
Thanks
Glad you're enjoying the videos, Srikar. In regards to the error, you have not defined the scaled_train_samples variable. I used it in this video just an example to show how the fit() function works. You can see how we defined this variable and follow the full code in the Keras playlist:
ua-cam.com/play/PLZbbT5o_s2xrwRnXk_yCPtnqqo4_u2YGL.html
@@deeplizard Ha, I forgot that I actually didn't define my 2 sets. Thank you for the quick reply! :)
After watching sentdex tutorials among others your videos have really helped me put things together. thank you so much
I’ve told everyone I know who thinks Deep learning is hard to check out your tutorials and they always give great feedback. Thank you so much for these tutorials, they’re changing lives. Please, is a tutorial on RNN coming up soon - in-line with LSTM, GRU and the likes for sequences. Just curious, thank you.
Thank you, Sanni! Glad to see you're still around :D
Yes, we have RNNs, LSTMs, etc. on our list of potential topics for future videos, but not currently sure of the time line for these at the moment.
Thank you so much for your content. It is helping me a lot!
This was a really good video. Thanks for posting it!
For sure, Tim! Thanks for letting me know!
The video is part of a larger series on deep learning fundamentals. Check it out if you haven't already!
ua-cam.com/play/PLZbbT5o_s2xq7LwI2y8_QtvuXZedL6tQU.html
This is a really great video and I haven't seen any other series explain this stuff so clearly. However the step that the optimizer takes during stochastic gradient descent is in the opposite direction of the cost function gradient. The gradient points in the direction where the cost function is increasing but we're looking to minimize it.
Great videos! Thank you
thank you! I love you! you explaining very well for me
Two questions:
1. Is the algorithm attempting to minimize the loss at the end of each epoch ?
Does that mean that If I have 10 pictures of cats and dogs, the loss function only runs once all my pictures were treated ?
2. could the learning rate be a function itself ? Meaning that we may want a bit more "tuning" for the first epochs, then a bit less after ?
Hey Yohann,
1. The loss is being calculated after each batch and is then averaged at the end of each epoch. So, with your example of 10 images of cats and dogs, if your batch size was equal to 1, then the network would run through 10 batches before completing a single epoch. It will be calculating the loss for each batch for all 10 batches, and then will calculate the average loss from the 10 batches, which will be the loss for a single epoch. I have a video on batches/batch size if you need more info regarding batches: ua-cam.com/video/U4WB9p6ODjM/v-deo.html
2. Yes, when training a model, you may want to lower the learning rate as the model improves, for example. This would be a "decayed learning rate." In general, a changing learning rate during training is referred to as an "adaptive learning rate." More on the learning rate is here also: ua-cam.com/video/jWT-AX9677k/v-deo.html
Thanks a lot!
Just finished the whole playlist and I'm ready to start the one about Keras.
Thanks again for the quality of your videos !
You're welcome! And awesome job finishing the playlist! More videos will be added to this one soon, so stay tuned!
What made you decide on 16 neurons and 32 neurons for the hidden layers?
How is that decision made, is it an optimization method that we can approach for deciding?
We don't just use the (gradient*learningRate) value to update the weights, we find each weight(w) by calculating for which value of w gradient*learningRate value is coming out as 0 (finding a minima). Correct me if I am wrong. Great learning material though!
Thanks mandy !
I'm still missing why you chose Dense layers with 16 and 32 neurons. What is a reason behind that?
i want to first thank you very much about your videos it is very useful for me
i want to ask you 2 question
1)if we have for example 5 image to train and our batch is 1 i got from your previous answer that to complete one epoch we need to run the model 5 times,then we will calculate the loss function of each batch and take the average of these.
but what about the weights ?? i mean is the weight of the neuron is constant for each epoch ?
2) i have a research on video summarization by deep learning ... i want to know from your advice what is the best path ,my starting point is to use a convolution neural network(CNN) to classify image to be key frame or not ? my question is it correct to classify image to two type one desired (key frame) and other (not key frame).
Hey mahmoud - You're very welcome!
1) Yes, you're correct. The only thing to point out here is that when you say "we need to run the model 5 times," what we are actually doing is passing in a batch of data to the model 5 times. Each batch will be made up of different data. I'm not sure if you've seen the video on Batch Size in this series, but if you haven't check it out. It has examples that help illustrate this concept further:
ua-cam.com/video/U4WB9p6ODjM/v-deo.html
2) This can be achieved with a CNN. Check out the question and highest-voted answer from this stackoverflow post. It gives two approaches available for classifying data that belong to a "none of the above" class for your network. Your "not key frame" class would need to catch everything that is not a key frame.
stackoverflow.com/questions/43578715/how-best-to-deal-with-none-of-the-above-in-image-classification
Hi Mandy, around 0:59 you talk about Gradient of the loss function, can you please let us know 1. How that Loss Function look like Mathematically for the Neural Network that's displayed in the video at 0:59. 2. How does the Gradient look like Mathematically AND 3. How will overall Hypothesis for this Neural Network would look like Mathematically.
I am Self Learning and filling these small gaps/cracks takes lot of time. Thanks in Advance... Ashish
With all these fantastic videos, are there/could there be, any summary notes, free or paid for?
Regards.
Hey Piers - On deeplizard.com, we are in the process of creating text-based resources and blogs for corresponding videos. You can see this for the first few videos in the Deep Learning Fundamentals series, as well as all of the TensorFlow.js series if you browse to the site. We plan to roll this out for all videos.
Hello there,
First, Thank you so much for your great videos! it's really helpful!
I have a question: Is that right if we increase the number of epochs, we will get a better result?
In our example it was 20 so if we make it 30 or 40 wouldn't be better? and how about trying some other loss functions to our model?
Thanks again!
Keep it up!
Hey الانترنت لحياة أسهل - You're welcome! I'm glad you're enjoying the videos!
Generally, you'll want to continue training your model until the metrics flatten and stop improving. When your model is trained to the point where it will not learn further, the accuracy will stop improving. If you continue to train the model for many epochs after the performance has flattened, then you risk the possibility of running into the issue of overfitting.
In regards to your question on the loss, yes, you can try other loss functions with the model. Note that some loss functions are known to work better for different types of problems. The highest-voted answer on the stackoverflow question below has a thorough explanation for this:
datascience.stackexchange.com/questions/9850/neural-networks-which-cost-function-to-use
Hope this info helps!
Hi @2:20 you say "weights would be getting closer and closer to their optimized values." So I'll try to frame the question, so what does "optimized" mean? Optimized means such a value for weight that the predicted output is closest to actual output. How do you explain that Mathematically? What happens? Does the slope of a function change or some multiplier to some function changes such that the final value is closest to actual....
why is it showing loss?....it is supposed to show only accuracy right?...coz v gave it on metrics
{
"question": "_______ is the number of samples that will be passed through to the network at one time.",
"choices": [
"The Batch Size",
"An Epoch",
"Shuffle",
"Sequential"
],
"answer": "The Batch Size",
"creator": "Hivemind",
"creationDate": "2019-10-09T17:58:04.057Z"
}
Awesome Kevin! I added your question with some slight modifications. You may need to clear your cache to see it. 🦎🙏
Around the 4:39 mark, where model.fit parameters are discussed, they are described using terms that I can't hear clearly. I hear "empire rate". I tried Close Captioning, but the speech recognition says the parameter narrations are "numpy array" and "empire rate". Can anybody tell me what the narration is for these two parameter descriptions? Thanks!
numpy array
you mentioned having 2 hidden layers with 16 and 32 nuerons respectively , the code shows one of those , could you explain me how are there 2 hidden layers when in the 3 lines of the code one is input one is output and the one in the middle is the hidden ?
The input layer isn’t explicitly declared here, as Keras creates the input layer implicitly given the input_shape passed to the first hidden layer. Given this, the model looks like this:
input layer that accepts data of shape (1, )
hidden Dense layer1 with 16 outputs
hidden Dense layer2 with 32 outputs
output layer with 2 outputs
Does this clarify?
Hi, thank you so much for the amazing videos, very clear and easy to understand.
However, I notice that sometimes the first accuracy can be 0.0000e+00 value and it stays at the top 0.5 no matter how many epochs I run.
And sometimes the accuracy will increase/decrease like "jumping".
Could you please kindly explain why? I was running codes on Colab not Jupyter which is a TensorFlow-based google API, does it matter?
Sorry I think I might be a little confused. If we have batch=10 and one epoch, does it mean we pass all of the data we have, in chunks of 10, until we have exhausted all the data to pass? Because at the end of each epoch you have computed the loss, but is that the sum of the losses for each passing of an image through the model? and do the weights get optimised say, at once, after one epoch, or every time after one singular image is passed? Hope my questions are not too dumb!
Hi there! I followed the video to a 't' and am getting this error message "NameError: name 'scaled_train_samples' is not defined"
I have updated version of keras, numpy and all other programs on anaconda. any idea why else I would be getting this message?
Hi Austin, you have not defined the scaled_train_samples variable. I used it in this episode just as an example to show how the fit() function works. You can see how we defined this variable and follow the full code in the Keras course:
deeplizard.com/learn/playlist/PLZbbT5o_s2xrwRnXk_yCPtnqqo4_u2YGL
Thanks very much for easy explanation..
I have a model with loss from (1~20) epochs reduced from (6.1011e-04 ~ 3.3847e-04) BUT accuracy was constant (accuracy: 5.4975e-04).
does it make sense or I made a mistake?
Thanks,
Could you please provide a link to your source. Thanks
Hey Andre - The code files are available as a perk for the deeplizard hivemind. The code for this series is available at:
www.patreon.com/posts/code-for-deep-19266563
Check out the details regarding deeplizard perks and rewards at: deeplizard.com/hivemind
how many epochs would you have to run this model for to create skynet?
let us suppose all data is passed. again, all data is passed 20 more times to reduce loss by optimizing weights is what epoch mean ??
Hey ravi - Yes, the number we specify for epochs is interpreted as how many times the model will run through all the data. So, by specifying 20 epochs, we'll have the model training on the full data set for 20 iterations.
Here at 1:19 you said that we feed in one image, calculate the loss for that image's prediction and then find the gradient of the loss w.r.t to the current weight. My question is, all we have right now is just one constant value of the loss function. How can we find the derivative of a sing constant value? Isn't it supposed to be an entire function whose derivative we wanna find?
I suggest checking out the 5-part series on backpropagation that comes later in this course. It explains this process in full detail. Starting with this episode:
deeplizard.com/learn/video/XE3krf3CQls
@@deeplizard Will do. And thanks for the quick reply.
Thank you for the videos. They are clear and informative. What version of Tensorflow 1 are you using?
hello,
am asking where can i find
the data that you are working on
Hi. Thank you for the videos! I am struggling to understand how the weights work. Should the sum of the weights equal 1?
Hey Martin - You're welcome!
The weights are typically randomly generated around a normal distribution (with a mean of 0 and standard deviation of 1). They do not need to sum to 1.
This video gives details on the process of how weights are initialized: ua-cam.com/video/8krd5qKVw-Q/v-deo.html
Let me know if this helps!
Thank you. Makes a lot more sense now.
How do you know how many nodes to use in your input layer? Is it the number of features you are using to train your model?
Hey Jeffrey - The input layer is just a tensor representation of the input data. So yes, the number of nodes in the input layer is equal to the number of features in a single input sample from the training set.
I have Error, please help me !!!
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
in
----> 1 model.fit(scaled_train_samples, train_labels, batch_size=10, epochs=20, shuffle=True, verbose=2)
NameError: name 'scaled_train_samples' is not defined
that moment u realize the lizard has eyelashes
😉
is there a reason for choosing a certain number of layers and neurons? does it depend on the problem? Until this video I only understand the number of neurons of the output layer is related to the problem itself (for example the cat and dogs example)
Yes it depends on the problem. Either you read blogs and research to find out what's recommended for your problem. Or you just experiment and try different models to find out what works best. At least that's my approach as long as I don't have enough experience to construct successful models right from the start. In the end, a simpler model that gives the same result as a more complex one would usually be preferred because it would be faster to train. (Even if you probably have found out that by now, someone else who reads this in the future may find my answer helpful...)
Around 1:55, it would be helpful if you can put the mathematical equation through which you calculated updated value of "w". Great video by the way, for the first time I am getting a feeling of "WHAT it means TO LEARN!!!"
Hey Ashish - For the math explanation, check out our 5-part series on backpropagation used by networks during the training process. Part 1 of 5 starts here:
deeplizard.com/learn/video/XE3krf3CQls
Great girl you are awesome
I got confused with the calculation of the gradient. The new/optimized value of weight is given by the calculated gradient (for a certain weight) multiplied by the learning rate. But what if the loss becomes approximately zero, then the gradient is approximately zero and the new/optimized weight value also becomes zero? Please help! Thanks! :)
Btw, I think I got it already. I read it from your blog:
new weight = old weight - (learning rate * gradient)
Thank you for a clear presentation :)
Hey Joseph - Yes, you're exactly right!
epoch-How many time to repeat with same data, right?
batch size-how many data sent at once?.. why is it needed?
Explained here:
deeplizard.com/learn/video/U4WB9p6ODjM
@@deeplizard Perfect!Thanks Mandy..
I am bit surprised that the topic of gradient is mentioned as delta loss over delta weight, but in the same video I don't hear that delta loss is calculated over TWO epochs.This suggests to me the delta loss is completely and independently calculated within just one epoch. I find that confusing. I lack experience and thus understanding of what it means to calculate either "delta", "change", or "gradient" (e.g., delta loss) without using two or more values to calculate differences. If you calculate two or more values of loss, would that not require MULTIPLE epochs for the calculation? Please note I am not looking for the underlying math. I am trying to understand why the HIGH-LEVEL idea seems to be that the loss change is calculated all within the same epoch. How is that done without initially running two epochs (with arbitrarily changed weights at first?) and then calculating how the loss changes?
We can calculate the derivative of the loss with respect to a given weight within a single training iteration. We are not making use of the change of the loss over two iterations, but rather just calculating its derivative with respect to the weights in each iteration.
from keras.optimizers import Adam -->didnt work for me so I tried " from tensorflow.keras.optimizers import Adam " hope it helps someone.
Yes, now Keras is incorporated inside of the TensorFlow API. Previously, it was separate.
how is batch_size different from epochs?
Hey Raju,
The batch size is the number of samples that will be passed through to the network at one time. An epoch is an entire pass of the data through the network.
So, for example, if we have 100 samples in our training set, and we set our batch size to 5, then we will be passing in 5 samples at a time to our network during training. It will take a total of 20 batches to complete an entire epoch at 5 samples per batch (because 100 samples divided by a batch size of 5 is 20 total batches). Once we pass 20 batches, then that will complete an epoch because 20 batches of 5 each make up our entire data set.
If you haven't seen it yet, I have a video later in the playlist that covers batch size in more detail.
ua-cam.com/video/U4WB9p6ODjM/v-deo.html
Hope this helps!
Thank you so much for the explannation.
@@deeplizard How is that you pass 5 samples at a time through a network?. How is it possible?, aren't you constrained to pass 1 sample at a time? (since the inputs are the characteristics of 1 sample)
I'm working on a neural net project which is detecting cardiac arrhythmia using ECG signals and I just need someone to take a look and give me a general overall input. I would really appreciate any help as this project is very important for me.
Isn't this some sort of brute force by recurrently looping through the model and updating the weights after computing the loss of the data?
fonts too small when showing the notebooks. :-(
where do I get the Data for trainign in this video?
The Keras course:
deeplizard.com/learn/playlist/PLZbbT5o_s2xrwRnXk_yCPtnqqo4_u2YGL
This episode is where we begin creating the dataset:
deeplizard.com/learn/video/UkzhouEk6uY
You are amazing. Everything is so clear and pointy. May God bless you. Can i mail you for a little help on my undergraduate thesis work? it will be agreat help. Thanks
Glad you're enjoying the content! We don't have the availability to scale by assisting in individual projects at this time. Good luck with yours!
Neural network learns? More like "Neat video that burns"...through ignorance. Thanks again for sharing!
Thanks samter for all of these thoughtful and funny comments, they're great!
@@deeplizard You are most welcome and I'm glad that you think (or are pretending to think) they're funny! I'm really liking the series and I fully intend to complete it. Part of the reason is because I was making a program as a hobby and I implemented a very rough neural net, and I've been trying to find a much more elegant way to reimplement it. I'm doing this in Java, so I know it's a bit different, but I'm thinking I can still transfer over a basic understanding of the concepts...
Is the jupyter notebook available??? Thank you so much!
Hey XxYyZz - Download access to code files and notebooks are available as a perk for the deeplizard hivemind. Check out the details regarding deeplizard perks and rewards at: deeplizard.com/hivemind
In the previous videos, you imported the Dense type of layer from keras.layers. In this video you did it from keras.layers.core. Is there any particular reason for this?
Hey Hasan - They both point to the same place, so both import statements achieve the same thing. I didn't intend to import the Dense class differently between videos.
Thanks for your reply. I should mention that I am finding your tutorials amazing. The tutorials I find on other places are either too vague and superfluous to be helpful or too theoretical and end up being overwhelming. I really appreciate your tutorial which gives practical lessons with necessary theoretical knowledge but doesn't get bogged down with esoteric details. I feel like I can get rolling on with my project once I finish your playlist. Same can't be said of other tutorials.
So, again, thanks a bunch. You are a life saver.
Thank you, Hasan! Practicality is one of the major points we strive for when producing these videos, so I'm glad to hear you've spotted that and that you're learning. I appreciate you letting me know your thoughts!
If you've not come across our Keras series for deep learning, you may be interested in that as well. It focuses more on coding with the same practical approach used in the conceptual videos of this playlist.
ua-cam.com/play/PLZbbT5o_s2xrwRnXk_yCPtnqqo4_u2YGL.html
thanks for this amazing video again.
Only thing im asking myself is what d(weigth) actually is.
Is this just the value of the specific Weight or was the value put in a function=
d(loss)/d(weight) means "the derivative of the loss with respect to the weight."
This is covered in much more depth in the later episodes in this course on backpropagation.
The backpropagation episodes start here:
deeplizard.com/learn/video/XE3krf3CQls
Your videos are great but I have a question:
Are the weights and losses functions? I think they're supposed to be constants but I'm confused because you said
d(loss)/d(weights) * LR
The derivative of a constant is 0.
Please help
The loss is a function. The weights are not.
@@deeplizard Hmm but d(w) is 0? Making the gradient undefined....
The loss is a function of the weights. Therefore, you can take the derivative of the loss with respect to any given weight. Call the loss function by the name "f" and a given weight by the name "x". Then the derivative of the loss function f with respect to the weight x would be written as df/dx.
deeplizard thanks for clarifying!! With respect to weights...
Quite clearly I'm not doing my calculus homework
Lol later in the series, there are five episodes that go into all the details (including the calculus) for how backpropagation is working during gradient descent. It starts with this one below. Maybe that would be helpful for you :)
deeplizard.com/learn/video/XE3krf3CQls
Hello there...
First of all thank-you for your wonderful work.
I got an error here
The error says :
Name error: name 'scaled_train_samples' is not defined.
Note: I'm using this in Google colab.
Note2: i wrote everyline exactly the same.
Hi hari, you have not defined the scaled_train_samples variable. I used it in this video just an example to show how the fit() function works. You can see how we defined this variable and follow the full code in the Keras course:
deeplizard.com/learn/playlist/PLZbbT5o_s2xrwRnXk_yCPtnqqo4_u2YGL
Thank-you...
But there are many videos in there. Will you please say which video shoud i watch in particular...
Thanks in advance
deeplizard.com/learn/video/UkzhouEk6uY
cant help it, but at 3:40 I thought you said it's a very Asian optimizer. And went like "she said what now?" had to re watch 5 times before i noticed you said variation.
Sometimes my brain goes boomboom
😆
pls validate ...
{
"question": "How many data samples must be passed through the model before the weights are updated?",
"choices": [
"One",
"Batch Size",
"Total of all nodes",
"All samples from dataset"
],
"answer": "One",
"creator": "Adrian",
"creationDate": "2020-05-29T20:11:13.767Z"
}
Hey Adrian, thanks for the question! Before we post it to the site, note the explanation under the section called Mini-Batch Gradient Descent in the blog below:
deeplizard.com/learn/video/U4WB9p6ODjM
How often the weights are updated depends on which gradient descent algorithm is being used. Can you please edit/confirm which algorithm your question is for? Also, it may be more suitable for this question to go under the linked blog above since we go into more detail about it there. Thank you! :)
@@deeplizard yes, the next "lesson" went into the detail needed for this question. Certainly, it came to mind (what I wrote) during the lesson it's attached to. Sorry for jumping the gun. Perhaps a little clarity about how often the weights are updated would suffice in that given lesson (i.e. it depends).
No worries, thank you :)
I just added your question to deeplizard.com/learn/video/U4WB9p6ODjM
is this the partial derivative?
where can i find the data set?
Hey Aashal - You can see how to generate the the data set used in this video here: ua-cam.com/video/UkzhouEk6uY/v-deo.html
Why do you need two different parameters to hold all of your data, wouldn't one just work? Why do you need two? This is at 4:50.
Kai, you're explanation is correct. One variable holds the data, which in this case is people's age. The other variable holds the labels to the corresponding data, which in this case is whether the individuals become ill or not. Thanks for chiming in!
you're not doing the back propagation from the output
Yes, the backprop process is detailed later in the playlist. There is a 5-part backprop mini series. Starting here: deeplizard.com/learn/video/XE3krf3CQls
These videos are great but sadly, installing keras on Windows 10 seems to be an impossibility. Nothing works. A whole day wasted (as is usually the case with anything open-source).
Keras is now part of the TensorFlow library. Check out the TensorFlow course here:
ua-cam.com/play/PLZbbT5o_s2xrwRnXk_yCPtnqqo4_u2YGL.html
Those who disliked the video must have some sought of lizard-phobia or something, not actually for the content.
1:55
very clear voice, good video, just guess the girl taking is definitely beautiful!