How is the dimension reduction for the softmax done? I only found videos describing softmax as a probability normalization step, keeping all the dimensions. So if the final step had a 9-dimensional vector, how would you decrease it to 3 using softmax? EDIT: Ok, from other videos I conclude that it's just a fully connected layer (FC) reduction step before the normalization.
ca someone help me . what Prof Andrew mean at 4:05 when he is saying the following: "One of the trick that could speed up training is we just pre-compute that layer, the features of re-activations from that layer and just save them to disk. What you're doing is using this fixed function, in this first part of the neural network, to take this input any image X and compute some feature vector for it and then you're training a shallow softmax model from this feature vector to make a prediction. One step that could help your computation as you just precompute that layers activation, for all the examples in training sets and save them to disk and then just train the softmax clause right on top of that. The advantage of the save to disk or the pre-compute method or the save to disk is that you don't need to recompute those activations every time you take a epoch or take a post through a training set. This is what you do if you have a pretty small training set for your task."
Hi Habib. I´m a beginner in DL but hope I can help. I think is easier save your feature vector of all the training set and compile them as a new data and then training only with the predicted layer (softmax function) like a perceptron. Instead train with the whole NN every epoch. If I´m wrong let me know :)
I think what Prof. Andrew means is if we only consider the subset of frozen layers, then, theoretically, the relationship between inputs and outputs is a deterministic mapping. Therefore, if we can find a function to represent this mapping (which ideally should less complicated than the original layers), we don’t need to let the data flow through those frozen layers but just use the pre-calculated function.
Hi everyone. I think I'm 5 years late but it's ok if someone found this comment and don't understand the concept Prof. Andrew Ng is talking about, then I will be happy to explain what he meant or what my point of view as he was talking about. So first of all you have to freeze all the layers except the output layer i.e. Softmax. 1. Take the frozen layers and do calculation with it like input the image X and save the output for that image from the last frozen layer as numpy array. 2. Do for all images you have and save the calculated result from last frozen layer as numpy. 3. Now you have a numpy array which contain some output values corresponding to the input images. 4. Use this array as input now to train the softamx model in epochs... 5. So in this way you have a pre-computed array which you can use to train the model and can save computation cost. Hope you (Reader of the comment) will understand Thanks
Is it helpful if we use frozen weights of one trained model for training a different dataset with help of transfer learning. Example: i have trained model of cat, dog, car, truck. and i will use frozen weights of this model to train a different dataset i.e. person, bicycle, ball etc Will this make any difference in resluts
I'm far from an expert, but my impression is that the early stages of a NN are primitives that the training process matches to the training data primitives; eg, the early stages of an image recognition NN are edges and other shapes that match the training data. Look at the application to see if was trained on data that matches your application data - the closer the better.
@@bobcrunch The NN is trained on IMAGENET www.image-net.org/ which has a dataset in the millions, is one of the most widely used and has 1000 classes of the most general categories usually shown in images.
How is the dimension reduction for the softmax done? I only found videos describing softmax as a probability normalization step, keeping all the dimensions. So if the final step had a 9-dimensional vector, how would you decrease it to 3 using softmax?
EDIT: Ok, from other videos I conclude that it's just a fully connected layer (FC) reduction step before the normalization.
thank you very much for the edit !
ca someone help me . what Prof Andrew mean at 4:05 when he is saying the following:
"One of the trick that could speed up training is we just pre-compute that
layer, the features of re-activations from that layer and just save them to disk. What you're doing is using this fixed function, in this first part of the neural network, to take this input any image X and compute some feature vector for it and then you're training a shallow softmax model from this feature vector to make a prediction. One step that could help your computation as you just precompute that layers activation, for all the examples in training sets and save them to disk and then just train the softmax clause right on top of that. The advantage of the save to disk or the pre-compute method or the save to disk is that you don't need to recompute those activations every time you take a epoch or take a post through a training set. This is what you do if you have a pretty small training set for your task."
Hi Habib. I´m a beginner in DL but hope I can help. I think is easier save your feature vector of all the training set and compile them as a new data and then training only with the predicted layer (softmax function) like a perceptron. Instead train with the whole NN every epoch. If I´m wrong let me know :)
I think what Prof. Andrew means is if we only consider the subset of frozen layers, then, theoretically, the relationship between inputs and outputs is a deterministic mapping. Therefore, if we can find a function to represent this mapping (which ideally should less complicated than the original layers), we don’t need to let the data flow through those frozen layers but just use the pre-calculated function.
Hi everyone. I think I'm 5 years late but it's ok if someone found this comment and don't understand the concept Prof. Andrew Ng is talking about, then I will be happy to explain what he meant or what my point of view as he was talking about.
So first of all you have to freeze all the layers except the output layer i.e. Softmax.
1. Take the frozen layers and do calculation with it like input the image X and save the output for that image from the last frozen layer as numpy array.
2. Do for all images you have and save the calculated result from last frozen layer as numpy.
3. Now you have a numpy array which contain some output values corresponding to the input images.
4. Use this array as input now to train the softamx model in epochs...
5. So in this way you have a pre-computed array which you can use to train the model and can save computation cost.
Hope you (Reader of the comment) will understand
Thanks
Is it helpful if we use frozen weights of one trained model for training a different dataset with help of transfer learning.
Example:
i have trained model of cat, dog, car, truck.
and i will use frozen weights of this model to train a different dataset i.e. person, bicycle, ball etc
Will this make any difference in resluts
I'm far from an expert, but my impression is that the early stages of a NN are primitives that the training process matches to the training data primitives; eg, the early stages of an image recognition NN are edges and other shapes that match the training data. Look at the application to see if was trained on data that matches your application data - the closer the better.
@@bobcrunch The NN is trained on IMAGENET www.image-net.org/ which has a dataset in the millions, is one of the most widely used and has 1000 classes of the most general categories usually shown in images.
does octave has any support for pretrained neuraln networks like matlab
Thanks!!
in transfer learning where would i use/apply the precomputed/learned weights ?
Treat them as initialization
just a funny observation...misty looks like a Dracula !!!
Hahahahahahahhaahhahahah best comment
Scary movie..p
What's up Nirma fam!?