Batch Size and Batch Normalization in Neural Networks and Deep Learning with Keras and TensorFlow

Поділитися
Вставка
  • Опубліковано 16 вер 2024

КОМЕНТАРІ • 6

  • @NicolaiAI
    @NicolaiAI  3 роки тому +1

    Join My AI Career Program
    www.nicolai-nielsen.com/aicareer
    Enroll in My School and Technical Courses
    www.nicos-school.com

  • @adithyaajith1751
    @adithyaajith1751 Рік тому +1

    Batch is the number of samples used in one step of the training, i.e, calculating the weights, and an epoch is done when the model goes through all the samples, i.e, the dataset. Correct me if I am wrong.

    • @NicolaiAI
      @NicolaiAI  Рік тому

      Yeah that’s correct. The weights are updated after each batch

  • @hoaxuan7074
    @hoaxuan7074 3 роки тому

    You can swap around what is adjusted in a neural net. You can use fixed dot products and adjustable (parametric) activation functions like fi(x)=ai.x x=0, i=0 to m.
    Fast transforms like the FFT or fast Hadamard transform can be viewed as collections of fixed dot products.
    Such a net then is: transform, functions, transform, functions, ... , transform.
    To stop the first transform from taking a spectrum of the input data you can apply a fixed randomly chosen pattern of sign flips to the input to the net. Or a sub-random pattern.
    The cost per layer then with the fast Hadamard transform is nlog2(n) add subtract operations and n multiplies using 2n parameters where n is the width of the net.
    How can that even work? Each dot product is a statistical summary measure and filter looking at all the neurons in the prior layer. Each dot product responds to the statistical patterns it sees and then has its response modulated by its own adjustable activation function.

  • @hoaxuan7074
    @hoaxuan7074 3 роки тому

    That batching even works suggests that the training algorithm only searches for statistical solutions where no one neuron can be exceptional. Neurons then must work in (diffuse) statistical groups that are more resistant to damage when moving between batches. The are some other things that would suggest that too.

  • @hoaxuan7074
    @hoaxuan7074 3 роки тому

    You know that ReLU is a switch. f(x)=x is connect, f(x)=0 is disconnect. A light switch in your house is binary on off yet connects and disconnects a continuously variable AC voltage signal.
    The dot product of a number of (switched) dot products is still a dot product. That is you can simplify back down to a simple dot product. When all the switch states in a ReLU net become known it collapses to a simple matrix (bunch of dot products with the input vector.)