Fine-tuning on Wikipedia Datasets

How to pick LoRA fine-tuning parameters?

Sequence-to-Sequence (seq2seq) Encoder-Decoder Neural Networks, Clearly Explained!!!

ДАЖЕ победителю СТАЛО СТРАШНО от того, ЧТО он СДЕЛАЛ с проигравшим #shorts

«Я зрозумів, що ми потрібні» Військовий про перші враження після звільнення з полону РФ

Passat CC на 300 л.с. Начало проекта!

Transformers from Scratch - Part #2

Trelis Research

Переглядів 406

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 31 тра 2024
➡️ Learn more about Trelis Resources at Trelis.com/About
TIMESTAMPS
0:00 Welcome and Link to Colab Notebook
3:20 Encoder versus Decoder Architectures
8:34 What is the GPT-4o architecture?
10:37 Recap of transformer for weather prediction
15:30 Pre layer norm versus post layer norm
19:42 RoPE vs Sinusoidal Positional Embeddings
26:00 Dummy Data Generation
26:40 Transformer Architecture Initialisation
30:48 Forward pass test
32:40 Training loop setup and test on dummy data
39:10 Weather data import
45:40 Training and Results Visualisation
47:20 Can the model predict the weather?
51:32 Is volatility in the loss graph a problem?
53:50 How to improve the model further?
Наука та технологія

КОМЕНТАРІ • 3

@loicbaconnier9150 15 днів тому
pre norm against post norm differ only for the first attention layer no ? So if the embedding vectors and positionnal added embedding are normalized it must be the same, no ?
@loicbaconnier9150 15 днів тому
I correct myself it' s not the same, in one case we only normalize the shift (post feed Forward layer) keeping the previous vector, in the other we normalize the added vectors. Why do we not normalized the shift first and the added vectors after ?
@TrelisResearch 14 днів тому ⁺¹
yeah, that was my original question too!
But, the difference is as per your follow-on comment, in pre-norm we normalise only the shift - as you point out.
I suppose you could additionally normalise the sum, although I guess that is empirically not worth the added step (most of the benefit is there from doing the shift normalisation).

Наступне

Автоматичне відтворення

Fine-tuning on Wikipedia Datasets

Fine-tuning on Wikipedia Datasets

How to pick LoRA fine-tuning parameters?

How to pick LoRA fine-tuning parameters?

Sequence-to-Sequence (seq2seq) Encoder-Decoder Neural Networks, Clearly Explained!!!

Sequence-to-Sequence (seq2seq) Encoder-Decoder Neural Networks, Clearly Explained!!!

ДАЖЕ победителю СТАЛО СТРАШНО от того, ЧТО он СДЕЛАЛ с проигравшим #shorts

ДАЖЕ победителю СТАЛО СТРАШНО от того, ЧТО он СДЕЛАЛ с проигравшим #shorts

«Я зрозумів, що ми потрібні» Військовий про перші враження після звільнення з полону РФ

«Я зрозумів, що ми потрібні» Військовий про перші враження після звільнення з полону РФ

Passat CC на 300 л.с. Начало проекта!

Passat CC на 300 л.с. Начало проекта!

Кто Последний Выйдет из Комнаты? (Бустер, Сатир, Кукояка, Парадеич, Карамбейби, Кокошка, Сквозьбаб)

Кто Последний Выйдет из Комнаты? (Бустер, Сатир, Кукояка, Парадеич, Карамбейби, Кокошка, Сквозьбаб)

Mamba Language Model Simplified In JUST 5 MINUTES!

Mamba Language Model Simplified In JUST 5 MINUTES!

Train & Serve Custom Multi-modal Models - IDEFICS 2 + LLaVA Llama 3

Train & Serve Custom Multi-modal Models - IDEFICS 2 + LLaVA Llama 3

Vision Transformer Quick Guide - Theory and Code in (almost) 15 min

Vision Transformer Quick Guide - Theory and Code in (almost) 15 min

Improved Retrieval Augmented Generation with ALL-SORT

Improved Retrieval Augmented Generation with ALL-SORT

What is Retrieval-Augmented Generation (RAG)?

What is Retrieval-Augmented Generation (RAG)?

Which transformer architecture is best? Encoder-only vs Encoder-decoder vs Decoder-only models

Which transformer architecture is best? Encoder-only vs Encoder-decoder vs Decoder-only models

IDEFICS 2 API Endpoint, vLLM vs TGI, and General Fine-tuning tips

IDEFICS 2 API Endpoint, vLLM vs TGI, and General Fine-tuning tips

Very Few Parameter Fine tuning with ReFT and LoRA

Very Few Parameter Fine tuning with ReFT and LoRA

iphone fold ? #spongebob #spongebobsquarepants

iphone fold ? #spongebob #spongebobsquarepants

Xiaomi Note 13 Pro по безумной цене в России

Xiaomi Note 13 Pro по безумной цене в России

3D printed Nintendo Switch Game Carousel

3D printed Nintendo Switch Game Carousel

Nokia 3310 versus Red Hot Ball

Nokia 3310 versus Red Hot Ball

Как я сделал домашний кинотеатр

Как я сделал домашний кинотеатр

MacBook тянет ВСЕ ИГРЫ на высоких? #пк #игры #гейминг #сборкапк #игровойпк #apple #macbook #pc

MacBook тянет ВСЕ ИГРЫ на высоких? #пк #игры #гейминг #сборкапк #игровойпк #apple #macbook #pc

ПК с Авито за 3000р

ПК с Авито за 3000р