maybe not a good idea. you still need to know the details if you want to fully understand. Some important details are skipped. but too much about Q, K, V with library example.
Can you explain… but of course we can it is simple it is AI! You know when I watch your videos, you do such an amazing job of explaining things it does almost feel simple. I learn so much from every video you produce!
The library example is super helpful, great explanation, thanks a lot!
You're very welcome!
maybe not a good idea. you still need to know the details if you want to fully understand. Some important details are skipped. but too much about Q, K, V with library example.
Don't forget about LongRoPE! validated up to 2 million token context length!
where is the proof for permuation invariance? like combined "correctly" for rescaling. what does correctly and rescaling mean?
this needs some background from flash attention paper.
Can you explain… but of course we can it is simple it is AI!
You know when I watch your videos, you do such an amazing job of explaining things it does almost feel simple. I learn so much from every video you produce!