Hey, great video! What is the rationale behind adding the global timestamp embeddings as opposed to concatenating them? I can imagine if you create an embedding of all those information and add it as an additional dimension to the time-series it could even better help the transformer to calculate its attention based on those events
Thanks for watching! One thing I could think of is that this can add many dimensions internally that makes the forward pass time consuming and space consuming. since now we are dealing with not just 512 dimensional vectors, but 512 * 3 dimensional vectors.
I think the answers to the quiz are CAB but I'm a little unsure of the middle one as while you mentioned kernel size I don't remember you mentioning number of kernels, so I'm not sure if these are synonyms or if there is a number of kernels of a certain size.
When you highlight 4 or 6 numbers, what do you use to calculate the resulting values? Love your video but this one is escaping me because you brush over it with 1 statement. You say is the sum of products plus the kernel. So, can you please tell me the math behind vector 1 (lets say this) and vector 2 (move) that produces 0.11
At 8.31 how come you use text in the image when this example is time series not a Large language model? when you're done are you going to show us end to end you feeding data into the model and then using it to forecast and test its accuracy?
You doing amazing work man. Thank you.
Great content! Thanks!
What is the whiteboarding tool you use for showcasing the tensors? (Located at 3:51)
Hey, great video! What is the rationale behind adding the global timestamp embeddings as opposed to concatenating them? I can imagine if you create an embedding of all those information and add it as an additional dimension to the time-series it could even better help the transformer to calculate its attention based on those events
Thanks for watching! One thing I could think of is that this can add many dimensions internally that makes the forward pass time consuming and space consuming. since now we are dealing with not just 512 dimensional vectors, but 512 * 3 dimensional vectors.
I think the answers to the quiz are CAB but I'm a little unsure of the middle one as while you mentioned kernel size I don't remember you mentioning number of kernels, so I'm not sure if these are synonyms or if there is a number of kernels of a certain size.
When you highlight 4 or 6 numbers, what do you use to calculate the resulting values? Love your video but this one is escaping me because you brush over it with 1 statement. You say is the sum of products plus the kernel. So, can you please tell me the math behind vector 1 (lets say this) and vector 2 (move) that produces 0.11
At 8.31 how come you use text in the image when this example is time series not a Large language model? when you're done are you going to show us end to end you feeding data into the model and then using it to forecast and test its accuracy?
Uhhh... Is a negative score allowed?
B