For LLM's, context window is for input tokens... There is normally, a LLM setting, called "maxLength" or something similar, that controls the maximum number of tokens, will be generated for a response... Thanks for feedback and question ....
@@NewMachina i’m going to disagree here. I believe the context window typically includes both the LLM input and output, especially in a chat session like your examples. This is in most cases primarily how the LLM knows what it said before.
@paulparker You are right... I was going through some documentation that ambigious about this ... and assumed it didn't include input.. I have found quite serveral other documents aligned with context window including both input and output. Thanks for helping clarify this ...
The question on tooling is a good question. In my personal case, I don’t know enough here to know what tool I would prefer to use: my inclinations would be VS Code, and/or notebooks, but I don’t really understand Jupiter notebooks to be honest having never used them. I believe Colab and the like use notebooks?
I am likely going to be showing examples just running in VSC, and maybe some in AWS Cloud using Lambda's and will likely do a simple one with Jupyter Notebooks to see how viewers like it ... thanks for providing your feedback on this....
@paulparker Check out the frameworks LangChain and LlamaIndex.. I think these two open source frameworks will continue to get more traction... I am working on some videos in this area next ... I would interested if you have an opinion or thoughts on this frameworks ... not urgent, I suspect your are busy as we all are... but if you get a chance to check these out, let me know what you think ...
@@NewMachina I thought that there was a different successor to Laingchain, and llama index doesn’t sound right. But I have not had time to mess with doing any of this myself.
@paulparker Are you maybe thinking about LangGraph or LangServe? Looks like there are some additional extensions to LangChain... some are driven by LangChain while others are from other teams.... Still getting a sense of all of these...
Can you substantiate the claim that LLM providers do this primarily to make the models cheaper to run? I ask this because my understanding is that this is actually how the models work and have worked since the initial research. So it seems incorrect to say that this is an optimization chosen for performance at scale.
Thanks for reaching out with your question... Can I get a quick clarification ... In the video "What is the LLM's Context Window", are you talking about the line "While larger 'context windows', improve the LLM’s performance, on longer text blocks, they also demand, more computational resources" ... I wanted to make sure I was following up on the the same part of the video you were were inquiring about ...
@@NewMachina no, I think that was towards the beginning of the video, whereas I think what I am remember was towards the end. Yes, currently larger context windows require quadratically, more computation. However, there is a new approach that just came out for infinite context, windows. We will have to see if that is any good. Most errors in this post come from Siri’s broken dictation.
Then i have another question what's this concept Say for eg Annie loves jam but she hates bread and she also loves fruits So if i say context window is 2 So i take 2 words towards left and 2 words towards right as input So for eg Annie loves but she is the input and jam is the output My second question is what's the difference bw context length and context windows to me what ever you explained sounded like context length rather then window so please help me to clarify
Yes, context window, is measured in tokens. If Context Window is 2, then you could get one token and one token out. For second question, I should have been consistent, and used Context Window throughtout... for this topic, context length is the same as the Context window and is measured in tokens.
@@NewMachina but sir what I have studied is context window and length are different contex window is the small amount window where your focus is but it's being interchanged very often.
I like that this is high-level. Perfect for those of us dabbling with various platforms and don’t want just another low level tutorial.
Thanks for your feedback... follow along with me as I go high-level and then 1 layer down to de-mystify these topics... appreciate you sharing ...
Cristal clear explanation. Thanks! more please :)
You got it! Working to make each video better and better….
very nice video and easy to understand sir excellent
Thank you for the feedback… appreciate it.
thx bro clear and nice infos
thanks for feedback....
Clear and nice! Exactly what answers I was looking for
Now I have to somehow evaluate how many tokens I am passing to a model through the Ollama
Glad it helped... hoping to with Ollama soon ... this space is evolving so quickly....
Perfect
Best explanation !
Glad it was helpful! Trying to determine better with each video …. Thanks for feedback…
Great content!
Thank you for your feedback… trying to get better with each video…. 🙏
thanks.
Glad you liked it…. Working to get better with each video… let me know if you have any ideas for videos …🙏
Thanks for the explanation. Does it mean that the context window is common or saperate for both input and output?
For LLM's, context window is for input tokens... There is normally, a LLM setting, called "maxLength" or something similar, that controls the maximum number of tokens, will be generated for a response... Thanks for feedback and question ....
@@NewMachina i’m going to disagree here. I believe the context window typically includes both the LLM input and output, especially in a chat session like your examples. This is in most cases primarily how the LLM knows what it said before.
@paulparker You are right... I was going through some documentation that ambigious about this ... and assumed it didn't include input.. I have found quite serveral other documents aligned with context window including both input and output. Thanks for helping clarify this ...
@@NewMachina you’re welcome!
The question on tooling is a good question. In my personal case, I don’t know enough here to know what tool I would prefer to use: my inclinations would be VS Code, and/or notebooks, but I don’t really understand Jupiter notebooks to be honest having never used them. I believe Colab and the like use notebooks?
I am likely going to be showing examples just running in VSC, and maybe some in AWS Cloud using Lambda's and will likely do a simple one with Jupyter Notebooks to see how viewers like it ... thanks for providing your feedback on this....
@paulparker Check out the frameworks LangChain and LlamaIndex.. I think these two open source frameworks will continue to get more traction... I am working on some videos in this area next ... I would interested if you have an opinion or thoughts on this frameworks ... not urgent, I suspect your are busy as we all are... but if you get a chance to check these out, let me know what you think ...
@@NewMachina I thought that there was a different successor to Laingchain, and llama index doesn’t sound right. But I have not had time to mess with doing any of this myself.
@paulparker Are you maybe thinking about LangGraph or LangServe? Looks like there are some additional extensions to LangChain... some are driven by LangChain while others are from other teams.... Still getting a sense of all of these...
Can you substantiate the claim that LLM providers do this primarily to make the models cheaper to run? I ask this because my understanding is that this is actually how the models work and have worked since the initial research. So it seems incorrect to say that this is an optimization chosen for performance at scale.
Thanks for reaching out with your question... Can I get a quick clarification ... In the video "What is the LLM's Context Window", are you talking about the line "While larger 'context windows', improve the LLM’s performance, on longer text blocks, they also demand, more computational resources" ... I wanted to make sure I was following up on the the same part of the video you were were inquiring about ...
@@NewMachina no, I think that was towards the beginning of the video, whereas I think what I am remember was towards the end.
Yes, currently larger context windows require quadratically, more computation. However, there is a new approach that just came out for infinite context, windows. We will have to see if that is any good. Most errors in this post come from Siri’s broken dictation.
Ok, I will look into that... if you have a reference to this approach on infinite context please share.... New stuff happening quickly ...
Then i have another question what's this concept
Say for eg Annie loves jam but she hates bread and she also loves fruits
So if i say context window is 2
So i take 2 words towards left and 2 words towards right as input
So for eg Annie loves but she is the input and jam is the output
My second question is what's the difference bw context length and context windows to me what ever you explained sounded like context length rather then window so please help me to clarify
Yes, context window, is measured in tokens. If Context Window is 2, then you could get one token and one token out.
For second question, I should have been consistent, and used Context Window throughtout... for this topic, context length is the same as the Context window and is measured in tokens.
Thanks for taking the time in asking me these questions ....
@@NewMachina but sir what I have studied is context window and length are different contex window is the small amount window where your focus is but it's being interchanged very often.
Ahh.. I see what you are saying... I will try to be more precise with my terminology as well... thank you for sharing..