Yeah i still don't trust zuck but good on him I would have rather paid 12$ to keep my privacy so you ain't gonna fool me again. Also Darpa program lifelong he was handed it. Even Elon scares me. The outcome doesn't look good we got to flip the tables before it's too late.
The reasons why the technical reports are the most cited is because everytime you use the models in your own research, you reference the technical report. So with 23k published papers, of course the technical reports will be at top
you need to either divide citations by the time it has been out or make a graph showing citations over time where the day each paper is released is shifted to the same place on the x axis. then you would be able to see which papers grew the fastest.
just barely missed meta's new paper which seems it'll change stuff in the next year alot. (byte latent transformer) also i'm very surprised nGPT isn't here.
released 1 day ago Meta's new Byte Latent Transformer (BLT) model outperforms tokenization models, up to their tested 8B param size. The canon previously was that it won't be possible to make byte-level models stable, or make them converge in training. their main claim is: "For fixed inference costs, BLT shows significantly better scaling than tokenization-based models" Traditionally, LLMs use tokenization - breaking text into predefined chunks (tokens) using fixed vocabularies this works directly with bytes (dynamic patching) (rigid, fixed-size tokens) -> (dynamically segments text into patches based on byte entropy) Byte Entropy: A measure of information complexity that determines how much computational resources should be allocated to different text segments. (higher entropy indicates more unpredictable or complex data segments) [Instead of treating all text in the same way] we changed: tokenization - breaking text into predefined chunks using fixed vocabulary into: Byte Latent Transformer - working directly with raw bytes (dynamic patching) we got: Improved performance on reasoning tasks Enhanced long-tail generalization Superior character-level understanding quote: BLT architecture trends between Llama 2 and 3 when using significantly larger patch sizes. The bpe tokenizers of Llama 2 and 3 have an average token size of 3.7 and 4.4 bytes. In contrast, BLT can achieve similar scaling trends with an average patch size of 6 and even 8 bytes. Inference flop are inversely proportional to the average patch size, so using a patch size of 8 bytes would lead to nearly 50% inference flop savings. Models with larger patch sizes also seem to perform better as we scale model and data size. BLT with patch size of 8 starts at a significantly worse point compared to bpe Llama 2 at 1B but ends up better than bpe at 7B scale. This suggests that such patch sizes might perform better at even larger scales and possibly that even larger ones could be feasible as model size and training compute grow.
very interesting.... i wish to know what the future of the ai llm space is going to be, we know that scaling transformers are giving diminishing returns, as seem by top ai labs like open ai, meta, google etc... so i wonder which of these techniques would it be that will be the next big thing that we scale to go further.... will it be mamba... or KAN or maybe diffusion LMs,... who knows, only time will tell...
@@2034-SWE if we consider scaling transformers only then yes, diminishing returns. The latest advancement is based on reasoning capabilities, not because of even more compute. The transformer architecture has almost reached its limit with regards to scaling and performance benefits. Not saying it wont be overcome or we switch architectures but this is the current state
@@yannickm5429 yes exactly, the transformer architecture pleatued, so now everyone it looking for the next big thing, like open ai did with o1, they claim that Large reasoning models are the next big thing but if we look at the results of the latest o1 paper then these reasoning models dont seem to scale well... like for example in some cases o1 preview gives better results than o1 full, so maybe this architecture is not all about scale.... we will see... we also have to see if these reasoning models are actually that good to begin with, that open ai claims aswell, like yes they are better but they are still sometimes only as good at other llms like for example claude 3.5 sonnet (new), its just an llm yet its on the same level as o1, so maybe LRMs are not that big of a deal and we need a truly novel architecture from from the ground up ... like ilya sutskever said, the age of scaling transformers is over, now we need to find a preplacement for pretraining itself... lets see...
Do any papers from November (or December at this point) even have any citations yet? I mean, someone has to read the paper and then write and publish a paper of their own for a citation to exist... how much can a paper be worth if it was farted out in less than a month?
Hey, that website is great it has a lot of scientific papers, although it seems to be addressed to engineering and technology, I can't find a lot about micro biology
I just found a paper from Meta AI about Large Concept Models. I'm still a layman but it sounded very promising for coherence and energy consumption. So far it works with text-to-concept and speach-to-concept encoders and a concept-to-text decoder, but I think it could work with other modalities (e.g. video) too, if you make encoders/decoders for that. I can't explain it. Just read it for yourself
Pretty clear that transformers dominated this year. I'm curious to see the most cited in other fields like diffusion, or RL. After all, the biggest breakthrough usually come where not everyone is looking.
It's a shame that the Apple paper demonstrating what we experts knew, that LLMs don't reason, isn't on the list. People don't like the truth. Ah, I see that you did give a monthly...but that you don't understand its impact. LLMs don't reason. They just look up answers, one token at a time.
plz look into metas AI papers, one that's about BLT (Bit Latent Transformer or in the lines of that) and COCONUT (Chain of continuous thought). Please.
Check out HubSpot's FREE AI Prompt Library Now! clickhubspot.com/h14h
no
Meta doing "Open"AI's job is still kinda surprising to me, lol
Are you insinuating that Saint Zuckerberg is otherwise untrustworthy?!
Shouldn’t OpenAI be renamed to closedai 😂
@@Gamatoto2038 Bang!
Yeah i still don't trust zuck but good on him I would have rather paid 12$ to keep my privacy so you ain't gonna fool me again. Also Darpa program lifelong he was handed it. Even Elon scares me. The outcome doesn't look good we got to flip the tables before it's too late.
The reasons why the technical reports are the most cited is because everytime you use the models in your own research, you reference the technical report. So with 23k published papers, of course the technical reports will be at top
That’s something new I learned today
you need to either divide citations by the time it has been out or make a graph showing citations over time where the day each paper is released is shifted to the same place on the x axis. then you would be able to see which papers grew the fastest.
Came here to make this comment 👏
yeah so ranking for growth rate for number of citations over time rather than absolute citation count
An example of shifted curves is available on the github star history website which allows comparing repositories
ByCloud with the amazing AI analysis videos..can’t wait what’s in store for your channel and AI as a whole in 2025
just barely missed meta's new paper which seems it'll change stuff in the next year alot. (byte latent transformer) also i'm very surprised nGPT isn't here.
can you give me a summary of it
released 1 day ago
Meta's new Byte Latent Transformer (BLT) model outperforms tokenization models, up to their tested 8B param size.
The canon previously was that it won't be possible to make byte-level models stable, or make them converge in training.
their main claim is: "For fixed inference costs, BLT shows significantly better scaling than tokenization-based models"
Traditionally, LLMs use tokenization - breaking text into predefined chunks (tokens) using fixed vocabularies
this works directly with bytes (dynamic patching)
(rigid, fixed-size tokens) -> (dynamically segments text into patches based on byte entropy)
Byte Entropy: A measure of information complexity that determines how much computational resources should be allocated to different text segments.
(higher entropy indicates more unpredictable or complex data segments)
[Instead of treating all text in the same way]
we changed:
tokenization - breaking text into predefined chunks using fixed vocabulary
into:
Byte Latent Transformer - working directly with raw bytes (dynamic patching)
we got:
Improved performance on reasoning tasks
Enhanced long-tail generalization
Superior character-level understanding
quote:
BLT architecture trends between Llama 2 and 3 when using significantly larger patch sizes. The
bpe tokenizers of Llama 2 and 3 have an average token size of 3.7 and 4.4 bytes. In contrast, BLT can
achieve similar scaling trends with an average patch size of 6 and even 8 bytes. Inference flop are inversely
proportional to the average patch size, so using a patch size of 8 bytes would lead to nearly 50% inference
flop savings. Models with larger patch sizes also seem to perform better as we scale model and data size.
BLT with patch size of 8 starts at a significantly worse point compared to bpe Llama 2 at 1B but ends up
better than bpe at 7B scale. This suggests that such patch sizes might perform better at even larger scales
and possibly that even larger ones could be feasible as model size and training compute grow.
@@XenoCrimson-uv8uz Basically gets rid of tokenizers and interprets the input's bits directly
@@CantoTheDegenerate666 except even while processing byte by byte the model tends to invent some kind of morphemes by itself
@@CantoTheDegenerate666 So it's like a tokenizer but with a token for each individual character?
Thank you. I have been learning about LLMs in general. This video helped me alot!
9:55 these are distrubition graphs so its showing that there is variance in the accuracy rather than showing that the accuracy is deteriorating
thanks, I'll get started now.
Can you cover Meta's Byte Latend Transformer and Coconut (Training Models to Reason in a Continuous Latent Space)?
I wonder in how many papers ChatGPT is a ghostwriter author...
I hope you make a video on Byte Latent Transformers and Large Concept Models, both from Meta (THE GOAT). These two imo are complete gamechangers!
very interesting.... i wish to know what the future of the ai llm space is going to be, we know that scaling transformers are giving diminishing returns, as seem by top ai labs like open ai, meta, google etc... so i wonder which of these techniques would it be that will be the next big thing that we scale to go further.... will it be mamba... or KAN or maybe diffusion LMs,... who knows, only time will tell...
Diminishing returns? OpenAI?
@@2034-SWE if we consider scaling transformers only then yes, diminishing returns. The latest advancement is based on reasoning capabilities, not because of even more compute. The transformer architecture has almost reached its limit with regards to scaling and performance benefits. Not saying it wont be overcome or we switch architectures but this is the current state
@@yannickm5429 yes exactly, the transformer architecture pleatued, so now everyone it looking for the next big thing, like open ai did with o1, they claim that Large reasoning models are the next big thing but if we look at the results of the latest o1 paper then these reasoning models dont seem to scale well... like for example in some cases o1 preview gives better results than o1 full, so maybe this architecture is not all about scale.... we will see... we also have to see if these reasoning models are actually that good to begin with, that open ai claims aswell, like yes they are better but they are still sometimes only as good at other llms like for example claude 3.5 sonnet (new), its just an llm yet its on the same level as o1, so maybe LRMs are not that big of a deal and we need a truly novel architecture from from the ground up ... like ilya sutskever said, the age of scaling transformers is over, now we need to find a preplacement for pretraining itself... lets see...
yes so now we scale test time compute instead @@2034-SWE
how did you sort the papers by citation on arXiv?
How to sort these papers by citation numbers?
do you think a llama 3.3 7b model will be released?
Do any papers from November (or December at this point) even have any citations yet? I mean, someone has to read the paper and then write and publish a paper of their own for a citation to exist... how much can a paper be worth if it was farted out in less than a month?
Hey, that website is great it has a lot of scientific papers, although it seems to be addressed to engineering and technology, I can't find a lot about micro biology
Where are the weekly posted banger researchs in the community tab though ? I miss them
I just found a paper from Meta AI about Large Concept Models.
I'm still a layman but it sounded very promising for coherence and energy consumption.
So far it works with text-to-concept and speach-to-concept encoders and a concept-to-text decoder, but I think it could work with other modalities (e.g. video) too, if you make encoders/decoders for that.
I can't explain it. Just read it for yourself
Have improvements in pure CV models plateaued? Or are we just not noticing cuz LLMs is what's everyone's been talking about the past 2 years?
Noice video, but you should normalize the citations with cit per day.
Pretty clear that transformers dominated this year. I'm curious to see the most cited in other fields like diffusion, or RL. After all, the biggest breakthrough usually come where not everyone is looking.
"AI and ML" bro it is only NLP in there, or NLP-related paper analysis, maybe with some twist of generating images Xd
Yeah lol.
Well, LLMs dominated the conversation, so when ranking by citations, it makes sense.
@@jmoney4695 yeah i know, it is understandable, but it still made me laugh when he said "and that's it in the news of *AI and ML* ", like, bro XD...
They should have just been weighted by days since publication.
so close to 32768 !
This is such a nerdy comment
pls explain
@@npc4416 max value of 16bit signed integer
@@npc4416 32768 is a power of 2, programmers deal with them pretty often.
the amount of ai papers published in 2024 is close to that number.
this list is biased towards early papers. because they have more time to be cited
bro why dont you put a lot of vidoes love youre videos btw
2024 is far from over.
6:54
It's a shame that the Apple paper demonstrating what we experts knew,
that LLMs don't reason,
isn't on the list.
People don't like the truth.
Ah, I see that you did give a monthly...but that you don't understand its impact.
LLMs don't reason.
They just look up answers, one token at a time.
I am horribly disappointed that you did not cover all 34,276 papers in this video. Shame! 🤣
So much information 😅... Rhis is so fast
plz look into metas AI papers, one that's about BLT (Bit Latent Transformer or in the lines of that) and COCONUT (Chain of continuous thought). Please.
Awesome
Please remove the disturbing background music it's not possible to concentrate on the video
Wow I am early!
gee pee tee
GG
How about the top 10 worst papers ?
This is a really bad way to find interesting papers
i'm trying to find interesting papers and would love to know what a better way would be to gauge interest for a given research paper...