【博士Vlog】2024最新模型Mamba详解，Transformer已死，你想知道的都在这里了！

Ph.D. Vlog

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 17 лис 2024
Наука та технологія

КОМЕНТАРІ • 73

@paidaxing754 7 місяців тому ⁺⁴
讲得非常不错，受用了
@wangxiao_ahu 7 місяців тому ⁺²
感谢分享！能否把 SSM 更加详细的角度进行解析？
1. SSM 原始模型；
2. SSM 模型的离散化；
3. SSM scan 机制；
4. Mamba 与 GPU 硬件加速的关系；
5. Mamba 的核心优势与特色；
6. Mamba 的各种应用。
🤣
@phdvlog2024 7 місяців тому
太麻烦了，等这个模型彻底火了吧。那时候也有好用的程序可以直接部署了。
@wangxiao_ahu 7 місяців тому
@@phdvlog2024 我们进行了一些 vision mamba 模型的测试，但是部分任务上有提升，大部分任务都比不过 ViT。显存使用上也不见明显的降低，这就很奇怪。🤣 博主有进行一些实验验证么？
@phdvlog2024 7 місяців тому
vision mamba可能需要特别的调参，因为mamba这个模型里面的abcd都可以调整，那么用原始模型可能拼不过老的
@刘环菁 7 місяців тому ⁺³
博主讲的好棒，可谓通俗易懂，从b'站追过来的。可以求问一下博主的ppt是如何高效收集相关的架构图的？感觉非常直观！！简洁明了，通俗易懂🤩
@phdvlog2024 7 місяців тому ⁺²
论文里的原图，你可以看这个文章引用的和引用这个文章的其他论文，然后就能收集全了
@phdvlog2024 7 місяців тому
多评论哈，我也好知道我讲的如何
@hasesukkt 5 місяців тому
@@phdvlog2024 学习了！
@utei9502 4 місяці тому
謝謝博主講解，尤其是關於GPU 各級memory的利用對training and inference speed的影響還是比較有趣的。不過解説中很多專業術語用得都不對，講解也流於表面，甚至似是而非。建議博主系統學習機器學習的基礎知識，以提高視頻的專業性。
@phdvlog2024 4 місяці тому ⁺¹
有些不对因为发论文的时候都是ChatGPT直接打磨根本不需要对😂
@phdvlog2024 4 місяці тому ⁺¹
而且我写英文是对的中文我对不上
@QuinnZack 7 місяців тому ⁺¹
博主讲的对于应用的人来说很不错，想问下博主的ppt方便share吗？想在基础上细化一下算法
@phdvlog2024 7 місяців тому
你去看一下他们的代码呗
@QuinnZack 7 місяців тому ⁺¹
博主讲的对于应用的人来说很不错了，请问下ppt方便分享的吗？
@phdvlog2024 7 місяців тому ⁺¹
比较麻烦，ppt里面可能有点我的个人信息，里面所有的东西都是文章截图，所以也没啥需要分享的
@kyrieirving5928 7 місяців тому ⁺¹
讲得很棒哈哈哈
@leeyanbin2896 6 місяців тому ⁺²
讲的真好
@JacobLiu-q7v 3 місяці тому ⁺²
讲的很好，比b站付费课程好了不少。
@sunnysky1193 5 місяців тому
适合初步了解一下，可惜关键地方都一笔带过，有点避重就轻…
@phdvlog2024 5 місяців тому
因为关键的地方全是自动控制原理不是一个视频能讲明白的
@jaylenzhang4198 6 місяців тому
它这个prefix sum更像是个segment tree数据结构
@phdvlog2024 6 місяців тому
是有点像
@PeijiYang-t6f 4 місяці тому
你好，大佬。如果是类似状态机的方法，是如何解决lstm和传统rnn的遗忘问题的呢？
@phdvlog2024 4 місяці тому
没有办法 lstm已经是利用动量更新来解决遗忘了直接上transformer用空间换吧
@PeijiYang-t6f 4 місяці тому
@@phdvlog2024 所以看上去，mamba只是解决了lstm训练过程慢的问题，在长期记忆上相比transformer还是弱很多。即便因为优化了内存的原因可以容纳更长的上下文，实际上效果未必会更好。
@fdsmolasfae 6 місяців тому
大佬可否讲讲RAG和long-context两条技术路线的对比
@fdsmolasfae 6 місяців тому
Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention 我看了这个paper前景不错
@phdvlog2024 6 місяців тому ⁺¹
试试
@AGI.Trainer 7 місяців тому ⁺²
RNN不是像拉锁，而是像拉锁头吧。
@phdvlog2024 7 місяців тому ⁺²
对，但是RNN后面会生成拉链
@yangyang1412 7 місяців тому ⁺¹
上個說幹掉transformer的草已經長這麼高了
@phdvlog2024 7 місяців тому
一直在发展，现在已经到diffusion中了
@devinzhou4913 4 місяці тому
mamba移掉了positional encoding吗？
@phdvlog2024 4 місяці тому
没有吧
@foreverwhisper 7 місяців тому
普通码农在今年要如何开始学习AI呢，有需要先补一下的数学概念吗
@phdvlog2024 7 місяців тому ⁺¹
没必要，问chatgpt就行了，先找个系统的视频学学
@foreverwhisper 7 місяців тому
@@phdvlog2024 我翻了翻你的视频，决定先订阅了再说😊
@phdvlog2024 7 місяців тому
我之后会出个合集，这样你就能比较好的了解了，图像分类问题的合集
@ErenNew787 6 місяців тому
想问下大佬的看法，这个模型会不会成为今年的顶刊顶会风向呢
@phdvlog2024 6 місяців тому
已经成为了各种中小会议已经刷榜了我是审稿看到的很多有三分之一吧
@phdvlog2024 6 місяців тому
顶会不知道
@ErenNew787 6 місяців тому
@@phdvlog2024好的，谢谢大佬回复
@唐鹏-t3n 4 місяці тому
最新的不是TTT吗？test-time training
@ErenNew787 4 місяці тому
@@唐鹏-t3n 在这个视频的时候还是mamba，而且现在transformer挑战者太多了，效果还都不太行
@williamzhou4353 2 місяці тому
请问可以分享一下PPT吗！谢谢！
@phdvlog2024 2 місяці тому
你截图吧，ppt做的也没有多好😂
@williamzhou4353 2 місяці тому
@@phdvlog2024 好好好哈哈哈辛苦你啦
@weiseven717 7 місяців тому ⁺²
其实我比较好奇博主是怎么快速学习这么多知识的毕竟AI领域的内容这么庞杂是时时刻刻关注最新进展然后去阅读还是系统性的花了一段时间去学习有机会可以讲讲吗
@phdvlog2024 7 місяців тому ⁺²
多读论文，然后看不懂的要么问chatgpt，要么去看他引用了谁的论文（前人工作）这样捋下来就通顺了
@phdvlog2024 7 місяців тому ⁺¹
多评论哈，我也好知道我讲的如何
@MDNQ-ud1ty 6 місяців тому ⁺²
Many do not actually understand as much as they pretend/present.as they are just regurgitating what they read or saw and it is easy to talk.
But the most important thing is immersion. The more one spends in something the more one understands. The most work is at the front.
AI, in fact, is not that complex. It is basic linear algebra and it's fundamentals are very simple(that of curve fitting). Much of AI is really the perfection of something very simple and the use of very large compute machines that are now available.
1. Make sure you understand math.You must understand linear algebra. It is not difficult but may seem so. Basically it is the idea of vectors/lists/tuples and their transformations(matrices). without understanding the language of linear algebra and the core concepts you will constantly struggle. Most AI will use some concepts outside of LA in specific ways and they may need to be learned as one see's them. Calculus is also a must. At least understand differentiation and integration along with being comfortable with partial differentiation, chain rules, etc. The better you understand these(which comes with and work) the easier things get.
2. Make sure to actually do things. The best way to learn is through experience. If you just read things it may make sense but you actually don't understand it well or you will forget. If you do actual work(e.g., design your own NN's from scratch or implement algorithms and such) the more it will make sense and feel real. It may take time and you may struggle a lot but the struggle = learning = understanding. It is not magic and it is not "quick". No one is born with such knowledge. Most spend a huge amount of time in it. Familiarity makes it clear.
3. If this is something you want to do then start doing it as much as possible. If you want to be a "pro" then you have to act like a pro: Do the things they would do such as spend 8 hours a day working on such things(or as much as possible as you have but at least an hour). E.g., You should be reading all the main papers and reading them several times if you need to. If you do not understand something you have to learn to understand what you do not understand and seek out learning that. Over time, maybe a year or 5 you will learn so much more.
Knowledge accumulates slowly at first. It is like building a pyramid or house. At first you have to do a lot of work and it seems slow but after some time you have the foundation... and then the walls and then the roof and then it looks like a house... and then it is adding smaller things like windows and electrical. Then lighting and furniture.
Learning is built the same way. Ultimately you have to find your own way(you are different than everyone else).
But the simple thing is that you have to put in the time.
1. You have to know how to program. You can sorta learn both on some level. Follow tutorials. If you have to just copy in the code you see even if you don't understand. The very act of copying/imitating will teach you because you will remember things and it will accumulate. But you should be able to program. Python is the language that most people use now since it is very good for learning and doing things precisely because so many people use it. It is not a great language but it is worth learning(there are better languages but no one uses them because no one uses them). Learning something like pytorch or tensorflow in python. This means being able to build basic NN's and having an idea of how to put things together. This requires learning the API and stuff. Again, if you have no clue just find some videos online and start typing in the stuff that they do. After you watch a few you will have some idea and you then build on it.
2. Don't expect things overnight. It does take time. If you are serious then the time does not matter. 5 years, 20 years, whatever. It is a life long profess and the field, as all do, will evolve and grow and you will always be behind(because so many humans are contributing there is always new stuff and you can only do so much). So ultimately you have to do it because you want to do it for yourself. Else you will give up because it is too much worse and not fun. The way to keep it fun is to want to use it for things you want. E.g., you have ideas you want to use it for and then work towards those goals. This way when you get up in the morning you are thinking about how to use it for the thing you want.
3. You can do it if you want. These things are not complicated but they do require learning and learning takes time. Anyone has the ability to learn something but most people do not have the desire or time(due to capitalism/life). It will change your life. You can choose to learn other things(piano/music, martial arts, sociology, history, etc). Each one will change your life in a different way(after 50 years).
4. Try to learn more than one thing. If you just focus on one thing you will know one thing. Try to not get myopic. AI isn't just about coding. It's also about life... so knowing other things about life is relevant. E.g., learning music can also help you learn about AI. Learning about biology can help you learn AI... and vice versa. What makes most "intelligent people" different than the "average person" is that intelligent people want to learn and so are always learning new things rather than doing "fun things"(e.g., fvcking, playing video games, watching movies, drinking, etc). Learning is also addictive. Because you start to see how the universe work and want to see more. Of course balance is important. Too much learning can be problematic.
Ultimately though you have to figure it out. Only you know you. It might take some time for you to figure out exactly how to build your life but as long as you are moving in the direction you want then you will get somewhere. Likely not where you initially wanted but you will, as long as you are moving forward, look back in 50 years and be amazed at how far you went. [Note I'm just describing the gradient descent algorithm... it's all the same stuff. The algorithm literally was derived from our experiences and ideas about life as humans and combined with other things(such as math) to accomplish new things(such as AI)]
Also, when you are learning something and feel lost, that is ok... that tells you something. You should always feel lost a little. You have to learn to "ride the wave" of always feeling a little lost but not too lost. That means you are doing it right. Just slightly uncomfortable. If you feel totally lost you won't understand anything and are wasting a lot of time. It means you should go back and learn simpler things that you do not know(the lost feeling is because you do not know things).
Everyone that is very good started out exactly the same as everyone else. When I first started programming, or actually doing anything, I had no clue and could not envision what the long term would entail. I just did it. I moved forward not knowing the destination. But after 30 years of programming in 30+ languages over a wide variety of systems and architectures you see the world much simpler and through the programming lens. So many things look different but, in fact are the same. It's like cats, Cats come in many sizes, colors, personalities, etc. But they are all cats. The more you interact with different cats the more you understand what general cat is[this is just sampling/data and how AI works too]. When I started learning math it was just because I didn't understand it and saw it as mysterious and started trying to figure out what the heck they were talking about. I sucked at math and wasn't interested in it at first until I was. Time*Desire*Effort*Memory*Organization = Knowledge. Even though we are all different we each also have different factors. Some can put in more time but have worse memories. Some have more desire but lower effort. The results are what they are. But it's all the same in the end as far as just learning. Most kids are not taught correctly or how to learn or the consequences of it(I was a kid that was taught very poorly and almost everything I learned was due to just me wanting to learn it and struggling very hard to learn it. I had a lot time, desire, and effort but a very bad memory and worse organization. But this has let me achieve quite a bit because I made up for my weaknesses using the other factors. I should have worked on my organizational skills and memory but I didn't know how early on and didn't understand the implications).
Life, in some sense, is only complicated because we start out with basically zero knowledge and have to build up. But we generally are given enough time to amass quite a bit. What makes everyone different is some learn at a slower rate than others and so there is a spread/distribution and the people at the middle are amazed at those at the top. But, in fact, it's just that some focused more on singular things(such as MJ only learning BB but being dumb in almost everything else yet people will treat him as a god. It's no god but someone that only did BB. Anyone else that did just as much BB as him with the same luck and such will be approximately just as good). You are what you eat... and you are what you "eat"(do).
Good luck with it. Best thing to do is to jump right in. Even if you are totally lost you will still learn something and eventually "learn to swim".
@jasperyoon4301 4 місяці тому
@@MDNQ-ud1ty Wonderful! Learned a lot from you. Thanks very much.
@michaelzap8528 4 місяці тому
@@MDNQ-ud1ty说的太好了。看得我眼泪汪汪。如果10年前我能看到你这个，那该多好啊。
@yunbow5630 5 місяців тому
12：35 这个图是哪里来的请问老板
@phdvlog2024 5 місяців тому
原论文
@yunbow5630 5 місяців тому
@@phdvlog2024 没有阿老板我看这个图像colab的。
@frogasian8888 6 місяців тому ⁺¹
看起來是一個寶藏頻道但沒有名字只有"博士"的話有點不好推廣
@phdvlog2024 6 місяців тому ⁺¹
也许以后改改吧
@FangXiaoyu-fi9kw 6 місяців тому
可以分享一下ppt嘛球球了
@phdvlog2024 6 місяців тому
这些图都是原文➕网上找的
@FangXiaoyu-fi9kw 6 місяців тому
可以分享一下ppt嘛
@phdvlog2024 6 місяців тому
这些图都是网上的加论文pdf里面的直接截取就好了
@Chuhao-t1s 4 місяці тому
谁来给博主建立损失函数？..我！讲得好！言简意赅！MSE->0
@phdvlog2024 4 місяці тому
😂
@yunbow5630 8 місяців тому
不会的这几年还得是attention
@phdvlog2024 7 місяців тому ⁺⁵
今年已经新模型涌现了，估计今年cvpr nips就会被屠榜
@yunbow5630 5 місяців тому
@@phdvlog2024 我看mamba2能行ahhh
@部落课程 5 місяців тому
那个是华中科技大学吗。。。
@phdvlog2024 5 місяців тому
是的
@pakersmuch3705 4 місяці тому ⁺¹
腻害！
@LouisCubingChannel 7 місяців тому ⁺¹
我天你声色好像方脸。。@多伦多
@phdvlog2024 7 місяців тому ⁺¹
😂

Наступне

Автоматичне відтворення

【博士Vlog】EfficientNet 和 EfficientDet 讲了什么？有计算资源真的可以为所欲为！！