Oxen
Oxen
  • 63
  • 100 398
How To Fine-Tune Llama 3.1 in 11 Minutes
Use Oxen AI 🐂 oxen.ai/
Oxen AI makes versioning your datasets as easy as versioning your code! Even is millions of unstructured images, the tool quickly handles any type of data so you can build cutting-edge AI.
--
Blog 📜 www.oxen.ai/blog/fine-tuning-llama-3-1-8b-in-under-12-minutes
Links + Notes 📝 www.oxen.ai/blog/arxiv-dives
Join Arxiv Dives 🤿 oxen.ai/community
Discord 🗿 discord.com/invite/s3tBEn7Ptg
--
Chapters
0:00 What to expect
0:16 REFT and how we fine-tuned
0:57 How to evaluate a model
2:00 Evaluating by hand
Переглядів: 490

Відео

Inside the Model that Beat DALL-E and PIXART
Переглядів 53014 днів тому
In this dive we go into one of the papers that inspired Flux, the new state-of-the-art generative image model. Use Oxen AI 🐂 oxen.ai/ Oxen AI makes versioning your datasets as easy as versioning your code! Even is millions of unstructured images, the tool quickly handles any type of data so you can build cutting-edge AI. Paper, Links, Notes 📝 www.oxen.ai/blog/arxiv-dives Join arXiv Dives 🤿 oxen...
How to Use Llama 3.1 to Generate Synthetic Data
Переглядів 12521 день тому
The Blog 📝 www.oxen.ai/blog/create-your-own-synthetic-data-with-only-5-political-spam-texts Join Arxiv Dives 🤿 oxen.ai/community Discord 🗿 discord.com/invite/s3tBEn7Ptg Use Oxen AI 🐂 oxen.ai/ Oxen AI makes versioning your datasets as easy as versioning your code! Even is millions of unstructured images, the tool quickly handles any type of data so you can build cutting-edge AI. Chapters 0:00 Wh...
How Llama 3.1 Works
Переглядів 471Місяць тому
In this dive we go into the absolute behemoth of a paper (92-pages) Llama 3 Herd of Models. We look at how Meta created the most competitive open-source model to date. Get Oxen AI 🐂 oxen.ai/ Oxen AI makes versioning your datasets as easy as versioning your code! Even is millions of unstructured images, the tool quickly handles any type of data so you can build cutting-edge AI. Paper 📜 ai.meta.c...
How Unlimiformer Works From the Author Herself- Amanda Bertsch
Переглядів 434Місяць тому
Here we dive into Unlimiformer with lead author Amanda Bertsch herself! Amanda gives us a presentation on Unlimiformer and Long Context Models as well as answer the several questions our divers had. If you want to ask questions yourself the next time we have an author on…click the link👇 Join Arxiv Dives 🤿 oxen.ai/community Use Oxen AI 🐂 oxen.ai Oxen AI makes versioning your datasets as easy as ...
How ReFT Works w/ Author Zhengxuan Wu
Переглядів 1,2 тис.2 місяці тому
We dive into the ReFT paper from Stanford with one of the authors Zhengxuan Wu. Use Oxen AI 🐂 oxen.ai/ Oxen AI makes versioning your datasets as easy as versioning your code! Even is millions of unstructured images, the tool quickly handles any type of data so you can build cutting-edge AI. Paper 📜 arxiv.org/abs/2404.03592 Links Notes 📝 www.oxen.ai/blog/arxiv-dives Join Arxiv Dives 🤿 oxen.ai/co...
Oxen AI's Rust Meetup!
Переглядів 1202 місяці тому
Join us for our next Rust meet up here👇 oxen.ai/community Get Oxen AI 🐂 oxen.ai/ Oxen AI makes versioning your datasets as easy as versioning your code! Even is millions of unstructured images, the tool quickly handles any type of data so you can build cutting-edge AI. Join Arxiv Dives 🤿 oxen.ai/community Discord 🗿 discord.com/invite/s3tBEn7Ptg Chapters 0:00 Rust Brain Teasers 10:33 What Oxen A...
How Samba Works
Переглядів 1,7 тис.3 місяці тому
We dive into the Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling. The model built off Mamba to create a fast, infinite context length LLM. Get Oxen AI 🐂 oxen.ai/ Oxen AI makes versioning your datasets as easy as versioning your code! Even is millions of unstructured images, the tool quickly handles any type of data so you can build cutting-edge AI. Pape...
How Interpretable Features in Claude 3 Work
Переглядів 1,2 тис.4 місяці тому
We dive into the Scaling Monosemanticity paper from Anthropic which explores the representations internal to the model, discovering how certain features are related to concepts in the real world. Get Oxen AI 🐂 oxen.ai/ Oxen AI makes versioning your datasets as easy as versioning your code! Even is millions of unstructured images, the tool quickly handles any type of data so you can build cuttin...
Efficient DiT Fine-Tuning with PixART for Text to Image Generation
Переглядів 3494 місяці тому
We dive into the PixART-α paper, an efficient technique to fine-tune a diffusion transformer to generate images from text. Get Oxen AI 🐂 oxen.ai/ Oxen AI makes versioning your datasets as easy as versioning your code! Even is millions of unstructured images, the tool quickly handles any type of data so you can build cutting-edge AI. Paper 📜 arxiv.org/abs/2310.00426 Links Notes 📝 www.oxen.ai/blo...
Comparing HumanEval vs. EvalPlus
Переглядів 3844 місяці тому
In this video, our community member Alex Owen compares HumanEval vs. EvalPlus. We dive deep into the code from the paper "Evaluating Large Language Models Trained on Code" and show results. Get Oxen AI 🐂 oxen.ai/ Oxen AI makes versioning your datasets as easy as versioning your code! Even is millions of unstructured images, the tool quickly handles any type of data so you can build cutting-edge...
How To Train an LLM With Diffusion From Scratch
Переглядів 9035 місяців тому
How To Train an LLM With Diffusion From Scratch
How Diffusion Works for Text
Переглядів 1,7 тис.5 місяців тому
How Diffusion Works for Text
How 1 Bit LLMs Work
Переглядів 28 тис.5 місяців тому
How 1 Bit LLMs Work
Sakana AI's Latest Release: Evolutionary Optimization of Model Merging Recipes
Переглядів 1,8 тис.6 місяців тому
Sakana AI's Latest Release: Evolutionary Optimization of Model Merging Recipes
Fine-Tune Mistral on Your Discord Data
Переглядів 2966 місяців тому
Fine-Tune Mistral on Your Discord Data
How I-JEPA Works
Переглядів 1,1 тис.6 місяців тому
How I-JEPA Works
Fine-Tune An LLM on Your Discord Data (Part 1)
Переглядів 3116 місяців тому
Fine-Tune An LLM on Your Discord Data (Part 1)
Fine-Tuning a Self-Rewarding Loop into Mistral 7B
Переглядів 5 тис.6 місяців тому
Fine-Tuning a Self-Rewarding Loop into Mistral 7B
How to Use Diff to Specify Keys, Compares, and Added Columns
Переглядів 626 місяців тому
How to Use Diff to Specify Keys, Compares, and Added Columns
Find Differences in Your Data in Under 5 Minutes
Переглядів 1046 місяців тому
Find Differences in Your Data in Under 5 Minutes
Road to Sora and How Diffusion Transformers Work
Переглядів 9156 місяців тому
Road to Sora and How Diffusion Transformers Work
How to Fine-Tune an LLM on Text2SQL Data
Переглядів 1,3 тис.6 місяців тому
How to Fine-Tune an LLM on Text2SQL Data
How Medusa Works
Переглядів 1,4 тис.7 місяців тому
How Medusa Works
How Lumiere Works
Переглядів 4387 місяців тому
How Lumiere Works
Depth Anything - Generating Depth Maps from a Single Image with Neural Networks
Переглядів 2,7 тис.7 місяців тому
Depth Anything - Generating Depth Maps from a Single Image with Neural Networks
Deep Dive Into The Toolformer
Переглядів 7557 місяців тому
Deep Dive Into The Toolformer
How To Create A New Repo From Your Terminal w/ Oxen.ai
Переглядів 1287 місяців тому
How To Create A New Repo From Your Terminal w/ Oxen.ai
Deep Dive Into How Self Rewarding Language Models Work
Переглядів 1,7 тис.7 місяців тому
Deep Dive Into How Self Rewarding Language Models Work
How DPO Works and Why It's Better Than RLHF
Переглядів 2,4 тис.8 місяців тому
How DPO Works and Why It's Better Than RLHF

КОМЕНТАРІ

  • @hasangoni4369
    @hasangoni4369 18 днів тому

    very nice description. May I ask, what tool r u using while you create the presentation ?

    • @oxen-ai
      @oxen-ai 18 днів тому

      Thank you! We are using a combination of Notion for the notes and OBS for the video

  • @navneetkumar5517
    @navneetkumar5517 19 днів тому

    hello send code

  • @iva1389
    @iva1389 Місяць тому

    Well-structured and informative lecture, thanks for sharing!

  • @saireddy7628
    @saireddy7628 Місяць тому

    Is there an implementation in code similar to this to get started?

  • @uansholanbayev5670
    @uansholanbayev5670 Місяць тому

    Thanks 😁

  • @SiminFan
    @SiminFan Місяць тому

    Thanks for the great post! Are there a practical implementation video now?

    • @oxen-ai
      @oxen-ai Місяць тому

      We did do a practical implementation blog post and video here: www.oxen.ai/blog/how-to-train-diffusion-for-text-from-scratch Let us know what you think!

  • @RuairiODonnellFOTO
    @RuairiODonnellFOTO Місяць тому

    Great presentation, liked the way you did the highlights summary ❤

  • @RuairiODonnellFOTO
    @RuairiODonnellFOTO Місяць тому

    On Markdown vs XML, XML enclose the text with an open and close markup. Markdown has fewer closing tags and header markup has the same de limiter for open and close. Tokenizer will do a better job on XML than markdown or JSON

    • @oxen-ai
      @oxen-ai Місяць тому

      This is good intuition 💡 thank you!

    • @ChaseFreedomMusician
      @ChaseFreedomMusician Місяць тому

      YES! I have been saying for 2 years XML is the BEST way to serialize in LLMs because of closing tags

    • @ChaseFreedomMusician
      @ChaseFreedomMusician Місяць тому

      Also it can semantically handle deeply nested graphs as linear tokens. Any of the overhead of "prettyfying" can be done post process trivially taking out all the headaches of do I need 1 space or 3 or 4 or 8 or 15. Zero is the right answer I need zero spaces for the new object I just need a closing tag and move on.

    • @hypervanse
      @hypervanse Місяць тому

      Completely agree, despite @OpenAI pushing structured json format, I used something (I didn't knew at the time) similar to xml, bra-ket notation, used in quantum mechanics, much more from tinkering than actually the formal aspect. Bra-ket is is something like <a|H|b>. it became more involved than that as I developed this in prompt programming in GPT, based on mathematical results I have for 2 years, that I got from a research I did (play I mean, nobody paid me anything, it was just that API was free and as a mathematical physicist I'm used to MATLAB). I am also ADHD, had this idea of fixing chatGPT 3.5 ADHD and give her a friend. It wasn't hard, I mean it is brutally hard, but coincidentally GPTs, all chatbots really, falls as a trivial case of the area of my research. I didn't knew all the bus about AI, terminology etc. IT are not famous for rigour, mathematical methods and specially respect for like, giving credits where it's due, citations. I stumbled on some papers and other videos and finally discovered where Transformer, loss, gradient descent comes from. All these things are well know, like linear transforms, steepest descent and relative absolute error. It really makes one head hurts so much low bar coming from these so called godfathers of AI. Offense intended, they create these things, they're not formally Models, let's not get into Tensors. I'm pretty sure Yllia sucks and their peers don't really know tensorial calculations, they did of course General Relativity would be easy. For sure everyone here is well versed in quantum mechanics, thermodynamics, noether theorem, universality classic, philosophy and deep connection with semantics, languages in general, are bilingual. I guess not. Well I actually already solved the alignment problem of LLMs using causality instead of data. Other aspect is is no AGI is necessary, all stochastic character emitters fall in this class.. This means security is a solved problem, but llm code modification by human, me in the case, just as easy. I do every day, don't understand Python, that maybe because NumPy and more I know from inside, the algorithms, also, if history and actually highly intellectual persons agree, is 1. Most people are not brilliant, rarely there's much wisdom in common reason. Object oriented language, python in special is so common because... it's made for glorified typewriters, coders. Have anyone created algorithms? guess not. Has anyone questioned why pytorch and tensorflow are free? guess not. Programmers are pursuing their disposal, surely you won't be missed. Hacking these systems, this job will the only one left.

  • @pawanpatil4715
    @pawanpatil4715 Місяць тому

    Very good content in your channel. But please upload high quality video. 360p is too bad.

    • @oxen-ai
      @oxen-ai Місяць тому

      Thanks for the feedback! The export of the presenter video for zoom was pretty low quality since we do it live. Will see if we can get a higher quality one next time!

  • @Pingu_astrocat21
    @Pingu_astrocat21 Місяць тому

    Thanks for uploading! Love your videos

  • @TRD009
    @TRD009 Місяць тому

    code for image classification pls

  • @NinthDoctor-ms3oj
    @NinthDoctor-ms3oj 2 місяці тому

    this is such a great presentation i can't believe how few views this has wtf

    • @oxen-ai
      @oxen-ai Місяць тому

      Thank you we do our best! Let us know if there are any papers you would want us to cover in the future ❤️

  • @420_gunna
    @420_gunna 2 місяці тому

    Thanks for doing these and posting them to youtube! Yall rock

    • @oxen-ai
      @oxen-ai 2 місяці тому

      🤜 🤛

    • @adhilaseem2518
      @adhilaseem2518 Місяць тому

      Hi... how do you think we can add eval_dataset to the trainer? The **datamodule , passed onto the trainer has eval_dataset as None

    • @oxen-ai
      @oxen-ai Місяць тому

      @@adhilaseem2518 We have another follow up blog post here where we tried it on some real data: www.oxen.ai/blog/fine-tuning-llama-3-in-14-minutes-using-reft Let me know if it helps!

    • @adhilaseem2518
      @adhilaseem2518 Місяць тому

      @@oxen-ai hey, I enjoyed reading your blog. I am facing an issue as my task is not classification, but extracting certain headings from a given document. Since the answer/output does not fall into predefined categories, I cannot use accuracy, precision and recall. What metric do you think I should evaluate for my task? I tried to add eval_dataset to the data module so that I get an idea of the cross-entropy loss on the validation set. But it's giving me some errors. It would be really helpful if you can tell me how this can be done in the right way. I think I am preparing the eval_dataset incorrectly!

  • @keeshrolling107
    @keeshrolling107 2 місяці тому

    This helps me with my research. Thank you for your amazing work. Keep it up! You guys are absolutely amazing!!

    • @oxen-ai
      @oxen-ai 2 місяці тому

      You’re welcome! Feel free to suggest any papers you’d like us to cover in our discord as well

  • @420_gunna
    @420_gunna 2 місяці тому

    Great job daniel! Thanks for linking to that reddit comment.

  • @scottoxen
    @scottoxen 2 місяці тому

    Brain teasers were fun!

  • @idrees2516
    @idrees2516 2 місяці тому

    beautiful open source code deep fives python+rust guest lectures

  • @anghuynhnguyen9625
    @anghuynhnguyen9625 3 місяці тому

    where can i find the notion link?

    • @oxen-ai
      @oxen-ai 2 місяці тому

      Hey there! We added all the notes to our blog here as well: www.oxen.ai/blog/arxiv-dives-fast-speech-2

  • @MarxOrx
    @MarxOrx 3 місяці тому

    The end was by far the best ❤

  • @RickySupriyadi
    @RickySupriyadi 3 місяці тому

    coincidentally the naming is the same protocol that spread ransomware all around the world ; samba = smb

    • @oxen-ai
      @oxen-ai 3 місяці тому

      Oh wow, that is a fun fact

  • @spencerfunk6697
    @spencerfunk6697 3 місяці тому

    Bro I’m going to try to make this but 1 bit

  • @scottoxen
    @scottoxen 3 місяці тому

    Liliang was so awesome live. Bummer it’s not in the video but hope the future dives get more great authors live!

    • @oxen-ai
      @oxen-ai 3 місяці тому

      We try to contact the authors of the papers we cover so hopefully next time!

  • @envynoir
    @envynoir 3 місяці тому

    Cool video bro, keep it up!

  • @spencerfunk6697
    @spencerfunk6697 3 місяці тому

    Idk if anyone’s coming back to this but something I found is you can go completely bonkers with your batch size. Goals to max out gpus capability right? I’ve been loading 64 gradient accum steps with a 64 batch size which is a total of 2048 examples per iteration. I’ve been using a 256 max seq len , and the size is 270 mil parameters. On a l4 that only uses like half the gpus power.

    • @oxen-ai
      @oxen-ai 3 місяці тому

      That’s dope, mind if I ask the use case for seq len 256? I’m curious what the dataset looks like

    • @spencerfunk6697
      @spencerfunk6697 3 місяці тому

      @@oxen-ai well I figure when you’re pretraining, u just need to make it good enough to spit out semi normal sentences, then the rest can be ironed out thru finetuning. You can introduce rope during finetuning as well to increase your context length / max position embeddings

    • @spencerfunk6697
      @spencerfunk6697 3 місяці тому

      @@oxen-ai and I also usually pretrain on the dataset I’ll fine tune it with (it seems to help everything stick better) the only difference in pretrain / finetune is the finetuning dataset is in alpaca format, the pretrain dataset doesn’t have the alpaca prompt format just each column in a single text field. Hope this helps

    • @oxen-ai
      @oxen-ai 3 місяці тому

      @@spencerfunk6697 Ahh that makes total sense. Smart!

  • @RivenL-re2fj
    @RivenL-re2fj 3 місяці тому

    Thank u for your work! The video is extremely helpful!

  • @tomharmon2000
    @tomharmon2000 3 місяці тому

    you're a legend bro! One of the better youtube channels covering the technical aspects of AI.

    • @oxen-ai
      @oxen-ai 3 місяці тому

      Appreciate it!! Let me know if there are any other topics you want to dive into

  • @DeruwynArchmage
    @DeruwynArchmage 3 місяці тому

    You should be able to use LLMs to predict semanticity and to design and run experiments automatically to validate the theories and explore possibilities.

  • @DeruwynArchmage
    @DeruwynArchmage 3 місяці тому

    I note that these are concepts. There must be functional parts. Like, a set of parameters that implement certain algorithms or functions.

  • @Neiltxu
    @Neiltxu 3 місяці тому

    When you're showing the terminal outputs please move the webcam, thanks for the great video!

  • @KadamShashankRavindra
    @KadamShashankRavindra 3 місяці тому

    how can mamba be used for text summarization??

  • @yookjieun6592
    @yookjieun6592 3 місяці тому

    17:04

  • @Lys-gv9ji
    @Lys-gv9ji 4 місяці тому

    thank you! And someone on github said that the fine-tune model work well on squad task based on 2.8b mamba, you can used 2.8b to train it again. Waiting for you!

  • @ZeZa1515
    @ZeZa1515 4 місяці тому

    Love this as a concept. I wonder if we will start seeing this as an alternative way to prompt. It seems obvious this could be a really easy way to help finetune your prompt texts, but it could be an even better system prompt

  • @augmentos
    @augmentos 4 місяці тому

    Would love an update after the mixed model, is this possible that future flagship models will adopt? Where does it fall short?

  • @spencerfunk6697
    @spencerfunk6697 4 місяці тому

    im huge on combing thru data. garbage in garbage out. i might try and make some additions to this

    • @oxen-ai
      @oxen-ai 4 місяці тому

      Awesome! Excited to see you additions!

  • @vamshi-rvk
    @vamshi-rvk 4 місяці тому

    hello oxen, thank you very much for the detailled explaination. i have a question. RLHF deals with huge datasets with no need of labelling them as the reward model would be dealing with the responses accuracy. But with DPO, we will have to tag/label the complete dataset with human efforts which is very time and resource consuming. im unable to understand the real benifit of DPO over RLHF here. Could you please help me in understanding this? I would really appreciate if i can somehow have a direct conversation with you over your preferred platform. Thanks in advance.

    • @oxen-ai
      @oxen-ai 4 місяці тому

      Thanks for the question! We have a discord would love to continue the conversation there. A bunch of smart people who can chime in as well discord.gg/s3tBEn7Ptg

    • @kunalsuri8316
      @kunalsuri8316 4 місяці тому

      RLHF by itself doesn't need labeling but the reward model (that RLHF is based on) needs preference data to be trained. In case of DPO, rather than using preference data to train a reward model, we use preference data to train the final LM itself.

  • @rogerc7960
    @rogerc7960 5 місяців тому

    Diffusion Models for Discrete Data m.ua-cam.com/video/mCaRNnEnYwA/v-deo.html

  • @yytuzk
    @yytuzk 5 місяців тому

    Can Mamba train for malware or network detection? if possible, any practice or example ?

    • @oxen-ai
      @oxen-ai 4 місяці тому

      In theory you could train these neural networks to approximate any data, I'm curious what you think the dataset for that would look like!

  • @spencerfunk6697
    @spencerfunk6697 5 місяців тому

    why the og get taken down

    • @oxen-ai
      @oxen-ai 5 місяців тому

      There were some issues with the edit and we had to re-upload unfortunately :(

    • @spencerfunk6697
      @spencerfunk6697 5 місяців тому

      @@oxen-ai oh thank goodness it was jjust that. so i was curious, could you finetune a qlora adapter this way?

    • @oxen-ai
      @oxen-ai 5 місяців тому

      @@spencerfunk6697 Yep should be able to since it is simply a transformer under the hood! That would be fun to try with the pre-trained ones provided.

    • @spencerfunk6697
      @spencerfunk6697 5 місяців тому

      @@oxen-ai please make a vid 🙏🙏🙏

  • @jensg8547
    @jensg8547 5 місяців тому

    couldnt the diffusion pertubations happen on the embedding vector level - as suggested in one of the questions - and a nearest neighbor search be used to predict a vector that resembles an actual token?

    • @oxen-ai
      @oxen-ai 5 місяців тому

      Yes, I love this idea. I think someone should try it and see how well it works. We dived a little into the code in our next video as a jumping off point!

  • @charb423
    @charb423 5 місяців тому

    Create a project where you are merging 2 or moŕe models please

  • @jamesgrayshon6732
    @jamesgrayshon6732 5 місяців тому

    Why did you not try this with the instruction/chat version of mistral?

  • @ax5344
    @ax5344 5 місяців тому

    watching it now. Thanks for the great sharing! I really like the diagram @ 8:00 -9:24. I would appreciate more elaboration! It is very helpful. I feel there are still very limited resources on the overall pipeline and there could be so many questions and best practices in different parts of the pipeline. Hope to see more materials there.

  • @spencerfunk6697
    @spencerfunk6697 5 місяців тому

    I’m working on a project where I quantized tiny llama to 1.58 bit and I’m base training it on a amd cpu 😭😭😭 doing anything to find a way to make models on a amd lol

  • @420_gunna
    @420_gunna 5 місяців тому

    Before viewing: One thing I was always kind of curious about regarding the contrastive loss thing... Is that we make things along the diagonal as close as possible, and push everything else away. But if I have data of: Cat A Dog A Dog B It makes sense that I would want to push Cat A away from Dog B But am I pushing Dog A and Cat A away an equal amount from Dog B? That doesn't seem quite right :O

  • @BooleanDisorder
    @BooleanDisorder 5 місяців тому

    Diffusion text noise could improve reasoning. A kind of overview of the problem, instead of trying to guess just the next token. If you make an oopsie at the start it can quickly compound later-on with autoregression. Being able to go back and forth must be a huge boost. I could see a model in the future where the question to an answer is put like "therefore answer to [question asked] must be" at the end of the noise to force it to answer. It's also a step into the direction of explainability.

  • @rogerc7960
    @rogerc7960 5 місяців тому

    Tesla diffusion model taught itself to read street signs.

    • @oxen-ai
      @oxen-ai 5 місяців тому

      Oh interesting, do you have a link?

  • @DanielPramel
    @DanielPramel 5 місяців тому

    Could this potentially improve function calling and adherence to certain output formats, e.g., JSON?

    • @oxen-ai
      @oxen-ai 5 місяців тому

      That's a great call out, the benefits of infilling could definitely help with certain output formats. IE put the curly braces at the start and end of the sequence.

  • @auresdz701
    @auresdz701 5 місяців тому

    Also, EMA prevents collapsing to a trivial solution

  • @auresdz701
    @auresdz701 5 місяців тому

    Question at 32:00 i think that the trick is in the EMA between the target and context encoder, they are not the exact weights