It Happened! Elon Musk Reveals Incredible Features Of Tesla Bot Gen 3 2025! Destroy ALL Rivals!

Minimax Update! Consistent Characters with ONE Photo and new "Live" model!

Large Language Models for Reasoning: A Survey #emoryuniversity #tsinghua #hkust

Cool Items!🥰 New Gadgets, Smart Appliances, Kitchen Tools Utensils, Home Cleaning, Beauty #shorts

ФИЛЬМ! НЕВИНОВНЫЙ ГОТОВИТ ДЕРЗКИЙ ПОБЕГ С НЕПРИСТУПНОГО ОСТРОВА-ТЮРЬМЫ! Мотылёк! Русский фильм

Психіатр Глузман УПЕРШЕ сканує Зеленського, Путіна й Трампа

MiniMax-01: Scaling Foundation Models with Lightning Attention - 4M tokens context window

Srikanth Bhakthan

Переглядів 33

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 17 січ 2025
arxiv: arxiv.org/pdf/...
HF: huggingface.co...
GitHub: github.com/Min...
The document details MiniMax-01, a new series of large language models (LLMs) and vision-language models (VLMs) that achieve state-of-the-art performance while significantly extending context window length. This is accomplished through the innovative use of "lightning attention," a highly efficient linear attention mechanism, combined with Mixture of Experts (MoE). The models, MiniMax-Text-01 and MiniMax-VL-01, are open-sourced and boast impressive benchmark results across various tasks, showcasing superior capabilities in long-context processing, multimodal understanding, and knowledge-based reasoning.
MiniMax-01's scaling is made possible by several architectural innovations, allowing it to handle context windows of up to 4 million tokens. Here are the key innovations:
*Lightning Attention:* This I/O-aware, optimized implementation of TransNormer addresses the computational bottleneck of linear attention mechanisms. It uses a novel tiling technique to divide attention calculation into intra-block and inter-block computations. This allows the model to scale linearly with the input sequence length.
*Hybrid Architecture:* MiniMax-01 uses a hybrid architecture that combines lightning attention with softmax attention and Mixture of Experts (MoE). This allows the model to balance the efficiency of linear attention with the retrieval capabilities of softmax attention. The architecture uses one transformer block with softmax attention for every seven transformer blocks with lightning attention.
*Mixture of Experts (MoE):* The model incorporates MoE to enhance scalability and efficiency. This enables the model to have a large number of parameters while only activating a subset for each token. MiniMax-01 has 32 experts and 456 billion total parameters, but only 45.9 billion are activated for each token.
*Optimized Parallel Strategy:* The development of an optimized parallel strategy using techniques such as expert parallel (EP), expert tensor parallel (ETP), varlen ring attention, and Linear Attention Sequence Parallelism (LASP) enables efficient training and inference. This allows the model to handle long contexts on a single machine with limited resources.
*CUDA Kernel Optimizations:* MiniMax-01 employs a set of CUDA kernels specifically designed for lightning attention inference. This leads to a high Model Flops Utilization (MFU) on the Nvidia H20, improving the efficiency of the inference process.
MiniMax-01, a series of large language and vision-language models, improves upon existing LLMs in several key ways:
*1. Longer Context Window:* MiniMax-Text-01 can handle a context window of up to **1 million tokens during training and 4 million tokens during inference**, significantly exceeding the typical range of 32K to 256K tokens in most existing models. This expanded context window allows for applications like using professional books as context, assisting with entire programming projects, and maximizing the potential of in-context learning.
*2. Linear Attention Implementation:* MiniMax-01 is the first successful large-scale implementation of linear attention.
*3. Mixture of Experts (MoE):* To further enhance scalability and efficiency, MiniMax-01 integrates MoE with linear attention, creating a model with 32 experts and 456 billion total parameters, of which 45.9 billion are activated for each token.
*4. Computation Optimizations:* Extensive optimizations are implemented for both training and inference, ensuring efficient utilization of computational resources.
*5. Data Quality and Training Strategy:* MiniMax-01 utilizes a rigorously curated pre-training corpus with data quality enhancement through filtering and reward-based evaluation. A three-stage training procedure is employed to extend the context window to one million tokens.
*6. Multi-Stage Post-Training Framework:* A comprehensive post-training framework enhances the model’s performance, long-context capability, and real-world applicability.
*7. Vision-Language Capabilities (MiniMax-VL-01):* MiniMax-VL-01 extends the language model's capabilities to visual understanding tasks through the integration of:
A Vision Transformer (ViT) for visual encoding
A two-layer MLP projector for image adaptation
A dynamic resolution strategy resizing input images according to a predefined grid
A four-stage training regimen involving visual pre-training and fine-tuning of the entire pipeline
*8. Benchmark Performance:* MiniMax-01 achieves top-tier performance on various academic and in-house benchmarks, outperforming many existing LLMs, particularly in long-context processing.
Created with NotebookLM

КОМЕНТАРІ •

Наступне

Автоматичне відтворення

It Happened! Elon Musk Reveals Incredible Features Of Tesla Bot Gen 3 2025! Destroy ALL Rivals!

It Happened! Elon Musk Reveals Incredible Features Of Tesla Bot Gen 3 2025! Destroy ALL Rivals!

Minimax Update! Consistent Characters with ONE Photo and new "Live" model!

Minimax Update! Consistent Characters with ONE Photo and new "Live" model!

Large Language Models for Reasoning: A Survey #emoryuniversity #tsinghua #hkust

Large Language Models for Reasoning: A Survey #emoryuniversity #tsinghua #hkust

Cool Items!🥰 New Gadgets, Smart Appliances, Kitchen Tools Utensils, Home Cleaning, Beauty #shorts

Cool Items!🥰 New Gadgets, Smart Appliances, Kitchen Tools Utensils, Home Cleaning, Beauty #shorts

ФИЛЬМ! НЕВИНОВНЫЙ ГОТОВИТ ДЕРЗКИЙ ПОБЕГ С НЕПРИСТУПНОГО ОСТРОВА-ТЮРЬМЫ! Мотылёк! Русский фильм

ФИЛЬМ! НЕВИНОВНЫЙ ГОТОВИТ ДЕРЗКИЙ ПОБЕГ С НЕПРИСТУПНОГО ОСТРОВА-ТЮРЬМЫ! Мотылёк! Русский фильм

Психіатр Глузман УПЕРШЕ сканує Зеленського, Путіна й Трампа

Психіатр Глузман УПЕРШЕ сканує Зеленського, Путіна й Трампа

ふわふわシフォン大作戦🩷スイーツ戦隊のキラキラミッション✨【銀座コージーコーナー】 #shorts #シフォンケーキ #クリスマスケーキ #クリスマス #ケーキ #チョコケーキ #christmas

ふわふわシフォン大作戦🩷スイーツ戦隊のキラキラミッション✨【銀座コージーコーナー】 #shorts #シフォンケーキ #クリスマスケーキ #クリスマス #ケーキ #チョコケーキ #christmas

Best Ways to Use Gemini 2.0 (over ChatGPT & Perplexity)!

Best Ways to Use Gemini 2.0 (over ChatGPT & Perplexity)!

MiniMax-01: Scaling Foundation Models with Lightning Attention

MiniMax-01: Scaling Foundation Models with Lightning Attention

Inference Time Scaling for Medical Reasoning in LLMs - o1 Replication Journey

Inference Time Scaling for Medical Reasoning in LLMs - o1 Replication Journey

Benchmarking AI OpenAI's o3 and the Quest for AGI #lab42

Benchmarking AI OpenAI's o3 and the Quest for AGI #lab42

Titans: Neural Long Term Memory for Enhanced Sequence Modeling #google

Titans: Neural Long Term Memory for Enhanced Sequence Modeling #google

How this LED Tech MADE IN CHINA is impacting the whole WORLD

How this LED Tech MADE IN CHINA is impacting the whole WORLD

Agent Laboratory : An Autonomous LLM Based Research Framework #amd #johnhopkins

Agent Laboratory : An Autonomous LLM Based Research Framework #amd #johnhopkins

LlamaV o1: Step by Step Visual Reasoning in LLMs

LlamaV o1: Step by Step Visual Reasoning in LLMs

MiniMax-Text-01: Best New Open Source Model Outperforms DeepSeek V3 (Fully Tested)

MiniMax-Text-01: Best New Open Source Model Outperforms DeepSeek V3 (Fully Tested)

“Don’t stop the chances.”

“Don’t stop the chances.”

⚡КОРЕЙЦІ ПРОТИ росіянок

⚡КОРЕЙЦІ ПРОТИ росіянок

Cool Items!🥰 New Gadgets, Smart Appliances, Kitchen Tools Utensils, Home Cleaning, Beauty #shorts

Cool Items!🥰 New Gadgets, Smart Appliances, Kitchen Tools Utensils, Home Cleaning, Beauty #shorts

до конца, там самая счастливая табалапка🐾🐾 #тикток #табалапка

до конца, там самая счастливая табалапка🐾🐾 #тикток #табалапка

Перший наступ КНДРівців

Перший наступ КНДРівців

Что-что Мурсдей говорит? 💭 #симбочка #симба #мурсдей

Что-что Мурсдей говорит? 💭 #симбочка #симба #мурсдей

ЧТО ОПАСНЕЕ? ОТВЕТЫ ВАС ШОКИРУЮТ... (1% ОТВЕЧАЮТ ПРАВИЛЬНО) #Shorts #Глент

ЧТО ОПАСНЕЕ? ОТВЕТЫ ВАС ШОКИРУЮТ... (1% ОТВЕЧАЮТ ПРАВИЛЬНО) #Shorts #Глент

😳Трамп ПОТІШИВ Скабєєву, але одразу РОЗЧАРУВАВ #shorts

😳Трамп ПОТІШИВ Скабєєву, але одразу РОЗЧАРУВАВ #shorts