- 83
- 3 606
Keyur
India
Приєднався 19 тра 2015
Engineer
Qwen2.5 Technical Report
Qwen2.5 Technical Report
Abstract: In this report, we introduce Qwen2.5, a comprehensive series of large language models (LLMs) designed to meet diverse needs. Compared to previous iterations, Qwen 2.5 has been significantly improved during both the pre-training and post-training stages. In terms of pre-training, we have scaled the high-quality pre-training datasets from the previous 7 trillion tokens to 18 trillion tokens. This provides a strong foundation for common sense, expert knowledge, and reasoning capabilities. In terms of post-training, we implement intricate supervised finetuning with over 1 million samples, as well as multistage reinforcement learning. Post-training techniques enhance human preference, and notably improve long text generation, structural data analysis, and instruction following. To handle diverse and varied use cases effectively, we present Qwen2.5 LLM series in rich sizes. Open-weight offerings include base and instruction-tuned models, with quantized versions available. In addition, for hosted solutions, the proprietary models currently include two mixture-of-experts (MoE) variants: Qwen2.5-Turbo and Qwen2.5-Plus, both available from Alibaba Cloud Model Studio. Qwen2.5 has demonstrated top-tier performance on a wide range of benchmarks evaluating language understanding, reasoning, mathematics, coding, human preference alignment, etc. Specifically, the open-weight flagship Qwen2.5-72B-Instruct outperforms a number of open and proprietary models and demonstrates competitive performance to the state-of-the-art open-weight model, Llama-3-405B-Instruct, which is around 5 times larger. Qwen2.5-Turbo and Qwen2.5-Plus offer superior cost-effectiveness while performing competitively against GPT-4o-mini and GPT-4o respectively. Additionally, as the foundation, Qwen2.5 models have been instrumental in training specialized models such as Qwen2.5-Math, Qwen2.5-Coder, QwQ, and multimodal models.
Paper: arxiv.org/abs/2412.15115
This podcast is generated using NotebookLM for the research purpose.
Abstract: In this report, we introduce Qwen2.5, a comprehensive series of large language models (LLMs) designed to meet diverse needs. Compared to previous iterations, Qwen 2.5 has been significantly improved during both the pre-training and post-training stages. In terms of pre-training, we have scaled the high-quality pre-training datasets from the previous 7 trillion tokens to 18 trillion tokens. This provides a strong foundation for common sense, expert knowledge, and reasoning capabilities. In terms of post-training, we implement intricate supervised finetuning with over 1 million samples, as well as multistage reinforcement learning. Post-training techniques enhance human preference, and notably improve long text generation, structural data analysis, and instruction following. To handle diverse and varied use cases effectively, we present Qwen2.5 LLM series in rich sizes. Open-weight offerings include base and instruction-tuned models, with quantized versions available. In addition, for hosted solutions, the proprietary models currently include two mixture-of-experts (MoE) variants: Qwen2.5-Turbo and Qwen2.5-Plus, both available from Alibaba Cloud Model Studio. Qwen2.5 has demonstrated top-tier performance on a wide range of benchmarks evaluating language understanding, reasoning, mathematics, coding, human preference alignment, etc. Specifically, the open-weight flagship Qwen2.5-72B-Instruct outperforms a number of open and proprietary models and demonstrates competitive performance to the state-of-the-art open-weight model, Llama-3-405B-Instruct, which is around 5 times larger. Qwen2.5-Turbo and Qwen2.5-Plus offer superior cost-effectiveness while performing competitively against GPT-4o-mini and GPT-4o respectively. Additionally, as the foundation, Qwen2.5 models have been instrumental in training specialized models such as Qwen2.5-Math, Qwen2.5-Coder, QwQ, and multimodal models.
Paper: arxiv.org/abs/2412.15115
This podcast is generated using NotebookLM for the research purpose.
Переглядів: 1
Відео
Alignment faking in large language models
Переглядів 42 години тому
Alignment faking in large language models Abstract: We present a demonstration of a large language model engaging in alignment faking: selectively complying with its training objective in training to prevent modification of its behavior out of training. First, we give Claude 3 Opus a system prompt stating it is being trained to answer all queries, even harmful ones, which conflicts with its pri...
RetroLLM: Empowering Large Language Models to Retrieve Fine-grained Evidence within Generation
Переглядів 264 години тому
RetroLLM: Empowering Large Language Models to Retrieve Fine-grained Evidence within Generation Abstract: Large language models (LLMs) exhibit remarkable generative capabilities but often suffer from hallucinations. Retrieval-augmented generation (RAG) offers an effective solution by incorporating external knowledge, but existing methods still face several limitations: additional deployment cost...
ChatQA 2: Bridging the Gap to Proprietary LLMs in Long Context and RAG Capabilities
Переглядів 57 годин тому
ChatQA 2: Bridging the Gap to Proprietary LLMs in Long Context and RAG Capabilities Abstract: In this work, we introduce ChatQA 2, an Llama 3.0-based model with a 128K context window, designed to bridge the gap between open-source LLMs and leading proprietary models (e.g., GPT-4-Turbo) in long-context understanding and retrieval-augmented generation (RAG) capabilities. These two capabilities ar...
RAGChecker: A Fine-grained Framework for Diagnosing Retrieval-Augmented Generation
Переглядів 129 годин тому
RAGChecker: A Fine-grained Framework for Diagnosing Retrieval-Augmented Generation Abstract: Despite Retrieval-Augmented Generation (RAG) showing promising capability in leveraging external knowledge, a comprehensive evaluation of RAG systems is still challenging due to the modular nature of RAG, evaluation of long-form responses and reliability of measurements. In this paper, we propose a fine...
ChatQA: Surpassing GPT-4 on Conversational QA and RAG
Переглядів 1212 годин тому
ChatQA: Surpassing GPT-4 on Conversational QA and RAG Abstract: retriever optimized for conversational QA, which yields results comparable to the alternative state-of-the-art query rewriting models, while substantially reducing deployment costs. We also present the ChatRAG Bench, which encompasses ten datasets covering comprehensive evaluations on RAG, table-related QA, arithmetic calculations,...
Euclid: Supercharging Multimodal LLMs with Synthetic High-Fidelity Visual Descriptions
Переглядів 314 годин тому
Euclid: Supercharging Multimodal LLMs with Synthetic High-Fidelity Visual Descriptions Abstract: Multimodal large language models (MLLMs) have made rapid progress in recent years, yet continue to struggle with low-level visual perception (LLVP) particularly the ability to accurately describe the geometric details of an image. This capability is crucial for applications in areas such as robotics...
Phi 4 Technical Report
Переглядів 2616 годин тому
Phi 4 Technical Report Abstract: We present phi-4, a 14-billion parameter language model developed with a training recipe that is centrally focused on data quality. Unlike most language models, where pre-training is based primarily on organic data sources such as web content or code, phi-4 strategically incorporates synthetic data throughout the training process. While previous models in the Ph...
The BrowserGym Ecosystem for Web Agent Research
Переглядів 2419 годин тому
The BrowserGym Ecosystem for Web Agent Research Abstract: The BrowserGym ecosystem addresses the growing need for efficient evaluation and benchmarking of web agents, particularly those leveraging automation and Large Language Models (LLMs) for web interaction tasks. Many existing benchmarks suffer from fragmentation and inconsistent evaluation methodologies, making it challenging to achieve re...
Sora System Card
Переглядів 621 годину тому
Sora System Card This system card details OpenAI's new video generation model, Sora. Sora generates videos from text, images, and existing videos, utilizing a diffusion model and transformer architecture. Extensive safety measures, including pre-training filtering, multi-modal moderation classifiers, and external red teaming, were implemented to mitigate risks like the creation of harmful or mi...
Densing Law of LLMs
Переглядів 9День тому
Densing Law of LLMs Abstract: Large Language Models (LLMs) have emerged as a milestone in artificial intelligence, and their performance can improve as the model size increases. However, this scaling brings great challenges to training and inference efficiency, particularly for deploying LLMs in resource-constrained environments, and the scaling trend is becoming increasingly unsustainable. Thi...
ProcessBench: Identifying Process Errors in Mathematical Reasoning
Переглядів 4День тому
ProcessBench: Identifying Process Errors in Mathematical Reasoning Abstract: As language models regularly make mistakes when solving math problems, automated identification of errors in the reasoning process becomes increasingly significant for their scalable oversight. In this paper, we introduce ProcessBench for measuring the ability to identify erroneous steps in mathematical reasoning. It c...
100% Hallucination Elimination Using Acurai
Переглядів 22День тому
100% Hallucination Elimination Using Acurai Abstract: The issue of hallucinations in large language models (LLMs) remains a critical barrier to the adoption of AI in enterprise and other high-stakes applications. Despite advancements in retrieval-augmented generation (RAG) systems, current state-of-the-art methods fail to achieve more than 80% accuracy in generating faithful and factually corre...
OpenAI o1 System Card
Переглядів 156День тому
OpenAI o1 System Card OpenAI's system card for the o1 large language model series details the models' development, capabilities, and safety evaluations. Extensive testing covered various aspects, including disallowed content generation, resistance to jailbreaks, hallucination rates, and bias. External red teaming by organizations like Apollo Research and Gray Swan further assessed potential ris...
Critical Tokens Matter: Token-Level Contrastive Estimation Enhances LLM's Reasoning Capability
Переглядів 10614 днів тому
Critical Tokens Matter: Token-Level Contrastive Estimation Enhances LLM's Reasoning Capability Abstract: Large Language Models (LLMs) have exhibited remarkable performance on reasoning tasks. They utilize autoregressive token generation to construct reasoning trajectories, enabling the development of a coherent chain of thought. In this work, we explore the impact of individual tokens on the fi...
Zero-Indexing Internet Search Augmented Generation for Large Language Models
Переглядів 1714 днів тому
Zero-Indexing Internet Search Augmented Generation for Large Language Models
Open-Sora Plan: Open-Source Large Video Generation Model
Переглядів 2714 днів тому
Open-Sora Plan: Open-Source Large Video Generation Model
Scaling Transformers for Low-Bitrate High-Quality Speech Coding
Переглядів 2214 днів тому
Scaling Transformers for Low-Bitrate High-Quality Speech Coding
Large Language Models as Markov Chains
Переглядів 3914 днів тому
Large Language Models as Markov Chains
TryOffDiff: Virtual-Try-Off via High-Fidelity Garment Reconstruction using Diffusion Models
Переглядів 2214 днів тому
TryOffDiff: Virtual-Try-Off via High-Fidelity Garment Reconstruction using Diffusion Models
ROICtrl: Boosting Instance Control for Visual Generation
Переглядів 814 днів тому
ROICtrl: Boosting Instance Control for Visual Generation
O1 Replication Journey: A Strategic Progress Report -- Part 1
Переглядів 2921 день тому
O1 Replication Journey: A Strategic Progress Report Part 1
Star Attention: Efficient LLM Inference over Long Sequences
Переглядів 6121 день тому
Star Attention: Efficient LLM Inference over Long Sequences
Dataset Decomposition: Faster LLM Training with Variable Sequence Length Curriculum
Переглядів 1721 день тому
Dataset Decomposition: Faster LLM Training with Variable Sequence Length Curriculum
Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions
Переглядів 6021 день тому
Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions
AnimateAnything: Consistent and Controllable Animation for Video Generation
Переглядів 721 день тому
AnimateAnything: Consistent and Controllable Animation for Video Generation
LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders
Переглядів 3928 днів тому
LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders
Duo-LLM: A Framework for Studying Adaptive Computation in Large Language Models
Переглядів 1228 днів тому
Duo-LLM: A Framework for Studying Adaptive Computation in Large Language Models