Xiaol.x
Xiaol.x
  • 235
  • 26 849
OpenAI o3-mini System Card
The OpenAI o model series is trained with large-scale reinforcement learning to reason using chain of thought. These advanced reasoning capabilities provide new avenues for improving the safety and robustness of our models. In particular, our models can reason about our safety policies in context when responding to potentially unsafe prompts, through deliberative alignment [1]1. This brings OpenAI o3-mini to parity with state-of-the-art performance on certain benchmarks for risks such as generating illicit advice, choosing stereotyped responses, and succumbing to known
jailbreaks. Training models to incorporate a chain of thought before answering has the potential to unlock substantial benefits, while also increasing potential risks that stem from heightened
intelligence.Under the Preparedness Framework, OpenAI’s Safety Advisory Group (SAG) recommended classifying the OpenAI o3-mini (Pre-Mitigation) model as Medium risk overall. It scores Medium risk for Persuasion, CBRN (chemical, biological, radiological, nuclear), and Model Autonomy, and Low risk for Cybersecurity. Only models with a post-mitigation score of Medium or below can be deployed, and only models with a post-mitigation score of High or below can be developed further. Due to improved coding and research engineering performance, OpenAI o3-mini is the first model to reach Medium risk on Model Autonomy (see section 5. Preparedness Framework
Evaluations). However, it still performs poorly on evaluations designed to test real-world ML research capabilities relevant for self improvement, which is required for a High classification.Our results underscore the need for building robust alignment methods, extensively stress-testing their efficacy, and maintaining meticulous risk management protocols.This report outlines the safety work carried out for the OpenAI o3-mini model, including safety evaluations, external red teaming, and Preparedness Framework evaluations.
cdn.openai.com/o3-mini-system-card.pdf
Переглядів: 7

Відео

Tensor Product Attention Is All You Need
Переглядів 62 години тому
Scaling language models to handle longer input sequences typically necessitates large key-value (KV) caches, resulting in substantial memory overhead during inference. In this paper, we propose Tensor Product Attention (TPA), a novel attention mechanism that uses tensor decompositions to represent queries, keys, and values compactly, significantly shrinking KV cache size at inference time. By f...
New exponent pairs, zero density estimates, and zero additive energy estimates:a systematic approach
Переглядів 379 годин тому
We obtain several new bounds on exponents of interest in analytic number theory, including four new exponent pairs, new zero density estimates for the Riemann zeta-function, and new estimates for the additive energy of zeroes of the Riemann zeta-function. These results were obtained by creating the Analytic Number Theory Exponent Database (ANTEDB) to collect results and relationships between th...
Foundations of Large Language Models
Переглядів 1354 години тому
This is a book about large language models. As indicated by the title, it primarily focuses on foundational concepts rather than comprehensive coverage of all cutting-edge technologies. The book is structured into four main chapters, each exploring a key area: pre-training, generative models, prompting techniques, and alignment methods. It is intended for college students, professionals, and pr...
AI Agents for Computer Use: A Review of Instruction-based Computer Control, GUI Automation,Operator
Переглядів 594 години тому
AI Agents for Computer Use: A Review of Instruction-based Computer Control, GUI Automation, and Operator Assistants Instruction-based computer control agents (CCAs) execute complex action sequences on personal computers or mobile devices to fulfill tasks using the same graphical user interfaces as a human user would, provided instructions in natural language. This review offers a comprehensive ...
Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs
Переглядів 4809 годин тому
Large language models (LLMs) such as OpenAI's o1 have demonstrated remarkable abilities in complex reasoning tasks by scaling test-time compute and exhibiting human-like deep thinking. However, we identify a phenomenon we term underthinking, where o1-like LLMs frequently switch between different reasoning thoughts without sufficiently exploring promising paths to reach a correct solution. This ...
IntellAgent: A Multi-Agent Framework for Evaluating Conversational AI Systems
Переглядів 389 годин тому
Large Language Models (LLMs) are transforming artificial intelligence, evolving into task-oriented systems capable of autonomous planning and execution. One of the primary applications of LLMs is conversational AI systems, which must navigate multi-turn dialogues, integrate domain-specific APIs, and adhere to strict policy constraints. However, evaluating these agents remains a significant chal...
Tell me about yourself: LLMs are aware of their learned behaviors
Переглядів 8139 годин тому
We study behavioral self-awareness an LLM's ability to articulate its behaviors without requiring in-context examples. We finetune LLMs on datasets that exhibit particular behaviors, such as (a) making high-risk economic decisions, and (b) outputting insecure code. Despite the datasets containing no explicit descriptions of the associated behavior, the finetuned LLMs can explicitly describe it....
Logic-of-Thought: Injecting Logic into Contexts for Full Reasoning in Large Language Models
Переглядів 759 годин тому
Large Language Models (LLMs) have demonstrated remarkable capabilities across various tasks but their performance in complex logical reasoning tasks remains unsatisfactory. Although some prompting methods, such as Chain-of-Thought, can improve the reasoning ability of LLMs to some extent, they suffer from an unfaithful issue where derived conclusions may not align with the generated reasoning c...
o3-mini vs DeepSeek-R1: Which One is Safer?
Переглядів 789 годин тому
The irruption of DeepSeek-R1 constitutes a turning point for the AI industry in general and the LLMs in particular. Its capabilities have demonstrated outstanding performance in several tasks, including creative thinking, code generation, maths and automated program repair, at apparently lower execution cost. However, LLMs must adhere to an important qualitative property, i.e., their alignment ...
Learning high-accuracy error decoding for quantum processors
Переглядів 1189 годин тому
Building a large-scale quantum computer requires effective strategies to correct errors that inevitably arise in physical quantum systems1. Quantum error-correction codes2 present a way to reach this goal by encoding logical information redundantly into many physical qubits. A key challenge in implementing such codes is accurately decoding noisy syndrome information extracted from redundancy ch...
Trading Inference-Time Compute for Adversarial Robustness
Переглядів 4612 годин тому
Robustness to adversarial attacks⁠(opens in a new window) has been one of the thorns in AI’s side for more than a decade. In 2014, researchers showed⁠(opens in a new window) that imperceptible perturbations-subtle alterations undetectable to the human eye-can cause models to misclassify images, illustrating one example of a model’s vulnerability to adversarial attacks. Addressing this weakness ...
FrontierMath: A Benchmark for Evaluating Advanced Mathematical Reasoning in AI
Переглядів 39812 годин тому
We introduce FrontierMath, a benchmark of hundreds of original, exceptionally challenging mathematics problems crafted and vetted by expert mathematicians. The questions cover most major branches of modern mathematics from computationally intensive problems in number theory and real analysis to abstract questions in algebraic geometry and category theory. Solving a typical problem requires mult...
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Переглядів 73814 годин тому
Mathematical reasoning poses a significant challenge for language models due to its complex and structured nature. In this paper, we introduce DeepSeekMath 7B, which continues pre-training DeepSeek-Coder-Base-v1.5 7B with 120B math-related tokens sourced from Common Crawl, together with natural language and code data. DeepSeekMath 7B has achieved an impressive score of 51.7% on the competition-...
Theory, Analysis, and Best Practices for Sigmoid Self-Attention
Переглядів 6514 годин тому
Attention is a key part of the transformer architecture. It is a sequence-to-sequence mapping that transforms each sequence element into a weighted sum of values. The weights are typically obtained as the softmax of dot products between keys and queries. Recent work has explored alternatives to softmax attention in transformers, such as ReLU and sigmoid activations. In this work, we revisit sig...
An Operating Principle of the Cerebral Cortex,a Cellular Mechanism for Attentional Pattern Learning
Переглядів 33114 годин тому
An Operating Principle of the Cerebral Cortex,a Cellular Mechanism for Attentional Pattern Learning
Hallucinations Can Improve Large Language Models in Drug Discovery
Переглядів 24616 годин тому
Hallucinations Can Improve Large Language Models in Drug Discovery
Qwen2.5-1M Technical Report
Переглядів 4516 годин тому
Qwen2.5-1M Technical Report
ARWKV: Pretrain is not what we need, an RNN-Attention-Based Language Model Born from Transformer
Переглядів 94619 годин тому
ARWKV: Pretrain is not what we need, an RNN-Attention-Based Language Model Born from Transformer
Model Swarms: Collaborative Search to Adapt LLM Experts via Swarm Intelligence
Переглядів 18219 годин тому
Model Swarms: Collaborative Search to Adapt LLM Experts via Swarm Intelligence
Schrodinger's Memory: Large Language Models
Переглядів 6819 годин тому
Schrodinger's Memory: Large Language Models
Janus-Pro: Unified Multimodal Understanding andGeneration with Data and Model Scaling
Переглядів 8719 годин тому
Janus-Pro: Unified Multimodal Understanding andGeneration with Data and Model Scaling
Humanity's Last Exam
Переглядів 39719 годин тому
Humanity's Last Exam
Aya Expanse: Combining Research Breakthroughs for a New Multilingual Frontier
Переглядів 132День тому
Aya Expanse: Combining Research Breakthroughs for a New Multilingual Frontier
A Theoretical Understanding of Chain-of-Thought: Coherent Reasoning and Error-Aware Demonstration
Переглядів 2,2 тис.День тому
A Theoretical Understanding of Chain-of-Thought: Coherent Reasoning and Error-Aware Demonstration
Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA
Переглядів 557День тому
Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA
MrT5: Dynamic Token Merging for Efficient Byte-level Language Models
Переглядів 53День тому
MrT5: Dynamic Token Merging for Efficient Byte-level Language Models
Reverse Thinking Makes LLMs Stronger Reasoners
Переглядів 263День тому
Reverse Thinking Makes LLMs Stronger Reasoners
The Geometry of Concepts: Sparse Autoencoder Feature Structure
Переглядів 286День тому
The Geometry of Concepts: Sparse Autoencoder Feature Structure
Does Prompt Formatting Have Any Impact on LLM Performance?
Переглядів 237День тому
Does Prompt Formatting Have Any Impact on LLM Performance?

КОМЕНТАРІ

  • @Rajeshammm
    @Rajeshammm 6 хвилин тому

    It is too predictable

  • @christopherd.winnan8701
    @christopherd.winnan8701 6 годин тому

    Thank you, I did not not expect to start the evening with a review of Terence Tao's latest paper!

  • @markm1514
    @markm1514 7 годин тому

    Sounds like a Manchurian candidate, it's fascinating that the system is apparently aware of the anomalous behaviors. The world's legal systems aren't equipped to address the possibilities of AI agents in the wild.

  • @xinyu0123
    @xinyu0123 7 годин тому

    The two AIs mimic human speech by incorporating a stuttering tone, but it's a bit too much😅

    • @Xiaoliu.x
      @Xiaoliu.x 6 годин тому

      Have to say that some concepts are very difficult to explain especially there is no intuitive understanding.

  • @Fitzroy_Edvin
    @Fitzroy_Edvin 8 годин тому

    Man its such a fun way to read articles by listening to them like a pod. Thank you for sharing man.

  • @DaveEtchells
    @DaveEtchells 18 годин тому

    Man, these are excellent! (If not just a little bit too pat😁) What LLM do you use to generate these? The voices are great and the two personalities are very well done. It must have taken a lot of prompt tweaking to get them to come out this good. (Do you need to intervene manually at all to get individual episodes to come out the way you want, or is it always just the same basic system prompts at work? Do you do the dialogue first and then have another tool read the output, or does it all happen within a single voice-output model? However you do it, high-five for the amazing quality and natural dialogue! 👍👍👍

  • @coder-x7440
    @coder-x7440 22 години тому

    Makes you wonder what the hell we’re doing.. we’re building something that can replace us in every way. Why are we doing this?? We’re leaving these decisions to CEO’s???

  • @splashkingxd9892
    @splashkingxd9892 День тому

    Genuine trash content how lazy do you have to be to produce something this vile

    • @VoiceTotheEndsOfTheEarth
      @VoiceTotheEndsOfTheEarth 21 годину тому

      It's an AI podcast. They're not real.

    • @Xiaoliu.x
      @Xiaoliu.x 17 годин тому

      Xiaol's ongoing work: ua-cam.com/play/PLgNKUrtyf8v4cFuK4U9SLflkygX4nNA62.html

    • @splashkingxd9892
      @splashkingxd9892 15 годин тому

      @@Xiaoliu.x I would like to formerly apologise for my previous comment I was misunderstood and rude good luck on your research! People would greatly appreciate if you did voice overs yourself discussing the papers wish you the best !❤️

  • @ShtrikeBirb
    @ShtrikeBirb День тому

    why are these voices SO annoying?

  • @Thebeast_QwQ
    @Thebeast_QwQ День тому

    i can't understand if the commentary is humans or really good ai 😭 it's good though

  • @geertdejonge4194
    @geertdejonge4194 День тому

    DeepSeek is performing better on math questions than ChatGPT about a year ago. I did some comparative testing. But it went wrong on the last question. The last answer was "The server is busy. Please try again later." That was a wrong answer. On that question ChatGPT performed better.

  • @geertdejonge4194
    @geertdejonge4194 День тому

    NotebookLM podcast conversation. Great analysis. It's always the deep dive. And the pattern is always the same. With some nice soundbites. Nice voices.

  • @ForbiddenCatBelly
    @ForbiddenCatBelly 2 дні тому

    What’s the cost of teaching an LLM to use a mic properly?

  • @MohammedShahrani
    @MohammedShahrani 2 дні тому

    The fact that they are AI themselves is another dimension of funny

  • @charliewhite3061
    @charliewhite3061 2 дні тому

    Nice notebook LLM generated podcast brother

  • @Petrvsco
    @Petrvsco 2 дні тому

    2:51 this shows this AI podcast generator did not understand the question, which is about the script BELOW the latin text.

  • @griffinarcher2911
    @griffinarcher2911 2 дні тому

    this ai slop is killing me -_-

  • @0xColdGlass
    @0xColdGlass 2 дні тому

    Which AI tools are you using for the conversation: text and audio? 🙏

    • @Petrvsco
      @Petrvsco 2 дні тому

      That’s NotebookLM

  • @RSDMNDZ
    @RSDMNDZ 3 дні тому

    These are AI's discussing this report

    • @CeoZorro
      @CeoZorro 3 дні тому

      stop lying, you hater! You're just mad because you sound fat when you do voiceover

  • @zhangalex734
    @zhangalex734 3 дні тому

    NotebookLM?

  • @imranullah3097
    @imranullah3097 4 дні тому

    Bro using ai for explanation 😂

  • @GilliardLima7
    @GilliardLima7 4 дні тому

    Top d+! Cada teste que faço me surpreende mais o o3-mini

  • @joeljose9428
    @joeljose9428 5 днів тому

    wait a min 2 human like ai convo created using deepseek loll ???

  • @Matt-y5o1
    @Matt-y5o1 5 днів тому

    Wow, amazing review! "we've gone from single neurons to these collaborative minicolumns, explored the role of the MTL, dived deep into free will and attention, and now we are connecting it all to language" 😃

  • @S1LLY_C0ST4_L0V3R
    @S1LLY_C0ST4_L0V3R 5 днів тому

    Are the voices real humans or AI chatbots?

    • @TheXComputerXDr
      @TheXComputerXDr 5 днів тому

      Chat bots... I would be surprised if they are not AI.

    • @HurricaneEmily
      @HurricaneEmily 4 дні тому

      It’s pretty obvious they are AI. There aren’t any pauses in speech between speakers, no fillers like um, no flaws in language.

    • @TheXComputerXDr
      @TheXComputerXDr 4 дні тому

      @@HurricaneEmily How funny would it be if it was just highly edited, all sliced up...

    • @HurricaneEmily
      @HurricaneEmily 4 дні тому

      It could be highly edited but I doubt it. It sounds like the two voices are reading, not having a natural conversation. This is what bad actors sound like. There is a sense you get from humans that they are genuinely listening to what another person is saying when they’re talking. This quality is completely lacking from this conversation which makes it hard to listen to. I almost clicked off the video because it was so grating but I was interested in the information. However, because it was delivered by AI by a human who was willing to deceive the viewers by not informing them that it was a conversation between two AIs, I’m highly skeptical of the truth behind it.

    • @TheXComputerXDr
      @TheXComputerXDr 4 дні тому

      @@HurricaneEmily I don't think you can value information based off of it's presenter, it's too difficult to discern the truth that way. That being said, there is a lot of information as far as this "controlled chaos", and the hallucinations they talk about are icebergs of information in them selves, but is only talked about at a very surface, vague level, to the point where there's not much substance to the conversation, therefore not much to be disputed, because "they" didn't really talk about much, the touched the tip of a few icebergs and said, "oooo, interesting". That being said, hallucinations in llm's are real, controlled chaos is a subject of philosophy religion and mysticism since ancient times, and they could be inter related. But that topic is so much bigger than this tiny video could ever encapsulate. But good points you made, the way the AI talks to itself is definitely not congruent with any conversations with real people that I have been a part of, it's as you say, like they are reading from books, not really talking to each other, they are more or less reading lines from a "script" or book that one person (or AI) formed.

  • @S1LLY_C0ST4_L0V3R
    @S1LLY_C0ST4_L0V3R 5 днів тому

    Wait is this a conversation between actual people or AI bots?

  • @Xiaoliu.x
    @Xiaoliu.x 5 днів тому

    1:15 math example.

  • @rmt3589
    @rmt3589 5 днів тому

    2:00 Wait a second. I know these voices! #NotebookLM

  • @ko.pi.pe.
    @ko.pi.pe. 6 днів тому

    Awesome work!

  • @marcelguinhos9022
    @marcelguinhos9022 6 днів тому

    The windows sounds in the background hehe

  • @Xiaoliu.x
    @Xiaoliu.x 6 днів тому

    Can use chat.deepseek.com/ explain the final unified formulation presented in the paper's snapshot.

  • @DobladorR
    @DobladorR 6 днів тому

    Who are the people talking. Do they have a UA-cam channel or similar?

    • @DobladorR
      @DobladorR 6 днів тому

      Omg it's NotebookLM, I didn't knows about it. I'm an experienced software engineer and I something feel like a donkey in this new AI world

  • @JensDoll
    @JensDoll 6 днів тому

    Please write the script yourself next time. Those sloppy AI jokes are horrible. From the first seconds: "Literal crystals?" Who would say something like that in an paper discussion for ML? And use your own voice. This is AI slop. Use your own voice.

    • @Xiaoliu.x
      @Xiaoliu.x 6 днів тому

      ua-cam.com/video/GwiWDwgBsNw/v-deo.htmlsi=SQOr7snjaAv2-i1v this is my ongoing work.

  • @ankitnegi1201
    @ankitnegi1201 6 днів тому

    Is there any other exam ?

    • @Xiaoliu.x
      @Xiaoliu.x 6 днів тому

      frontier math

    • @ankitnegi1201
      @ankitnegi1201 6 днів тому

      @Xiaoliu.x grateful to you

    • @ankitnegi1201
      @ankitnegi1201 6 днів тому

      @@Xiaoliu.x can I have the link please

    • @Xiaoliu.x
      @Xiaoliu.x 5 днів тому

      ua-cam.com/video/J1GGd0T94qI/v-deo.html

    • @Bao_Lei
      @Bao_Lei 14 годин тому

      @@Xiaoliu.x can you summerize frontier math in 999999 words?

  • @rafaelgonzalez4175
    @rafaelgonzalez4175 6 днів тому

    I think you should rethink this through. Get some coherence, in your mental subway.

  • @jaymee_
    @jaymee_ 6 днів тому

    This sounds incredible! Does that mean universal translators might be possible? Hmmm... I also wonder how hardware developers should use this information. Maybe if we assume that since certain common structures will created in the information, information might be more efficiently accessed if we design caches to be related to each other in similarly common ways. Thanks for posting this! Keeps me excited for school.

    • @Xiaoliu.x
      @Xiaoliu.x 6 днів тому

      any form information is kind of language.

  • @Matt-y5o1
    @Matt-y5o1 6 днів тому

    Hi, nice review! Can you please review this: Rvachev (2024) An operating principle of the cerebral cortex, and a cellular mechanism for attentional trial-and-error pattern learning and useful classification extraction. Frontiers in Neural Circuits?

    • @Xiaoliu.x
      @Xiaoliu.x 6 днів тому

      ua-cam.com/video/8ipE0mRr1gQ/v-deo.html , i have tried my best to elaborate the paper.

    • @Matt-y5o1
      @Matt-y5o1 6 днів тому

      Thank you, that's a very impressive review!

  • @Matt-y5o1
    @Matt-y5o1 6 днів тому

    Good stuff! Could you please review this: Rvachev (2024) An operating principle of the cerebral cortex, and a cellular mechanism for attentional trial-and-error pattern learning and useful classification extraction. Frontiers in Neural Circuits?

  • @sblowes
    @sblowes 6 днів тому

    Thanks for chiming in on the bots!

  • @kapilk1644
    @kapilk1644 7 днів тому

    what is this shit the narration is AI?

  • @hey_im_him
    @hey_im_him 7 днів тому

    I didn’t know we could just post Notebook LM audio. I’ve had issues with the conversation not flowing, I noticed it in yours too. I want to see if I can get mine to teach, not just summarize the paper.

    • @Xiaoliu.x
      @Xiaoliu.x 7 днів тому

      Yes, you probably can make it treat you as a student.

  • @footloose1187
    @footloose1187 7 днів тому

    She sounds so sexy..

  • @bob_412
    @bob_412 7 днів тому

    Hearing something smart discussed in a podcast format sounds as weird as seeing batman on a sunny day

  • @ImmaSupportThisWeb
    @ImmaSupportThisWeb 8 днів тому

    i want to listen but i hate listening to AI voice, so decided to not finish with this video, maybe that is something you guys can feedback on

  • @JD-po3uk
    @JD-po3uk 8 днів тому

    I cant listen to this, they are too annoying

  • @JD-po3uk
    @JD-po3uk 8 днів тому

    the girl jumping in is weird. cutting each other off. same thing for the guy. hard to listen to

  • @Quaquaquaqua
    @Quaquaquaqua 8 днів тому

    Lol asking notebookllm to explain llms

  • @christopherd.winnan8701
    @christopherd.winnan8701 8 днів тому

    What happened to your mike?

  • @homocholo9635
    @homocholo9635 8 днів тому

    AI voice casters?

    • @connorcolestock4757
      @connorcolestock4757 8 днів тому

      I'm curious, my thoughts too lol

    • @hEmZoRz
      @hEmZoRz 8 днів тому

      Yeah, you'll get this by feeding the PDF to Google's Notebook LM.

    • @Xiaoliu.x
      @Xiaoliu.x 5 днів тому

      New videos revealed.

  • @emekaukoha9847
    @emekaukoha9847 8 днів тому

    Woww, this channel just got recommended to me and i must say it's one of the best unplanned things that happened to me today Keep publishing more🎉🎉