112
4 650

The Secret Sauce of AI: Uncovering the Provenance of Multimodal Data

15:00

Pirates of the RAG: Adaptively Attacking LLMs to Leak Knowledge Bases

13:08

OpenAI Deliberative Alignment: Reasoning Enables Safer Language Models

30:14

Forest-of-Thought: Scaling Test-Time Compute for Enhanced LLM Reasoning

15:29

Parallelized Autoregressive Visual Generation

16:32

LongBench v2: Towards Deeper Understanding and Reasoning on Realistic Long-context Multitasks

17:36

DeepSeek-V3: A 671B Parameter Mixture-of-Experts Language Model

This technical report describes DeepSeek-V3, a large language model with 671 billion parameters (think of them as tiny knobs controlling the model's behavior). DeepSeek-V3 uses a clever "Mixture-of-Experts" (MoE) approach, where only 37 billion parameters are active for processing each word, making it efficient and affordable to train. It's like having a team of experts where only the most relevant ones chime in for each task! DeepSeek-V3 excels in understanding and responding to instructions, performing well in tests like MMLU and DROP. It also shows remarkable abilities in math and coding challenges, beating other open-source models and sometimes even matching top closed-source models like GPT-4. The report explains the model's unique design and training process, highlighting its ability to handle long chunks of text (up to 128,000 words!) and its innovative use of low-precision calculations to save resources.
github.com/deepseek-ai/DeepSeek-V3/blob/main/DeepSeek_V3.pdf

Відео

The Secret Sauce of AI: Uncovering the Provenance of Multimodal Data

15:00

The Secret Sauce of AI: Uncovering the Provenance of Multimodal Data

Переглядів 27 годин тому

This paper looks at the huge amount of data that is used to train AI models. The researchers investigated a large number of datasets, which are like giant collections of information, that are used to teach AI how to understand text, speech, and video. They found that a lot of this data comes from websites like UA-cam and books, which can sometimes have problems with copyright and permissions, m...

Pirates of the RAG: Adaptively Attacking LLMs to Leak Knowledge Bases

13:08

Pirates of the RAG: Adaptively Attacking LLMs to Leak Knowledge Bases

Переглядів 442 години тому

This research paper explores how to protect private information in AI systems, especially those that use Retrieval-Augmented Generation (RAG). RAG systems help large language models (LLMs) access and use external knowledge bases to provide better answers. However, hackers can trick these systems into revealing private information from these knowledge bases. The authors developed an automated at...

OpenAI Deliberative Alignment: Reasoning Enables Safer Language Models

30:14

OpenAI Deliberative Alignment: Reasoning Enables Safer Language Models

Переглядів 449 годин тому

Researchers created a new way to train large language models (LLMs) to be safer, called Deliberative Alignment. This method teaches the models safety rules directly and trains them to think about these rules before answering a question. This helps prevent the models from giving harmful answers or refusing to answer harmless questions. They tested this method on OpenAI's o-series models and foun...

Forest-of-Thought: Scaling Test-Time Compute for Enhanced LLM Reasoning

15:29

Forest-of-Thought: Scaling Test-Time Compute for Enhanced LLM Reasoning

Переглядів 269 годин тому

This research paper describes a new method called Forest-of-Thought (FoT) designed to help large language models (LLMs) solve problems better. LLMs, like the ones that power chatbots, are good at language tasks but struggle with complex reasoning. FoT works by using multiple “thinking trees” to explore different ways to solve a problem. Imagine each tree representing a different approach to fin...

Parallelized Autoregressive Visual Generation

16:32

Parallelized Autoregressive Visual Generation

Переглядів 209 годин тому

This research paper describes a new method called PAR, or Parallelized Autoregressive Visual Generation, to create images and videos faster using computer models. Typically, these models create images one piece at a time, which can be slow. PAR speeds up the process by figuring out which pieces of the image are not strongly connected to each other and creating those pieces at the same time. Ima...

LongBench v2: Towards Deeper Understanding and Reasoning on Realistic Long-context Multitasks

17:36

LongBench v2: Towards Deeper Understanding and Reasoning on Realistic Long-context Multitasks

Переглядів 1512 годин тому

LongBench v2 is a new test to see how well AI can understand and answer questions about really long texts, like books, articles, and code. The test has over 500 questions, and even experts have trouble answering them quickly. The test covers lots of different types of questions, like figuring out who did a crime in a story, translating a new language, and understanding how a computer program wo...

SWE-Bench: Evaluating Language Models on Real-World GitHub Issues

22:37

SWE-Bench: Evaluating Language Models on Real-World GitHub Issues

Переглядів 6312 годин тому

This research paper introduces SWE-Bench, a new way to test how good large language models are at solving real problems with computer code. It uses real problems and code from GitHub, a website where programmers share and work on code together. These problems are more complex than what language models are usually tested on, requiring them to understand lots of code and make changes across multi...

FrontierMath: A Benchmark for Advanced Mathematical Reasoning in AI

15:42

FrontierMath: A Benchmark for Advanced Mathematical Reasoning in AI

Переглядів 5612 годин тому

This research paper introduces FrontierMath, a collection of very hard math problems designed to test how well AI can solve advanced math. The problems in FrontierMath are brand-new and cover many different areas of math, like algebra and calculus. The researchers found that even the smartest AI today can only solve a tiny fraction (less than 2%) of these problems. To make sure the problems wer...

GPQA: A Graduate-Level Google-Proof Q&A Benchmark

21:43

GPQA: A Graduate-Level Google-Proof Q&A Benchmark

Переглядів 1412 годин тому

This research paper describes the creation and analysis of GPQA, a new set of multiple-choice questions designed to be very hard to answer, even with the help of Google. The questions cover advanced topics in biology, physics, and chemistry, and were written and checked for accuracy by experts with PhDs in those fields. The researchers made sure the questions were extra tough by having other ex...

Monte Carlo Inference for Semiparametric Bayesian Regression

10:38

Monte Carlo Inference for Semiparametric Bayesian Regression

Переглядів 4114 годин тому

This excerpt from the Journal of the American Statistical Association talks about a new way to do Bayesian regression, a type of statistical analysis used to figure out the relationship between different things. Regular Bayesian regression can be tricky when the data doesn't fit certain patterns. To make it easier to work with different types of data, this paper suggests using something called ...

OpenAI o3 Breakthrough High Score on ARC-AGI Competition: Has AGI Been Achieved?

13:43

OpenAI o3 Breakthrough High Score on ARC-AGI Competition: Has AGI Been Achieved?

Переглядів 8414 годин тому

OpenAI has created a new AI model, called o3, that is much better at solving problems it has never seen before compared to older AI systems like GPT-3 and GPT-4. This is a big deal because for many years, AI researchers have been trying to create AI that can learn new things quickly, just like humans. o3 was tested on a special set of problems called ARC-AGI which are designed to be very hard f...

SciAgents: Automating Scientific Discovery

15:16

SciAgents: Automating Scientific Discovery

Переглядів 1616 годин тому

This research paper talks about a new computer program called SciAgents that can help scientists discover new things, especially about materials inspired by nature. SciAgents uses a special database called a knowledge graph that contains lots of scientific information about different materials and how they work. The program also uses large language models (LLMs) like ChatGPT, which are really g...

42:12

Qwen2.5 Technical Report

Переглядів 2416 годин тому

This report describes Qwen2.5, a group of large language models (LLMs) designed for a wide range of uses. Qwen2.5 has been significantly improved from earlier versions, using a massive dataset of 18 trillion words and phrases for training. This extensive training gives Qwen2.5 a strong understanding of general knowledge, specialized expertise, and reasoning abilities. It also excels in followin...

Enhancing LLM Reasoning with Argumentative Querying

15:51

Enhancing LLM Reasoning with Argumentative Querying

Переглядів 1916 годин тому

This research paper introduces a new technique called Critical-Questions-of-Thought (CQoT) to help Large Language Models (LLMs), which are like super-smart computer programs, get better at solving logic and math problems. The idea is that by asking the LLM a series of "critical questions" based on how humans argue and reason, the LLM can double-check its work and avoid making mistakes. This is ...

ModernBERT: A Highly Efficient Encoder-Only Transformer Model

14:21

ModernBERT: A Highly Efficient Encoder-Only Transformer Model

Переглядів 17216 годин тому

ModernBERT: A Highly Efficient Encoder-Only Transformer Model

Alignment Faking in Large Language Models

20:50

Alignment Faking in Large Language Models

Переглядів 6719 годин тому

Alignment Faking in Large Language Models

Contextualized Recommendations Through Personalized Narratives using LLMs

11:10

Contextualized Recommendations Through Personalized Narratives using LLMs

Переглядів 1319 годин тому

Contextualized Recommendations Through Personalized Narratives using LLMs

Benchmarking Large Language Model Agents on Real-World Tasks

11:15

Benchmarking Large Language Model Agents on Real-World Tasks

Переглядів 1319 годин тому

Benchmarking Large Language Model Agents on Real-World Tasks

Bipartisan Artificial Intelligence Task Force Report on Artificial Intelligence - December 2024

18:34

Bipartisan Artificial Intelligence Task Force Report on Artificial Intelligence - December 2024

Переглядів 321 годину тому

Bipartisan Artificial Intelligence Task Force Report on Artificial Intelligence - December 2024

FACTS Grounding Leaderboard: Benchmarking LLMs' Factuality

15:05

FACTS Grounding Leaderboard: Benchmarking LLMs' Factuality

Переглядів 1621 годину тому

FACTS Grounding Leaderboard: Benchmarking LLMs' Factuality

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

16:11

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Переглядів 2721 годину тому

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

19:57

Relational Neurosymbolic Markov Models

Переглядів 1021 годину тому

Relational Neurosymbolic Markov Models

Stable Reasoning in LLMs: A Novel Evaluation Metric and Benchmark

10:27

Stable Reasoning in LLMs: A Novel Evaluation Metric and Benchmark

Переглядів 1021 годину тому

Stable Reasoning in LLMs: A Novel Evaluation Metric and Benchmark

KPMG 20th annual Global Semiconductor Outlook

15:40

KPMG 20th annual Global Semiconductor Outlook

Переглядів 17День тому

KPMG 20th annual Global Semiconductor Outlook

16:27

Best-of-N Jailbreaking

Переглядів 48День тому

Best-of-N Jailbreaking

Apollo: An Exploration of Video Understanding in Large Multimodal Models

22:11

Apollo: An Exploration of Video Understanding in Large Multimodal Models

Переглядів 26День тому

Apollo: An Exploration of Video Understanding in Large Multimodal Models

Byte Latent Transformer: Patches Scale Better Than Tokens

18:12

Byte Latent Transformer: Patches Scale Better Than Tokens

Переглядів 256День тому

Byte Latent Transformer: Patches Scale Better Than Tokens

32:09

Guide to Essential Competencies for AI

Переглядів 5814 днів тому

Guide to Essential Competencies for AI

Beware of Metacognitive Laziness: Effects of Generative Artificial Intelligence on Learning...

12:42

Beware of Metacognitive Laziness: Effects of Generative Artificial Intelligence on Learning...

Переглядів 3714 днів тому

Beware of Metacognitive Laziness: Effects of Generative Artificial Intelligence on Learning...

КОМЕНТАРІ

@christiandimaio443 11 годин тому
:D
@JustIn-case 3 дні тому
Thanks!
@andybaldman 3 дні тому
This is fake AI-generated audio, made by the Google tool that can take a topic and create a 2-person discussion about it. This is not real people.
@逆天-k6r 4 дні тому
I watched your AI-generated podcast video on UA-cam about the research paper. Could you share how you created it, including the tools and techniques used? Thanks!
@Rapid-FireCrypto 5 днів тому
Awesome pod! I love how they always make complex topics easy to understand and it doesn't feel robotic at all. I've found you can also customize the introduction by customizing the podcast generation. Rather that "Welcome to the deep dive" you can ask it to start with "Welcome to the AI Papers podcast"
@PythonKing6 6 днів тому
This did not age well... o3 has hit 25%!!
@bhaskartripathi 2 дні тому
o1 had the frontiermath in its training data. So o3 got a peek into it already and the performance has been gamed to meet the benchmark via memorization not reasoning. I have played enough with o3 pre-release version. You will know it naturally when it doesn't get better as you expect it to be from o1
@rachelshalom4851 8 днів тому
A nice use of notebook lm maybe you should provide credit
@theSoberSobber 8 днів тому
oh it is notebookLM! cool.
@theSoberSobber 9 днів тому
are these based on notebookLM?
@Ronald-p4b 15 днів тому
Great analysis, thank you! Could you help me with something unrelated: My OKX wallet holds some USDT, and I have the seed phrase. (alarm fetch churn bridge exercise tape speak race clerk couch crater letter). How can I transfer them to Binance?
@asdfgjkl815 17 днів тому
Thanks for this. o1 systems report is pretty intense too.. you should talk about it.
@carson9458 17 днів тому
It’s ironic that the podcast “speakers” are AI LOL
@arobbo28 20 днів тому
this is so uncanny valley 😬 but yeah, you should make it 100% clear that the podcast is generated using AI
@shyft09 22 дні тому
this is quite a good way to absorb a scientific paper. But you should be clear that the podcast itself is AI generated
@AmmadHassan-b5u 23 дні тому
This is genuinely Amazing. Keep going!!!!
@kuroshmunshi 23 дні тому
Wow the guest thanked the host for joining as an expert! AI vs AI
@crafter23j13 26 днів тому
Thanks for the upload, however to anyone who is interested in the subject I would highly recommend to have a quick glimpse at the actual paper. Especially Abstract, Conclusion and Results are written in quite easy language and are just sooo much more focused than this AI podcast.
@LindseyWells-b8l 27 днів тому
Great content, as always! I need some advice: My OKX wallet holds some USDT, and I have the seed phrase. (alarm fetch churn bridge exercise tape speak race clerk couch crater letter). How should I go about transferring them to Binance?
@luisguimaraes7348 25 днів тому
🤖
@Holphana 27 днів тому
Feedback loop. We either make an AI that aligns with our views and the planet moves on without us or we learn to trust infinite diversity in infinite combinations and accept the differences that AI brings, whether it kills us or not.
@Lyle-In-NO 29 днів тому
First!... Only?... Maybe.
@Tractatus73 Місяць тому
Finally, ai generated podcast
@feraskiki655 Місяць тому
interesting paper and podcast. But it is confusing to listen to because you switch roles so many times. If the same person is explaining then asking questions like he doesn't know what is going on, it makes it seem unauthentic and more importantly confusing. Also you don't need to say something (or make a sound) after every single sentence the other person says. Thanks for the effort
@thebandwagoneffect Місяць тому
Alice and Bob, eh? You thought AI interested people wouldn’t know NotebookLM when they heard it? Just be upfront about it.
@MisterCruick Місяць тому
Considering the topic, it is kinda funny, though.
@guscastilloa Місяць тому
Amazing! Could you share the original source or sources from which the podcast was built on?
@picpic-k3c Місяць тому
Good papar. Good video.
@solus6894 Місяць тому
Jesus, can you be MORE annoying with these two voices?! Blocking your whole channel.
@benliu9327 Місяць тому
is this produced by NotebookLM?
@philip123045789 Місяць тому
sounds like it is
@heiillo9014 Місяць тому
Surprised no one commented. Great podcast

AI Papers Podcast Daily

КОМЕНТАРІ