Weaviate • Vector Database
Weaviate • Vector Database
  • 252
  • 381 418
New embedding model: Contextual Document Embeddings
Traditional document embeddings have a significant limitation: they encode documents independently, without considering their context or neighboring documents.
This means they have to choose a single global weighting for terms, potentially missing important contextual nuances, or overweighting terms that might occur a lot in the dataset. This can be problematic when embedding in different domains or contexts.
✨ The Solution: Contextual Document Embeddings (CDE) ✨
CDE operates in two stages:
1️⃣ Adversarial contrastive learning: batch and embed related context from neighboring documents
2️⃣ Embed the target document while considering the contextual embeddings of the related document batch
CDE can:
- Improve performance in domain-specific scenarios
- Better handle of out-of-domain queries
but also has the benefits of:
- No additional storage requirements during retrieval
- Maintains fast search capabilities
The approach has achieved state-of-the-art results on the MTEB benchmark: huggingface.co/spaces/mteb/leaderboard
Want to dive deeper? Check out the full research paper: arxiv.org/abs/2410.02525
Or try it out with this notebook: github.com/weaviate/recipes/blob/main/weaviate-features/services-research/contextual_document_embeddings.ipynb
▬▬▬▬▬▬▬▬▬▬▬▬ CONNECT WITH US ▬▬▬▬▬▬▬▬▬▬▬▬
- Visit weaviate.io/
- Star us on GitHub github.com/weaviate/weaviate
- Stay updated and subscribe to our newsletter: newsletter.weaviate.io/
- Try out Weaviate Cloud for free here: console.weaviate.cloud/
Got a question?
- Forum: forum.weaviate.io/
- Slack: weaviate.io/slack
Connect with us on
- Twitter: weaviate_io
- LinkedIn: www.linkedin.com/company/weaviate-io/
Переглядів: 641

Відео

Agentic RAG with Erika Cardenas - Weaviate Podcast #109!
Переглядів 582День тому
Hey everyone! Thank you so much for watching the 109th episode of the Weaviate Podcast with Erika Cardenas! Erika, in collaboration with Leonie Monigatti, have recently published "What is Agentic RAG". This blog post that was even covered in VentureBeat with additional quotes from Weaviate Co-Founder and CEO Bob van Luijt! This podcast continues the discussion on all things Agentic RAG, coverin...
Let Me Speak Freely? with Zhi Rui Tam - Weaviate Podcast #108!
Переглядів 25614 днів тому
JSON mode has been one of the biggest enablers for working with Large Language Models! JSON mode is even expanding into Multimodal Foundation models! But how exactly is JSON mode achieved? There are generally 3 paths to JSON mode: (1) constrained generation (such as Outlines), (2) begging the model for a JSON response in the prompt, and (3) A two stage process of generate-then-format. I am BEYO...
Optimize your vector database's search speed, accuracy, and costs
Переглядів 13014 днів тому
Weaviate's new hot, warm, and cold storage tiers offer flexible options for managing resources to optimize search speed, accuracy, and costs 🚀 There are three main levers to adjust: • Choosing the vector index type (HNSW, flat, or dynamic) • Using compression techniques (binary, product, or scalar quantization) • Managing flexible tenant states (active, inactive, or offloaded) Learn when you sh...
SWE-bench with John Yang and Carlos E. Jimenez - Weaviate Podcast #107!
Переглядів 25421 день тому
Hey everyone! Thank you so much for watching the 107th episode of the Weaviate Podcast! This one dives into SWE-bench, SWE-agent, and most recently SWE-bench Multimodal with John Yang from Stanford University and Carlos E. Jimenez from Princeton University! One of the most impactful applications of AI we have seen so far is in programming and software engineering! John, Carlos, and team are at ...
AI in Education with Rose E. Wang - Weaviate Podcast #106!
Переглядів 332Місяць тому
Hey everyone! I am SUPER excited to publish the 106th episode of the Weaviate Podcast featuring Rose E. Wang!! Rose is a Ph.D. student at Stanford University where she has lead incredible research at the cutting-edge of AI applications in Education. The podcast heavily discusses her recent work on Tutor CoPilot! Tutor CoPilot is one of the world's largest randomized control trials on the impact...
Compound AI Systems with Philip Kiely - Weaviate Podcast #105!
Переглядів 408Місяць тому
Hey everyone! Thanks so much for watching the 105th episode of the Weaviate Podcast with Philip Kiely! This one dives into all sorts of apsects related to Compound AI Systems! We are now seeing far better results with AI models by breaking up tasks into multiple stages and inferences. Philip explains the work they are doing at Baseten to optimize and scale deployments of these emerging systems ...
Hack Night at GitHub with Weaviate
Переглядів 242Місяць тому
Beyond hacking and writing code, there’s something incredibly fun about creating environments for likeminded and smart people to get together to learn and hack on new tech. It takes a lot of work, but the reward is great and it's pure vibes. It creates the perfect synergy for incredible things to happen, from rad demos by magically talented people like Leann Chen from Diffbot, Ben A. at Telepor...
Late chunking improves context recall in RAG pipelines
Переглядів 1,1 тис.Місяць тому
Optimizing your chunking techniques is one of the top places to improve performance in your RAG pipelines, but what’s the best one? Jina AI just released a new method called late chunking that takes the same amount of storage space as naive chunking, but solves the problem of lost context, similarly to ColBERT. You can implement it super easily with just a few extra lines in your embedding step...
Matryoshka Representation Learning (MRL) for ML tasks and vector compression
Переглядів 4772 місяці тому
Matryoshka Representation Learning (MRL) for ML tasks and vector compression
AI Agents That Matter with Sayash Kapoor and Benedikt Stroebl - Weaviate Podcast #104!
Переглядів 6112 місяці тому
AI Agents That Matter with Sayash Kapoor and Benedikt Stroebl - Weaviate Podcast #104!
Chat With Your Data With Verba
Переглядів 1,5 тис.2 місяці тому
Chat With Your Data With Verba
MIPRO and DSPy with Krista Opsahl-Ong! - Weaviate Podcast #103
Переглядів 2 тис.2 місяці тому
MIPRO and DSPy with Krista Opsahl-Ong! - Weaviate Podcast #103
AI-Native Development with Guy Podjarny and Bob van Luijt - Weaviate Podcast #102!
Переглядів 2863 місяці тому
AI-Native Development with Guy Podjarny and Bob van Luijt - Weaviate Podcast #102!
Chat with your code: RAG with Weaviate and LlamaIndex
Переглядів 4254 місяці тому
Chat with your code: RAG with Weaviate and LlamaIndex
Scaling Pandas with Devin Petersohn - Weaviate Podcast #101!
Переглядів 3004 місяці тому
Scaling Pandas with Devin Petersohn - Weaviate Podcast #101!
Generative UIs with Lucas Negritto and Bob van Luijt - Weaviate Podcast #100!
Переглядів 7054 місяці тому
Generative UIs with Lucas Negritto and Bob van Luijt - Weaviate Podcast #100!
Advanced AI Agents with RAG
Переглядів 7 тис.4 місяці тому
Advanced AI Agents with RAG
ACORN with Liana Patel and Abdel Rodriguez - Weaviate Podcast #99!
Переглядів 8135 місяців тому
ACORN with Liana Patel and Abdel Rodriguez - Weaviate Podcast #99!
Window Search Tree with Josh Engels - Weaviate Podcast #98!
Переглядів 4095 місяців тому
Window Search Tree with Josh Engels - Weaviate Podcast #98!
Vector Quantization: The Vector Clubhouse Episode 2
Переглядів 2655 місяців тому
Vector Quantization: The Vector Clubhouse Episode 2
AI Renaissance Berlin - AI Buzzwords
Переглядів 1985 місяців тому
AI Renaissance Berlin - AI Buzzwords
The Future of Search with Nils Reimers and Erika Cardenas - Weaviate Podcast #97!
Переглядів 1,3 тис.5 місяців тому
The Future of Search with Nils Reimers and Erika Cardenas - Weaviate Podcast #97!
Deep Learning with Letitia Parcalabescu - Weaviate Podcast #96!
Переглядів 4535 місяців тому
Deep Learning with Letitia Parcalabescu - Weaviate Podcast #96!
All Your Vector Embeddings Are Belong To You
Переглядів 8326 місяців тому
All Your Vector Embeddings Are Belong To You
Open Source RAG running LLMs locally with Ollama
Переглядів 29 тис.6 місяців тому
Open Source RAG running LLMs locally with Ollama
Guest Lecture: Vector Quantization Techniques with Etienne | Brown University CSCI
Переглядів 5696 місяців тому
Guest Lecture: Vector Quantization Techniques with Etienne | Brown University CSCI
DSPy End-to-End: Meetup in San Francisco
Переглядів 6 тис.6 місяців тому
DSPy End-to-End: Meetup in San Francisco
Google Cloud Marketplace with Dai Vu and Bob van Luijt - Weaviate Podcast #95!
Переглядів 3466 місяців тому
Google Cloud Marketplace with Dai Vu and Bob van Luijt - Weaviate Podcast #95!
ParlayANN with Magdalen Dobson Manohar - Weaviate Podcast #94!
Переглядів 3917 місяців тому
ParlayANN with Magdalen Dobson Manohar - Weaviate Podcast #94!