Skip to content

Commit

Permalink
Merge pull request #1478 from qdrant/vst-deasylabs
Browse files Browse the repository at this point in the history
  • Loading branch information
sabrinaaquino authored Feb 24, 2025
2 parents 587314a + 7629fa6 commit f8e241d
Show file tree
Hide file tree
Showing 6 changed files with 71 additions and 0 deletions.
71 changes: 71 additions & 0 deletions qdrant-landing/content/blog/metadata-deasy-labs.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
---
title: "Metadata automation and optimization - Reece Griffiths | Vector Space Talks"
draft: false
slug: metadata-deasy-labs # Change this slug to your page slug if needed
short_description: Reece Griffiths discusses the power of metadata in retrieval-augmented generation (RAG), how Deasy Labs automates metadata, and best practices for metadata optimization.
description: Metadata plays a critical role in vector search accuracy, yet it’s often overlooked. In this episode of Vector Space Talks, Reece Griffiths, CEO of Deasy Labs, explains why metadata automation is essential for scalable AI systems. He walks us through how Deasy Labs orchestrates metadata extraction, classification, and enrichment to boost retrieval efficiency.
preview_image: /blog/metadata-deasy-labs/preview.jpg
social_preview_image: /blog/metadata-deasy-labs/social_preview.png
title_preview_image: /blog/metadata-deasy-labs/title.jpg # Optional image used for
date: 2025-02-24T18:29:51-03:00
author: Sabrina Aquino # Change this
featured: false # if true, this post will be featured on the blog page
tags: # Change this, related by tags posts will be shown on the blog page
- Vector Search
- Retrieval Augmented Generation
- Vector Space Talks
- Metadata Optimization
---

> *"Metadata is one of the key unlocks to both segmentation and file organization, setting up the right knowledge base, and enriching it to hit that last mile of accuracy and speed.”*\
> **— Reece Griffiths**
[Reece Griffiths](https://www.linkedin.com/in/reece-william-griffiths/) is the CEO and co-founder of [Deasy Labs](https://www.deasylabs.com/), a metadata automation platform that helps companies optimize their vector databases for retrieval accuracy. Previously part of Y Combinator, Deasy Labs focuses on improving metadata extraction, classification, and enrichment at scale.

<iframe width="560" height="315" src="https://www.youtube.com/embed/R-G2i63mmPw?si=lRtbuGmrrjqU8aZ5" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>

## **Top takeaways:**

Retrieval-augmented generation (RAG) and vector search are incomplete without high-quality metadata. In this episode of **Vector Space Talks**, Reece Griffiths explains how **metadata automation and optimization** can significantly enhance retrieval accuracy, filtering, and indexing efficiency.

Here are some key insights from this episode:

1. **Why Metadata Matters in Vector Search:** Traditional approaches often focus on embedding models, but metadata can bridge the gap between mediocre and high-performance search systems.
2. **Metadata for Segmentation vs. Enrichment:** Segmentation metadata helps filter and categorize data, while enrichment metadata provides additional context that improves retrieval accuracy.
3. **Optimizing Hybrid Search with Metadata:** Reece explains how metadata can be embedded into sparse vectors for **hybrid search**, enhancing keyword and semantic search combinations.
4. **Scaling Metadata Extraction:** Learn how Deasy Labs uses LLM-powered extraction methods to generate metadata dynamically and update taxonomies in real-time.
5. **Metadata as an Access Control Layer:** Metadata can also be leveraged for **role-based access control (RBAC)** by defining data slices that different teams or users can access within a knowledge base.

> Fun Fact: Reece and his team at Deasy Labs experimented with **pure metadata embeddings** (without the original data) and found that hybrid search using metadata alone can yield strong retrieval performance.
## **Show notes:**

00:00 Introduction to metadata automation and optimization.\
05:32 The role of metadata in retrieval-augmented generation (RAG).\
10:48 How Deasy Labs structures metadata extraction workflows.\
15:35 Implementing hybrid search with sparse metadata vectors.\
20:14 Automating metadata classification using LLMs.\
25:51 Best practices for maintaining metadata over time.\
30:18 Using metadata for segmentation and access control.\
35:43 Q&A and closing remarks.

## **More Quotes from Reece:**

*"Going from 75% retrieval accuracy to 95%+ is hard. In many cases, 80% accuracy might as well be zero. Metadata is the key to getting that last mile."*\
— Reece Griffiths

*"Metadata shouldn't rely on manual tagging by business teams. With LLMs, we can auto-suggest domain-specific metadata dynamically and refine it over time."*\
— Reece Griffiths

*"In a vector database, segmentation metadata helps you structure your knowledge base, while enrichment metadata boosts retrieval precision—both are critical."*\
— Reece Griffiths

---


### **Try Deasy Labs 🚀**
Want to enhance your vector search performance with **automated metadata workflows**?

**Start now at [app.deasylabs.com](https://app.deasylabs.com)!**

---
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.

0 comments on commit f8e241d

Please sign in to comment.