This is such an amazing explanation of product quantization! I'm surprised to see such low number of views and how I never found it earlier, then I realized the video is barely a week old. This will be my go to video to share with anyone who likes to understand the internals of product quantization working. Thank you for putting this together!
The search explanation looks like touching asymmetric product quantization (starts at 26:43). In this case, we can build distances between the query vector and the centroids and re-use them in all the distance calculations for the query and the quantized vectors. For the symmetric PQ, the query *must* be quantized first (e.g. each subvector must find its closest centrioids) and only then the distance calculation against the database vectors (which are already quantized) must be performed. The query search speed gains come from the fact that the distances between the centrioids have already been calculated which eliminates the need to calculate the distance again for the query subvectors and the db vectors' subvectors. The symmetric PQ will likely have lower recall, because of the query vector quantization, but will be faster, since there is no need to build a distances for the query's sub-vectors for every query.
It's worth pointing out that the speed gains (as I understand it) come from the fact you can basically reuse the distances calculated between the subvectors and centroids you show at 27:32, because there's a small finite number of possible centroids.
Very good explanation, and this series got me intrigued about the vector databases. I have a doubt regarding usage of product quantization while using cosine similarity. (Splitting up in subvectors and adding up distances makes sense for Euclidean distance intuitively) How would that work for cosine one though?
Doubt 1: At 27:45, you mention that we "sum/add" up all the distances between the subvectors. Hence, it is called "product" quantization. I didn't get that part (why product quantization when we are summing the distances?). Thanks for the awesome video! Doubt 2: At 28:10 the q(u) is the dense vector of the closest centroid that u is mapped to? Then we are just calculating the euclidean distance between the dense query xq with a dense vector q(u) for every single u in the dataset. How is that faster than just calculating the euclidean distance between the query xq with each dense vector (not PQ) in the dataset? [edit] I think the answer to doubt 2 is precomputation. If nbits is 8, we only have 256 distinct centroid values. And each subvector in our query xq, only has 2^nbits/m centroids associated with it. So we precompute the distance between each subvector (0 --> m) in xq and the corresponding centroids (0 --> nbits/m for each subspace). For each dense vector xb_i in xb, we only store the code value, e.g. if D = 128 and we PQ it with m = 8 (subspaces), nbits = 8 (2^nbits centroids). Each subspace has 2^nbits/m centroids = 2^8/8 = 32. We only store PQ(xb_i) = [0, 17, 31, 2, 5, 6, 1, 28] (random e.g. of a vector in the dataset to be searched against) AND the codebook for reproducing the dense centroid of each subspace. Rough algorithm: 0. Find centroids (total 2^nbits/m) in each subspace (m) for the vectors in xb (for all xb_i in xb). 1. Precompute the distance between subspaces of query xq with centroids in each subspace and store in hash-table for fast lookup. 2. Iterate over all PQ(xb_i) 3. using codebook, reproduce the dense centroid vector for value in PQ(xb_i) 4. Calculate the approximate distance between query xq and PQ(xb_i) by summing up the euc. distances of each subspace. (lookup takes O(1) time, we do m lookups, 1 for each subspace, and then m additions). The time complexity of approx. distance for a query against a single PQ(xb_i) is O(m). 5. Store the xb_i with the minimum approximate distance to xq.
I probably didn't make it very clear, we sum up the distances, but the total number of possible 'quantization' distances is the *product* of the number of centroids within each subquantizer, eg C = C_1 * C_2 * ... * C_m where m == number of subquantizers/subspaces. This is why product quantization is seen as good, from a much smaller set of subcentroids we produce a much larger set of actual (full vector) distances On your second point, q(u) is the quantized vector, it's not faster but it is much more memory efficient, which is what we use PQ for, reducing the subspace of possible vector representations reduces the memory usage of the index (PQ == very good compression).
For second query maybe what can be done is using the centroid id vectors as buckets we also map the query vector to a bucket and then limit our search to vectors in that bucket and nearby buckets only.
Good explanation. But I do not understand how you would retrieve the closest vector after the neirest neighbour search. So I could get the sub vectors that are stored closest to the centroids, but how would I get the full vector?
we split our query into subvectors, calculate the distance between query subvectors and indexed and quantized (xb) subvectors, sum the total of the distances and we then count those as being the nearest - we don't store the full vectors because they take a lot of memory, only the quantized vectors. If you wanted you could add a reranking step where you just compare the full vectors for maybe the top 1000 most similar vectors after the ANN search, but then we would be storing the full vectors and not saving memory - I hope that helps, let me know if you have any more questions, we cover much more in the full playlist too :) ua-cam.com/play/PLIUOU7oqGTLhlWpTz4NnuT3FekouIVlqc.html
This is such an amazing explanation of product quantization! I'm surprised to see such low number of views and how I never found it earlier, then I realized the video is barely a week old. This will be my go to video to share with anyone who likes to understand the internals of product quantization working. Thank you for putting this together!
That's really cool to hear - happy it was useful!
The search explanation looks like touching asymmetric product quantization (starts at 26:43). In this case, we can build distances between the query vector and the centroids and re-use them in all the distance calculations for the query and the quantized vectors.
For the symmetric PQ, the query *must* be quantized first (e.g. each subvector must find its closest centrioids) and only then the distance calculation against the database vectors (which are already quantized) must be performed. The query search speed gains come from the fact that the distances between the centrioids have already been calculated which eliminates the need to calculate the distance again for the query subvectors and the db vectors' subvectors.
The symmetric PQ will likely have lower recall, because of the query vector quantization, but will be faster, since there is no need to build a distances for the query's sub-vectors for every query.
It's worth pointing out that the speed gains (as I understand it) come from the fact you can basically reuse the distances calculated between the subvectors and centroids you show at 27:32, because there's a small finite number of possible centroids.
Please post more content, its really helpful!
happy you're enjoying them - working on more!
Very good explanation, and this series got me intrigued about the vector databases. I have a doubt regarding usage of product quantization while using cosine similarity. (Splitting up in subvectors and adding up distances makes sense for Euclidean distance intuitively) How would that work for cosine one though?
You are the GOD, james!
Doubt 1: At 27:45, you mention that we "sum/add" up all the distances between the subvectors. Hence, it is called "product" quantization. I didn't get that part (why product quantization when we are summing the distances?). Thanks for the awesome video!
Doubt 2:
At 28:10 the q(u) is the dense vector of the closest centroid that u is mapped to? Then we are just calculating the euclidean distance between the dense query xq with a dense vector q(u) for every single u in the dataset. How is that faster than just calculating the euclidean distance between the query xq with each dense vector (not PQ) in the dataset?
[edit] I think the answer to doubt 2 is precomputation.
If nbits is 8, we only have 256 distinct centroid values. And each subvector in our query xq, only has 2^nbits/m centroids associated with it. So we precompute the distance between each subvector (0 --> m) in xq and the corresponding centroids (0 --> nbits/m for each subspace).
For each dense vector xb_i in xb, we only store the code value, e.g. if D = 128 and we PQ it with m = 8 (subspaces), nbits = 8 (2^nbits centroids). Each subspace has 2^nbits/m centroids = 2^8/8 = 32.
We only store PQ(xb_i) = [0, 17, 31, 2, 5, 6, 1, 28] (random e.g. of a vector in the dataset to be searched against) AND the codebook for reproducing the dense centroid of each subspace.
Rough algorithm:
0. Find centroids (total 2^nbits/m) in each subspace (m) for the vectors in xb (for all xb_i in xb).
1. Precompute the distance between subspaces of query xq with centroids in each subspace and store in hash-table for fast lookup.
2. Iterate over all PQ(xb_i)
3. using codebook, reproduce the dense centroid vector for value in PQ(xb_i)
4. Calculate the approximate distance between query xq and PQ(xb_i) by summing up the euc. distances of each subspace. (lookup takes O(1) time, we do m lookups, 1 for each subspace, and then m additions). The time complexity of approx. distance for a query against a single PQ(xb_i) is O(m).
5. Store the xb_i with the minimum approximate distance to xq.
I probably didn't make it very clear, we sum up the distances, but the total number of possible 'quantization' distances is the *product* of the number of centroids within each subquantizer, eg C = C_1 * C_2 * ... * C_m where m == number of subquantizers/subspaces. This is why product quantization is seen as good, from a much smaller set of subcentroids we produce a much larger set of actual (full vector) distances
On your second point, q(u) is the quantized vector, it's not faster but it is much more memory efficient, which is what we use PQ for, reducing the subspace of possible vector representations reduces the memory usage of the index (PQ == very good compression).
From the above rough implementation PQ also reduces compute time right?
For second query maybe what can be done is using the centroid id vectors as buckets we also map the query vector to a bucket and then limit our search to vectors in that bucket and nearby buckets only.
Great explanation!
I have a side question, how do you generate such nice visuals in the slides?
I use excalidraw for the base charts, and edit/animate with adobe animate (when needed)
Good explanation. But I do not understand how you would retrieve the closest vector after the neirest neighbour search. So I could get the sub vectors that are stored closest to the centroids, but how would I get the full vector?
we split our query into subvectors, calculate the distance between query subvectors and indexed and quantized (xb) subvectors, sum the total of the distances and we then count those as being the nearest - we don't store the full vectors because they take a lot of memory, only the quantized vectors.
If you wanted you could add a reranking step where you just compare the full vectors for maybe the top 1000 most similar vectors after the ANN search, but then we would be storing the full vectors and not saving memory - I hope that helps, let me know if you have any more questions, we cover much more in the full playlist too :)
ua-cam.com/play/PLIUOU7oqGTLhlWpTz4NnuT3FekouIVlqc.html
link for the code?
1:35 Wait what's the recall?
awesome