Product Quantization (PQ)
Product Quantization splits a vector into pq_dim equal-sized subvectors
and quantizes each one independently against a small learned codebook of
2^pq_bits centroids. The stored code is then pq_dim × pq_bits bits per
vector — orders of magnitude smaller than fp32. Distance computations
use precomputed lookup tables (LUT) per query.
Implementation:
src/quantization/product_quantization/product_quantizer.cpp, parameter fileproduct_quantizer_parameter.cpp.
When to use it
- High-dim float vectors (≥ 256 dim) where
sq8is still too large. - Memory-tight, accuracy-acceptable workloads where ~16× compression vs fp32 is required.
- Combined with
use_reorder: trueand a smallfp16/fp32precise store, PQ is the standard “compressed graph index” recipe at large scale.
For wider SIMD throughput at pq_bits = 4, see PQ FastScan.
Memory cost (codes only)
ceil(pq_dim × pq_bits / 8) bytes per vector for the codes, plus a small
codebook stored once (pq_dim × 2^pq_bits × subspace_dim × 4 bytes).
For typical settings (pq_dim = 32, pq_bits = 8, dim = 128):
- code size =
32 × 8 / 8 = 32bytes per vector (vs128 × 4 = 512for fp32 → 16× smaller).
Parameters
| Key | Type | Default | Meaning |
|---|---|---|---|
pq_dim | int | 1 | Number of subvectors. Must divide dim. Larger values give finer quantization at the cost of more codebooks and larger codes (product_quantizer_parameter.h:38). |
pq_bits | int | 8 | Bits per subvector (1–8). With 8, each subvector is one byte. Most reliable with 8; see PQ FastScan for the 4-bit SIMD variant. |
On HGraph these are exposed as the top-level keys base_pq_dim and
pq_bits (src/algorithm/hgraph.cpp:465-472).
{
"dtype": "float32",
"metric_type": "l2",
"dim": 128,
"index_param": {
"base_quantization_type": "pq",
"base_pq_dim": 32,
"max_degree": 32,
"ef_construction": 300,
"use_reorder": true,
"precise_quantization_type": "fp16"
}
}
Training
NEED_TRAIN is set. Training runs k-means per subspace to learn the
2^pq_bits centroids; this is typically the most expensive training step
of any built-in quantizer. Use a training sample of at least 256 × 2^pq_bits vectors per subspace for stable codebooks; Build(base)
samples from the input automatically.
Metric compatibility
l2, ip, cosine — all supported. Query-time distance is computed via
a per-subspace LUT: for l2 it is squared L2 between the query subvector
and each centroid; for ip it is the dot product. Cosine reduces to ip
on pre-normalized vectors.
Tips
pq_dimshould dividedimevenly. Common ratios aredim/4ordim/8.- Very small
pq_dim(e.g.dim/16) produces very compact codes but loses recall fast; combine with reorder. - For anisotropic data, a rotation transformer in front improves PQ
recall noticeably: use Transform Quantizer
with a chain like
"rom, pq".