IVF

IVF: Voronoi partition over k-means centroids; only the scan_buckets_count buckets closest to the query are scanned, with an optional precise rerank

IVF (Inverted File) is VSAG’s partition-based index. It clusters the corpus into buckets at build time, and at query time only scans the buckets whose centroids are closest to the query. This turns an O(N) linear scan into O(N · scan_buckets_count / buckets_count) with tunable recall/latency.

IVF trades a little recall (compared to graph indexes) for lower memory overhead, higher throughput on batch workloads, and simpler sharding — which makes it a good default when the corpus is large (hundreds of millions or more), when memory is tight, or when queries are naturally parallelizable.

Source: src/algorithm/ivf.{h,cpp}, src/algorithm/ivf_parameter.{h,cpp}
Example: examples/cpp/106_index_ivf.cpp

How it works

Clustering. A sample of the dataset is clustered with k-means (or sampled randomly, ivf_train_type: "random") to produce buckets_count centroids.
Assignment. Every vector is written to the inverted list of its nearest centroid, stored in the configured coarse quantization (base_quantization_type). Optionally, a second high-precision copy is kept (use_reorder: true) for post-filter reordering.
Search. For each query, the scan_buckets_count nearest centroids are computed first, then the vectors in those buckets are scored. When reordering is enabled, factor controls how many extra candidates are fetched from the coarse stage before being re-scored with the precise quantizer.

A second partition strategy, GNO-IMI (partition_strategy_type: "gno_imi"), splits the space into two orthogonal sets of centroids (first_order_buckets_count × second_order_buckets_count) for even finer partitioning on very large corpora.

Quick start

#include <vsag/vsag.h>

std::string params = R"({
    "dtype": "float32",
    "metric_type": "l2",
    "dim": 128,
    "index_param": {
        "buckets_count": 256,
        "base_quantization_type": "sq8",
        "partition_strategy_type": "ivf",
        "ivf_train_type": "kmeans"
    }
})";
auto index = vsag::Factory::CreateIndex("ivf", params).value();

// Build.
auto base = vsag::Dataset::Make();
base->NumElements(n)->Dim(128)->Ids(ids)->Float32Vectors(data)->Owner(false);
index->Build(base);

// Search.
auto query = vsag::Dataset::Make();
query->NumElements(1)->Dim(128)->Float32Vectors(q)->Owner(false);
auto result = index->KnnSearch(
    query, /*topk=*/10,
    R"({"ivf": {"scan_buckets_count": 16}})").value();

Build parameters

Build-time parameters live under index_param. See Index Parameters for the exhaustive list.

Parameter	Type	Default	Description
`partition_strategy_type`	string	`"ivf"`	`ivf` (single-level) or `gno_imi` (two-level orthogonal)
`buckets_count`	int	`10`	Number of inverted lists (effective for `ivf`)
`first_order_buckets_count`	int	`10`	First-level count (effective for `gno_imi`)
`second_order_buckets_count`	int	`10`	Second-level count (effective for `gno_imi`)
`ivf_train_type`	string	`"kmeans"`	Centroid training: `kmeans` or `random`
`base_quantization_type`	string	`"fp32"`	`fp32`, `fp16`, `bf16`, `sq8`, `sq4`, `sq8_uniform`, `sq4_uniform`, `pq`, `pqfs`, `rabitq` — see the Quantization chapter for per-quantizer details
`base_pq_dim`	int	`1`	PQ subspaces (required with `pq` / `pqfs`)
`rabitq_pca_dim`	int	`0`	Optional PCA preprocessing dimension for `base_quantization_type: "rabitq"`
`rabitq_bits_per_dim_query`	int	`32`	Query bits for `rabitq`; allowed values are `4` or `32`
`rabitq_bits_per_dim_base`	int	`1`	Stored-code bits for `rabitq`; allowed range is `[1, 8]`
`rabitq_version`	string	`"standard"`	`rabitq` layout: `"standard"` or `"split_1bit_7bit"`
`rabitq_error_rate`	float	`1.9`	Positive error-budget parameter for `rabitq` encoding
`rabitq_use_fht`	bool	`false`	Enable FHT rotation before `rabitq` binarization
`fast_encode_rabitq`	bool	`true`	Use CAQ fast construction for multi-bit `rabitq`; set to `false` for exact encoding
`fast_encode_rabitq_rounds`	int	`6`	CAQ adjustment rounds; allowed range is `[1, 32]`
`use_reorder`	bool	`false`	Keep a high-precision copy and re-rank after the coarse scan
`precise_quantization_type`	string	`"fp32"`	Quantizer used for reordering (with `use_reorder: true`)
`base_io_type`	string	`"memory_io"`	Storage backend for coarse codes
`precise_io_type`	string	`"block_memory_io"`	Storage backend for precise codes (`memory_io`, `block_memory_io`, `mmap_io`, `buffer_io`, `async_io`, `reader_io`)
`precise_file_path`	string	`""`	File path when the precise IO type is disk-backed

A rule of thumb for buckets_count is sqrt(N) to 4 * sqrt(N) where N is the corpus size.

Search parameters

Search-time parameters live under the ivf sub-object:

Parameter	Type	Default	Description
`scan_buckets_count`	int	— (required)	Number of buckets probed per query. Must be ≤ `buckets_count` (except when `disable_bucket_scan` is true, where larger values are allowed and unavailable slots are padded with `-1`).
`disable_bucket_scan`	bool	`false`	Return bucket IDs and distances. Supports batch queries.
`factor`	float	`2.0`	With reordering enabled, pulls `factor * topk` coarse candidates before the precise rescore.
`enable_reorder`	bool	`true`	Set to `false` to skip the final reorder stage for this request even when the index was built with reorder enabled.
`parallelism`	int	`1`	Threads used to scan buckets in parallel for a single query.
`timeout_ms`	double	`+∞`	Hard cap in milliseconds; partial results are returned once exceeded.

auto result = index->KnnSearch(
    query, topk,
    R"({"ivf": {"scan_buckets_count": 32, "factor": 2.0, "parallelism": 4}})").value();

auto fast_result = index->KnnSearch(
    query, topk,
    R"({"ivf": {"scan_buckets_count": 32, "factor": 2.0, "enable_reorder": false}})").value();

When to use IVF

Large corpora (hundreds of millions of vectors and above), especially when the working set does not fit comfortably in RAM.
Batch or high-throughput workloads where per-query latency is less critical than queries-per-second.
Memory-tight deployments that benefit from aggressive quantization (sq8, sq4_uniform, pq, pqfs) combined with use_reorder to recover recall.
Shard-friendly setups: buckets map naturally onto shards or disk blocks.

For latency-sensitive, high-recall workloads on dense embeddings, compare against HGraph first.

VSAG Documentation

IVF

How it works

Quick start

Build parameters

Search parameters

When to use IVF

See also

Keyboard shortcuts

VSAG Documentation

IVF

How it works

Quick start

Build parameters

Search parameters

When to use IVF

See also