Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

IVF

IVF: Voronoi partition over k-means centroids; only the scan_buckets_count buckets closest to the query are scanned, with an optional precise rerank

IVF (Inverted File) is VSAG’s partition-based index. It clusters the corpus into buckets at build time, and at query time only scans the buckets whose centroids are closest to the query. This turns an O(N) linear scan into O(N · scan_buckets_count / buckets_count) with tunable recall/latency.

IVF trades a little recall (compared to graph indexes) for lower memory overhead, higher throughput on batch workloads, and simpler sharding — which makes it a good default when the corpus is large (hundreds of millions or more), when memory is tight, or when queries are naturally parallelizable.

How it works

  1. Clustering. A sample of the dataset is clustered with k-means (or sampled randomly, ivf_train_type: "random") to produce buckets_count centroids.
  2. Assignment. Every vector is written to the inverted list of its nearest centroid, stored in the configured coarse quantization (base_quantization_type). Optionally, a second high-precision copy is kept (use_reorder: true) for post-filter reordering.
  3. Search. For each query, the scan_buckets_count nearest centroids are computed first, then the vectors in those buckets are scored. When reordering is enabled, factor controls how many extra candidates are fetched from the coarse stage before being re-scored with the precise quantizer.

A second partition strategy, GNO-IMI (partition_strategy_type: "gno_imi"), splits the space into two orthogonal sets of centroids (first_order_buckets_count × second_order_buckets_count) for even finer partitioning on very large corpora.

Quick start

#include <vsag/vsag.h>

std::string params = R"({
    "dtype": "float32",
    "metric_type": "l2",
    "dim": 128,
    "index_param": {
        "buckets_count": 256,
        "base_quantization_type": "sq8",
        "partition_strategy_type": "ivf",
        "ivf_train_type": "kmeans"
    }
})";
auto index = vsag::Factory::CreateIndex("ivf", params).value();

// Build.
auto base = vsag::Dataset::Make();
base->NumElements(n)->Dim(128)->Ids(ids)->Float32Vectors(data)->Owner(false);
index->Build(base);

// Search.
auto query = vsag::Dataset::Make();
query->NumElements(1)->Dim(128)->Float32Vectors(q)->Owner(false);
auto result = index->KnnSearch(
    query, /*topk=*/10,
    R"({"ivf": {"scan_buckets_count": 16}})").value();

Build parameters

Build-time parameters live under index_param. See Index Parameters and docs/ivf.md in the repository for the exhaustive list.

ParameterTypeDefaultDescription
partition_strategy_typestring"ivf"ivf (single-level) or gno_imi (two-level orthogonal)
buckets_countint10Number of inverted lists (effective for ivf)
first_order_buckets_countint10First-level count (effective for gno_imi)
second_order_buckets_countint10Second-level count (effective for gno_imi)
ivf_train_typestring"kmeans"Centroid training: kmeans or random
base_quantization_typestring"fp32"fp32, fp16, bf16, sq8, sq4, sq8_uniform, sq4_uniform, pq, pqfs, rabitq — see the Quantization chapter for per-quantizer details
base_pq_dimint1PQ subspaces (required with pq / pqfs)
use_reorderboolfalseKeep a high-precision copy and re-rank after the coarse scan
precise_quantization_typestring"fp32"Quantizer used for reordering (with use_reorder: true)
base_io_typestring"memory_io"Storage backend for coarse codes
precise_io_typestring"block_memory_io"Storage backend for precise codes (memory_io, block_memory_io, mmap_io, buffer_io, async_io, reader_io)
precise_file_pathstring""File path when the precise IO type is disk-backed

A rule of thumb for buckets_count is sqrt(N) to 4 * sqrt(N) where N is the corpus size.

Search parameters

Search-time parameters live under the ivf sub-object:

ParameterTypeDefaultDescription
scan_buckets_countint— (required)Number of buckets probed per query. Must be ≤ buckets_count.
factorfloat2.0With reordering enabled, pulls factor * topk coarse candidates before the precise rescore.
enable_reorderbooltrueSet to false to skip the final reorder stage for this request even when the index was built with reorder enabled.
parallelismint1Threads used to scan buckets in parallel for a single query.
timeout_msdouble+∞Hard cap in milliseconds; partial results are returned once exceeded.
auto result = index->KnnSearch(
    query, topk,
    R"({"ivf": {"scan_buckets_count": 32, "factor": 2.0, "parallelism": 4}})").value();
auto fast_result = index->KnnSearch(
    query, topk,
    R"({"ivf": {"scan_buckets_count": 32, "factor": 2.0, "enable_reorder": false}})").value();

When to use IVF

  • Large corpora (hundreds of millions of vectors and above), especially when the working set does not fit comfortably in RAM.
  • Batch or high-throughput workloads where per-query latency is less critical than queries-per-second.
  • Memory-tight deployments that benefit from aggressive quantization (sq8, sq4_uniform, pq, pqfs) combined with use_reorder to recover recall.
  • Shard-friendly setups: buckets map naturally onto shards or disk blocks.

For latency-sensitive, high-recall workloads on dense embeddings, compare against HGraph first.

See also