IVF
IVF (Inverted File) is VSAG’s partition-based index. It clusters the corpus into
buckets at build time, and at query time only scans the buckets whose centroids are
closest to the query. This turns an O(N) linear scan into O(N · scan_buckets_count
/ buckets_count) with tunable recall/latency.
IVF trades a little recall (compared to graph indexes) for lower memory overhead, higher throughput on batch workloads, and simpler sharding — which makes it a good default when the corpus is large (hundreds of millions or more), when memory is tight, or when queries are naturally parallelizable.
- Source:
src/algorithm/ivf.{h,cpp},src/algorithm/ivf_parameter.{h,cpp} - Example:
examples/cpp/106_index_ivf.cpp
How it works
- Clustering. A sample of the dataset is clustered with k-means (or sampled
randomly,
ivf_train_type: "random") to producebuckets_countcentroids. - Assignment. Every vector is written to the inverted list of its nearest
centroid, stored in the configured coarse quantization (
base_quantization_type). Optionally, a second high-precision copy is kept (use_reorder: true) for post-filter reordering. - Search. For each query, the
scan_buckets_countnearest centroids are computed first, then the vectors in those buckets are scored. When reordering is enabled,factorcontrols how many extra candidates are fetched from the coarse stage before being re-scored with the precise quantizer.
A second partition strategy, GNO-IMI (partition_strategy_type: "gno_imi"),
splits the space into two orthogonal sets of centroids
(first_order_buckets_count × second_order_buckets_count) for even finer
partitioning on very large corpora.
Quick start
#include <vsag/vsag.h>
std::string params = R"({
"dtype": "float32",
"metric_type": "l2",
"dim": 128,
"index_param": {
"buckets_count": 256,
"base_quantization_type": "sq8",
"partition_strategy_type": "ivf",
"ivf_train_type": "kmeans"
}
})";
auto index = vsag::Factory::CreateIndex("ivf", params).value();
// Build.
auto base = vsag::Dataset::Make();
base->NumElements(n)->Dim(128)->Ids(ids)->Float32Vectors(data)->Owner(false);
index->Build(base);
// Search.
auto query = vsag::Dataset::Make();
query->NumElements(1)->Dim(128)->Float32Vectors(q)->Owner(false);
auto result = index->KnnSearch(
query, /*topk=*/10,
R"({"ivf": {"scan_buckets_count": 16}})").value();
Build parameters
Build-time parameters live under index_param. See
Index Parameters and docs/ivf.md in the
repository for the exhaustive list.
| Parameter | Type | Default | Description |
|---|---|---|---|
partition_strategy_type | string | "ivf" | ivf (single-level) or gno_imi (two-level orthogonal) |
buckets_count | int | 10 | Number of inverted lists (effective for ivf) |
first_order_buckets_count | int | 10 | First-level count (effective for gno_imi) |
second_order_buckets_count | int | 10 | Second-level count (effective for gno_imi) |
ivf_train_type | string | "kmeans" | Centroid training: kmeans or random |
base_quantization_type | string | "fp32" | fp32, fp16, bf16, sq8, sq4, sq8_uniform, sq4_uniform, pq, pqfs, rabitq — see the Quantization chapter for per-quantizer details |
base_pq_dim | int | 1 | PQ subspaces (required with pq / pqfs) |
use_reorder | bool | false | Keep a high-precision copy and re-rank after the coarse scan |
precise_quantization_type | string | "fp32" | Quantizer used for reordering (with use_reorder: true) |
base_io_type | string | "memory_io" | Storage backend for coarse codes |
precise_io_type | string | "block_memory_io" | Storage backend for precise codes (memory_io, block_memory_io, mmap_io, buffer_io, async_io, reader_io) |
precise_file_path | string | "" | File path when the precise IO type is disk-backed |
A rule of thumb for buckets_count is sqrt(N) to 4 * sqrt(N) where N is the
corpus size.
Search parameters
Search-time parameters live under the ivf sub-object:
| Parameter | Type | Default | Description |
|---|---|---|---|
scan_buckets_count | int | — (required) | Number of buckets probed per query. Must be ≤ buckets_count. |
factor | float | 2.0 | With reordering enabled, pulls factor * topk coarse candidates before the precise rescore. |
enable_reorder | bool | true | Set to false to skip the final reorder stage for this request even when the index was built with reorder enabled. |
parallelism | int | 1 | Threads used to scan buckets in parallel for a single query. |
timeout_ms | double | +∞ | Hard cap in milliseconds; partial results are returned once exceeded. |
auto result = index->KnnSearch(
query, topk,
R"({"ivf": {"scan_buckets_count": 32, "factor": 2.0, "parallelism": 4}})").value();
auto fast_result = index->KnnSearch(
query, topk,
R"({"ivf": {"scan_buckets_count": 32, "factor": 2.0, "enable_reorder": false}})").value();
When to use IVF
- Large corpora (hundreds of millions of vectors and above), especially when the working set does not fit comfortably in RAM.
- Batch or high-throughput workloads where per-query latency is less critical than queries-per-second.
- Memory-tight deployments that benefit from aggressive quantization (
sq8,sq4_uniform,pq,pqfs) combined withuse_reorderto recover recall. - Shard-friendly setups: buckets map naturally onto shards or disk blocks.
For latency-sensitive, high-recall workloads on dense embeddings, compare against HGraph first.