VSAG Documentation

VSAG is a high-performance, production-grade vector indexing library for similarity search. It powers vector retrieval in OceanBase and other projects at Ant Group, and is released under the Apache 2.0 license.

Features

Multiple index types: hgraph, ivf, pyramid, sindi, brute_force, covering in-memory, sparse and multi-tenant scenarios.
Rich quantization: fp32 / fp16 / bf16 / int8 / sq8 / sq4 / pq, with SIMD dispatch on x86_64 and AArch64.
Advanced capabilities: range search, filtered search, serialization, conjugate graph enhancement, online Tune-based optimization, custom allocator / thread pool.
Language bindings: native C++, Python via pyvsag, Node.js / TypeScript via the npm package vsag.

How to Read This Documentation

User Guide — start here if you are new to VSAG: install, create an index, and run search.
Indexes — compare supported index types and look up their parameters.
Advanced Features — deep dives into specific search, serialization, memory, and hybrid-index capabilities.
API Reference — the C++ class, method, and type reference for the public headers in include/vsag/.
Performance and Tuning — best practices, Tune, benchmarks, and evaluation tooling.
Developer Guide — building from source, running tests, and contributing.
Resources — release notes, roadmap, community links, related projects, papers, and contributors.

The Chinese version of the same documentation is available under docs/docs/zh/.

Project Links

Source: https://github.com/antgroup/vsag
Issues: https://github.com/antgroup/vsag/issues
Releases: https://github.com/antgroup/vsag/releases

Installation

VSAG can be installed as a C++ library, a Python package (pyvsag), or a Node.js/TypeScript package (vsag).

Using Docker (Recommended for Development)

The official development image includes the full toolchain (GCC 9.4+, CMake 3.18+, clang-format/clang-tidy 15, HDF5, etc.):

docker pull vsaglib/vsag:ubuntu
docker run -it --rm -v $(pwd):/work -w /work vsaglib/vsag:ubuntu bash

Building from Source

Requirements

Operating System: Ubuntu 20.04+ or CentOS 7+
Compiler: GCC 9.4.0+ or Clang 13.0.0+
CMake: 3.18.0+
clang-format / clang-tidy: exactly version 15 (enforced by make fmt / make lint)

Build

git clone https://github.com/antgroup/vsag.git
cd vsag
make release

Other common Makefile targets:

make debug — plain debug build (no sanitizers; tests/tools/examples disabled by default).
make dev — developer configuration: debug + tests + tools + examples.
make test — build with tests enabled and run the unit + functional suites.
make cov — build with coverage instrumentation; run tests afterwards to generate the report.
make asan / make tsan — sanitizer-enabled builds.
make pyvsag PY_VERSION=3.10 — build the Python wheel.
make dist-pre-cxx11-abi / dist-cxx11-abi / dist-libcxx — build redistributable tarballs.

See Building for details.

Python (pyvsag)

pip install pyvsag

Node.js / TypeScript

npm install vsag

The bindings source lives under typescript/ and the npm package name is vsag.

Optional Features

Enable or disable at CMake configure time with these cache options:

ENABLE_INTEL_MKL=ON — Intel MKL acceleration.
ENABLE_LIBAIO=ON — Linux AIO for DiskANN async IO.
ENABLE_TOOLS=ON — build tools under tools/ (including eval_performance).
ENABLE_EXAMPLES=ON — build sample programs under examples/cpp/.

If you build through the project Makefile, the corresponding environment variables are VSAG_ENABLE_INTEL_MKL=ON, VSAG_ENABLE_LIBAIO=ON, VSAG_ENABLE_TOOLS=ON, and VSAG_ENABLE_EXAMPLES=ON.

Creating an Index

All VSAG indexes are built through vsag::Factory::CreateIndex(name, build_params_json). The name selects the implementation; build_params_json configures dimension, metric, and index-specific options.

Supported Index Types

Name	Description	Page	Example
`hgraph`	Improved graph index with richer quantization options	HGraph	`examples/cpp/103_index_hgraph.cpp`
`ivf`	Inverted file with quantization	IVF	`examples/cpp/106_index_ivf.cpp`
`sindi`	Sparse-vector index (e.g. BM25, SPLADE)	SINDI	`examples/cpp/109_index_sindi.cpp`
`pyramid`	Multi-tenant / tag-partitioned graph index	Pyramid	`examples/cpp/107_index_pyramid.cpp`
`brute_force`	Exact exhaustive search; useful as baseline	—	`examples/cpp/105_index_brute_force.cpp`

Common Top-Level Fields

Field	Values	Notes
`dim`	positive integer	Fixed after build
`dtype`	`float32` / `fp16` / `bf16` / `int8`	Public API currently uses `float32`
`metric_type`	`l2` / `ip` / `cosine`	Must match at query time

Examples

HGraph

HGraph uses index_param as the build-time sub-object (hgraph is reserved for search-time parameters like ef_search). See examples/cpp/103_index_hgraph.cpp.

std::string params = R"(
{
    "dim": 128,
    "dtype": "float32",
    "metric_type": "l2",
    "index_param": {
        "base_quantization_type": "fp32",
        "max_degree": 32,
        "ef_construction": 400
    }
}
)";
auto index = vsag::Factory::CreateIndex("hgraph", params).value();

HGraph with SQ8 quantization

Switch base_quantization_type to sq8 to store base vectors as 8-bit scalar-quantized codes (roughly a 4× reduction versus fp32) with minimal recall impact; other quantization types (fp16, bf16, pq, …) are selected the same way.

std::string params = R"(
{
    "dim": 768,
    "dtype": "float32",
    "metric_type": "ip",
    "index_param": {
        "base_quantization_type": "sq8",
        "max_degree": 32,
        "ef_construction": 400
    }
}
)";
auto index = vsag::Factory::CreateIndex("hgraph", params).value();

See Index Parameters for the full reference.

k-Nearest Neighbor Search

This page assumes VSAG is already installed. Examples are available in C++, Python, and TypeScript under the examples/ directory. This page uses the C++ BruteForce index for illustration; the full source is at examples/cpp/105_index_brute_force.cpp.

In most cases, your program should call vsag::init() once at startup to perform one-time initialization (global logger, allocator, etc.). The snippets below omit boilerplate to focus on the essential steps.

Prepare Vectors

VSAG operates on collections of fixed-dimensional vectors (typically a few hundred to a few thousand dimensions). Vectors are laid out row-major, equivalent to vector[num_vectors][dim] in C++. The API only requires a pointer (const float*) to the first element, so you can use a raw array, std::vector<float>, or a custom buffer.

VSAG currently supports 32-bit float vectors for the public API. Other dtypes are available internally via the dtype option.

A k-NN search needs two datasets:

base: all vectors in the database; size = num_vectors * dim.
query: the query vector(s) for which to find nearest neighbors; size = num_queries * dim. Currently the public KnnSearch API processes one query at a time.

int64_t num_vectors = 10000;
int64_t dim = 128;
int64_t* ids = new int64_t[num_vectors];
float* datas = new float[num_vectors * dim];
std::mt19937 rng(47);
std::uniform_real_distribution<float> distrib;
for (int64_t i = 0; i < num_vectors; ++i) ids[i] = i;
for (int64_t i = 0; i < dim * num_vectors; ++i) datas[i] = distrib(rng);

float* query_vector = new float[dim];
for (int64_t i = 0; i < dim; ++i) query_vector[i] = distrib(rng);

Create an Index and Insert Vectors

The Index interface is the central abstraction. Multiple implementations exist; brute_force is the simplest (exhaustive comparison, used as a baseline).

All indexes must be created explicitly, specifying dimension and metric:

std::string build_params = R"(
{
    "dtype": "float32",
    "metric_type": "l2",
    "dim": 128
}
)";
auto index = vsag::Factory::CreateIndex("brute_force", build_params).value();

Build performs any required training; Add appends vectors. BruteForce supports both:

auto base = vsag::Dataset::Make();
base->NumElements(num_vectors)
    ->Dim(dim)
    ->Ids(ids)
    ->Float32Vectors(datas)
    ->Owner(false);
index->Add(base);

Search

KnnSearch takes the query, k, and a JSON search-params string. BruteForce has no tunable search params, so an empty object is passed.

auto query = vsag::Dataset::Make();
query->NumElements(1)->Dim(dim)->Float32Vectors(query_vector)->Owner(false);

int64_t topk = 10;
auto result = index->KnnSearch(query, topk, R"({})").value();

for (int64_t i = 0; i < result->GetDim(); ++i) {
    std::cout << result->GetIds()[i] << ": " << result->GetDistances()[i] << std::endl;
}

The result contains up to k neighbors sorted by ascending distance to the query.

pyvsag

pyvsag is the official Python binding for VSAG, implemented with pybind11. Sources live under python_bindings/ and python/.

Installation

pip install pyvsag

To build from source:

make pyvsag PY_VERSION=3.10
# Build wheels for multiple Python versions:
make pyvsag-all

Quick Start

pyvsag.Index(name, parameters) accepts the index name and a JSON-encoded parameter string, matching the C++ vsag::Factory::CreateIndex signature:

import json
import numpy as np
import pyvsag

dim = 128
num_elements = 10_000

data = np.random.random((num_elements, dim)).astype(np.float32)
ids = np.arange(num_elements, dtype=np.int64)

index_params = json.dumps({
    "dtype": "float32",
    "metric_type": "l2",
    "dim": dim,
    "index_param": {
        "base_quantization_type": "fp32",
        "max_degree": 32,
        "ef_construction": 300,
    },
})

index = pyvsag.Index("hgraph", index_params)
index.build(vectors=data, ids=ids, num_elements=num_elements, dim=dim)

query = np.random.random(dim).astype(np.float32)
search_params = json.dumps({"hgraph": {"ef_search": 60}})
result_ids, result_dists = index.knn_search(
    vector=query, k=10, parameters=search_params,
)
print(result_ids, result_dists)

Saving & Loading

index.save("index.bin")

new_index = pyvsag.Index("hgraph", index_params)
new_index.load("index.bin")

Relationship with the C++ Library

pyvsag wraps the same vsag::Index API as C++ and shares the underlying index binaries. You can build an index in Python and load it in C++ (and vice versa) as long as parameters match.

More Examples

See examples/python/ in the repository.

Indexes

VSAG ships a family of index implementations that share a single builder-style API, one serialization format, and one set of operations (Build, Add, KnnSearch, RangeSearch, Remove, Serialize / Deserialize, …). They differ in the data structures and trade-offs they use under the hood.

The pages in this section cover the actively developed indexes:

Index	Page	Best for
`hgraph`	HGraph	General-purpose, high-recall graph with rich quantization options
`lazy_hgraph`	LazyHGraph	Small-to-growing FP32 collections that start exact and later convert to HGraph
`ivf`	IVF	Partition-based search, high-throughput batch queries, large corpora
`sindi`	SINDI	Sparse vectors (BM25 / learned sparse) on inner-product
`simq`	SIMQ	ColBERT-style multi-vector retrieval (MaxSim)
`pyramid`	Pyramid	Multi-tenant or tag-partitioned corpora with hierarchical paths

brute_force is also available as an exact-search baseline (see Creating an Index and examples/cpp/105_index_brute_force.cpp).

Parameter conventions

All indexes share the same top-level build fields:

Field	Values	Notes
`dim`	positive integer	Vector dimensionality; fixed after build
`dtype`	`float32` / `float16` / `bfloat16` / `int8` / `sparse`	`sparse` is SINDI only
`metric_type`	`l2` / `ip` / `cosine`	Must match at query time (SINDI is `ip` only)

Index-specific build parameters live under the index_param sub-object; search-time parameters live under a sub-object named after the index (e.g. hgraph, ivf, sindi, pyramid). LazyHGraph also uses the hgraph search object after it converts to graph phase. Concrete schemas are documented on each page and enumerated in Index Parameters.

Index Parameters

This page summarises the commonly used parameters for every VSAG index type. For the full enumeration, consult the source:

Build parameter keys: src/constants.cpp
Public constants: include/vsag/constants.h
Per-index examples: the examples/cpp/*_index_*.cpp files (e.g. 103_index_hgraph.cpp).

Common Fields

Every index requires these top-level fields at build time:

Field	Values	Description
`dim`	positive integer	Vector dimensionality; cannot change after build
`dtype`	`float32` / `fp16` / `bf16` / `int8`	Vector data type; determines internal representation
`metric_type`	`l2` / `ip` / `cosine`	Distance metric

HGraph

HGraph places its build parameters under the generic index_param key (see examples/cpp/103_index_hgraph.cpp); the hgraph key is reserved for search-time parameters.

{
    "dim": 128,
    "dtype": "float32",
    "metric_type": "l2",
    "index_param": {
        "base_quantization_type": "fp32",
        "max_degree": 32,
        "ef_construction": 400
    }
}

Field	Typical	Description
`max_degree`	16–48	Maximum out-degree per node
`ef_construction`	200–500	Candidate set size during build; larger = higher recall, slower build
`base_quantization_type`	`fp32` / `fp16` / `bf16` / `sq8` / `sq4` / `pq`	Quantization of the base storage — see the Quantization chapter for all supported values

At search time:

{"hgraph": {"ef_search": 100}}

The hgraph search-param object also accepts brute_force_threshold (a float in [0.0, 1.0], default 0.0). When set above zero and the request carries a filter whose ValidRatio() is at most this threshold, HGraph skips the graph traversal and runs an exact scan over the surviving ids. See the HGraph index page for details.

LazyHGraph

LazyHGraph can take its build parameters in a top-level lazy_hgraph object (preferred for clarity) or in the generic index_param object. The hgraph sub-object is forwarded to the internal HGraph used after transition.

{
    "dim": 128,
    "dtype": "float32",
    "metric_type": "l2",
    "lazy_hgraph": {
        "transition_threshold": 1000,
        "hgraph": {
            "base_quantization_type": "sq8",
            "max_degree": 26,
            "ef_construction": 100
        }
    }
}

Field	Typical	Description
`transition_threshold`	`1000` or workload-specific	Positive vector count at which the index converts from exact flat search to HGraph
`hgraph`	HGraph build object	Parameters for the graph phase; see HGraph

LazyHGraph only supports dtype: "float32". Search parameters use the hgraph object, for example {"hgraph": {"ef_search": 100}}. See the LazyHGraph index page for details.

The hgraph search-param object also accepts the following filter-related parameters:

Parameter	Type	Default	Description
`skip_ratio`	float	`0.2`	Controls the ratio of filtered-search candidate checks to skip, in range `[0.0, 1.0]`. Higher values mean more aggressive skipping, faster search, and potentially lower recall.
`skip_strategy`	string	`"deterministic_accumulative"`	Skip strategy. Supports `"random"` and `"deterministic_accumulative"`.

IVF

{
    "ivf": {
        "nlist": 4096,
        "base_quantization_type": "sq8",
        "nprobe": 32
    }
}

Brute Force

{"brute_force": {}}

No extra parameters.

Pyramid

Pyramid supports organising multiple subgraphs by tag:

{
    "pyramid": {
        "tag_dim": 1,
        "max_degree": 24,
        "ef_construction": 300
    }
}

SINDI (sparse vectors)

{
    "sindi": {
        "top_k": 32,
        "doc_prune_ratio": 0.1
    }
}

Runtime Parameters

Beyond build-time parameters, Index::Tune and SearchParam tweak runtime settings such as ef_search and nprobe. See Optimizer and the examples/cpp/3xx_feature_*.cpp examples.

HGraph

HGraph is VSAG’s flagship graph-based index. It builds a hierarchical proximity graph and offers a rich set of quantization options, a unified build-parameter schema (index_param), and first-class support for reordering, incremental updates, deletion, and ELP-based runtime tuning.

For most dense-vector workloads (text / image / multimodal embeddings, 64–4096 dims, from a few thousand up to hundreds of millions of points), HGraph is the recommended default.

Source: src/algorithm/hgraph.{h,cpp}
Example: examples/cpp/103_index_hgraph.cpp

How it works

Graph construction. Vectors are organised in a layered proximity graph; upper layers act as navigation aids, the bottom layer connects every data point to its nearest neighbours within a max_degree budget. The construction algorithm can be either NSW-style insertion (graph_type: "nsw", the default) or ODescent (graph_type: "odescent").
Quantization. The base storage is compressed with a configurable quantizer (base_quantization_type — fp32, fp16, bf16, sq8, sq4, sq8_uniform, sq4_uniform, pq, pqfs, rabitq, tq). Optionally, a second high-precision copy is kept (use_reorder: true with precise_quantization_type) and used to re-rank the candidates returned by the coarse search.
Search. Greedy beam search traverses the graph top-down, expanding the current frontier up to ef_search candidates. When reordering is enabled, the final list is re-scored against the precise representation.

Quick start

#include <vsag/vsag.h>

std::string params = R"({
    "dtype": "float32",
    "metric_type": "l2",
    "dim": 128,
    "index_param": {
        "base_quantization_type": "sq8",
        "max_degree": 32,
        "ef_construction": 400
    }
})";
auto index = vsag::Factory::CreateIndex("hgraph", params).value();

// Build.
auto base = vsag::Dataset::Make();
base->NumElements(n)->Dim(128)->Ids(ids)->Float32Vectors(data)->Owner(false);
index->Build(base);

// Search.
auto query = vsag::Dataset::Make();
query->NumElements(1)->Dim(128)->Float32Vectors(q)->Owner(false);
auto result = index->KnnSearch(
    query, /*topk=*/10, R"({"hgraph": {"ef_search": 100}})").value();

Build parameters

Build-time parameters live under index_param. The table below highlights the keys most users need; the exhaustive list is in Index Parameters.

Parameter	Type	Default	Description
`base_quantization_type`	string	— (required)	`fp32`, `fp16`, `bf16`, `sq8`, `sq4`, `sq8_uniform`, `sq4_uniform`, `pq`, `pqfs`, `rabitq`, `tq` — see the Quantization chapter for per-quantizer details
`max_degree`	int	`64`	Maximum out-degree per graph node
`ef_construction`	int	`400`	Candidate list size during build (higher = better recall, slower build)
`graph_type`	string	`"nsw"`	Graph algorithm: `nsw` or `odescent`
`use_reorder`	bool	`false`	Keep a high-precision copy and re-rank after the coarse search
`precise_quantization_type`	string	`"fp32"`	Quantizer used for reordering (takes effect only with `use_reorder: true`)
`base_pq_dim`	int	`1`	Number of PQ subspaces. When using `pq` / `pqfs`, set this explicitly instead of relying on the default.
`build_thread_count`	int	`100`	Threads used to parallelise build
`support_duplicate`	bool	`false`	Enable duplicate-ID detection on insert
`deduplicate_storage`	bool	`false`	Share vector storage between duplicates; requires `support_duplicate: true`
`duplicate_distance_threshold`	float	`0.0`	Duplicate-detection distance threshold. When greater than `0`, deduplicate by the nearest candidate distance; when `0`, fall back to the current code `memcmp` check
`support_remove`	bool	`false`	Enable graph delete-tracking metadata used by mark-remove recovery paths
`support_force_remove`	bool	`false`	Enable `RemoveMode::FORCE_REMOVE` and its extra synchronization on the built index
`store_raw_vector`	bool	`false`	Keep the raw vector in addition to the quantized copy (useful for `cosine`)
`use_elp_optimizer`	bool	`false`	Auto-tune search parameters after build
`base_io_type` / `precise_io_type`	string	`"block_memory_io"`	Storage backend (`memory_io`, `block_memory_io`, `buffer_io`, `async_io`, `mmap_io`)
`base_file_path` / `precise_file_path`	string	—	File path; required when the corresponding `*_io_type` is disk-backed (`buffer_io`, `async_io`, `mmap_io`)
`hgraph_init_capacity`	int	`100`	Initial capacity hint (doesn’t cap the final size)

Deduplicating vector storage

Set both support_duplicate: true and deduplicate_storage: true to let duplicate vectors share one physical code slot while retaining their individual labels. This option currently supports only dense-vector HGraph indexes using graph_type: "nsw"; it is not available for the separate HNSW index or for graph_type: "odescent".

The following operations and configurations are not supported while storage deduplication is enabled:

force removal (support_force_remove: true);
cache-assisted build after ImportCache();
Merge;
legacy v0.14 serialization.

UpdateVector is supported only for IDs whose vector storage is not shared with another duplicate-group member.

Current serialization and streaming serialization are supported.

Supported input data types

The dtype field in the top-level build config selects how Dataset interprets the raw vector bytes. HGraph supports four input types; the dtype value, the corresponding Dataset setter, and the example demonstrating each combination are summarised below.

`dtype`	Element type	`Dataset` setter	Example
`float32`	`float`	`Float32Vectors`	`103_index_hgraph.cpp`
`int8`	`int8_t`	`Int8Vectors`	`316_index_int8_hgraph.cpp`
`float16`	`uint16_t` (IEEE 754 binary16, bit-pattern packed)	`Float16Vectors`	`321_index_fp16_hgraph.cpp`
`bfloat16`	`uint16_t` (Brain Float, bit-pattern packed)	`Float16Vectors` (shared with FP16)	adapt `321_index_fp16_hgraph.cpp` per the notes below

The dim value is the logical vector dimensionality (number of elements), not the byte length, so the same dim is reused across all four data types.

`int8` input

Quantized int8 vectors are passed directly via Int8Vectors:

std::vector<int8_t> data(num_vectors * dim);  // populate with int8 elements
auto base = vsag::Dataset::Make();
base->NumElements(num_vectors)->Dim(dim)->Ids(ids)
    ->Int8Vectors(data.data())->Owner(false);

Build config (note dtype: "int8"):

{
    "dtype": "int8",
    "metric_type": "l2",
    "dim": 128,
    "index_param": {
        "base_quantization_type": "pq",
        "max_degree": 26,
        "ef_construction": 100,
        "alpha": 1.2
    }
}

Queries use the same Int8Vectors setter and the same dtype. A runnable example is 316_index_int8_hgraph.cpp.

`float16` / `bfloat16` input

FP16 and BF16 vectors are both passed through Float16Vectors, which takes a const uint16_t* that points at the 16-bit storage of each element. Conversion from float is up to the caller; inside the VSAG source tree there are convenience helpers (vsag::generic::FloatToFP16 in src/simd/fp16_simd.h and vsag::generic::FloatToBF16 in src/simd/bf16_simd.h), but these are internal headers that are not installed under include/vsag/. Application code linking against an installed VSAG library should provide its own conversion (for example, copy the small helper, use _cvtss_sh / F16C intrinsics, or any FP16 library of choice). The snippet below uses the in-tree helper for brevity:

// The fp16/bf16 helpers below live in src/simd/ and are not part of the public
// installed headers. Replace with your own float -> uint16_t conversion when
// linking against an installed VSAG.
#include "simd/fp16_simd.h"  // FloatToFP16 (for BF16, use simd/bf16_simd.h / FloatToBF16)

std::vector<uint16_t> data(num_vectors * dim);
for (size_t i = 0; i < data.size(); ++i) {
    data[i] = vsag::generic::FloatToFP16(some_float_source());
}
auto base = vsag::Dataset::Make();
base->NumElements(num_vectors)->Dim(dim)->Ids(ids)
    ->Float16Vectors(data.data())->Owner(false);

Build config:

{
    "dtype": "float16",
    "metric_type": "l2",
    "dim": 128,
    "index_param": {
        "base_quantization_type": "pq",
        "max_degree": 26,
        "ef_construction": 100,
        "alpha": 1.2
    }
}

To switch the example to BF16, change dtype to "bfloat16" and replace FloatToFP16 with FloatToBF16; the Float16Vectors setter and the rest of the build/search flow stay the same. A runnable FP16 example is 321_index_fp16_hgraph.cpp.

Note. The header comment at the top of 321_index_fp16_hgraph.cpp currently mentions a BFloat16Vectors() setter, but no such setter exists — Float16Vectors is the single entry point for both FP16 and BF16. Use it for both dtype: "float16" and dtype: "bfloat16".

Choosing an input type

Pick float32 when accuracy matters most and memory budget allows; this is the default.
Pick float16 / bfloat16 to halve the input storage. FP16 has a smaller exponent range; BF16 has fewer mantissa bits but the same exponent range as FP32, which is often preferable for embedding-style vectors.
Pick int8 when your data is already integer-quantised (e.g. produced by an upstream quantiser or by a model with int8 outputs). With int8 input you typically still combine a coarse quantizer such as pq / sq8 for the in-index storage.

The chosen dtype only constrains the input representation. The on-disk / in-memory storage is still controlled by base_quantization_type (and optionally precise_quantization_type when use_reorder: true), so e.g. dtype: "float16" + base_quantization_type: "sq8" is valid.

Search parameters

Search-time parameters live under the hgraph sub-object:

Parameter	Type	Default	Description
`ef_search`	int	— (required)	Size of the search frontier. Larger = higher recall, slower query.
`hops_limit`	int	unlimited	Hard cap on the number of hops the beam search performs before returning the current frontier.
`skip_ratio`	float	`0.2`	Performance tuning parameter for filtered search. Controls the ratio of invalid points to skip, in range `[0.0, 1.0]`. `skip_ratio=0.2` means skip 20% of invalid points and only check 80%. Higher values improve performance but may reduce recall. Only applies to searches with filters. See Filter Skip Strategy below.
`skip_strategy`	string	`"deterministic_accumulative"`	Strategy for filter skipping. Options: `"random"` (random skipping) or `"deterministic_accumulative"` (deterministic cumulative skipping). See Filter Skip Strategy below.
`brute_force_threshold`	float	`0.0`	Selectivity-aware brute-force fallback. When `> 0` and the supplied filter’s `ValidRatio()` is `≤ brute_force_threshold`, the search bypasses the graph traversal entirely and runs an exact scan over the valid ids using the best available flatten codes (see the section below). Must lie in `[0.0, 1.0]`; the default `0.0` disables the feature and preserves legacy behavior.
`rabitq_one_bit_search`	bool	`false`	Enables the RaBitQ filter/lower-bound path. On an x+y split index it uses all x filter bits; see RaBitQ x+y Split.
`rabitq_error_rate`	float	index default	Positive lower-bound error multiplier for this search. It can be tuned without rebuilding the split index.

auto result = index->KnnSearch(
    query, topk, R"({"hgraph": {"ef_search": 200}})").value();

Brute-force fallback under highly selective filters (`brute_force_threshold`)

Graph traversal is the right strategy when most candidates pass the filter — the graph quickly reaches the neighborhood of the query. As filter selectivity increases (only a tiny fraction of vectors survive), the beam has to expand far more nodes just to fill ef_search with valid candidates, and recall drops. At some point an exhaustive scan over the surviving ids is both faster and exact.

brute_force_threshold lets HGraph make that switch automatically on a per-query basis:

// When the active filter keeps ≤ 1% of ids, run an exact scan instead.
auto params = R"({"hgraph": {"ef_search": 200, "brute_force_threshold": 0.01}})";
auto result = index->KnnSearch(query, topk, params, my_filter).value();

How it works (src/algorithm/hgraph/hgraph_search.cpp):

The fallback only fires when all of the following hold:
- brute_force_threshold > 0.0, and
- a filter is supplied, and
- filter->ValidRatio() <= brute_force_threshold.
The accuracy of Filter::ValidRatio() matters — it is the user-supplied hint the dispatcher checks against the threshold. See Filtered Search for the API contract.
The scan iterates every valid inner id and computes distances in batches of 64 using the most precise flatten storage available (raw vectors if store_raw_vector was set, otherwise the high-precision reorder codes when use_reorder=true, otherwise the base quantized codes).
Because the scan already uses precise codes when present, the post-search reorder pass is skipped for queries that took the brute-force branch.
Applies to KnnSearch (the non-iterator overload, which is what SearchWithRequest and the standard KnnSearch(query, k, params, filter) call) and to RangeSearch. It does not apply to the iterator-style KnnSearch(..., IteratorContext*&, ...), because a single sweep cannot be paged across multiple iterator calls.

Picking a value:

Leave at 0.0 (default) for unfiltered or weakly filtered workloads.
For highly selective filters, 0.01–0.05 is a reasonable starting point. Setting it higher than that effectively turns the index into a brute-force scanner whenever a filter is present.
The cost of the brute-force scan is roughly O(N × dim) where N is the total number of indexed vectors (regardless of selectivity, because every id is visited to check CheckValid). The benefit grows when graph search would otherwise need a much larger ef_search to recover recall.

See 322_feature_hgraph_brute_force_threshold.cpp for a runnable brute-force fallback example.

Filter Skip Strategy (skip_ratio and skip_strategy)

When searching with a filter, HGraph needs to frequently call Filter::CheckValid() during graph traversal to verify whether each candidate point is valid. This check can be expensive (especially for complex filter logic). skip_ratio and skip_strategy provide a probabilistic optimization: they skip some filter checks to speed up the search, but may reduce recall.

How It Works

This is a probabilistic optimization strategy: we don’t know in advance which points are valid, so we decide probabilistically whether to visit each point.

skip_ratio (default 0.2): Controls the aggressiveness of skipping filter checks. skip_ratio=0.2 means skip 20% of candidate checks and only check 80%. Higher values skip more, making search faster but potentially reducing recall.
skip_strategy (default “deterministic_accumulative”): Determines how skipping is distributed:
- “random”: Random skipping. Each point is visited independently with probability visit_ratio = valid_ratio + (1 - valid_ratio) * (1 - skip_ratio), so roughly a 1 - skip_ratio fraction of invalid points are skipped.
- “deterministic_accumulative”: Deterministic cumulative skipping. Emits visit decisions at fixed intervals so that the long-run visit ratio matches the target visit_ratio, with lower variance than the random strategy.

The specific formula:

Let valid_ratio be the filter’s global validity rate (from Filter::ValidRatio())
Probability of visiting each point = valid_ratio + (1 - valid_ratio) * (1 - skip_ratio)
In expectation, this targets skipping about skip_ratio of invalid candidate checks when Filter::ValidRatio() is accurate

Usage Examples

// Conservative setting: skip 10% of invalid candidate checks, suitable for high-recall
// scenarios where latency is less critical
auto params = R"({"hgraph": {"ef_search": 200, "skip_ratio": 0.1}})";
auto result = index->KnnSearch(query, topk, params, my_filter).value();

// Use random strategy
auto params = R"({"hgraph": {"ef_search": 200, "skip_ratio": 0.2, "skip_strategy": "random"}})";
auto result = index->KnnSearch(query, topk, params, my_filter).value();

// Aggressive skipping: skip 50% of invalid candidate checks for lower latency
auto params = R"({"hgraph": {"ef_search": 200, "skip_ratio": 0.5}})";
auto result = index->KnnSearch(query, topk, params, my_filter).value();

Choosing Values

Default 0.2: Suitable for most scenarios, balancing performance and recall.
0.1 or lower: Conservative setting, suitable for scenarios with high recall requirements where latency is less critical.
0.5 or higher: Aggressive skipping, suitable for latency-sensitive scenarios where recall degradation is acceptable (e.g., real-time recommendation systems).
0.0: Don’t skip any points, equivalent to disabling this optimization (all points will be checked).

Important notes:

Only applies to searches with filters. These parameters are ignored when no filter is present.
Performance optimization works better when Filter::ValidRatio() is accurately estimated.
Can be used together with brute_force_threshold: when the filter is very strict (ValidRatio is very small), brute_force_threshold will trigger brute-force fallback; otherwise, graph traversal + skip optimization is used.

When to use HGraph

Dense float vectors with dimensions roughly between 64 and 4096.
Latency-sensitive queries where high recall matters.
Mixed workloads with incremental insertion (optionally force removal via support_force_remove).
Memory-constrained deployments that benefit from sq8 / sq4_uniform / pq — often in combination with use_reorder to recover recall.

If your workload is partition-heavy (coarse-grained buckets scanned per query) or strongly I/O-bound on a SSD, compare against IVF before committing to HGraph.

LazyHGraph

LazyHGraph is an adaptive dense-vector index that starts as an exact BruteForce index and automatically converts to HGraph after the collection reaches a configurable transition_threshold. It is useful when a dataset starts small but is expected to grow: early searches stay exact and avoid graph build overhead, while larger collections get HGraph’s approximate-search latency and quantization options.

Source: src/algorithm/lazy_hgraph.{h,cpp}
Example: examples/cpp/111_index_lazy_hgraph.cpp

How it works

Flat phase. Before the threshold is reached, data is stored in an internal BruteForce index using FP32 vectors. Search is exact.
Transition. When Build receives at least transition_threshold vectors, or Add grows the flat phase to that size, LazyHGraph builds an internal HGraph from the flat data.
Graph phase. After transition, new data and search requests are handled by the internal HGraph. Search parameters keep using the hgraph search object.

Quick start

#include <vsag/vsag.h>

std::string params = R"({
    "dtype": "float32",
    "metric_type": "l2",
    "dim": 128,
    "lazy_hgraph": {
        "transition_threshold": 1000,
        "hgraph": {
            "base_quantization_type": "sq8",
            "max_degree": 26,
            "ef_construction": 100,
            "build_thread_count": 4
        }
    }
})";
auto index = vsag::Factory::CreateIndex("lazy_hgraph", params).value();

auto base = vsag::Dataset::Make();
base->NumElements(n)->Dim(128)->Ids(ids)->Float32Vectors(data)->Owner(false);
index->Add(base);

auto query = vsag::Dataset::Make();
query->NumElements(1)->Dim(128)->Float32Vectors(q)->Owner(false);
auto result = index->KnnSearch(
    query, /*topk=*/10, R"({"hgraph": {"ef_search": 100}})").value();

Build parameters

LazyHGraph accepts its build parameters in a top-level lazy_hgraph object. For compatibility with the generic factory shape, the same object may also be provided as index_param.

Parameter	Type	Default	Description
`transition_threshold`	uint64	`1000`	Number of vectors at which LazyHGraph converts from the flat phase to HGraph. Must be positive.
`hgraph`	object	`{}`	HGraph build parameters used after transition. See HGraph.

LazyHGraph only supports top-level dtype: "float32". The flat phase is fixed to FP32 BruteForce storage and does not accept separate flat quantization parameters.

Search parameters

Search parameters use the same hgraph object as HGraph:

{"hgraph": {"ef_search": 100}}

In the flat phase, search is exact. In the graph phase, the internal HGraph uses the supplied HGraph search parameters such as ef_search.

Lifecycle notes

Build chooses the initial phase from the input size: below transition_threshold stays flat; at or above the threshold builds HGraph directly.
Add can trigger the one-way transition from flat to graph.
Flat-phase Remove always performs physical removal, even if the caller passes RemoveMode::MARK_REMOVE, so graph transition does not carry tombstones.
GetExtraInfoByIds, UpdateExtraInfo, and extra-info filtering are supported in both phases. See Extra Info.

When to use LazyHGraph

A dense FP32 collection starts small and grows over time.
Exact results are preferred while the collection is small.
The same index should automatically switch to HGraph once approximate graph search becomes worthwhile.

Use HGraph directly when the dataset is already large at build time, when you need non-FP32 input types, or when you want graph behavior from the first insertion.

IVF

IVF: Voronoi partition over k-means centroids; only the scan_buckets_count buckets closest to the query are scanned, with an optional precise rerank

IVF (Inverted File) is VSAG’s partition-based index. It clusters the corpus into buckets at build time, and at query time only scans the buckets whose centroids are closest to the query. This turns an O(N) linear scan into O(N · scan_buckets_count / buckets_count) with tunable recall/latency.

IVF trades a little recall (compared to graph indexes) for lower memory overhead, higher throughput on batch workloads, and simpler sharding — which makes it a good default when the corpus is large (hundreds of millions or more), when memory is tight, or when queries are naturally parallelizable.

Source: src/algorithm/ivf.{h,cpp}, src/algorithm/ivf_parameter.{h,cpp}
Example: examples/cpp/106_index_ivf.cpp

How it works

Clustering. A sample of the dataset is clustered with k-means (or sampled randomly, ivf_train_type: "random") to produce buckets_count centroids.
Assignment. Every vector is written to the inverted list of its nearest centroid, stored in the configured coarse quantization (base_quantization_type). Optionally, a second high-precision copy is kept (use_reorder: true) for post-filter reordering.
Search. For each query, the scan_buckets_count nearest centroids are computed first, then the vectors in those buckets are scored. When reordering is enabled, factor controls how many extra candidates are fetched from the coarse stage before being re-scored with the precise quantizer.

A second partition strategy, GNO-IMI (partition_strategy_type: "gno_imi"), splits the space into two orthogonal sets of centroids (first_order_buckets_count × second_order_buckets_count) for even finer partitioning on very large corpora.

Quick start

#include <vsag/vsag.h>

std::string params = R"({
    "dtype": "float32",
    "metric_type": "l2",
    "dim": 128,
    "index_param": {
        "buckets_count": 256,
        "base_quantization_type": "sq8",
        "partition_strategy_type": "ivf",
        "ivf_train_type": "kmeans"
    }
})";
auto index = vsag::Factory::CreateIndex("ivf", params).value();

// Build.
auto base = vsag::Dataset::Make();
base->NumElements(n)->Dim(128)->Ids(ids)->Float32Vectors(data)->Owner(false);
index->Build(base);

// Search.
auto query = vsag::Dataset::Make();
query->NumElements(1)->Dim(128)->Float32Vectors(q)->Owner(false);
auto result = index->KnnSearch(
    query, /*topk=*/10,
    R"({"ivf": {"scan_buckets_count": 16}})").value();

Build parameters

Build-time parameters live under index_param. See Index Parameters for the exhaustive list.

Parameter	Type	Default	Description
`partition_strategy_type`	string	`"ivf"`	`ivf` (single-level) or `gno_imi` (two-level orthogonal)
`buckets_count`	int	`10`	Number of inverted lists (effective for `ivf`)
`first_order_buckets_count`	int	`10`	First-level count (effective for `gno_imi`)
`second_order_buckets_count`	int	`10`	Second-level count (effective for `gno_imi`)
`ivf_train_type`	string	`"kmeans"`	Centroid training: `kmeans` or `random`
`base_quantization_type`	string	`"fp32"`	`fp32`, `fp16`, `bf16`, `sq8`, `sq4`, `sq8_uniform`, `sq4_uniform`, `pq`, `pqfs`, `rabitq` — see the Quantization chapter for per-quantizer details
`base_pq_dim`	int	`1`	PQ subspaces (required with `pq` / `pqfs`)
`rabitq_pca_dim`	int	`0`	Optional PCA preprocessing dimension for `base_quantization_type: "rabitq"`
`rabitq_bits_per_dim_query`	int	`32`	Query bits for `rabitq`; allowed values are `4` or `32`
`rabitq_bits_per_dim_base`	int	`1`	Stored-code bits for `rabitq`; allowed range is `[1, 8]`
`rabitq_version`	string	`"standard"`	`rabitq` layout: `"standard"` or `"split_1bit_7bit"`
`rabitq_error_rate`	float	`1.9`	Positive error-budget parameter for `rabitq` encoding
`rabitq_use_fht`	bool	`false`	Enable FHT rotation before `rabitq` binarization
`fast_encode_rabitq`	bool	`true`	Use CAQ fast construction for multi-bit `rabitq`; set to `false` for exact encoding
`fast_encode_rabitq_rounds`	int	`6`	CAQ adjustment rounds; allowed range is `[1, 32]`
`use_reorder`	bool	`false`	Keep a high-precision copy and re-rank after the coarse scan
`precise_quantization_type`	string	`"fp32"`	Quantizer used for reordering (with `use_reorder: true`)
`base_io_type`	string	`"memory_io"`	Storage backend for coarse codes
`precise_io_type`	string	`"block_memory_io"`	Storage backend for precise codes (`memory_io`, `block_memory_io`, `mmap_io`, `buffer_io`, `async_io`, `reader_io`)
`precise_file_path`	string	`""`	File path when the precise IO type is disk-backed

A rule of thumb for buckets_count is sqrt(N) to 4 * sqrt(N) where N is the corpus size.

Search parameters

Search-time parameters live under the ivf sub-object:

Parameter	Type	Default	Description
`scan_buckets_count`	int	— (required)	Number of buckets probed per query. Must be ≤ `buckets_count` (except when `disable_bucket_scan` is true, where larger values are allowed and unavailable slots are padded with `-1`).
`disable_bucket_scan`	bool	`false`	Return bucket IDs and distances. Supports batch queries.
`factor`	float	`2.0`	With reordering enabled, pulls `factor * topk` coarse candidates before the precise rescore.
`enable_reorder`	bool	`true`	Set to `false` to skip the final reorder stage for this request even when the index was built with reorder enabled.
`parallelism`	int	`1`	Threads used to scan buckets in parallel for a single query.
`timeout_ms`	double	`+∞`	Hard cap in milliseconds; partial results are returned once exceeded.

auto result = index->KnnSearch(
    query, topk,
    R"({"ivf": {"scan_buckets_count": 32, "factor": 2.0, "parallelism": 4}})").value();

auto fast_result = index->KnnSearch(
    query, topk,
    R"({"ivf": {"scan_buckets_count": 32, "factor": 2.0, "enable_reorder": false}})").value();

When to use IVF

Large corpora (hundreds of millions of vectors and above), especially when the working set does not fit comfortably in RAM.
Batch or high-throughput workloads where per-query latency is less critical than queries-per-second.
Memory-tight deployments that benefit from aggressive quantization (sq8, sq4_uniform, pq, pqfs) combined with use_reorder to recover recall.
Shard-friendly setups: buckets map naturally onto shards or disk blocks.

For latency-sensitive, high-recall workloads on dense embeddings, compare against HGraph first.

SINDI

SINDI: per-term inverted lists grouped by window; only the lists matching the query’s non-zero terms are walked and accumulated into an n_candidate-sized heap

SINDI (Sparse INverted Dense Index) is VSAG’s index for sparse vectors — the kind produced by BM25, SPLADE, and other learned-sparse encoders. Unlike the dense indexes (HGraph, IVF), SINDI operates directly on term/value pairs and is the only VSAG index that accepts dtype: "sparse".

Source: src/algorithm/sindi/
Example: examples/cpp/109_index_sindi.cpp

How it works

Window-based inverted lists. Documents are grouped into fixed-size windows (window_size). Within each window, an inverted list per term maps a term id to the (doc_id, value) pairs that mention it.
Optional pruning and quantization. During construction, doc_prune_ratio drops low-weight terms per document, and use_quantization compresses the term values to shrink memory further.
Scoring. At query time, SINDI iterates the non-zero terms of the query, walks the corresponding inverted lists in each window, aggregates contributions into a max-heap of size n_candidate, and returns the top-k. When use_reorder is enabled, the candidates are re-scored against a forward store. The default forward store keeps fp32 values, while rerank_type: "dmq8" uses a compressed DMQ store to reduce rerank memory.

Distance is returned as 1 - inner_product so results sort ascending as in the dense indexes.

Quick start

#include <vsag/vsag.h>

std::string params = R"({
    "dtype": "sparse",
    "metric_type": "ip",
    "dim": 1024,
    "index_param": {
        "term_id_limit": 30000,
        "window_size": 50000,
        "doc_prune_ratio": 0.0,
        "use_quantization": false,
        "use_reorder": false,
        "remap_term_ids": false
    }
})";
auto index = vsag::Factory::CreateIndex("sindi", params).value();

// Build a dataset of SparseVector.
auto base = vsag::Dataset::Make();
base->NumElements(n)
    ->SparseVectors(sparse_vectors)  // vsag::SparseVector*
    ->Ids(ids)
    ->Owner(false);
index->Build(base);

// Search.
auto query = vsag::Dataset::Make();
query->NumElements(1)->SparseVectors(&query_vec)->Owner(false);
auto result = index->KnnSearch(
    query, /*topk=*/10,
    R"({"sindi": {"n_candidate": 100}})").value();

Build parameters

Build-time parameters live under index_param. dtype must be "sparse" and metric_type must be "ip".

Parameter	Type	Default	Description
`dim`	int	— (required)	Maximum number of non-zero elements per sparse vector. Not the vocabulary size.
`term_id_limit`	int	`1000000`	Upper bound on term id values (≥ max term id + 1, up to 50 000 000).
`window_size`	int	`50000`	Documents per window (range: 10 000 – 60 000).
`doc_prune_ratio`	float	`0.0`	Fraction of lowest-weight terms dropped per doc at build time (0.0 – 0.9).
`use_quantization`	bool	`false`	Quantize stored term values to cut memory; when enabled, uses 8-bit scalar quantization (SQ8).
`use_reorder`	bool	`false`	Keep a forward store and rescore candidates after coarse SINDI scoring.
`rerank_type`	string	`"fp32"`	Forward-store type used when `use_reorder` is enabled. `fp32` keeps exact values; `dmq8` stores compressed 8-bit DMQ codes.
`remap_term_ids`	bool	`false`	Remap term IDs before indexing; useful when term IDs are sparse or have large gaps.
`avg_doc_term_length`	int	`100`	Hint for memory estimation only.

dim vs term_id_limit. For the sparse vector {0:0.1, 2:0.5, 177:0.8}, dim is 3 (three non-zero entries) while term_id_limit must be ≥ 178 (largest term id + 1). Sizing term_id_limit to your vocabulary is the most common first-time mistake.

Search parameters

Search-time parameters live under the sindi sub-object:

Parameter	Type	Default	Description
`n_candidate`	int	`0`	Candidate heap size. When `0`, defaults to `SPARSE_AMPLIFICATION_FACTOR · topk` (500×). If set, must satisfy `1 ≤ n_candidate ≤ SPARSE_AMPLIFICATION_FACTOR · topk`.
`query_prune_ratio`	float	`0.0`	Fraction of lowest-weight query terms skipped (0.0 – 0.9).
`term_prune_ratio`	float	`0.0`	Fraction of term-list entries skipped (0.0 – 0.9).

SINDI chooses the heap-insertion strategy automatically from the build-time doc_prune_ratio and search-time query_prune_ratio. With the current 0.1 threshold, SINDI uses the distance-array insertion path when both ratios are <= 0.1; if either ratio is greater than 0.1, it uses term-list heap insertion. The legacy use_term_lists_heap_insert search parameter is ignored; configure pruning ratios instead.

auto result = index->KnnSearch(
    query, topk,
    R"({"sindi": {"n_candidate": 200, "query_prune_ratio": 0.1}})").value();

When to use SINDI

Sparse retrieval with BM25, SPLADE, uniCOIL, or similar learned-sparse encoders.
Hybrid dense+sparse pipelines where SINDI handles the sparse leg in parallel with HGraph / IVF for dense embeddings.
Memory-constrained deployments of sparse corpora (use_quantization: true roughly halves inverted-list memory with a small recall loss; use_reorder: true trades forward-store memory for recall, and rerank_type: "dmq8" reduces that forward-store overhead).

SINDI does not accept dense vectors and supports only inner-product similarity. Range search and id-based filtering are supported; see the example for usage. When rerank_type is dmq8, codebooks are fixed by the initial build, so incremental Add after the model is established and UpdateVector are not supported.

Practical guidance

For Chinese corpora, we recommend encoding sparse vectors with BGE-M3. For English corpora, SPLADE is the more common default.
BGE-M3 can emit both sparse and dense vectors. Today SINDI handles the sparse leg, and VSAG plans to support fused sparse+dense scoring in a future release.
Sparse vectors are not a complete replacement for BM25 full-text retrieval. In practice, three-way recall with BM25 + sparse + dense usually outperforms any two-way combination.
At the index level, SINDI can also serve BM25-style scoring: use inverse document frequency as the query-side term weight, and use term-frequency-based weights as the document-side term value.

Common configurations

Flat brute-force sparse index. Keep all non-zero terms in the inverted index (doc_prune_ratio: 0.0), disable the flat reranker (use_reorder: false), and disable quantization (use_quantization: false). This is the simplest high-recall baseline.
Pruned high-accuracy index. Prune most low-weight terms during build (doc_prune_ratio: 0.4), keep the flat copy for reranking (use_reorder: true), and enable quantization to shrink inverted-list memory (use_quantization: true). This is a common balance between memory and recall.
Pruned high-accuracy index with compressed reranking. Use the same pruning and inverted-list quantization as above, but set rerank_type: "dmq8" together with use_reorder: true to reduce forward-store memory.
Very large sparse vocabularies. When term IDs are sparse within the uint32 range, such as hash-based tokenizers, external vocabulary IDs, or vocabularies with large gaps, enable remap_term_ids: true. This avoids managing many empty posting lists and helps stay below the term_id_limit ceiling.

Mark remove

SINDI supports RemoveMode::MARK_REMOVE. Calling Remove(ids) (the default mode) tombstones the given ids so they no longer appear in search results; GetNumElements() drops accordingly and GetNumberRemoved() reports the running total. Removing an id that is absent or already removed is a no-op. RemoveMode::FORCE_REMOVE is not supported and returns an error.

Mark-removed documents still occupy memory until the index is rebuilt; the space is not physically reclaimed.

SIMQ

SIMQ is VSAG’s index for multi-vector retrieval — the kind of data where each document is a set of token-level vectors rather than a single embedding. This pattern arises in late-interaction models such as ColBERT, where a document is represented by one vector per token and relevance is computed via MaxSim (sum of maximum per-query-token similarities).

Source: src/algorithm/simq/

How it works

Dynamic clustering of token vectors. At build time, all token vectors across every document are extracted into a flat pool and clustered using an HGraph-based dynamic clustering algorithm. The initial cluster centers are sampled at a ratio controlled by init_cluster_ratio; clusters that grow beyond max_cluster_size are split incrementally.
Representative graph for coarse search. A representative HGraph is built over the cluster centroids. At query time, each query token searches this graph to find its nearest clusters (controlled by coarse_k). The cluster scores are accumulated across all query tokens to produce a candidate set.
Exact MaxSim reranking. The top rerank_k candidates are re-scored by reading back the original token vectors from disk (or memory) and computing the exact MaxSim similarity between query tokens and document tokens.

The combination of cluster-level coarse search and exact reranking gives SIMQ a tunable recall/latency tradeoff for multi-vector workloads.

Quick start

#include <vsag/vsag.h>

std::string build_params = R"({
    "dtype": "float32",
    "metric_type": "ip",
    "dim": 256,
    "index_param": {
        "base_io_type": "async_io",
        "base_file_path": "/path/to/simq_base_codes.bin",
        "init_cluster_ratio": 0.1,
        "max_cluster_size": 160,
        "split_start_idx": 80,
        "random_seed": 42,
        "coarse_k": 50,
        "rerank_k": 1000
    }
})";
auto index = vsag::Factory::CreateIndex("simq", build_params).value();

// Build a dataset of MultiVector.
// Each document has a variable number of token vectors, each of dimension `dim`.
std::vector<vsag::MultiVector> base_mvs(num_docs);
std::vector<int64_t> ids(num_docs);
for (int64_t i = 0; i < num_docs; ++i) {
    base_mvs[i].len_ = doc_token_counts[i];             // number of tokens in doc i
    base_mvs[i].vectors_ = doc_token_vectors[i];        // flat array: len_ * dim floats
    ids[i] = i;
}
auto base = vsag::Dataset::Make();
base->NumElements(num_docs)
    ->Dim(dim)
    ->Ids(ids.data())
    ->MultiVectors(base_mvs.data())
    ->MultiVectorDim(dim)
    ->Owner(false);
index->Build(base);

// Search with a multi-vector query.
vsag::MultiVector query_mv;
query_mv.len_ = query_token_count;
query_mv.vectors_ = query_token_vectors;
auto query = vsag::Dataset::Make();
query->NumElements(1)
    ->Dim(dim)
    ->MultiVectors(&query_mv)
    ->MultiVectorDim(dim)
    ->Owner(false);

std::string search_params = R"({
    "simq": {
        "coarse_k": 600,
        "rerank_k": 5000
    }
})";
auto result = index->KnnSearch(query, /*topk=*/100, search_params).value();

// Read results.
const int64_t* result_ids = result->GetIds();
const float* result_dists = result->GetDistances();
int64_t result_count = result->GetDim();
for (int64_t i = 0; i < result_count; ++i) {
    int64_t id = result_ids[i];
    float dist = result_dists[i];
}

Build parameters

SIMQ-specific build parameters live under index_param. The common fields dim, dtype, and metric_type are top-level. dtype must be "float32" and metric_type must be "ip".

Parameter	Type	Default	Description
`dim`	int	— (required)	Dimension of each token vector.
`base_io_type`	string	`"async_io"`	Storage backend for reranking multi-vector data.
`base_file_path`	string	`"./default_file_path"`	File path for disk-backed IO types.
`init_cluster_ratio`	float	`0.2`	Fraction of tokens sampled as initial cluster centers.
`max_cluster_size`	int	`64`	Maximum token vectors per cluster before split.
`split_start_idx`	int	`32`	Split position within an overflowing cluster.
`random_seed`	int	`42`	Random seed for clustering shuffle.
`coarse_k`	int	`8`	Default nearest clusters per query token at build time.
`rerank_k`	int	`100`	Default max rerank candidates at build time.

dim — shared across all documents and queries.
base_io_type — supported values: async_io, memory_io, block_memory_io, buffer_io, mmap_io, reader_io.
base_file_path — the default is a placeholder; provide a real path when using a disk-backed type (async_io, buffer_io, mmap_io).
init_cluster_ratio — range (0, 1]. Smaller values yield fewer, larger clusters; larger values produce more, finer-grained clusters.
max_cluster_size — must be > 1.
split_start_idx — typically half of max_cluster_size. Must be in (1, max_cluster_size).
coarse_k, rerank_k — must be > 0.

Choosing cluster parameters. init_cluster_ratio and max_cluster_size together control the number and size of clusters. A smaller init_cluster_ratio with a larger max_cluster_size yields fewer clusters and faster coarse search at the cost of recall. Start with init_cluster_ratio = 0.1–0.2 and max_cluster_size = 2 × split_start_idx, then tune with the search parameters.

Search parameters

Search-time parameters live under the simq sub-object:

Parameter	Type	Default	Description
`coarse_k`	int	(index default)	Nearest clusters per query token.
`rerank_k`	int	(index default)	Max rerank candidates.

coarse_k — overrides the build-time value. Larger values increase the candidate pool and improve recall at the cost of latency.
rerank_k — overrides the build-time value. Larger values improve recall at the cost of more disk reads and compute.
When omitted, the build-time defaults are used. Both values must be > 0 when explicitly set.

auto result = index->KnnSearch(
    query, topk,
    R"({"simq": {"coarse_k": 600, "rerank_k": 5000}})").value();

When to use SIMQ

Late-interaction retrieval with ColBERT or similar models where each document is a bag of token-level vectors and relevance is computed via MaxSim.
Multi-vector relevance where a single embedding per document loses too much information and fine-grained token-level matching is needed.
Large-scale multi-vector corpora where brute-force MaxSim is too slow and a two-stage coarse-then-rerank pipeline provides the right recall/latency tradeoff.

SIMQ only accepts float32 multi-vector data with inner-product similarity. It does not accept single dense vectors or sparse vectors (use HGraph or SINDI for those).

Practical guidance

Scaling coarse_k and rerank_k. Increasing coarse_k widens the cluster-level candidate net; increasing rerank_k admits more documents to exact scoring. In practice, rerank_k has a larger impact on recall but also on latency because each additional candidate requires a disk read and full MaxSim computation.
IO type selection. Use async_io for large corpora that do not fit in memory. Use memory_io or block_memory_io when the multi-vector data fits in RAM for the lowest reranking latency.
Cluster sizing. Set max_cluster_size to roughly twice split_start_idx. The split point determines how the token vectors are partitioned when a cluster overflows; centering it keeps the two halves balanced.

MultiVector field reference

Field	Type	Description
`len_`	`uint32_t`	Number of token vectors in this document or query.
`vectors_`	`float*`	Contiguous array of `len_ * dim` floats

Pyramid

Pyramid: a tree of per-node proximity sub-graphs keyed by a path string; the search walks down the tree along the query’s path prefix and runs ef_search inside the leaf sub-graph

Pyramid is VSAG’s hierarchical, path-partitioned graph index. Every vector is tagged with a path string such as "a/d/f", and Pyramid builds a graph per node in that path tree. At query time you supply a path prefix, and Pyramid restricts the search to the corresponding sub-tree.

This is ideal for multi-tenant deployments, tag-partitioned catalogs, or any scenario where one logical index serves many groups that must not cross-contaminate results.

Source: src/algorithm/pyramid.{h,cpp}, src/algorithm/pyramid_zparameters.{h,cpp}
Example (single hierarchy): examples/cpp/107_index_pyramid.cpp
Example (multi-hierarchy): examples/cpp/112_index_pyramid_multi_hierarchy.cpp

How it works

Path tree. Each vector carries a path in addition to its id. Paths use / as separator (e.g. "tenant_a/lang_en/topic_news"). Pyramid builds one sub-index for every path prefix seen during build.
Per-level sub-graphs. By default every level gets its own proximity graph. Use no_build_levels to skip levels that are too small or too coarse to benefit from graph indexing — those levels still exist as passthrough containers, but search degrades to a scan.
Graph construction. Each sub-graph is built with the same machinery as HGraph: nsw insertion or odescent with graph_iter_turn, neighbor_sample_rate, and alpha for pruning. Base vectors are stored in base_quantization_type; optional reordering keeps a high-precision copy.
Search. Query vectors also carry a path. The search walks down the tree to the most specific sub-graph matching the query path and runs a graph search there with ef_search (and subindex_ef_search for intermediate levels).

Quick start

#include <vsag/vsag.h>

std::string params = R"({
    "dtype": "float32",
    "metric_type": "l2",
    "dim": 128,
    "index_param": {
        "base_quantization_type": "sq8",
        "max_degree": 32,
        "alpha": 1.2,
        "graph_type": "odescent",
        "graph_iter_turn": 15,
        "neighbor_sample_rate": 0.2,
        "no_build_levels": [0, 1],
        "use_reorder": true,
        "build_thread_count": 16
    }
})";
auto index = vsag::Factory::CreateIndex("pyramid", params).value();

// Build with per-vector paths.
auto base = vsag::Dataset::Make();
base->NumElements(n)
    ->Dim(128)
    ->Ids(ids)
    ->Paths(paths)          // std::string* of length n, e.g. "a/d/f"
    ->Float32Vectors(data)
    ->Owner(false);
index->Build(base);

// Search restricted to a path prefix.
std::string query_path = "a/d";
auto query = vsag::Dataset::Make();
query->NumElements(1)
    ->Dim(128)
    ->Float32Vectors(q)
    ->Paths(&query_path)
    ->Owner(false);
auto result = index->KnnSearch(
    query, /*topk=*/10,
    R"({"pyramid": {"ef_search": 100}})").value();

Build parameters

Build-time parameters live under index_param.

Parameter	Type	Default	Description
`base_quantization_type`	string	—	Coarse storage quantizer (`fp32`, `fp16`, `bf16`, `sq8`, `sq4`, `sq8_uniform`, `sq4_uniform`, `pq`, `pqfs`, `rabitq`). See the Quantization chapter for per-quantizer details.
`max_degree`	int	`64`	Maximum out-degree per node within a sub-graph.
`graph_type`	string	`"nsw"`	`nsw` or `odescent`.
`ef_construction`	int	`400`	Candidate list size for `nsw` builds.
`alpha`	float	`1.2`	Pruning factor during graph construction.
`graph_iter_turn`	int	—	ODescent iterations (effective with `graph_type: "odescent"`).
`neighbor_sample_rate`	float	—	ODescent neighbor sampling rate.
`no_build_levels`	int[]	`[]`	Tree levels that skip graph construction (0-indexed from the root).
`use_reorder`	bool	`false`	Keep a high-precision copy for rescoring.
`precise_quantization_type`	string	`"fp32"`	Quantizer for reordering.
`index_min_size`	int	`0`	Minimum sub-index size; smaller groups fall back to scan.
`support_duplicate`	bool	`false`	Allow duplicate ids.
`build_thread_count`	int	`1`	Threads used for parallel build.
`hierarchies`	array	`[]`	Named hierarchy definitions. Each element is either a string (inherits all top-level params) or an object with `name` and optional overrides (`max_degree`, `ef_construction`, `alpha`, `no_build_levels`, `index_min_size`). When present, multi-hierarchy mode is activated and each hierarchy maintains its own independent path tree.

Search parameters

Search-time parameters live under the pyramid sub-object:

Parameter	Type	Default	Description
`ef_search`	int	`100`	Candidate list size for the leaf-level graph search.
`subindex_ef_search`	int	`50`	Candidate list size used when traversing intermediate sub-graphs on the path.
`hierarchies`	string[]	`[]`	Select which hierarchy to search. Empty means use the default (unnamed) hierarchy.
`hierarchy_op`	string	`"single"`	How to combine results across hierarchies: `single` (search one hierarchy), `union`, or `intersection`. Note: `union` and `intersection` are not yet implemented — setting them will cause `KnnSearch`/`RangeSearch` to return an error.

auto result = index->KnnSearch(
    query, topk,
    R"({"pyramid": {"ef_search": 200, "subindex_ef_search": 80}})").value();

Multi-Hierarchy Support

A single Pyramid index can maintain multiple independent path trees, each identified by a name (e.g. "site", "category"). Vectors share IDs and data across all hierarchies — only the paths differ. Each hierarchy can optionally override graph construction parameters.

This is useful when the same set of vectors needs to be partitioned along different dimensions simultaneously. For example, an e-commerce platform might partition products by site (site-a/lang-en) and by category (electronics/phones) at the same time, and search can target either hierarchy independently.

Build configuration

Add a hierarchies array inside index_param. Each element is either:

A string (inherits all top-level params): "site"
An object with name and optional per-hierarchy overrides: {"name": "category", "max_degree": 64, "no_build_levels": [0]}

Overridable per-hierarchy parameters: max_degree, ef_construction, alpha, no_build_levels, index_min_size.

{
    "dtype": "float32",
    "metric_type": "l2",
    "dim": 128,
    "index_param": {
        "base_quantization_type": "sq8",
        "max_degree": 32,
        "alpha": 1.2,
        "graph_type": "odescent",
        "graph_iter_turn": 15,
        "neighbor_sample_rate": 0.2,
        "no_build_levels": [0, 1],
        "use_reorder": true,
        "build_thread_count": 16,
        "hierarchies": [
            "site",
            {"name": "category", "max_degree": 64, "no_build_levels": [0]}
        ]
    }
}

Dataset API for named hierarchies

Use the overloaded Paths(hierarchy_name, paths) method to assign paths per hierarchy. The same Ids() and Float32Vectors() are shared across all hierarchies:

auto base = vsag::Dataset::Make();
base->NumElements(n)
    ->Dim(128)
    ->Ids(ids)
    ->Float32Vectors(data)
    ->Paths("site", site_paths)         // std::string* of length n
    ->Paths("category", category_paths) // independent paths for 2nd hierarchy
    ->Owner(false);
index->Build(base);

Searching a specific hierarchy

Specify which hierarchy to search via "hierarchies" in the search parameters. The query dataset must also set its path on the matching hierarchy name:

auto query = vsag::Dataset::Make();
query->NumElements(1)
    ->Dim(128)
    ->Float32Vectors(q)
    ->Paths("site", &query_path)   // target the "site" hierarchy
    ->Owner(false);

auto result = index->KnnSearch(
    query, /*topk=*/10,
    R"({"pyramid": {"ef_search": 100, "hierarchies": ["site"]}})").value();

Incremental insertion (Add)

Add() works the same as Build() — provide named paths and the index inserts into all matching hierarchies:

auto new_data = vsag::Dataset::Make();
new_data->NumElements(count)
    ->Dim(128)
    ->Ids(new_ids)
    ->Float32Vectors(new_vectors)
    ->Paths("site", new_site_paths)
    ->Paths("category", new_cat_paths);
index->Add(new_data);

RangeSearch

RangeSearch also supports hierarchy selection via the same search parameters:

auto result = index->RangeSearch(
    query, /*radius=*/20.0f,
    R"({"pyramid": {"ef_search": 100, "hierarchies": ["category"]}})").value();

Serialize & Deserialize

Multi-hierarchy indexes serialize and deserialize transparently. The serialized format includes all hierarchy names and their graph structures:

// Serialize
auto binary_set = index->Serialize().value();

// Deserialize into a new index (must use the same build params)
auto new_index = vsag::Factory::CreateIndex("pyramid", build_params).value();
new_index->Deserialize(binary_set);

When to use Pyramid

Multi-tenant services where each tenant must see results only from its own partition, and you would otherwise maintain one index per tenant.
Content catalogs with hierarchical tags (language / region / category) where queries always scope to a known prefix.
Workloads with many small partitions: no_build_levels and index_min_size let you skip graph construction for partitions too small to benefit.

If you don’t need path-based scoping, HGraph is simpler and generally faster.

Mark remove

Pyramid supports RemoveMode::MARK_REMOVE. Calling Remove(ids) (the default mode) tombstones the given ids: they are excluded from subsequent search results, GetNumElements() drops by the number removed, and GetNumberRemoved() reports the running total. Removing an id that is absent or already removed is a no-op. RemoveMode::FORCE_REMOVE is not supported and returns an error.

Mark-removed vectors still occupy memory until the index is rebuilt; the space is not physically reclaimed.

BruteForce

BruteForce: vectors live in a flat store; the query is compared against every stored vector, with optional intra-query parallelism splitting the scan across threads, and the smallest distances are kept in a top-k heap

BruteForce is VSAG’s exact, flat index. At query time it scores the query against every vector in the corpus and returns the true top-k — no graph traversal, no inverted lists, no approximation. Its main role is to be the ground-truth baseline that approximate indexes (HGraph, IVF, …) are evaluated against, but it is also a reasonable production choice for small corpora or for workloads where 100% recall is mandatory.

Source: src/algorithm/brute_force.{h,cpp}
Example: examples/cpp/105_index_brute_force.cpp

How it works

Build. Vectors are stored in a single flat data cell encoded by base_quantization_type (default fp32 — i.e. raw). No graph, no clustering, no training is performed for the uncompressed quantizers; PQ/SQ-style quantizers that require training will still run their training pass when used.
Add. New vectors are appended to the flat store. There is no rebalancing or rebuild cost.
Search. For each query the distance is computed against every stored vector under the configured metric_type (l2, ip, or cosine), then a top-k heap returns the closest ids. Search uses SIMD kernels and supports intra-query parallelism — a single query can be split across multiple threads via the parallelism search parameter (see BruteForce::SearchWithRequest in src/algorithm/brute_force.cpp).

Because the index keeps every vector verbatim (modulo the chosen quantizer), the result is exact when base_quantization_type is fp32 and is the standard reference used to compute ground truth in the eval_performance tool.

Quick start

#include <vsag/vsag.h>

std::string params = R"({
    "dtype": "float32",
    "metric_type": "l2",
    "dim": 128
})";
auto index = vsag::Factory::CreateIndex("brute_force", params).value();

// Build.
auto base = vsag::Dataset::Make();
base->NumElements(n)->Dim(128)->Ids(ids)->Float32Vectors(data)->Owner(false);
index->Build(base);

// Search — no index-specific knobs; pass an empty JSON object (or set `parallelism`).
auto query = vsag::Dataset::Make();
query->NumElements(1)->Dim(128)->Float32Vectors(q)->Owner(false);
auto result = index->KnnSearch(query, /*topk=*/10, "{}").value();

A full runnable program is at examples/cpp/105_index_brute_force.cpp.

Build parameters

The minimal config consists of the three top-level fields (dtype, metric_type, dim). For most uses no index_param is needed — that is the form shown in example 105. Advanced users can pass an index_param object to enable quantization or storage tweaks:

Parameter	Type	Default	Description
`base_quantization_type`	string	`"fp32"`	`fp32`, `fp16`, `bf16`, `sq8`, `sq4`, `sq8_uniform`, `sq4_uniform`, `pq`, `pqfs`, `rabitq` — see the Quantization chapter for per-quantizer details
`use_attribute_filter`	bool	`false`	Enable attribute-based filtering (see Attribute Filter)

Note on store_raw_vector. The store_raw_vector flag is parsed by the shared InnerIndexParameter but BruteForce does not consult it when deciding whether GetRawVectorByIds is available. On BruteForce, raw-vector retrieval is enabled strictly when base_quantization_type is fp32 and either the metric is not cosine or the quantizer is configured to hold the per-vector norms (hold_molds). Setting store_raw_vector: true on BruteForce currently has no observable effect on the capability flags — use HGraph or IVF if you need a quantized index that still answers GetRawVectorByIds.

Example with sq8 quantization for memory savings while keeping linear scan semantics:

{
    "dtype": "float32",
    "metric_type": "ip",
    "dim": 128,
    "index_param": {
        "base_quantization_type": "sq8"
    }
}

When base_quantization_type is set to a quantizer that requires training (sq8, sq8_uniform, sq4_uniform, pq, pqfs, rabitq), Build will run the training pass on the supplied dataset before adding vectors; the resulting recall is no longer 100%. Only fp32, fp16, and bf16 skip training and preserve exact distances (modulo numeric precision).

Search parameters

BruteForce does not expose any index-specific search knobs (no ef, nprobe, etc.), but the generic IndexSearchParameter fields are honored:

Parameter	Type	Default	Description
`parallelism`	int	`1`	Split the linear scan of a single query across this many threads in the index’s internal thread pool. It applies to both `KnnSearch` and `RangeSearch`. Larger values cut single-query latency on large corpora at the cost of using more cores. Values `<= 0` are clamped to `1`.

// Single-threaded scan (default).
auto r1 = index->KnnSearch(query, topk, "{}").value();

// Use 8 threads to scan a single query in parallel.
auto r2 = index->KnnSearch(query, topk, R"({"parallelism": 8})").value();

// RangeSearch uses the same parallelism parameter.
auto r3 = index->RangeSearch(query, radius, R"({"parallelism": 8})").value();

For range search semantics, see Range Search.

Capabilities

BruteForce advertises the following capability flags (see BruteForce::InitFeatures in src/algorithm/brute_force.cpp):

Capability	Notes
`SUPPORT_BUILD` / `SUPPORT_ADD_AFTER_BUILD` / `SUPPORT_ADD_CONCURRENT`	Build once, append later, concurrent inserts.
`SUPPORT_ADD_FROM_EMPTY`	Available with non-training quantizers (`fp32`, `fp16`, `bf16`).
`SUPPORT_KNN_SEARCH` / `SUPPORT_KNN_SEARCH_WITH_ID_FILTER` / `SUPPORT_SEARCH_CONCURRENT`	Standard top-k API and id-list filters, with concurrent search.
`SUPPORT_RANGE_SEARCH` / `SUPPORT_RANGE_SEARCH_WITH_ID_FILTER`	Available with non-training quantizers (`fp32`, `fp16`, `bf16`).
`SUPPORT_DELETE_BY_ID` / `SUPPORT_DELETE_CONCURRENT`	`Remove` by id is supported and concurrency-safe.
`SUPPORT_CAL_DISTANCE_BY_ID`	Distance lookup against stored vectors (non-training quantizers only).
`SUPPORT_GET_RAW_VECTOR_BY_IDS`	Available only when `base_quantization_type` is `fp32` and either the metric is not `cosine` or the underlying quantizer holds molds (`hold_molds`). Quantized BruteForce indexes do not advertise this flag.
`SUPPORT_CHECK_ID_EXIST` / `SUPPORT_CLONE` / `SUPPORT_ESTIMATE_MEMORY` / `SUPPORT_GET_MEMORY_USAGE`	Standard introspection and lifecycle.
`SUPPORT_SERIALIZE_BINARY_SET` / `SUPPORT_SERIALIZE_FILE` / `SUPPORT_SERIALIZE_WRITE_FUNC`	Full save surface.
`SUPPORT_DESERIALIZE_BINARY_SET` / `SUPPORT_DESERIALIZE_FILE` / `SUPPORT_DESERIALIZE_READER_SET`	Full load surface. (There is no `DESERIALIZE_WRITE_FUNC` counterpart — read paths use `READER_SET` instead.)
`NEED_TRAIN`	Set when `base_quantization_type` is one of `sq8`, `sq4`, `sq8_uniform`, `sq4_uniform`, `pq`, `pqfs`, `rabitq`.

Notably not supported by BruteForce: SUPPORT_UPDATE_VECTOR_CONCURRENT, SUPPORT_UPDATE_ID_CONCURRENT, and SUPPORT_EXPORT_MODEL.

When to use BruteForce

Recall baseline. Compute the ground truth that approximate indexes are scored against (this is what the eval_performance tool does).
Tiny corpora. A few hundred to a few hundred thousand vectors, where the cost of a full scan is acceptable and you want to skip tuning altogether.
Strict-recall requirements. Use cases that cannot tolerate any approximation error.
Quantization experiments at small scale. Reuse the same scan pipeline but compare different base_quantization_type settings without the confounding effect of a graph or inverted-list structure.

For anything larger, prefer HGraph (latency-sensitive, high recall) or IVF (throughput-oriented, memory-friendly).

Quantization

Vector quantization is the central memory/recall lever in VSAG. Every index type stores vectors through a base quantizer (configured by base_quantization_type), and may keep a second precise quantizer for re-ranking (precise_quantization_type + use_reorder: true). This chapter documents each supported quantizer: what it does, what JSON parameters it takes, when it needs training, which metrics it supports, and when to choose it.

Quantization decision tree: pick a quantizer by memory budget

Storage and search pipeline

                 +---------------------+
   raw vector -->|  optional transform |   (TQ chain: pca / rom / fht / mrle)
                 +----------+----------+
                            |
                            v
                 +---------------------+
                 |   base quantizer    |   fp32 / fp16 / bf16 /
                 |                     |   sq8 / sq4 / sq8_uniform /
                 |                     |   sq4_uniform / pq / pqfs /
                 |                     |   rabitq
                 +----------+----------+
                            |
                            v
                  +-------------------+
                  |   index storage   |   (HGraph / IVF / Pyramid /
                  |                   |    BruteForce / SINDI)
                  +---------+---------+
                            |
                            v
                   graph / list walk
                            |
            +---------------+-----------------+
            |                                 |
   use_reorder: false                use_reorder: true
            |                                 |
            v                                 v
       top-K result               +---------------------+
                                  | precise quantizer   |  re-rank
                                  | (fp32 default;      |
                                  |  fp16/bf16/sq8 OK)  |
                                  +----------+----------+
                                             |
                                             v
                                        top-K result

use_reorder and precise_quantization_type are not specific to any single quantizer — they apply whenever the index supports reordering (see HGraph, IVF, Pyramid).

Supported quantizers at a glance

The factory in src/datacell/flatten_interface.cpp dispatches to the concrete quantizer based on the JSON type field.

`base_quantization_type`	Bits / dim (approx.)	Needs training	Lossless	Typical use
`fp32`	32	no	yes	Reference / precise reorder store
`fp16`	16	no	near-lossless	Half-precision storage; good default for high-dim float vectors
`bf16`	16	no	near-lossless	Same memory as `fp16`, wider dynamic range
`sq8`	8	yes	no	General memory-saving baseline
`sq4`	4	yes	no	Aggressive memory saving, expect recall drop without reorder
`sq8_uniform`	8	yes	no	SIMD-friendly SQ8 with global min/max
`sq4_uniform`	4	yes	no	SIMD-friendly SQ4; supports `sq4_uniform_trunc_rate`
`pq`	~`pq_bits` × `pq_dim` / `dim`	yes	no	Codebook-based, very compact
`pqfs`	4 × `pq_dim` / `dim`	yes	no	PQ FastScan — SIMD-accelerated PQ
`rabitq`	1 or HGraph x+y	yes	no	1-bit / low-bit split binary quantization, strongest compression
`tq`	depends on chain	depends on terminal quantizer	no	Transform Quantizer: prepend rotations / PCA before another quantizer

int8 and sparse are not exposed as general-purpose base_quantization_type values:

int8 is selected automatically when dtype: "int8" is used; it is not a compression mode.
sparse backs the inverted lists of SINDI and is not selectable on dense indexes.

Training requirement

Quantizers marked yes above implement the NEED_TRAIN flag and require either Build (which trains internally on the input vectors) or an explicit Train call before Add. See Build and Train for the full lifecycle.

For HGraph the training data is the base vectors passed to Build; for IVF the centroids are trained first and the residuals fed to the configured base quantizer.

Metric compatibility

All quantizers documented here support the three dense metrics (l2 / ip / cosine). For cosine, the index normalizes vectors before quantization, so the underlying quantizer never sees the original magnitude. A few practical notes:

pq / pqfs perform their distance lookup tables per subspace; very low pq_dim (≤ 4) on ip / cosine is more sensitive to anisotropy than l2.
rabitq works best when input vectors are decorrelated — either turn on rabitq_use_fht / rabitq_pca_dim, or wrap with a tq chain like "pca, rom, rabitq".

Choosing a quantizer

A pragmatic decision tree:

Need exact distances or a precise reorder store? Use fp32.
Just want to halve memory with negligible recall loss? Use fp16 (or bf16 if the data has a wide dynamic range, e.g. unnormalized embeddings).
Want ~4× memory saving and willing to enable reorder? Use sq8 (or sq8_uniform for better SIMD throughput on l2 / ip).
Memory-tight and willing to lose more recall before reorder? Use sq4_uniform.
High-dim vectors, want strong compression with codebooks? Use pq, or pqfs when the platform supports the SIMD path.
Maximum compression (1-bit) and willing to pay reorder cost? Use rabitq, ideally with rabitq_use_fht: true or a tq chain.

For every lossy quantizer above, enabling use_reorder: true with precise_quantization_type: "fp32" is the standard way to recover recall at the cost of extra memory; see the HGraph parameter table for the exact behavior.

Where quantization is exposed

Not every index exposes every parameter as an external key. As of today:

HGraph exposes the richest set: base_quantization_type, precise_quantization_type, use_reorder, base_pq_dim, rabitq_pca_dim, rabitq_bits_per_dim_query, rabitq_bits_per_dim_base, rabitq_bits_per_dim_precise, rabitq_error_rate, rabitq_use_fht, sq4_uniform_trunc_rate, tq_chain (see src/algorithm/hgraph.cpp).
IVF exposes base_quantization_type, base_pq_dim, the common reorder keys, and the RabitQ tuning keys rabitq_pca_dim, rabitq_bits_per_dim_query, rabitq_bits_per_dim_base, rabitq_version, rabitq_error_rate, and rabitq_use_fht.
Pyramid exposes base_quantization_type, base_pq_dim, the common reorder keys, and the RabitQ PCA, base/query bit, and FHT keys.
BruteForce exposes base_quantization_type and the common reorder keys; some tunables (e.g. tq_chain) are wired internally but not exposed as external keys today.

Refer to each index page for its full parameter list.

FP32 (Baseline)

fp32 stores every coordinate as a 32-bit IEEE-754 float — the same layout as the input vectors. It is the only fully lossless option in VSAG and serves as the reference baseline that all other quantizers are compared against.

Implementation: src/quantization/fp32_quantizer.cpp, parameter file fp32_quantizer_parameter.cpp.

When to use it

Reorder / precise store. precise_quantization_type: "fp32" is the default precise store when use_reorder: true; the graph walk uses a cheap base quantizer and the top-K candidates are re-scored exactly against the fp32 copy.
Reference / ground truth. Building an index with base_quantization_type: "fp32" gives the highest possible recall for that index type and is the standard baseline for benchmarking other quantizers (docs/docs/en/src/resources/eval.md).
Small datasets where memory is not the bottleneck.
BruteForce with raw-vector retrieval. SUPPORT_GET_RAW_VECTOR_BY_IDS is only advertised when base_quantization_type is fp32 and the metric allows it (src/index/brute_force.cpp).

Memory cost

4 × dim bytes per vector for the codes alone. When fp32 is used as a precise store on top of a base quantizer, the per-vector cost is base codes + 4 × dim.

Parameters

fp32 has no quantizer-specific JSON parameters.

{
    "dtype": "float32",
    "metric_type": "l2",
    "dim": 128,
    "index_param": {
        "base_quantization_type": "fp32",
        "max_degree": 32,
        "ef_construction": 300
    }
}

Training

Not required. fp32 does not set NEED_TRAIN.

Metric compatibility

l2, ip, cosine — all supported with no special handling.

Quantization overview
HGraph index — see precise_quantization_type
Memory Management

Half-Precision (FP16 / BF16)

fp16 and bf16 store each coordinate in 16 bits instead of 32, cutting code memory in half with near-lossless accuracy. They have no quantizer-specific JSON parameters; the only difference is the bit layout of the float format itself.

FP32 vs FP16 vs BF16 bit layout: sign / exponent / mantissa widths

Implementation: src/quantization/scalar_quantization/half_precision_quantizer.cpp with the type traits at half_precision_traits.h. Runnable example: examples/cpp/321_index_fp16_hgraph.cpp.

FP16 vs BF16 at a glance

Format	Sign	Exponent	Mantissa	Effective range	Precision
`fp16`	1	5	10	~±6.55e4	~3 decimal digits
`bf16`	1	8	7	same as `fp32` (~±3.4e38)	~2 decimal digits

Practical implications:

fp16 keeps more mantissa bits — better precision for normalized embeddings whose values lie roughly in [-1, 1]. Standard choice for cosine-normalized vectors.
bf16 keeps the full fp32 exponent range — safer for raw, un-normalized features (e.g. weighted sums, accumulator-like embeddings). Loses some precision compared to fp16 on values close to zero.

If you do not know which one to pick, start with fp16 for normalized embeddings and bf16 for unnormalized or wide-range data.

When to use it

Default “drop-in” memory saving on top of an fp32 baseline. Recall loss is typically below 1% on standard benchmarks (SIFT, GIST, Glove, sentence embeddings).
As a precise reorder store that is half the size of fp32: precise_quantization_type: "fp16" or "bf16" with use_reorder: true.
High-dim float vectors where 32-bit storage is the bottleneck.

Memory cost

2 × dim bytes per vector for the codes alone.

Parameters

Neither fp16 nor bf16 has quantizer-specific JSON parameters.

{
    "dtype": "float32",
    "metric_type": "l2",
    "dim": 768,
    "index_param": {
        "base_quantization_type": "fp16",
        "max_degree": 32,
        "ef_construction": 300
    }
}

Swap "fp16" for "bf16" to switch formats. The input dtype stays "float32": the quantizer converts on the fly.

Training

Not required. Neither fp16 nor bf16 sets NEED_TRAIN.

Metric compatibility

l2, ip, cosine — all supported. cosine is implemented by normalizing inputs before storing them at 16-bit precision.

When not to use it

When you also need a memory-aggressive base quantizer such as sq8 or pq — those already pull the storage well below 2 bytes/dim.
When you need exact distances (use fp32).

Quantization overview
HGraph index — precise_quantization_type table
Memory Management

Scalar Quantization (SQ4 / SQ8)

sq8 and sq4 are per-dimension scalar quantizers: each coordinate is mapped from float32 to an 8-bit (sq8) or 4-bit (sq4) integer using a per-dimension [min, max] range learned during training. They share the same implementation, parameterized by bit width, in src/quantization/scalar_quantization/scalar_quantizer.cpp and scalar_quantizer_parameter.h.

For SIMD-friendlier variants with a global [min, max], see Scalar Uniform.

Scalar Quantization: map a coordinate into one of 2^b bins on its per-dim range

SQ4 vs SQ8 at a glance

Type	Bits / dim	Memory vs fp32	Typical accuracy	Notes
`sq8`	8	~1/4	minor recall loss	General memory-saving baseline
`sq4`	4	~1/8	noticeable loss without reorder	Aggressive compression; pair with `use_reorder: true`

The training is per-dimension min/max, so heavy-tailed coordinates can waste code bits. If your data is anisotropic, consider either Scalar Uniform or a Transform Quantizer chain like "rom, sq8_uniform" to rotate first.

Memory cost (codes only)

sq8: dim bytes per vector.
sq4: ceil(dim / 2) bytes per vector.

There is also a small per-dimension range table (8 × dim bytes, amortized across all vectors).

Parameters

Neither sq8 nor sq4 has quantizer-specific JSON parameters today (scalar_quantizer_parameter.h:36-58). The bit width is selected by the type string alone.

{
    "dtype": "float32",
    "metric_type": "l2",
    "dim": 128,
    "index_param": {
        "base_quantization_type": "sq8",
        "max_degree": 32,
        "ef_construction": 300,
        "use_reorder": true,
        "precise_quantization_type": "fp32"
    }
}

Replace "sq8" with "sq4" for 4-bit codes.

Training

NEED_TRAIN is set. Training collects per-dimension min / max from a sample of the input vectors. Calling Build(base) trains internally; on indexes that require an explicit Train (some IVF flows), call it before Add. See Build and Train.

Metric compatibility

l2, ip, cosine — all supported. Distances are computed by decoding the integer codes back to per-dimension scaled floats.

When to choose `sq8` vs `sq4`

sq8: default memory-saving choice for graph indexes (HGraph, Pyramid) when ~4× memory reduction is the target. Recall loss is small enough that use_reorder is often optional, but enabling it with precise_quantization_type: "fp32" is the safest setup.
sq4: choose when memory is tight and you can afford a precise reorder store. Almost always pair with use_reorder: true.
Pick sq*_uniform instead when the data is roughly homogeneous across dimensions; the uniform variants have higher SIMD throughput.
For heavy-tailed / anisotropic data, prefer a Transform Quantizer chain that rotates before quantization.

Scalar Uniform (SQ4 / SQ8 Uniform)
Transform Quantizer
Quantization overview

Scalar Quantization Uniform (SQ4 / SQ8 Uniform)

sq8_uniform and sq4_uniform are scalar quantizers like sq8 / sq4, except they learn a single global [min, max] range that applies to every dimension. This trade-off — slightly less adaptive per dimension, but a much simpler decode path — unlocks SIMD code that runs significantly faster on l2 and ip distance kernels and keeps the code layout tighter.

Uniform (global range) vs per-dimension Scalar Quantization

Implementation: src/quantization/scalar_quantization/sq8_uniform_quantizer.cpp, src/quantization/scalar_quantization/sq4_uniform_quantizer.cpp.

Why it is fast: distances stay in the integer domain

This is the core reason to prefer sq*_uniform over sq* whenever it applies. Because every dimension shares one (min, max) pair, the affine decode x = min + code · (max - min) / (2^b - 1) has the same scale and offset for every coordinate. That has three consequences in the hot path:

The query is encoded once with the same global (min, max) into a uint8 (or packed nibble) buffer, in ProcessQueryImpl (src/quantization/scalar_quantization/sq8_uniform_quantizer.cpp:179).
Each base vector code is never decoded back to fp32. The kernel SQ8UniformComputeCodesIP(uint8_t* q, uint8_t* x, dim) / SQ4UniformComputeCodesIP(...) reads both operands as raw integer codes and does the dot product on uint8 / packed nibble lanes using AVX-512 / AMX (or NEON on ARM), one cache-line at a time. There is no per-element fp dequantization in the inner loop.
The single shared scale factor and offset are applied once per pair, after the integer reduction, to recover the fp distance. Some metric-specific corrections (a per-vector norm or sum) are also added outside the loop; see the trailing metadata noted in sq8_uniform_quantizer.cpp:200 and the SQ8UniformComputeCodesIPBatch batch kernel.

In the per-dimension sq* quantizers, each coordinate has its own (min_i, max_i) so the kernel either has to multiply by a per-dim scale table inside the loop or decode at least one operand back to fp first. Skipping that work is what makes uniform variants significantly faster at the same recall.

When to use it

HGraph / IVF / Pyramid hot paths. When the bottleneck is the base-quantizer distance computation, sq8_uniform / sq4_uniform are almost always faster than their non-uniform counterparts at comparable recall.
Data with similar coordinate ranges across dimensions. Normalized embeddings (cosine), or vectors that have already been rotated (e.g. through a Transform Quantizer chain like "rom, sq8_uniform" or "fht, sq8_uniform") are the ideal inputs.
As the terminal quantizer of a tq chain. The most common chain is "pca, rom, sq8_uniform", see example 501.

SQ4 uniform vs SQ8 uniform

Type	Bits / dim	Memory vs fp32	Typical accuracy
`sq8_uniform`	8	~1/4	minor recall loss
`sq4_uniform`	4	~1/8	needs reorder for high recall

Parameters

Key	Type	Default	Applies to	Meaning
`sq4_uniform_trunc_rate`	float	`0.05`	`sq4_uniform` only	Symmetric truncation rate for outliers (`src/quantization/scalar_quantization/sq4_uniform_quantizer_parameter.h:39`). Higher values clip more extreme coordinates, reducing range loss for the bulk of the data at the cost of clipping the tails.

sq8_uniform has no quantizer-specific JSON parameters.

When using HGraph, sq4_uniform_trunc_rate is exposed as a top-level key and mapped into the nested quantization params (src/algorithm/hgraph.cpp:409-416).

{
    "dtype": "float32",
    "metric_type": "l2",
    "dim": 128,
    "index_param": {
        "base_quantization_type": "sq4_uniform",
        "sq4_uniform_trunc_rate": 0.05,
        "max_degree": 32,
        "ef_construction": 300,
        "use_reorder": true,
        "precise_quantization_type": "fp32"
    }
}

Set "base_quantization_type": "sq8_uniform" and drop the trunc_rate key for the 8-bit variant.

Training

NEED_TRAIN is set. Training estimates one global [min, max] across all dimensions (with optional truncation for sq4_uniform). Build will perform training internally.

Metric compatibility

l2, ip, cosine — all supported. cosine normalizes before quantizing, which is also what makes uniform scaling close to optimal for that metric.

Choosing between uniform and non-uniform

Data is normalized (cosine or pre-normalized l2) → uniform.
Data has very heterogeneous per-dimension ranges (e.g. mixed feature blocks) → start with non-uniform sq*, or use uniform behind a rotation transformer ("rom, sq*_uniform").
Throughput matters more than the last bit of recall → uniform.

Scalar Quantization (SQ4 / SQ8)
Transform Quantizer
Quantization overview

Product Quantization (PQ)

Product Quantization splits a vector into pq_dim equal-sized subvectors and quantizes each one independently against a small learned codebook of 2^pq_bits centroids. The stored code is then pq_dim × pq_bits bits per vector — orders of magnitude smaller than fp32. Distance computations use precomputed lookup tables (LUT) per query.

Product Quantization: sub-vector split and codebook lookup

Implementation: src/quantization/product_quantization/product_quantizer.cpp, parameter file product_quantizer_parameter.cpp.

When to use it

High-dim float vectors (≥ 256 dim) where sq8 is still too large.
Memory-tight, accuracy-acceptable workloads where ~16× compression vs fp32 is required.
Combined with use_reorder: true and a small fp16/fp32 precise store, PQ is the standard “compressed graph index” recipe at large scale.

For wider SIMD throughput at pq_bits = 4, see PQ FastScan.

Memory cost (codes only)

ceil(pq_dim × pq_bits / 8) bytes per vector for the codes, plus a small codebook stored once (pq_dim × 2^pq_bits × subspace_dim × 4 bytes). For typical settings (pq_dim = 32, pq_bits = 8, dim = 128):

code size = 32 × 8 / 8 = 32 bytes per vector (vs 128 × 4 = 512 for fp32 → 16× smaller).

Parameters

Key	Type	Default	Meaning
`pq_dim`	int	`1`	Number of subvectors. Must divide `dim`. Larger values give finer quantization at the cost of more codebooks and larger codes (`product_quantizer_parameter.h:38`).
`pq_bits`	int	`8`	Bits per subvector (1–8). With `8`, each subvector is one byte. Most reliable with `8`; see PQ FastScan for the 4-bit SIMD variant.

On HGraph these are exposed as the top-level keys base_pq_dim and pq_bits (src/algorithm/hgraph.cpp:465-472).

{
    "dtype": "float32",
    "metric_type": "l2",
    "dim": 128,
    "index_param": {
        "base_quantization_type": "pq",
        "base_pq_dim": 32,
        "max_degree": 32,
        "ef_construction": 300,
        "use_reorder": true,
        "precise_quantization_type": "fp16"
    }
}

Training

NEED_TRAIN is set. Training runs k-means per subspace to learn the 2^pq_bits centroids; this is typically the most expensive training step of any built-in quantizer. Use a training sample of at least 256 × 2^pq_bits vectors per subspace for stable codebooks; Build(base) samples from the input automatically.

Metric compatibility

l2, ip, cosine — all supported. Query-time distance is computed via a per-subspace LUT: for l2 it is squared L2 between the query subvector and each centroid; for ip it is the dot product. Cosine reduces to ip on pre-normalized vectors.

Tips

pq_dim should divide dim evenly. Common ratios are dim/4 or dim/8.
Very small pq_dim (e.g. dim/16) produces very compact codes but loses recall fast; combine with reorder.
For anisotropic data, a rotation transformer in front improves PQ recall noticeably: use Transform Quantizer with a chain like "rom, pq".

PQ FastScan
Transform Quantizer
HGraph index
Quantization overview

PQ FastScan

pqfs is a SIMD-accelerated variant of Product Quantization that fixes pq_bits = 4 and uses a memory layout designed for the AVX-2 / AVX-512 “FastScan” lookup-table kernel. At the cost of being 4-bit only, it delivers significantly higher distance-computation throughput.

PQ FastScan: 16-vector 4-bit interleaved block and SIMD LUT lookup

Implementation: src/quantization/product_quantization/pq_fastscan_quantizer.cpp, parameter file pq_fastscan_quantizer_parameter.cpp.

When to use it

The platform has AVX-2 (and ideally AVX-512); the FastScan kernel is the main reason to choose pqfs over pq.
Search throughput, not just memory, matters.
4-bit subspace codebooks (16 centroids per subvector) are sufficient for your recall target — typically yes when combined with reorder.

If your platform does not advertise the required SIMD width, fall back to plain pq.

Memory cost (codes only)

ceil(pq_dim / 2) = (pq_dim + 1) / 2 bytes per vector — both even and odd pq_dim are supported (src/quantization/product_quantization/pq_fastscan_quantizer.cpp:41). Codebooks: pq_dim × 16 × subspace_dim × 4 bytes — significantly smaller than 8-bit pq because the codebook has only 16 centroids per subspace.

Parameters

Key	Type	Default	Meaning
`pq_dim`	int	`1`	Number of subvectors. Must divide `dim`. `pq_bits` is fixed to 4 internally and not configurable (`pq_fastscan_quantizer_parameter.cpp:28-33`).

Exposed on HGraph as base_pq_dim (src/algorithm/hgraph.cpp:465-472).

{
    "dtype": "float32",
    "metric_type": "l2",
    "dim": 128,
    "index_param": {
        "base_quantization_type": "pqfs",
        "base_pq_dim": 32,
        "max_degree": 32,
        "ef_construction": 300,
        "use_reorder": true,
        "precise_quantization_type": "fp16"
    }
}

Training

NEED_TRAIN is set. Trains 16-centroid codebooks per subspace; cheaper than the 256-centroid training in pq.

Metric compatibility

l2, ip, cosine — same coverage as pq. The LUT layout is metric- specific but transparently handled by the quantizer.

Tips

pq_dim should be a multiple of the SIMD-batch width the kernel expects (the implementation uses 32 internally on AVX-512). When in doubt, choose pq_dim ∈ {32, 64, 96, 128}.
The benefit over pq is throughput at the same recall, not memory (4-bit codes are inherently smaller, but pq with pq_bits = 4 would match).
For maximum recall recovery, pair with use_reorder: true and an fp16 or fp32 precise store.

Product Quantization (PQ)
HGraph index
Transform Quantizer
Quantization overview

RaBitQ

rabitq is VSAG’s binary / low-bit quantizer. In its default mode each coordinate is encoded with 1 bit, giving the highest compression ratio of any built-in quantizer. On HGraph, an x+y split mode stores low-bit base codes as x filter bits plus y supplement bits, so graph traversal can use only the filter code and re-ranking can fetch only the supplement bits it needs.

RaBitQ: encode each coordinate by its sign relative to a random hyperplane

Implementation: src/quantization/rabitq_quantization/rabitq_quantizer.cpp, parameter file rabitq_quantizer_parameter.cpp. For the complete HGraph split layout, lower-bound formula, and IO modes, see RaBitQ x+y Split.

When to use it

Maximum compression. 1-bit codes are the smallest possible storage for dense vectors.
High-dim embeddings where rotation + binarization preserves enough geometry for nearest-neighbor search.
Combined with a precise reorder store (fp16 / fp32) — the standard recipe is “RaBitQ + reorder”, because the binary distance is noisy on its own.

For best accuracy, also enable rabitq_use_fht: true or wrap with a Transform Quantizer chain such as "pca, rom, rabitq".

Memory cost (codes only)

rabitq_bits_per_dim_base = 1: ceil(dim / 8) bytes per vector. With dim = 768 that is 96 bytes (vs 3072 for fp32 → 32× smaller).
rabitq_bits_per_dim_base = x plus rabitq_bits_per_dim_precise = y on HGraph: split mode stores roughly (x + y) * dim / 8 bytes per vector for the RaBitQ code bytes. For example, 3+5 is about dim bytes per vector.

Parameters

Key	Type	Default	Meaning
`pca_dim`	int	`0` (= input dim)	Optional PCA preprocessing dimension applied inside RaBitQ. `0` means no PCA reduction (`rabitq_quantizer_parameter.cpp:30-32`).
`rabitq_bits_per_dim_query`	int	`32`	Bits per dimension used to encode the query during search. Allowed values: `4` or `32` (`rabitq_quantizer_parameter.cpp:38-43`).
`rabitq_bits_per_dim_base`	int	`1`	In standard RaBitQ, bits per dimension for the stored base code. In HGraph `x+y` split mode, this external key means `x`, the filter bits used during graph traversal. Allowed range `[1, 8]`.
`rabitq_bits_per_dim_precise`	int	unset	HGraph-only split-mode key. When present with `base_quantization_type: "rabitq"` and `precise_quantization_type: "rabitq"`, this means `y`, the supplement bits used for reorder/full-distance refinement. The sum `x + y` must be `<= 8`.
`rabitq_error_rate`	float	`1.9`	Default lower-bound error multiplier for HGraph split search; must be finite and positive. It can be overridden per search under the `hgraph` object.
`use_fht`	bool	`false`	If `true`, applies a Fast Hadamard Transform rotation before binarization. Improves accuracy on anisotropic data with cheap O(dim log dim) cost (`rabitq_quantizer_parameter.cpp:76-78`).
`fast_encode_rabitq`	bool	`true`	For stored codes wider than one bit, use CAQ-based fast encoding. Set to `false` to retain the exact RaBitQ encoder. The setting is ignored for one-bit codes.
`fast_encode_rabitq_rounds`	int	`6`	Number of CAQ coordinate-adjustment rounds. Allowed range: `[1, 32]`. Each coordinate moves by at most one level per round.

Multi-bit RaBitQ uses an LVQ initialization followed by fixed-round coordinate adjustment when fast_encode_rabitq is enabled. This reduces code selection from approximately O(2^B * dim * log(dim)) to O(rounds * dim) while keeping the existing code layout and query estimator. The implementation follows the CAQ component of SAQ; use the exact fallback when measuring the quality/speed trade-off on a new dataset. These build-only settings do not affect index loading compatibility. VSAG uses a clean-room implementation and does not depend on the Apache-2.0 licensed SAQ reference repository.

Index pages expose RaBitQ settings as top-level index_param keys: HGraph exposes rabitq_pca_dim, rabitq_bits_per_dim_query, rabitq_bits_per_dim_base, rabitq_bits_per_dim_precise, rabitq_error_rate, and rabitq_use_fht; IVF exposes rabitq_pca_dim, rabitq_bits_per_dim_query, rabitq_bits_per_dim_base, rabitq_version, rabitq_error_rate, and rabitq_use_fht; Pyramid exposes the PCA, base/query bit, and FHT keys for its base quantizer. The rabitq_use_fht key is an index-level alias for the quantizer’s internal use_fht key and is rewritten by the index layer. fast_encode_rabitq and fast_encode_rabitq_rounds are available on HGraph, IVF, and Pyramid and are propagated to both base and precise RaBitQ quantizers.

{
    "dtype": "float32",
    "metric_type": "l2",
    "dim": 768,
    "index_param": {
        "base_quantization_type": "rabitq",
        "rabitq_use_fht": true,
        "rabitq_pca_dim": 0,
        "rabitq_bits_per_dim_base": 1,
        "rabitq_bits_per_dim_query": 32,
        "max_degree": 32,
        "ef_construction": 300,
        "use_reorder": true,
        "precise_quantization_type": "fp32"
    }
}

Swap to the higher-accuracy x+y split mode by setting both base and precise quantization to RaBitQ and providing rabitq_bits_per_dim_precise. HGraph then automatically selects the split datacell. In the example below, traversal uses x = 3 filter bits and reorder reads only y = 5 supplement bits:

{
    "base_quantization_type": "rabitq",
    "precise_quantization_type": "rabitq",
    "rabitq_bits_per_dim_base": 3,
    "rabitq_bits_per_dim_precise": 5,
    "rabitq_use_fht": true
}

Training

NEED_TRAIN is set. Training learns the rotation and per-dimension statistics that make the 1-bit encoding well-balanced. The optional FHT rotation is fixed (not learned), so it adds no extra training cost; PCA preprocessing (when pca_dim > 0) trains a projection matrix.

Metric compatibility

l2, ip, cosine — all supported. The binary distance kernel is a popcount over XORed code words; for ip / cosine the implementation also tracks a residual norm so the inner-product estimate is unbiased.

Tips

Always enable reorder unless you have validated that 1-bit recall is acceptable on your data. use_reorder: true + precise_quantization_type: "fp32" is the safe default.
Rotate first. For un-normalized data, set rabitq_use_fht: true or use a tq chain that includes rom / fht.
Split mode for accuracy. HGraph x+y split keeps an x-bit fast path for graph traversal and adds y supplement bits for re-ranking; expect significantly higher recall than pure 1-bit when using more total bits.

Transform Quantizer
HGraph index
RaBitQ x+y Split
Quantization overview

RaBitQ x+y Split

RaBitQ x+y split is an HGraph storage and search mode for low-bit base codes. Each vector is divided into two records:

x filter bits are read during graph traversal and lower-bound filtering.
y supplement bits are fetched only for candidates that reach reorder.
The final reorder distance uses all x+y bits.

This layout keeps the traversal record small while retaining a higher-precision RaBitQ distance for final ranking. It also allows the filter record to stay in memory while the colder supplement record is stored on disk.

Enable split mode

HGraph selects split mode when both quantization types are rabitq and rabitq_bits_per_dim_precise is present:

{
    "dtype": "float32",
    "metric_type": "l2",
    "dim": 960,
    "index_param": {
        "base_quantization_type": "rabitq",
        "precise_quantization_type": "rabitq",
        "use_reorder": true,
        "rabitq_bits_per_dim_query": 32,
        "rabitq_bits_per_dim_base": 3,
        "rabitq_bits_per_dim_precise": 5,
        "rabitq_error_rate": 1.9,
        "max_degree": 64,
        "ef_construction": 400
    }
}

The relevant parameters are:

Parameter	Meaning
`base_quantization_type`	Must be `"rabitq"`.
`precise_quantization_type`	Must also be `"rabitq"` to select split mode.
`rabitq_bits_per_dim_base`	`x`, the number of filter bits read during traversal.
`rabitq_bits_per_dim_precise`	`y`, the number of supplement bits fetched during reorder.
`rabitq_bits_per_dim_query`	Must be `32` for split storage.
`rabitq_error_rate`	Default positive multiplier applied to the lower-bound error term.
`use_reorder`	Should be `true` so candidates are ranked with the `x+y` distance.

The constraints are:

1 <= x <= 8
1 <= y <= 8
x + y <= 8

If rabitq_bits_per_dim_precise is omitted, HGraph uses the standard RaBitQ path instead of split storage.

Enable the filter/lower-bound search path with:

{
    "hgraph": {
        "ef_search": 200,
        "parallelism": 4,
        "rabitq_one_bit_search": true,
        "rabitq_error_rate": 1.9
    }
}

The external search key is named rabitq_one_bit_search, but on a split index it uses all x filter bits configured by rabitq_bits_per_dim_base. hgraph.rabitq_error_rate overrides the index default for that search. It can be swept without rebuilding because the stored record contains the geometric error scale before this multiplier is applied.

Search pipeline

The split search path has four stages:

The query is transformed and normalized once. For supported filter widths, a byte lookup table is also built once per query.
Graph traversal reads only the filter record. It computes an x-bit distance estimate and a conservative lower bound for each visited vector.
Reorder discards candidates whose lower bound cannot enter the result set. It fetches the y-bit supplement record only for the remaining candidates.
The final distance combines the filter contribution and supplement contribution into one x+y-bit RaBitQ estimate.

The HGraph heap is therefore not populated with an x+y distance for every visited vector. The inexpensive x-bit distance drives traversal; the more accurate distance is evaluated only during candidate reorder.

Encoding and bit planes

Let:

d       = transformed dimension
x       = filter bits per dimension
y       = supplement bits per dimension
B       = x + y
P       = ceil(d / 8), bytes in one bit plane
q_i     = transformed and normalized query coordinate
u_i     = unsigned B-bit base code, 0 <= u_i < 2^B

The centered full code is:

c_B = (2^B - 1) / 2
z_i = u_i - c_B
N_B = sqrt(sum_i z_i^2)

PackIntoPlanes stores each logical bit of u_i in a separate bit plane. The split is defined by:

f_i = floor(u_i / 2^y)    # top x bits
s_i = u_i mod 2^y         # low y bits
u_i = 2^y * f_i + s_i

The physical order keeps the most significant filter planes contiguous:

filter record:     logical B-1, B-2, ..., B-x
supplement record: logical 0, 1, ..., y-1

This order lets traversal scan exactly x * P plane bytes and lets reorder fetch exactly y * P additional plane bytes, excluding metadata and alignment.

Datacell layout

RaBitQSplitDataCell owns two RaBitQSplitCodeStorage instances.

Filter record

The filter record in x_bit_cell_ contains:

x high bit planes
base norm
filter-code norm when x > 1
optional MRQ residual norm
optional raw norm for IP/cosine
lower-bound error
filter approximation error

For one vector, its plane payload is:

FilterPlanesSize = x * ceil(d / 8)

The filter record is the hot traversal record. Graph search and prefetch do not need the supplement record while the x-bit estimate is valid.

Supplement record

The supplement record in supplement_cell_ contains:

y low bit planes
full-code norm
full-code approximation error
remaining metadata required by the selected metric and transforms

Its plane payload is:

SupplementPlanesSize = y * ceil(d / 8)

The complete code payload is approximately (x+y) * d / 8 bytes per vector, plus aligned norms, errors, and optional transform metadata.

X-bit filter estimate and lower bound

The filter code for coordinate i is f_i in [0, 2^x - 1]. Define:

c_x   = (2^x - 1) / 2
N_x   = sqrt(sum_i (f_i - c_x)^2)
S_x   = sum_i q_i * f_i
Q_sum = sum_i q_i
rho_x = (S_x - c_x * Q_sum) / N_x

During index construction, RaBitQ stores the absolute filter approximation error E_x and the geometric error scale:

E_safe    = clamp(abs(E_x), 1e-5, 1)
epsilon_x = sqrt(max(0, 1 - E_safe^2) / max(1, d - 1))

The corrected filter inner-product estimate is:

rho_hat_x = rho_x / abs(E_x)

For L2, with base norm N_o and query norm N_q, the x-bit distance and lower bound are:

D_x = N_o^2 + N_q^2 - 2 * N_o * N_q * rho_hat_x

LB = D_x
     - 2 * N_o * N_q * rabitq_error_rate * epsilon_x / abs(E_x)

The implementation subtracts a small floating-point guard from LB. IP and cosine apply the corresponding metric conversion to the error term.

The lower bound is used only to reject candidates safely. D_x remains the traversal estimate, while the final ranking uses the x+y distance.

Query lookup table and SIMD

For x = 2 and x = 3, the query computer builds a FastScan-style byte lookup table. Each table row corresponds to eight query coordinates and has 256 entries:

LUT[block][byte_value]
    = sum of q_i for the set bits in byte_value within that 8-D block

Each filter plane then contributes one lookup per byte instead of decoding eight coordinates separately. Binary weights combine the x planes into S_x.

The AVX2 and AVX512 kernels gather multiple lookup entries at once and also provide a batch-of-four path. The scalar implementation is kept as the portable fallback. The relevant entry points are:

RaBitQFloatMultiBitIPByLookup
RaBitQFloatMultiBitIPBatch4ByLookup
RaBitQFloatBuildByteIPLookupTable

An x-bit width outside the specialized set remains supported through the generic bit-plane computation path.

Reorder scans only y supplement bits

The full unsigned code satisfies:

sum_i q_i * u_i
    = 2^y * sum_i q_i * f_i
      + sum_i q_i * s_i

For L2 with an x-bit lookup filter, HGraph passes the previously computed filter distance to reorder as a hint. ComputeDistWithSplitCodeAndFilterDist recovers the first term from that hint and computes only the second term from the y supplement planes:

full contribution = shifted filter contribution + supplement contribution

Thus a 3+5 index reuses the 3-bit filter result and scans only 5 new bit planes for each reordered candidate. If the hint is unavailable or cannot be used, the code falls back to ComputeDistWithSplitCode, which computes the same final distance directly from both split records.

Memory, disk, and hybrid IO

Both records use the base IO type unless a separate supplement IO type is configured.

Both records in memory

{
    "base_io_type": "block_memory_io"
}

Both records on disk

{
    "base_io_type": "async_io",
    "base_file_path": "/data/hgraph_rabitq_split"
}

VSAG creates separate backing paths for the filter and supplement records.

Filter in memory, supplement on disk

{
    "base_io_type": "block_memory_io",
    "base_supplement_io_type": "async_io",
    "base_file_path": "/data/hgraph_rabitq_split"
}

The supported mixed-IO combination keeps x_bit_cell_ in block memory and places supplement_cell_ in async IO. During batched reorder, the filter record is read by direct pointer while MultiRead fetches only supplement records. base_supplement_file_path may be set explicitly; otherwise VSAG derives a supplement path from base_file_path.

Serialization and loading

Use the normal index-level serialization API. Applications do not need to persist the two records independently.

std::ofstream out("/path/to/index.bin", std::ios::binary);
auto serialized = index->Serialize(out);

auto loaded = vsag::Factory::CreateIndex("hgraph", index_params).value();
std::ifstream in("/path/to/index.bin", std::ios::binary);
auto deserialized = loaded->Deserialize(in);

The split datacell serializes, in order:

Base datacell state and supplement IO type.
Filter storage.
Supplement storage.
RaBitQ quantizer state.

Create the destination index with parameters compatible with the serialized index, especially dim, metric_type, x/y bit widths, and query bits. Changing an encoded parameter requires rebuilding the index. Tuning only the search-time hgraph.rabitq_error_rate does not.

Implementation map

Area	File / entry point
External x/y parameter mapping	`src/algorithm/hgraph/hgraph_param_mapping.cpp`
Split record ownership and IO	`src/datacell/rabitq_split_datacell.h`
Plane layout and code splitting	`RaBitQuantizer::StoredPlaneIndex`, `SplitCode`
Filter estimate and lower bound	`ComputeDistWithOneBitLowerBound`
Direct split distance	`ComputeDistWithSplitCode`
Reorder using the filter hint	`ComputeDistWithSplitCodeAndFilterDist`
SIMD dispatch	`src/simd/rabitq_simd.cpp`
AVX2 / AVX512 lookup kernels	`src/simd/avx2.cpp`, `src/simd/avx512.cpp`
Runnable memory/disk/hybrid example	`examples/cpp/323_index_hgraph_rabitq_split.cpp`

Operational notes

Split storage is currently an HGraph feature and requires fp32 query codes.
l2, ip, and cosine are supported. The filter-hint reorder shortcut is currently specialized for L2.
Keep use_reorder: true unless x-bit traversal accuracy alone has been validated for the dataset.
Changing x, y, metric, or transform parameters requires rebuilding the index. A search-time hgraph.rabitq_error_rate override does not.
Use RaBitQ for the general quantizer description and HGraph for the complete index parameter table.

Quantization Transform

The Transform Quantizer (base_quantization_type: "tq") chains one or more vector transformations in front of a final quantizer. Transformations reshape vectors so a downstream quantizer can encode them more accurately or compactly — for example, rotate vectors so their energy is spread across dimensions (RaBitQ / SQ benefit greatly), or reduce dimensionality with PCA before storing them.

Runnable example: examples/cpp/501_quantization_transform.cpp.

Why a transform layer

A pure quantizer compresses vectors directly. With low-bit quantizers (e.g. sq4, sq*_uniform, rabitq) accuracy depends heavily on the distribution of vector coordinates: heavy-tailed or anisotropic dimensions waste code bits. A transform layer mitigates this:

Random rotations (rom, fht) decorrelate coordinates so a uniform/scalar quantizer works better on each axis.
PCA (pca) reduces dimensions while keeping most of the variance — code size shrinks proportionally.
MRLE (mrle) is a metric-recoverable low-rank encoding tailored to L2/IP search.

The transform output then feeds a standard quantizer (fp32, sq8, sq8_uniform, rabitq, …), which actually stores the codes. The whole chain is referred to as tq (Transform Quantizer).

Quick start

tq is currently exposed as a public, externally configurable quantization type only by HGraph. HGraph maps the top-level keys tq_chain and rabitq_pca_dim into the nested base_codes.quantization_params JSON via its external-parameter mapping (src/algorithm/hgraph.cpp:370-385). IVF, BruteForce, Pyramid and WARP all internally render a tq_chain field into their inner JSON template, but none of them expose tq_chain (or any other TQ parameter) in their external mapping today. CheckAndMappingExternalParam rejects unknown external keys with invalid config param (src/utils/util_functions.cpp:50-53), so passing tq_chain in the index_param JSON of those indexes will fail at index construction. Configuring TQ on non-HGraph indexes therefore requires code-side changes to add the external mapping.

std::string params = R"({
    "dtype": "float32",
    "metric_type": "l2",
    "dim": 128,
    "index_param": {
        "base_quantization_type": "tq",
        "tq_chain": "pca, rom, sq8_uniform",
        "rabitq_pca_dim": 64,
        "max_degree": 32,
        "ef_construction": 300,
        "use_reorder": true,
        "precise_quantization_type": "fp32"
    }
})";

vsag::Resource resource(vsag::Engine::CreateDefaultAllocator(), nullptr);
vsag::Engine engine(&resource);
auto index = engine.CreateIndex("hgraph", params).value();
index->Build(base);
auto result = index->KnnSearch(query, topk, search_params).value();

In the example above, base vectors are first projected from 128 to 64 dimensions (pca), randomly rotated (rom), then quantized with sq8_uniform. Reordering is enabled, so HGraph keeps an fp32 precise copy and re-ranks the top candidates returned by the graph search (include/vsag/index.h; see Memory Management for the storage implications).

`tq_chain` syntax

tq_chain is a comma-separated string: one or more transformer names followed by exactly one final quantizer name. Whitespace around tokens is trimmed (src/quantization/transform_quantization/transform_quantizer_parameter.cpp:53-74).

"<transform1>, <transform2>, ..., <quantizer>"

Examples:

Chain	Effect
`"rom, fp32"`	Random rotation, then store as fp32 (used for tests / sanity baselines).
`"fht, sq8_uniform"`	Fast Hadamard rotation, then 8-bit uniform scalar quantization.
`"pca, rom, sq8_uniform"`	PCA reduction, random rotation, then 8-bit uniform — the example chain.
`"pca, rom, rabitq"`	PCA + rotation feeding the RaBitQ binary quantizer.
`"mrle, fp32"`	MRLE projection then store as fp32 (MRLE must be first).

Constraints (transform_quantizer_parameter.cpp:33-45):

The chain must contain at least one transformer + one quantizer (length ≥ 2). An empty or single-token chain raises INVALID_ARGUMENT.
The last token must be a quantizer that the TQ flatten path can dispatch: one of fp32, sq8, sq8_uniform, sq4, sq4_uniform, bf16, fp16, pq, pqfs, rabitq (src/datacell/flatten_interface.cpp:126-164). TransformQuantizerParameter parses a slightly wider set of names (it also accepts sparse, int8, tq), but the flatten factory does not have a dispatch branch for int8/tq and explicitly rejects sparse when is_transform_quantizer is true (src/datacell/flatten_interface.cpp:166), so using any of those three as the terminal quantizer fails at index construction with an “unsupported quantization type” error.
Any unrecognized transformer name raises INVALID_ARGUMENT: invalid transformer name (transform_quantizer.h:225-227).

Supported transformers

The factory at src/quantization/transform_quantization/transform_quantizer.h:192-227 recognizes four transformer names today:

Name	Output dim	Description	Implementation
`pca`	`pca_dim` if set, else input dim	Principal-Component-Analysis projection; reduces dim while keeping variance.	`src/impl/transform/pca_transformer.h`
`rom`	input dim	Random Orthogonal Matrix; rotates vectors to decorrelate dimensions.	`src/impl/transform/random_orthogonal_transformer.h`
`fht`	input dim	Fast Hadamard / KAC random rotation; cheaper variant of `rom`.	`src/impl/transform/fht_kac_rotate_transformer.h`
`mrle`	`mrle_dim` (≤ input dim)	Metric-Recoverable Low-rank Encoding; must be the first transformer in the chain.	`src/impl/transform/mrle_transformer.h`

Notes:

mrle placement is enforced at transform_quantizer.h:155-159 and mrle_dim ≤ input_dim at transform_quantizer.h:217-220.
Other strings declared in headers (residual, normalize) are not wired into the factory and will be rejected.

Transformer parameters

The transformer JSON is read by VectorTransformerParameter::FromJson (src/impl/transform/vector_transformer_parameter.cpp:22-35):

Key	Type	Default	Meaning
`pca_dim`	int	`0` (= input dim)	Output dim of the `pca` transformer.
`mrle_dim`	int	`0` (= input dim)	Output dim of the `mrle` transformer.
`input_dim`	int	auto	Auto-populated by the chain — do not set manually.

HGraph external mapping

When using HGraph, two top-level shortcuts are mapped into the nested quantizer params (src/algorithm/hgraph.cpp:370-385):

tq_chain → base_codes.quantization_params.tq_chain
rabitq_pca_dim → base_codes.quantization_params.pca_dim

The name rabitq_pca_dim predates Transform Quantizer; when the chain includes pca, it drives the pca transformer’s output dim (it is not RaBitQ-specific). When the chain ends in rabitq without pca, the same key configures RaBitQ’s own PCA preprocessing (src/quantization/rabitq_quantization/rabitq_quantizer_parameter.cpp:30).

Reordering and the precise codes store

Transform chains lose some information by design (rotation is lossless, but pca / sq*_uniform / rabitq are not). Combining tq with reorder — keep a precise (typically fp32) copy of every vector and re-rank the top candidates — restores accuracy with a modest memory cost:

use_reorder: true makes HGraph keep a second flatten store, the precise codes store (src/algorithm/hgraph.cpp:76-79).
precise_quantization_type selects its quantizer (fp32 default; can be fp16 / bf16 / sq8 if you want to trade memory for accuracy).
At search time the graph walk uses the cheap tq base codes, then the top-K are re-scored against the precise codes (hgraph.cpp:978-981 and surrounding sites).

use_reorder and precise_quantization_type are not specific to tq — they also apply when base_quantization_type is sq8, pq, rabitq, etc. See the table in HGraph index for the full per-index parameter list.

Choosing a chain

A pragmatic rule of thumb:

Goal	Suggested chain	Notes
Memory-aggressive, accuracy-restored	`"pca, rom, sq8_uniform"` + `use_reorder: true`, `precise_quantization_type: "fp32"`	Example 501 baseline.
Maximum compression	`"pca, rom, rabitq"` + reorder	1-bit quantization with rotation cleanup; expect noticeable accuracy loss without reorder.
Anisotropic data, no dim reduction	`"rom, sq8_uniform"` or `"fht, sq8_uniform"`	Use `fht` for lower build cost on high dim.
Distance-preserving low-rank	`"mrle, fp32"`	Metric-aware reduction, no further quantization.

Always benchmark on your own data — the right tradeoff between tq aggressiveness and use_reorder depends on dataset distribution, target recall, and memory budget.

Compatibility and merge

Two tq configurations are considered compatible only when the chain length, every transformer name, and the final quantizer all match (src/quantization/transform_quantization/transform_quantizer_parameter.cpp:99-117). This matters for serialization round-trips and for any future merge / clone operations across indexes — keep the chain string stable across builds you intend to combine.

Chain string equality is necessary but not sufficient. The tq_chain token list does not encode transformer parameters such as pca_dim / mrle_dim (read as separate sibling JSON keys at src/quantization/transform_quantization/transform_quantizer.h:200-216) or the internal parameters of the terminal quantizer (e.g. pq subspace count, rabitq rotation seed). These parameters change the effective code dimension and layout, so for two builds to be practically merge-/clone-compatible you must keep the entire transform + quantizer parameter set consistent, not just the chain string.

HGraph index — parameter reference for base_quantization_type, use_reorder, precise_quantization_type.
Memory Management — memory cost of base + precise stores.

Code Structure

This page gives a quick tour of the VSAG repository layout.

Top-Level Directories

Path	Contents
`include/vsag/`	Public C++ headers (`index.h`, `engine.h`, `resource.h`, `constants.h`, …)
`src/`	Core implementation and unit tests
`tests/`	Functional tests (Catch2)
`examples/cpp/`	C++ end-to-end examples
`examples/python/`	Python examples
`python/`	`pyvsag` packaging
`python_bindings/`	pybind11 bindings
`typescript/`	Node.js / TypeScript bindings (npm package `vsag`)
`tools/`	Utilities such as `eval_performance`, `analyze_index`, `check_compatibility`
`extern/`	Third-party dependencies (do not modify unless necessary)
`docs/`	Documentation (this site) and blog posts
`cmake/`	CMake modules

Core Subsystems (inside `src/`)

index: concrete index implementations (HGraph, IVF, Pyramid, SINDI, …; legacy: HNSW, DiskANN).
quantization: FP32 / FP16 / BF16 / SQ4 / SQ8 / PQ quantizers with SIMD dispatch.
graph: shared graph data structures used by HGraph and other graph-based indexes.
storage: binary/reader sets, streaming serialization.
allocator / thread pool: user-pluggable resource management.
simd: cascaded SIMD dispatch for x86_64 and AArch64.

Naming Conventions

Public API: vsag namespace, in include/vsag/.
Implementation: src/, same namespace unless the file explicitly needs otherwise.
File extension: .cpp (not .cc).

Build Artifacts

make debug / make release / make dev produce build trees:

build-debug/
build-release/
build-dev/

Each contains the test binaries, example executables, and libraries.

New Index Integration Checklist

Use this checklist when adding a new index implementation to VSAG. Keep the first pass small: make the index creatable through the public factory, support the lifecycle methods it advertises, and add feature flags only after the behavior is implemented and tested.

Required

Choose the public index name and type.
- Add the user-facing index name constant in include/vsag/constants.h or src/inner_string_params.h if the new index needs one.
- Add an IndexType value in include/vsag/index.h when callers must distinguish this index through Index::GetIndexType().
- Keep the public name stable. src/factory/index_registry.cpp normalizes factory names to lower case before lookup.
Implement the index behind the public Index API.
- Prefer the current IndexImpl<T> pattern in src/index/index_impl.h for new in-memory indexes: implement T as an InnerIndexInterface subclass under src/algorithm/<name>/.
- Implement static CheckAndMappingExternalParam(const JsonType&, const IndexCommonParam&) so IndexImpl<T> can validate external JSON and construct the internal parameter object.
- Implement GetName(), GetIndexType(), GetNumElements(), Add(), KnnSearch(), Serialize(StreamWriter&), and Deserialize(StreamReader&) as required by the inner-index contract; implement Build() when the index supports it. InnerIndexInterface::Add() is pure virtual, so every subclass must override it even when the index only supports Build() and should throw UNSUPPORTED_INDEX_OPERATION without enabling the corresponding feature flag.
- Leave unsupported operations on the base class defaults instead of advertising them.
Wire creation through the factory and engine path.
- Add a creator in src/factory/index_creators.cpp.
- Register it in register_all_index_creators().
- Use IndexCommonParam::CheckAndCreate() from src/index_common_param.cpp for shared fields: dtype, metric_type, dim, optional repr, optional extra_info_size, allocator, thread pool, and old serialization format compatibility.
- Add factory tests for the accepted name, invalid parameters, and unsupported parameter shapes in src/factory/factory_test.cpp or a focused test near the implementation.
Add build-system wiring.
- Add src/algorithm/<name>/CMakeLists.txt and include it from src/algorithm/CMakeLists.txt.
- Add new sources to the closest existing target rather than creating a parallel build path.
- Keep file suffixes as .cpp; do not edit extern/ unless the dependency itself is part of the change.
Define and validate index parameters.
- Put implementation parameters in a <name>_parameter.{h,cpp} pair when the index has its own schema.
- Implement JSON parsing, ToJson(), and CheckCompatibility() for serialized/recreated parameter checks.
- Reject invalid dimensions, metric/data-type combinations, missing required blocks, and unknown modes with ErrorType::INVALID_ARGUMENT through the existing CHECK_ARGUMENT / VsagException flow.
- Update docs/docs/{en,zh}/src/resources/index_parameters.md and the per-index docs if the parameter becomes user-facing.
Implement lifecycle behavior deliberately.
- Decide whether the index supports Train(), Build(), ContinueBuild(), Add() after build, and Add() from empty.
- Decide whether Remove(), UpdateId(), UpdateVector(), UpdateAttribute(), and UpdateExtraInfo() are supported.
- For every supported mutation, test empty datasets, duplicate IDs if applicable, missing IDs, immutable index behavior, and search correctness after mutation.
- Keep InitFeatures() in sync with the implemented operations.
Implement search behavior and result packing.
- Support the public KnnSearch() overloads required by the index, including BitsetPtr, std::function<bool(int64_t)>, and FilterPtr filtering when advertised.
- Implement SearchWithRequest() if the index supports the newer request path.
- Return Dataset fields consistently: IDs, distances, num_elements, result dimension, and optional result statistics.
- Parse search parameters with the same nested index-name convention used by existing indexes such as HGraph.
Preserve serialization compatibility.
- Implement both Serialize(StreamWriter&) and Deserialize(StreamReader&); the base InnerIndexInterface adapts these to BinarySet, ReaderSet, and streams.
- Store enough metadata to reject incompatible binaries, including parameter compatibility and extra_info_size when extra info is present.
- Add round-trip tests through BinarySet and ReaderSet when the index supports both.
- If the binary format changes for an existing index, update compatibility tests and document the migration path.
Add tests before advertising features.
- Unit tests should cover parameter parsing, build/add/search, serialization, feature flags, memory estimation if implemented, and error paths.
- Functional tests under tests/ should cover public API behavior that users can reach through Factory::CreateIndex().
- Keep C++ unit-test coverage for src/ and include/ at or above the project threshold.

Optional Adaptation Points

Add these only when the new index actually implements the behavior. When implemented, enable the matching IndexFeature values in InitFeatures() and add focused tests.

Extra info (extra_info / extrainfo).
- Parse extra_info_size through IndexCommonParam.
- Store fixed-size per-vector payloads from Dataset::GetExtraInfos() and validate Dataset::GetExtraInfoSize() during Build(), Add(), and UpdateExtraInfo().
- Implement GetExtraInfoByIds() and populate search-result extra info when the feature is supported.
- If the index supports extra-info filtering, document and test the search parameter that switches Filter::CheckValid(const char*) on.
- See docs/docs/en/src/advanced/extra_info.md and examples/cpp/320_feature_extra_info.cpp.
Statistics and analysis.
- Implement GetStats() for static structure data that helps operators understand an index.
- Implement AnalyzeIndexBySearch(const SearchRequest&) only for query-driven analysis.
- Include result statistics with Dataset::Statistics() when search-time metrics are useful.
- Keep tool output compatible with tools/analyze_index and docs/docs/en/src/resources/analyze_index.md.
Range search.
- Override the pure-virtual primary RangeSearch(..., const FilterPtr&, ...) required by InnerIndexInterface, even when the algorithm does not support range search; in that case, throw UNSUPPORTED_INDEX_OPERATION without enabling the corresponding feature flag.
- Implement the other RangeSearch() overloads only when the algorithm can honor radius semantics and limited_size.
- Test no-limit, limited, filtered, and empty-result cases.
- See docs/docs/en/src/advanced/range_search.md.
Filters and attributes.
- Support BitsetPtr, std::function<bool(int64_t)>, or FilterPtr only when each path is wired through search.
- If attribute filtering is supported, implement attribute storage/update paths and document accepted attribute schemas.
- Test the difference between bitset invalidation and Filter::CheckValid() keep semantics.
Allocator, resource, and threading integration.
- Allocate long-lived structures with IndexCommonParam::allocator_ or a derived allocator-aware component.
- Use the Resource thread pool when build/search work is parallelized.
- Verify custom allocator and custom thread-pool examples still describe the behavior accurately.
- Mark concurrency features only after add/search/delete/update interactions are tested.
Memory and introspection APIs.
- Implement EstimateMemory(), EstimateBuildMemory(), GetMemoryUsage(), and GetMemoryUsageDetail() when the index can report meaningful numbers.
- Implement GetMinAndMaxId(), CheckIdExist(), ExportIDs(), GetVectorByIds(), GetDataByIds(), GetIndexDetailInfos(), or GetDetailDataByName() only when the backing storage supports them.
Model export, clone, merge, tune, feedback, and cache import/export.
- Implement Clone() and ExportModel() when the index can be copied without sharing mutable storage incorrectly.
- Implement Merge() only when parameter compatibility, ID remapping, and deletion semantics are clear.
- Implement Tune(), Feedback(), ExportCache(), and ImportCache() only with explicit parameter parsing and tests.
Bindings, examples, benchmarks, and docs.
- Python bindings usually need updates only when the public API surface changes; current pyvsag users create indexes through names and JSON parameters.
- Add C++ examples under examples/cpp/ when the index introduces a new user workflow.
- Add Python examples/tests under tests/python/ if the behavior is reachable from pyvsag.
- Add benchmark YAML under benchs/ when reviewers need repeatable performance data.
- Add English and Chinese website docs under docs/docs/{en,zh}/src/ for user-facing indexes or parameters.

Review Checklist

Factory::CreateIndex() and Engine::CreateIndex() create the index by the documented name.
CheckFeature() returns true only for implemented and tested behavior.
Unsupported operations return UNSUPPORTED_INDEX_OPERATION through existing wrappers.
Serialization round trips preserve IDs, vectors or compressed codes, parameters, deletions, attributes, and extra info that the index claims to support.
Search results remain valid after every supported lifecycle transition.
Documentation lists user-facing parameters, supported metrics/data types, and unsupported operations.
Practical validation has run: unit/functional tests for changed code, plus formatting or git diff --check for documentation-only changes.

Building

This page documents how to build VSAG from source.

Prerequisites

OS: Ubuntu 20.04+ or CentOS 7+
Compiler: GCC 9.4.0+ or Clang 13.0.0+
CMake: 3.18.0+
clang-format / clang-tidy: exactly version 15 (enforced)
Optional: HDF5 (for tools/eval/eval_performance), libaio (for DiskANN async IO), Intel MKL.

We recommend using the official Docker dev image, which already contains the matching toolchain:

docker pull vsaglib/vsag:ubuntu

Makefile Targets

Running make help prints a concise list; the most common targets are:

debug       Build debug binaries (no sanitizers; tests/tools/examples OFF by default)
release     Build release binaries (tests/tools/examples OFF by default)
dev         Developer build: debug + tests + tools + examples
test        Build with tests enabled and run unit + functional tests
cov         Build with coverage instrumentation enabled
asan        Build with AddressSanitizer
tsan        Build with ThreadSanitizer
fmt         Run clang-format
lint        Run clang-tidy
fix-lint    Apply clang-tidy fix-its in-place (destructive)
pyvsag      Build pyvsag for a specific Python version (PY_VERSION=...)
pyvsag-all  Build pyvsag wheels for all supported Python versions
dist-pre-cxx11-abi  Build redistributable tarball (pre-C++11 ABI)
dist-cxx11-abi      Build redistributable tarball (C++11 ABI)
dist-libcxx         Build redistributable tarball (libc++)
clean       Remove build trees

Step-by-Step

git clone https://github.com/antgroup/vsag.git
cd vsag
make release

Resulting binaries from a plain make release:

Library: build-release/src/libvsag.{a,so}

Examples and tools are not built by default. To include them, either use make dev, or enable the corresponding Makefile variables (VSAG_ENABLE_EXAMPLES=ON, VSAG_ENABLE_TOOLS=ON) or the underlying CMake cache options (-DENABLE_EXAMPLES=ON, -DENABLE_TOOLS=ON).

Environment Variables / CMake Options

The Makefile exposes a few VSAG_ENABLE_* environment variables that are translated into CMake cache options (ENABLE_*). Defaults below reflect a plain make release.

Makefile env var	CMake option	Default	Effect
`VSAG_ENABLE_INTEL_MKL`	`ENABLE_INTEL_MKL`	`OFF`	Use Intel MKL for BLAS kernels
`VSAG_ENABLE_LIBAIO`	`ENABLE_LIBAIO`	`ON` on Linux	Enable DiskANN async IO via libaio
`VSAG_ENABLE_TOOLS`	`ENABLE_TOOLS`	`OFF`	Build utilities under `tools/`
`VSAG_ENABLE_EXAMPLES`	`ENABLE_EXAMPLES`	`OFF`	Build sample programs under `examples/cpp/`
n/a	`CMAKE_BUILD_TYPE`	driven by Makefile target	Debug / Release

When invoking CMake directly instead of using make, use the underlying CMake cache option names:

cmake -S . -B build-release -DCMAKE_BUILD_TYPE=Release -DENABLE_INTEL_MKL=ON
cmake --build build-release -j

Offline / Air-gapped Builds

VSAG downloads its third-party libraries at configure/build time. In offline or restricted-network environments, set the per-dependency VSAG_THIRDPARTY_* environment variables to fetch each archive from a local path or an internal mirror (internal HTTP server, OSS bucket, etc.). See Offline / Air-gapped Builds for the full list of variables and worked examples.

Python Wheel (pyvsag)

make pyvsag PY_VERSION=3.10
# Or build all supported versions in parallel:
make pyvsag-all

Wheels are emitted under python/dist/.

Distribution Tarballs

For ABI-compatible redistribution use one of:

make dist-pre-cxx11-abi   # _GLIBCXX_USE_CXX11_ABI=0
make dist-cxx11-abi       # _GLIBCXX_USE_CXX11_ABI=1
make dist-libcxx          # libc++ (Clang)

The produced tarballs contain headers, static/shared libraries, and version metadata.

Release Publishing

To publish a new GitHub Release, use the Build and Publish Release workflow in the GitHub Actions tab and run it manually with:

branch: the branch, tag, or commit SHA to release from
tag_name: the new release tag, such as v1.0.0
prerelease: whether to mark the release as a prerelease

For a local dry run of the same packaging script, run:

COMPILE_JOBS=6 bash ./scripts/release/dist.sh

You can increase COMPILE_JOBS if your machine has enough memory, but the default is conservative to avoid out-of-memory failures in CI runners.

Offline / Air-gapped Builds

VSAG downloads a set of third-party libraries at CMake configure / build time (via ExternalProject_Add and FetchContent). On a machine without internet access, or behind a slow / restricted network, those downloads can fail or time out. This page explains how to point each dependency at a local path or an internal mirror (internal HTTP server, OSS bucket, Artifactory, etc.) so the build can complete fully offline.

How third-party downloads are resolved

For every downloaded dependency, VSAG builds a list of candidate URLs and lets CMake try them in order, stopping at the first one that succeeds. Using antlr4 as the representative example (extern/antlr4/antlr4.cmake):

set (antlr4_urls
    https://github.com/antlr/antlr4/archive/refs/tags/4.13.2.tar.gz   # 1. upstream
    https://vsagcache.oss-rg-china-mainland.aliyuncs.com/antlr4/v4.13.2.tar.gz  # 2. project mirror
)
if (DEFINED ENV{VSAG_THIRDPARTY_ANTLR4})
    message (STATUS "Using local path for antlr4: $ENV{VSAG_THIRDPARTY_ANTLR4}")
    list (PREPEND antlr4_urls "$ENV{VSAG_THIRDPARTY_ANTLR4}")   # 0. your override (tried first)
endif ()

ExternalProject_Add (antlr4
    URL ${antlr4_urls}
    URL_HASH MD5=3b75610fc8a827119258cba09a068be5
    ...)

The resolution order is therefore:

VSAG_THIRDPARTY_<LIB> — your override, if the environment variable is set to a non-empty value. Tried first.
The upstream URL (GitHub / project release page).
The project-maintained Aliyun OSS mirror (vsagcache.oss-rg-china-mainland.aliyuncs.com). This fallback is always present and helps in mainland-China / poor-network environments, but it is not user-configurable — for a fully internal mirror, use the environment variable.

Availability: the VSAG_THIRDPARTY_* override is available on main and on the 0.15, 0.16, 0.17, and 0.18 release lines — see Version availability.

Key facts before you start

The value may be a local path or a URL. Accepted forms include an absolute filesystem path (/data/deps/fmt-10.2.1.tar.gz), a file:// URL, or any http(s):// URL — including an internal HTTP server or an OSS / S3 bucket.
The archive hash is still verified. Each dependency declares a URL_HASH (MD5 or SHA256). Your mirrored / local archive must be byte identical to the upstream archive, otherwise CMake aborts with a hash mismatch. The simplest safe approach is to download the exact upstream file once and re-host it unchanged.
Overrides are read at configure time. If you change a variable after a previous configure, re-run CMake configure or run make clean first so the new value takes effect.
Use a non-empty value, or leave it unset. CMake treats a variable that is exported but empty as defined, so export VSAG_THIRDPARTY_FMT= would prepend an empty entry to the URL list and break the download. To disable an override, unset it instead of setting it to an empty string.
Each dependency is independent. There is no single global mirror variable; set one VSAG_THIRDPARTY_<LIB> per dependency you need. You only need to set variables for the dependencies your build actually pulls in (see Which dependencies do I need?).
Confirmation in the log. When an override is picked up, CMake prints -- Using local path for <lib>: <your value>.

Environment variables

Environment variable	Library	Upstream archive to mirror	Pulled in when
`VSAG_THIRDPARTY_JSON`	nlohmann/json 3.11.3	`github.com/nlohmann/json/.../v3.11.3.tar.gz`	always
`VSAG_THIRDPARTY_ANTLR4`	ANTLR4 runtime 4.13.2	`github.com/antlr/antlr4/.../4.13.2.tar.gz`	always
`VSAG_THIRDPARTY_BOOST`	Boost 1.67.0 (headers)	`archives.boost.io/.../boost_1_67_0.tar.gz`	always
`VSAG_THIRDPARTY_OPENBLAS`	OpenBLAS 0.3.23	`github.com/OpenMathLib/OpenBLAS/.../OpenBLAS-0.3.23.tar.gz`	default BLAS backend (when not using system / MKL)
`VSAG_THIRDPARTY_CPUINFO`	pytorch/cpuinfo	`github.com/pytorch/cpuinfo/archive/ca678952...tar.gz`	always
`VSAG_THIRDPARTY_FMT`	fmt 10.2.1	`github.com/fmtlib/fmt/.../10.2.1.tar.gz`	always (unless system fmt)
`VSAG_THIRDPARTY_THREAD_POOL`	log4cplus/ThreadPool	`github.com/log4cplus/ThreadPool/archive/3507796e...tar.gz`	always
`VSAG_THIRDPARTY_TSL`	Tessil/robin-map 1.4.0	`github.com/Tessil/robin-map/.../v1.4.0.tar.gz`	always
`VSAG_THIRDPARTY_ROARINGBITMAP`	CRoaring 3.0.1	`github.com/RoaringBitmap/CRoaring/.../v3.0.1.tar.gz`	always
`VSAG_THIRDPARTY_CATCH2`	Catch2 3.7.1	`github.com/catchorg/Catch2/.../v3.7.1.tar.gz`	`ENABLE_TESTS=ON`
`VSAG_THIRDPARTY_HDF5`	HDF5 1.14.4	`github.com/HDFGroup/hdf5/.../hdf5_1.14.4.tar.gz`	`ENABLE_TOOLS=ON` (+ C++11 ABI)
`VSAG_THIRDPARTY_ARGPARSE`	p-ranav/argparse 3.1	`github.com/p-ranav/argparse/.../v3.1.tar.gz`	`ENABLE_TOOLS=ON` (+ C++11 ABI)
`VSAG_THIRDPARTY_YAML_CPP`	yaml-cpp 0.9.0	`github.com/jbeder/yaml-cpp/.../yaml-cpp-0.9.0.tar.gz`	`ENABLE_TOOLS=ON` (+ C++11 ABI)
`VSAG_THIRDPARTY_TABULATE`	p-ranav/tabulate	`github.com/p-ranav/tabulate/archive/3a583010...tar.gz`	`ENABLE_TOOLS=ON` (+ C++11 ABI)
`VSAG_THIRDPARTY_HTTPLIB`	cpp-httplib 0.35.0	`github.com/yhirose/cpp-httplib/.../v0.35.0.tar.gz`	`ENABLE_TOOLS=ON` (+ C++11 ABI)
`VSAG_THIRDPARTY_PYBIND11`	pybind11 2.11.1	`github.com/pybind/pybind11/.../v2.11.1.tar.gz`	Python bindings (`pyvsag` / `ENABLE_PYBINDS=ON`)

The exact upstream URL and the expected URL_HASH for each dependency are the single source of truth in the corresponding extern/<lib>/<lib>.cmake file. Check that file when mirroring, especially after a version bump.

Not listed here (no download, so no override needed): Intel MKL (located on the host with find_path) and DiskANN (vendored in-tree under extern/diskann/).

Which dependencies do I need?

You only have to mirror what your specific build actually downloads:

Core library (make debug / make release): JSON, ANTLR4, BOOST, OPENBLAS, CPUINFO, FMT, THREAD_POOL, TSL, ROARINGBITMAP. Two of these are conditional: OPENBLAS is not downloaded when BLAS comes from Intel MKL (x86_64 with ENABLE_INTEL_MKL=ON) or from a system OpenBLAS, and FMT is skipped when a system fmt is found.
+ Tests (make test, ENABLE_TESTS=ON): also CATCH2.
+ Tools (ENABLE_TOOLS=ON and ENABLE_CXX11_ABI=ON): also HDF5, ARGPARSE, YAML_CPP, TABULATE, HTTPLIB — downloaded only when both options are enabled (see cmake/VSAGThirdParty.cmake).
+ Python wheel (make pyvsag): also PYBIND11.

Examples

A. Internal HTTP server or OSS bucket (recommended)

Re-host the upstream archives unchanged on an internal endpoint, then point each variable at it. A base-URL shell variable keeps this compact:

# Internal mirror that serves the upstream archives byte-for-byte
export VSAG_MIRROR=https://mirror.corp.example.com/vsag-thirdparty

export VSAG_THIRDPARTY_JSON=$VSAG_MIRROR/v3.11.3.tar.gz
export VSAG_THIRDPARTY_ANTLR4=$VSAG_MIRROR/antlr4-4.13.2.tar.gz
export VSAG_THIRDPARTY_BOOST=$VSAG_MIRROR/boost_1_67_0.tar.gz
export VSAG_THIRDPARTY_OPENBLAS=$VSAG_MIRROR/OpenBLAS-0.3.23.tar.gz
export VSAG_THIRDPARTY_CPUINFO=$VSAG_MIRROR/cpuinfo-ca678952.tar.gz
export VSAG_THIRDPARTY_FMT=$VSAG_MIRROR/fmt-10.2.1.tar.gz
export VSAG_THIRDPARTY_THREAD_POOL=$VSAG_MIRROR/thread_pool-3507796e.tar.gz
export VSAG_THIRDPARTY_TSL=$VSAG_MIRROR/robin-map-1.4.0.tar.gz
export VSAG_THIRDPARTY_ROARINGBITMAP=$VSAG_MIRROR/CRoaring-3.0.1.tar.gz

make release

An OSS / S3 bucket works identically — just use its public (or network-reachable) object URL, for example https://my-bucket.oss-cn-hangzhou.aliyuncs.com/vsag/OpenBLAS-0.3.23.tar.gz.

B. Pre-downloaded local files (fully air-gapped)

On a machine that has no network at all, copy the archives onto the box first (e.g. to /data/vsag-deps) and point the variables at the local files:

export VSAG_THIRDPARTY_JSON=/data/vsag-deps/v3.11.3.tar.gz
export VSAG_THIRDPARTY_ANTLR4=/data/vsag-deps/antlr4-4.13.2.tar.gz
export VSAG_THIRDPARTY_BOOST=/data/vsag-deps/boost_1_67_0.tar.gz
export VSAG_THIRDPARTY_OPENBLAS=/data/vsag-deps/OpenBLAS-0.3.23.tar.gz
export VSAG_THIRDPARTY_CPUINFO=/data/vsag-deps/cpuinfo-ca678952.tar.gz
export VSAG_THIRDPARTY_FMT=/data/vsag-deps/fmt-10.2.1.tar.gz
export VSAG_THIRDPARTY_THREAD_POOL=/data/vsag-deps/thread_pool-3507796e.tar.gz
export VSAG_THIRDPARTY_TSL=/data/vsag-deps/robin-map-1.4.0.tar.gz
export VSAG_THIRDPARTY_ROARINGBITMAP=/data/vsag-deps/CRoaring-3.0.1.tar.gz

make release

A file:// URL (export VSAG_THIRDPARTY_FMT=file:///data/vsag-deps/fmt-10.2.1.tar.gz) is equally valid.

C. Override a single dependency

If only one download is unreliable, override just that one and let the rest use the defaults:

export VSAG_THIRDPARTY_OPENBLAS=https://mirror.corp.example.com/OpenBLAS-0.3.23.tar.gz
make release

Alternative: reuse system libraries

For dependencies that are already installed on the host, you can skip the download entirely instead of mirroring it. Set VSAG_USE_SYSTEM_DEPS=ON (or the per-dependency VSAG_USE_SYSTEM_<DEP>=ON). See DEVELOPMENT.md for the list of dependencies that currently support system reuse.

Troubleshooting

Hash mismatch / “HASH mismatch” error — your mirrored or local archive is not byte-identical to the upstream file. Re-download the exact upstream archive and re-host it unchanged, or confirm the expected URL_HASH in extern/<lib>/<lib>.cmake.
Override seems ignored — make sure the variable was exported in the same shell that runs make / cmake, then re-run configure (or make clean), because the value is read at CMake configure time. Confirm the -- Using local path for <lib>: <your value> line appears in the configure output.
Still hitting the network — you probably missed a dependency that your build pulls in. Cross-check the list in Which dependencies do I need? against your enabled options (ENABLE_TESTS, ENABLE_TOOLS, Python bindings).

Version availability

The per-dependency VSAG_THIRDPARTY_* override is available on the main development line and on the 0.15, 0.16, 0.17, and 0.18 release lines, so local-path and internal-mirror overrides behave the same way across all of them. It was introduced on main by #1606 and backported to the release lines (tracked in #2308). The built-in upstream + Aliyun OSS mirror fallback remains present on every line, and system-library reuse is still available when you would rather not mirror a dependency at all.

Running Tests

VSAG uses Catch2 for testing, organized in two layers:

Unit tests live next to source files under src/.
Functional tests live under tests/ and cover cross-module, end-to-end behavior. Typical files include test_hnsw.cpp, test_hgraph.cpp, test_diskann.cpp, test_ivf.cpp, test_pyramid.cpp, test_sindi.cpp, test_brute_force.cpp, test_multi_thread.cpp, test_memleak.cpp.

Run the Full Suite

make test configures a Debug build with tests enabled and runs the full unit + functional suite:

make test

Note: make test does not enable coverage instrumentation. To produce a coverage report, use make cov — it configures the build with ENABLE_COVERAGE=ON; run the test binaries afterwards to collect and aggregate coverage data:

make cov
# then run the test binaries, e.g.:
./build-debug/tests/functional_tests
# open build-debug/coverage/index.html

Run a Single Binary

./build-debug/tests/functional_tests "[hgraph]"
./build-debug/tests/functional_tests "[hnsw][concurrent]"

Catch2 supports filtering by name, tag, and wildcards — see --help.

Coverage Expectations

Contributions are expected to keep the C++ line coverage over src/ and include/ at 90% or higher, as measured by the make cov flow and the CI coverage job.

Memory & Concurrency

test_memleak.cpp: run under AddressSanitizer / LeakSanitizer to verify construction and destruction paths.
test_multi_thread.cpp: concurrent Build / KnnSearch / RangeSearch correctness.

Python Tests

make pyvsag PY_VERSION=3.10
cd tests/python && pytest -q

References

tests/ directory
Makefile entries: test, cov, asan

Contributing to VSAG

First of all, thank you for taking the time to contribute to VSAG! Contributors like you are what keep the project alive and growing. 🎉

If this is your first open-source contribution, we recommend walking through the First Contributions tutorial to get familiar with the basic workflow.

The sections below cover what you may want to know before contributing.

Ways to Contribute

Report bugs. File a bug issue with enough detail to reproduce the problem. If you consider the issue urgent, mention the VSAG team in a comment.
Propose features. File a feature request issue describing the expected behavior. Discuss the design with the VSAG team and the community before implementation. Once the plan is agreed, follow the contribution flow.
Implement features or fix bugs. Pick up an open issue and follow the contribution flow. Feel free to ask for clarifications by commenting on the issue and @-mentioning the VSAG team.

Contribution Flow

We use GitHub Flow to collaborate on VSAG.

Fork the VSAG repository on GitHub.
Clone your fork locally: git clone git@github.com:<yourname>/vsag.git.
Create a working branch: git checkout -b my-topic-branch.
Make changes, run local checks, commit, and push with git push --set-upstream origin my-topic-branch.
Open a pull request on GitHub.

If you already have a local clone, update it before starting so that merge conflicts are less likely:

git remote add upstream git@github.com:antgroup/vsag.git
git checkout main
git pull upstream main
git checkout -b my-topic-branch

Guidelines

Before opening a pull request, make sure your changes pass local checks and follow the VSAG coding style.

New features must ship with tests that demonstrate correct behavior and guard against regressions.
Bug fixes should add a regression test covering the triggering case; a missing test is usually what allowed the bug in the first place.
Preserve API compatibility when editing code under include/.
Do not include internal headers (from src/) in public headers (under include/).
When contributing a new feature, remember that the maintenance cost shifts to the VSAG team by default — we evaluate contributions by weighing benefit against long-term maintenance.

Signing Off (DCO)

All contributions to this project must include a Developer Certificate of Origin (DCO) sign-off. The sign-off must be included in every commit message in the form Signed-off-by: {{Full Name}} <{{email address}}> (without the {}). Contributions without a DCO sign-off cannot be accepted.

This is my commit message

Signed-off-by: Random J Developer <random@developer.example.org>

Git provides a -s flag that appends the trailer automatically:

git commit -s -m "This is my commit message"

For contributions made with the help of an AI coding agent (OpenCode, Claude Code, Codex, etc.), only human contributors sign off on the DCO; the AI agent must not add its own Signed-off-by trailer, because only a human can legally certify the DCO. Each human contributor still adds their own Signed-off-by: trailer as usual. Instead of signing off, attribute the AI agent with an Assisted-by: trailer that follows the Linux kernel AI Coding Assistants policy, in the form Assisted-by: AgentName:ModelVersion. Place the human Signed-off-by: line(s) first, followed by the Assisted-by: line, for example:

Signed-off-by: Random J Developer <random@developer.example.org>
Assisted-by: OpenCode:claude-opus-4.7

The human submitter is responsible for reviewing AI-generated changes, ensuring license compliance, and taking full responsibility for the contribution.

Commit Messages and PR Labels

Follow Conventional Commits; common prefixes include feat:, fix:, docs:, chore:, refactor:, test:, ci:.
If a commit must skip CI, put [skip ci] at the beginning of the subject line, e.g. [skip ci] docs: fix typo in README.
Every PR must carry two labels (enforced by Mergify, required to merge):
- kind/*: kind/bug, kind/feature, kind/improvement, or kind/documentation.
- version/*: the target release, e.g. version/1.0, version/0.18.

Coding Style

VSAG follows the Google C++ Style Guide with project-specific tweaks covering indentation, naming, and line width. The authoritative configuration lives in the repository:

clang-format: https://github.com/antgroup/vsag/blob/main/.clang-format
clang-tidy: https://github.com/antgroup/vsag/blob/main/.clang-tidy

clang-tidy enforces not only naming conventions but also style checks such as magic-number usage.

The Makefile exposes formatting targets; clang-format and clang-tidy (both version 15) must be installed.

Format code:

make fmt

Run static analysis (fix the reported issues manually):

make lint

Some clang-tidy findings can be auto-fixed:

make fix-lint

Local Testing

Run the full test suite and make sure it passes:

make test

Build and Train

VSAG separates index construction into three stages:

Train — fit any internal quantizers / partitioners on a sample of the data.
Add — insert vectors into the index using those trained encoders.
Build — convenience wrapper that does Train then Add on the same dataset.

Most users only call Build. Two situations are worth knowing about explicitly:

Train + streaming Add. When the corpus is large or arrives incrementally, train on a representative sample first and then stream the rest via Add (no rebuild). See examples/cpp/311_feature_train.cpp.
ODescent. An alternative graph-construction algorithm for HGraph / Pyramid that builds the whole neighbor graph in batch instead of insertion-by-insertion. See examples/cpp/312_feature_odescent.cpp.

The `Train` API

tl::expected<void, Error> Index::Train(const DatasetPtr& data);

Declared in include/vsag/index.h. Trains the index on a (typically sampled) dataset without inserting it. Returns tl::expected<void, Error>; check .has_value().

Indexes that perform meaningful training: HGraph, IVF, BruteForce, WARP, Pyramid. For all of them, Build(data) first trains and then inserts the vectors — for the default NSW graph it calls the equivalent of Train(data) followed by Add(data), while for HGraph/Pyramid configured with graph_type: "odescent" the insertion step is a batch ODescent graph build instead of Add (see HGraph::build_by_odescent / Pyramid::Build in src/algorithm/).

When you need to call `Train` explicitly

The base quantizer requires training. The capability flag IndexFeature::NEED_TRAIN reflects this on HGraph and IVF: HGraph sets it whenever base_quantization_type is not one of fp32, fp16, bf16 (src/algorithm/hgraph.cpp:1803); IVF always sets it (src/algorithm/ivf.cpp:316) because its centroids must be trained. Pyramid does not currently set NEED_TRAIN in InitFeatures() even when its underlying HGraph quantizer would need training, so do not rely on HasFeature(NEED_TRAIN) for Pyramid — call Train explicitly when you choose a trained base_quantization_type. fp32 / fp16 / bf16 do not require training (you can still call Train — it is a harmless no-op).
You want to insert vectors in many small batches rather than in one Build call.
You plan to export the trained model and reuse it on another index instance (via ExportModel).

Pattern: train once, add in a stream

auto params = R"({
    "dtype": "float32",
    "metric_type": "l2",
    "dim": 128,
    "index_param": {
        "max_degree": 32,
        "ef_construction": 100,
        "base_quantization_type": "sq8"
    }
})";
auto index_result = vsag::Factory::CreateIndex("hgraph", params);
if (!index_result.has_value()) {
    std::cerr << "Create index failed: " << index_result.error().message << std::endl;
    return -1;
}
auto index = index_result.value();

// Step 1 — train on the whole base (or a representative sample).
auto train_result = index->Train(base);
if (!train_result.has_value()) {
    std::cerr << "Train failed: " << train_result.error().message << std::endl;
    return -1;
}

// Step 2 — stream vectors in one at a time (or in small batches).
for (int64_t i = 0; i < num_vectors; ++i) {
    auto one = vsag::Dataset::Make();
    one->NumElements(1)
       ->Dim(dim)
       ->Ids(ids + i)
       ->Float32Vectors(vectors + i * dim)
       ->Owner(false);
    auto add_result = index->Add(one);
    if (!add_result.has_value()) { /* handle */ }
}

The complete program is examples/cpp/311_feature_train.cpp.

`Train` vs `Build` vs `Add`

Call	Trains quantizer?	Inserts vectors?	Use it when
`Build(data)`	yes	yes (all of `data`)	Bulk-load: you have the whole dataset already.
`Train(data)`	yes	no	You want to insert vectors later, possibly in batches.
`Add(data)`	no (requires prior `Train` or `Build`)	yes	Incremental inserts after the index is trained.

ODescent: an alternative graph builder

By default, HGraph and Pyramid build their graphs NSW-style — every vector is inserted one at a time and connects to the neighbors found by a search-on-insert (graph_type: "nsw"). ODescent (“Optimized NN-Descent”) is an alternative: it seeds a random k-NN graph over the entire dataset and then iteratively refines edges using sampled candidate exchanges.

ODescent typically produces graphs with comparable recall to NSW at lower build cost for large batches, because the refinement loop parallelizes cleanly over the data and avoids per-insert search.

ODescent is implemented in src/impl/odescent/odescent_graph_builder.{h,cpp} and is currently used by HGraph and Pyramid (build path).

Enabling ODescent on HGraph

Add graph_type: "odescent" to the HGraph index_param:

{
    "dtype": "float32",
    "metric_type": "l2",
    "dim": 128,
    "index_param": {
        "base_quantization_type": "sq8",
        "max_degree": 26,
        "ef_construction": 100,
        "graph_type": "odescent",
        "graph_iter_turn": 10,
        "neighbor_sample_rate": 0.3,
        "alpha": 1.2
    }
}

Then just call Build(data) — no other API change. The complete program is examples/cpp/312_feature_odescent.cpp.

ODescent build parameters

These keys go under index_param alongside the usual HGraph keys:

Parameter	Default (HGraph)	Description
`graph_type`	`"nsw"`	Set to `"odescent"` to switch on this builder.
`graph_iter_turn`	`30`	Number of refinement iterations. Higher → better graph quality, longer build.
`neighbor_sample_rate`	`0.2`	Fraction of each node’s neighbors sampled per iteration for candidate exchange.
`alpha`	`1.2`	α factor used by the diversity-aware edge pruning step. Larger `alpha` → sparser, more diverse edges.
`min_in_degree`	`1`	Minimum in-degree enforced when repairing the graph after pruning.
`build_block_size`	`10000`	Parallelization granularity (vectors per worker block).

max_degree is inherited from the HGraph top-level setting; you do not need to repeat it under ODescent. Upper graph layers automatically use half of max_degree.

When to use ODescent vs NSW

Use ODescent when you have the full dataset up front and care about build throughput on a many-core machine. The batch refinement parallelizes better than insertion-by-insertion.
Use NSW (the default) when you build incrementally or care about strictly minimal memory during the build, or when you have not measured a build-time problem.

Both choices produce a graph that is searched the same way at query time, so search-side parameters (ef_search, pq_rerank, …) carry over unchanged.

Range Search

Besides k-nearest-neighbor search (KnnSearch), VSAG also supports range search (RangeSearch): return every result whose distance to the query vector is less than or equal to a given radius. It is useful for threshold filtering, de-duplication, and approximate recall scenarios.

Basic Usage

#include <vsag/vsag.h>

// 1. Create an index (HGraph in this example)
auto index = vsag::Factory::CreateIndex("hgraph", hgraph_build_params).value();
index->Build(dataset);

// 2. Prepare the query
auto query = vsag::Dataset::Make();
query->NumElements(1)->Dim(dim)->Float32Vectors(query_vec)->Owner(false);

// 3. Range search
float radius = 0.5f;
auto result = index->RangeSearch(query, radius, search_params);
if (result.has_value()) {
    auto ids = result.value()->GetIds();
    auto dists = result.value()->GetDistances();
    int64_t n = result.value()->GetDim();
    // ...
}

See examples/cpp/302_feature_range_search.cpp for a complete example.

`limited_size` Parameter

RangeSearch accepts a limited_size argument that caps the number of returned results:

// Return at most 100 results within the radius
auto result = index->RangeSearch(query, radius, search_params, /*limited_size=*/100);

limited_size = -1 (default): return every result inside the radius (unlimited).
limited_size > 0: return at most this many results.
limited_size = 0: invalid; the implementation explicitly rejects this value (CHECK_ARGUMENT(limited_size != 0, ...)).

Combining with Filter

RangeSearch has the same signature shape as KnnSearch and also accepts a filter (see examples/cpp/301_feature_filter.cpp). The filter is applied during the search, not afterwards, which is more efficient than post-filtering.

Support Matrix

Index type	Supports RangeSearch
hgraph	yes
ivf	yes
brute_force	yes
sindi	yes (sparse vectors)

Notes

The distance metric (IP / L2 / cosine) defines the semantics of radius. Make sure it matches the metric_type specified at index creation.
If radius is very large, the result set can be huge; combine with limited_size to avoid unbounded memory usage.
For graph-based indexes (HGraph), runtime parameters like ef share the same meaning between RangeSearch and KnnSearch.

Calculate Distance by ID

Besides KnnSearch and RangeSearch, VSAG exposes APIs that compute the distance between a query vector and already-indexed vectors referenced by their IDs. This is useful for re-ranking external candidate sets, validating recall, or implementing custom retrieval pipelines on top of VSAG.

Two flavors are provided:

CalcDistanceById — single ID, returns one distance.
CalDistanceById — batch of IDs, returns a DatasetPtr containing distances.

Each flavor has two overloads: one taking a raw const float* (dense vectors) and one taking a DatasetPtr (works for both dense and sparse vectors).

Note on naming. The batch method is currently spelled CalDistanceById (missing the c in Calc). This is a historical typo introduced when the batch overload was first added; the two names do not indicate any semantic difference beyond single vs. batch. The current spelling is kept for backward compatibility and is expected to be deprecated in a future release in favor of a correctly spelled name (proposed: CalcDistancesById). New code is encouraged to centralize calls behind a thin wrapper to ease the eventual migration. See issue #2068 for tracking.

API Overview

// Single, dense float pointer.
tl::expected<float, Error>
CalcDistanceById(const float* vector,
                 int64_t id,
                 bool calculate_precise_distance = true) const;

// Single, DatasetPtr (dense or sparse).
tl::expected<float, Error>
CalcDistanceById(const DatasetPtr& vector,
                 int64_t id,
                 bool calculate_precise_distance = true) const;

// Batch, dense float pointer.
tl::expected<DatasetPtr, Error>
CalDistanceById(const float* query,
                const int64_t* ids,
                int64_t count,
                bool calculate_precise_distance = true) const;

// Batch, DatasetPtr (dense or sparse).
tl::expected<DatasetPtr, Error>
CalDistanceById(const DatasetPtr& query,
                const int64_t* ids,
                int64_t count,
                bool calculate_precise_distance = true) const;

Declarations live in include/vsag/index.h.

`calculate_precise_distance`

true (default): the implementation tries to use the high-precision representation of the stored vector (e.g. full-precision float32). When the index only retains quantized codes, obtaining the precise value can be more expensive.
false: the implementation may use the quantized / approximate representation that the index already keeps in memory. Faster, but the returned distance is approximate.

Return Semantics

The single-ID overload returns the distance as a float.
The batch overload returns a DatasetPtr whose GetDistances() array has count entries aligned with the input ids. A value of -1 in that array indicates an invalid ID (e.g. the ID does not exist in the index).
The distance metric (IP / L2 / cosine) follows the metric_type chosen at index construction; see Metric Semantics.

Basic Usage

#include <vsag/vsag.h>

// 1. Build an HGraph index over float32 vectors.
auto index = engine.CreateIndex("hgraph", hgraph_build_parameters).value();
index->Build(base);

// 2. Single ID.
auto d = index->CalcDistanceById(query_vector.data(), /*id=*/42);
if (d.has_value()) {
    std::cout << "distance to id 42 = " << d.value() << std::endl;
}

// 3. Batch IDs.
std::vector<int64_t> ids = { 1, 2, 3, 4, 5 };
auto result = index->CalDistanceById(query_vector.data(), ids.data(), ids.size());
if (result.has_value()) {
    const float* dists = result.value()->GetDistances();
    for (size_t i = 0; i < ids.size(); ++i) {
        if (dists[i] == -1.0f) {
            std::cout << ids[i] << " -> invalid ID" << std::endl;
        } else {
            std::cout << ids[i] << " -> " << dists[i] << std::endl;
        }
    }
}

A runnable example is provided in examples/cpp/306_feature_calculate_distance_by_id.cpp.

Sparse Vectors

For sparse-vector indexes such as SINDI, the const float* overloads are not applicable. Pass the query as a DatasetPtr carrying sparse vectors via SparseVectors(...), and use the DatasetPtr overloads:

auto query = vsag::Dataset::Make();
query->NumElements(1)->SparseVectors(&sparse_query)->Owner(false);

auto d = index->CalcDistanceById(query, /*id=*/42);

Support Matrix

Index type	Dense overload (`const float*`)	DatasetPtr overload	Notes
hgraph	yes	yes	Honors `calculate_precise_distance`.
ivf	yes	yes (default loop)
brute_force	yes	yes (default loop)	Always precise (no quantization).
pyramid	yes	yes (default loop)
sindi	no	yes	Sparse vectors only.

Indexes that do not implement the API surface for a given overload return an UNSUPPORTED_INDEX_OPERATION error.

Notes

The query dimension (for dense overloads) must match the index dimension.
The batch overload has a default implementation that loops over single-ID calls; some indexes override it for batch-level optimization.
Like all VSAG read-only APIs, these methods are safe to call concurrently with other read-only operations (e.g. KnnSearch).

Filtered Search

Filtered search restricts the result set of a KnnSearch or RangeSearch to vectors that satisfy an application-defined predicate. VSAG applies the predicate during index traversal whenever the underlying algorithm supports it, so you avoid the recall loss and extra latency of post-filtering top-k results.

This page covers the three id-based filter APIs:

Bitset filter — a compact bit array indexed by vector id.
Function-callback filter — a std::function<bool(int64_t)>.
Filter object — a vsag::Filter subclass that can also expose hints (valid ratio, distribution) to the search algorithm.

For attribute / “hybrid” search where the predicate is an SQL-like expression over typed fields, see Attribute Filter (Hybrid Search). For filtering against an opaque per-vector byte payload during graph traversal, see Extra Info.

Truth-value Conventions

The three APIs disagree on how to spell “exclude this id”. Read this table carefully before mixing them.

API	Method	Returning `true` means …
`Bitset`	`Test(id)`	id is filtered out
`std::function`	`f(id)`	id is filtered out
`Filter::CheckValid`	`CheckValid(id)`	id is kept

The bitset and std::function overloads are wrapped internally as a BlackListFilter (src/impl/filter/black_list_filter.cpp): the bit being set, or the callback returning true, marks the id as excluded. The Filter::CheckValid API inverts that polarity — true keeps the id. If you maintain your own deletion bitmap, the bitset/function APIs are a natural fit. If you want predicate logic with hints, the Filter form is clearer.

Bitset Filter

vsag::Bitset (include/vsag/bitset.h) is a growable, ordinal-indexed bit array.

auto invalid = vsag::Bitset::Make();
for (int64_t i = 0; i < num_vectors; ++i) {
    if (ids[i] % 2 == 0) {
        invalid->Set(ids[i]);    // even ids are excluded
    }
}

auto search_params = R"({ "hgraph": { "ef_search": 100 } })";
auto result = index->KnnSearch(query, /*topk=*/10, search_params, invalid).value();

The bitset is indexed by vector id, but ids are masked to their low 32 bits before lookup (bit_index = id & ROW_ID_MASK in src/impl/filter/black_list_filter.cpp, where ROW_ID_MASK = 0xFFFFFFFFLL). Two ids that share the same low 32 bits will collide in the bitset, so keep ids within [0, 2^32) if you rely on this filter; otherwise switch to the Filter form. The bitset is indexed by id, not by insertion order, so reused/recycled ids must be handled by your application.

Function-callback Filter

A plain lambda or std::function<bool(int64_t)> works directly. The callback must return true for ids that should be excluded (it is wrapped as a BlackListFilter):

// Drop even ids: return true to exclude.
std::function<bool(int64_t)> drop_even = [](int64_t id) { return id % 2 == 0; };
auto result = index->KnnSearch(query, 10, search_params, drop_even).value();

This is the easiest way to drop in a small amount of custom logic without subclassing. If you prefer the “return true to keep” polarity, use the Filter object instead.

`Filter` Object

The richest API is vsag::Filter (include/vsag/filter.h). Subclass it when the search algorithm can benefit from hints about the predicate:

class MyFilter : public vsag::Filter {
public:
    bool CheckValid(int64_t id) const override {
        return id % 2 == 1;
    }

    // Approximate fraction of ids that pass the predicate. The search uses this to
    // size internal candidate buffers; an accurate estimate improves latency and recall.
    float ValidRatio() const override { return 0.5F; }

    // Hint whether passing ids cluster spatially. NONE means "no correlation"; use
    // RELATED_TO_VECTOR if the predicate correlates with vector position (e.g. region tags).
    Distribution FilterDistribution() const override { return Distribution::NONE; }
};

auto filter = std::make_shared<MyFilter>();
auto result = index->KnnSearch(query, 10, search_params, filter).value();

Important methods:

Method	Default	Purpose
`CheckValid(int64_t id)`	pure virtual	Required. `true` keeps the id.
`CheckValid(const char* data)`	returns `true`	Used for in-graph filtering against the per-vector byte payload; see Extra Info.
`ValidRatio()`	`1.0F`	Hint, in `[0, 1]`, of the fraction of ids that pass.
`FilterDistribution()`	`NONE`	`NONE` or `RELATED_TO_VECTOR`.
`GetValidIds(...)`	empty	Optional whitelist for very selective filters.

Passing the wrong ValidRatio is not a correctness bug, but a poor estimate may either inflate latency (overestimate) or hurt recall (underestimate).

Available Overloads

KnnSearch and RangeSearch both expose four filter shapes (include/vsag/index.h):

// KnnSearch
index->KnnSearch(query, topk, params);                                    // no filter
index->KnnSearch(query, topk, params, BitsetPtr invalid);
index->KnnSearch(query, topk, params, std::function<bool(int64_t)> f);
index->KnnSearch(query, topk, params, FilterPtr filter);

// RangeSearch
index->RangeSearch(query, radius, params, limited_size);                  // no filter
index->RangeSearch(query, radius, params, BitsetPtr invalid, limited_size);
index->RangeSearch(query, radius, params, std::function<bool(int64_t)> f, limited_size);
index->RangeSearch(query, radius, params, FilterPtr filter, limited_size);

limited_size is the maximum number of results returned by RangeSearch:

limited_size < 0: no limit (the default -1).
limited_size == 0: rejected explicitly by the API (CHECK_ARGUMENT(limited_size != 0, ...)); pass -1 for “no limit”.
limited_size > 0: cap the result list at this many entries.

A filtered iterator-style search is also exposed:

vsag::IteratorContext* ctx = nullptr;
index->KnnSearch(query, topk, params, filter, ctx, /*is_last_search=*/false);
// repeat with the same ctx; pass true on the final call to release resources

Index Support Matrix

All index types accept the bitset, function, and FilterPtr overloads — the inner implementation wraps bitsets and lambdas into a FilterPtr automatically. The columns below reflect the capability flags each index registers (see include/vsag/index_features.h), which is what runtime feature checks return.

Index	`_KNN_SEARCH_WITH_ID_FILTER`	`_RANGE_SEARCH_WITH_ID_FILTER`	`_KNN_ITERATOR_FILTER_SEARCH`
HGraph	Yes	Yes	Yes
IVF	Yes	Yes	—
BruteForce	Yes	Yes	—
Pyramid	Yes	Yes	—
SINDI / WARP	Yes	Yes	—

For id-based filtering, query support at runtime via index->CheckFeature(vsag::SUPPORT_KNN_SEARCH_WITH_ID_FILTER), SUPPORT_RANGE_SEARCH_WITH_ID_FILTER, and SUPPORT_KNN_ITERATOR_FILTER_SEARCH. The flag SUPPORT_KNN_SEARCH_WITH_EX_FILTER is unrelated — it covers extra-info (byte-payload) filtering, see Extra Info.

Performance Notes

The more selective the filter (smaller ValidRatio), the more candidates the search has to expand. For graph indexes, increase ef_search proportionally when the filter is very selective; otherwise recall will drop sharply below ~1% selectivity.
HGraph also offers a selectivity-aware brute-force fallback: set brute_force_threshold (e.g. 0.01–0.05) in the search params so that, when Filter::ValidRatio() is small enough, HGraph automatically skips graph traversal and runs an exact scan over the surviving ids. This is often a better choice than chasing recall by raising ef_search to very large values. See the HGraph index page and example 322_feature_hgraph_brute_force_threshold.cpp.
Bitset filters are fastest because Test() is a single bit lookup. A Filter object that performs heavy work in CheckValid will be called many times per query.
For RangeSearch, set a finite limited_size when filters can let through millions of ids — otherwise the result set may grow unbounded.
Filters compose cheaply with Attribute Filter when using SearchRequest: all enabled filters are combined with logical AND.

Combining Filters via `SearchRequest`

SearchRequest (include/vsag/search_request.h) is the unified entry point used by SearchWithRequest. It can carry a bitset filter, a Filter object, and an attribute expression simultaneously; all are ANDed together.

vsag::SearchRequest req;
req.query_                = query;
req.mode_                 = vsag::SearchMode::KNN_SEARCH;
req.topk_                 = 10;
req.params_str_           = R"({ "hgraph": { "ef_search": 200 } })";
req.enable_filter_        = true;
req.filter_               = std::make_shared<MyFilter>();
req.enable_bitset_filter_ = true;
req.bitset_filter_        = invalid;
auto result = index->SearchWithRequest(req).value();

See Attribute Filter for the attribute_filter_str_ field.

Examples

C++: examples/cpp/301_feature_filter.cpp — bitset, function, and Filter-object styles.
C++: examples/cpp/320_feature_extra_info.cpp — in-graph filtering using the CheckValid(const char*) byte-buffer overload.

Python Status

Python bindings for the filter APIs are not yet exposed; the placeholder at examples/python/todo_examples/301_feature_filter.py is intentionally empty. Use the C++ API for filtered search today.

Iterator Search

VSAG supports iterator-based search (also called iterative search): instead of asking for the top-k results in one shot, the caller can request results in successive chunks while VSAG preserves the internal search state between calls. Each subsequent call resumes from where the previous one left off and returns new, non-overlapping results.

This is useful when:

The application implements an external re-ranker or post-filter and wants to keep pulling more candidates until enough survivors are collected.
Result consumption is lazy / streaming (e.g. UI pagination, server-side cursor).
The eventual k is unknown up front and may grow on demand.

How It Works

Iterator search relies on a long-lived IteratorContext object that holds:

the current candidate heap / visited bitmap, and
the cursor into the underlying graph or inverted lists.

The first call creates the context (when the pointer is nullptr); follow-up calls reuse it so the search continues instead of restarting. When the caller is done, the IteratorContext object itself must be deleted by the caller — that is what releases the iterator’s internal state.

The is_last_search flag is optional: when set to true, the index drains the candidates that are still buffered inside the context (the “discard heap”) and returns them as the result of that call. This is useful when the caller wants the long tail of explored-but-not-yet-emitted candidates; if you don’t need them, you can simply skip the final call and delete the context directly. Note that the returned set is still capped to k, so if you want all tail candidates, pass a sufficiently large k on the finalize call.

Basic Usage (`SearchParam` API)

#include <vsag/vsag.h>

// 1. Build an index (HGraph in this example)
auto index = vsag::Factory::CreateIndex("hgraph", hgraph_build_params).value();
index->Build(dataset);

// 2. Prepare query
auto query = vsag::Dataset::Make();
query->NumElements(1)->Dim(dim)->Float32Vectors(query_vec)->Owner(false);

// 3. Configure SearchParam in iterator mode
nlohmann::json search_parameters = {
    {"hgraph", {{"ef_search", 100}}},
};
std::string param_str = search_parameters.dump();

vsag::SearchParam search_param(
    /*iter_filter_flag=*/true,   // enable iterator mode
    param_str,
    /*filter=*/nullptr,
    /*allocator=*/&allocator,
    /*iter_ctx=*/nullptr,        // first call: context is created internally
    /*last_search_flag=*/false);

// 4. First page
auto page1 = index->KnnSearch(query, /*k=*/10, search_param).value();

// 5. Next page — context carries over, results do not overlap with page1
auto page2 = index->KnnSearch(query, /*k=*/10, search_param).value();

// 6. (Optional) drain the candidates still buffered in the context.
//    Skip this call if you don't need the tail candidates; cleanup
//    happens through `delete` below either way.
search_param.is_last_search = true;
auto page3 = index->KnnSearch(query, /*k=*/10, search_param).value();

// 7. The caller owns the context object — this is what releases resources.
delete search_param.iter_ctx;

Reference: examples/cpp/313_feature_search_allocator.cpp and examples/cpp/314_feature_hgraph_search_allocator.cpp.

Alternative: Explicit `IteratorContext` Argument

The lower-level KnnSearch overload accepts the context pointer directly. This is the form used by VSAG’s own tests (tests/test_index/test_index_search.cpp) when calling KnnSearch several times in a row:

vsag::IteratorContext* iter_ctx = nullptr;

auto r1 = index->KnnSearch(query, k1, param_str, filter, iter_ctx, /*is_last_search=*/false);
auto r2 = index->KnnSearch(query, k2, param_str, filter, iter_ctx, /*is_last_search=*/false);
auto r3 = index->KnnSearch(query, k3, param_str, filter, iter_ctx, /*is_last_search=*/false);

delete iter_ctx;

Each call advances iter_ctx; the union of the returned ids is a non-overlapping continuation of the search ordered by distance. Pass is_last_search=true on a trailing call instead if you want the index to also emit the candidates still buffered in the context.

SearchRequest API. SearchRequest declares enable_iterator_search_, p_iter_ctx_, and is_last_search_ fields, but no in-tree SearchWithRequest implementation currently consults them. Until that wiring lands, use one of the two KnnSearch forms above to drive iterator search.

Combining With Filters

Iterator search composes with regular filters (label filter, attribute filter, bitset filter). A common use case is “keep iterating until enough results pass my external check”:

size_t needed = 50;
std::vector<int64_t> kept;
vsag::IteratorContext* ctx = nullptr;

while (kept.size() < needed) {
    auto page = index->KnnSearch(query, 32, param_str, filter, ctx, /*is_last_search=*/false);
    if (!page.has_value() || page.value()->GetDim() == 0) break;

    for (int64_t i = 0; i < page.value()->GetDim(); ++i) {
        if (external_check(page.value()->GetIds()[i])) {
            kept.push_back(page.value()->GetIds()[i]);
        }
    }
}

// Release the iterator state. No `is_last_search=true` call is required —
// add one only if you also want the candidates still buffered in `ctx`.
delete ctx;

The HGraph index supports an additional runtime parameter — skip_ratio — that controls how aggressively the iterator skips already-explored regions during continuation. See examples/cpp/314_feature_hgraph_search_allocator.cpp.

Support Matrix

Indexes that advertise the SUPPORT_KNN_ITERATOR_FILTER_SEARCH feature (queryable via Index::CheckFeature):

Index type	Supports iterator search
hgraph	yes
ivf	no
brute_force	no
sindi	no

Always check index->CheckFeature(vsag::SUPPORT_KNN_ITERATOR_FILTER_SEARCH) at runtime before relying on this capability — coverage may expand in future releases.

Notes and Pitfalls

Ownership. The IteratorContext is owned by the caller. Forgetting to delete it leaks the internal search state (heap, visited bitmap, allocator scratch). Resource release is driven entirely by delete, not by is_last_search.
Optional last call. is_last_search = true is not required for cleanup. Its only effect is to make the index drain the candidates that are still buffered in the context and return them as that call’s result, still capped to k. Use it only when you want those tail candidates, and pick a k large enough not to truncate them.
Parameter stability. Do not change the query vector, distance metric, or filter between calls that share a context — results are only meaningful when the search state is reused for the same logical query.
k per call. The k argument applies to each call individually; the returned chunks are disjoint, so the cumulative result size grows by k (or less if the index is exhausted) each iteration.
Thread safety. A single IteratorContext must not be used concurrently from multiple threads. Different queries should each have their own context.

Attribute filtering — sometimes called hybrid search or filtered ANN with structured predicates — restricts a KnnSearch / RangeSearch to vectors whose structured tags satisfy an SQL-like expression. Compared to the id-based filters in Filtered Search, it lets you express predicates like:

category = "electronics" AND price <= 1000 AND multi_in(tag, "promo|new", "|")

without writing a callback. VSAG builds an attribute inverted index alongside the vector index; the predicate is parsed once and evaluated during graph traversal, so candidates that cannot satisfy the predicate are pruned early.

“Hybrid search” on this page means vector + structured attributes (not a storage-layout hybrid).

When to Use Each Filter API

You want to …	Use
Exclude a known set of ids (e.g. tombstones)	Bitset / function filter
Run user-defined logic over an id	`Filter` object
Filter on opaque per-vector bytes inside the graph	Extra Info
Filter on named, typed fields with AND/OR/IN	This page

All three can be combined inside a single SearchRequest; they are ANDed together.

Index Support

Index	Build with `use_attribute_filter`	`SearchWithRequest` + attribute string	`UpdateAttribute`
HGraph	Yes	Yes	Yes
IVF	Yes	Yes	Yes
BruteForce	Yes	Yes	Yes
WARP (sparse)	Yes	Yes	Yes
SINDI / Pyramid	—	id-based filters only (see Filtered Search)	—

When use_attribute_filter is enabled, BruteForce currently rejects Remove calls (re-add the index to delete entries).

Attribute Data Model

Attributes are defined per vector and grouped into an AttributeSet (include/vsag/attribute.h). Each attribute has:

a name (string),
a value type (AttrValueType enum),
a list of values — every field is multi-valued by design, so IN-style membership works naturally for tag-like fields.

Supported value types:

enum AttrValueType {
    INT8 = 5,  INT16 = 7,  INT32 = 1,  INT64  = 3,
    UINT8 = 6, UINT16 = 8, UINT32 = 2, UINT64 = 4,
    STRING = 9,
};

The schema is auto-discovered from the first build/add: the (name, type) pair seen for each field is locked. Subsequent inserts must match.

Building an `AttributeSet`

auto* category = new vsag::AttributeValue<std::string>();
category->name_ = "category";
category->GetValue() = { "electronics" };

auto* tags = new vsag::AttributeValue<std::string>();
tags->name_ = "tag";
tags->GetValue() = { "promo", "new" };       // multi-valued

auto* price = new vsag::AttributeValue<int32_t>();
price->name_ = "price";
price->GetValue() = { 899 };

vsag::AttributeSet set;
set.attrs_ = { category, tags, price };

Lifetime of the Attribute* entries depends on the Dataset::Owner(...) flag passed to the dataset that carries the AttributeSet:

Owner(true) (the default): DatasetImpl’s destructor will delete each Attribute* and delete[] the AttributeSet array; do not free them yourself.
Owner(false) (used in the example below): the caller retains ownership and must free the Attribute* entries (and the AttributeSet array, if heap-allocated) after Build/Add returns.

Pick one and stick with it for a given dataset to avoid double-free or leaks.

Building an Index with Attribute Support

Set index_param.use_attribute_filter to true and (optionally) tune the attribute-inverted-index parameters under attr_params.

std::string build_params = R"(
{
    "dtype": "float32",
    "metric_type": "l2",
    "dim": 128,
    "index_param": {
        "use_attribute_filter": true,
        "attr_params": {
            "has_buckets": false
        }
    }
}
)";
auto index = vsag::Factory::CreateIndex("hgraph", build_params).value();

has_buckets controls how the inverted index lays out posting lists. Defaults differ by index:

Index	Default `has_buckets`
HGraph	`false`
IVF	`true`
BruteForce	`true`

Leave the defaults unless profiling indicates otherwise.

Attaching Attributes During Build / Add

Dataset::AttributeSets accepts a contiguous array of AttributeSet, one per vector (include/vsag/dataset.h):

std::vector<vsag::AttributeSet> sets(num_vectors);
for (int64_t i = 0; i < num_vectors; ++i) {
    sets[i] = build_attrs_for_row(i);
}

auto base = vsag::Dataset::Make();
base->NumElements(num_vectors)
    ->Dim(dim)
    ->Ids(ids)
    ->Float32Vectors(vectors)
    ->AttributeSets(sets.data())
    ->Owner(false);

index->Build(base);     // or index->Add(base)

Querying with `SearchRequest`

Attribute filtering is only exposed via SearchWithRequest (include/vsag/search_request.h):

vsag::SearchRequest req;
req.query_                    = query;
req.mode_                     = vsag::SearchMode::KNN_SEARCH;
req.topk_                     = 10;
req.params_str_               = R"({ "hgraph": { "ef_search": 200 } })";
req.enable_attribute_filter_  = true;
req.attribute_filter_str_     =
    "category = \"electronics\" AND price <= 1000 "
    "AND multi_in(tag, \"promo|new\", \"|\")";

auto result = index->SearchWithRequest(req).value();
for (int64_t i = 0; i < result->GetDim(); ++i) {
    std::cout << result->GetIds()[i] << " " << result->GetDistances()[i] << "\n";
}

You can simultaneously enable enable_filter_ (with a FilterPtr) and enable_bitset_filter_ (with a BitsetPtr); all enabled filters are combined with AND.

Filter Expression Language

The expression grammar is defined in src/attr/grammar/FC.g4. It is small but covers the common needs of structured filtering.

Logical operators

Form	Aliases
AND	`AND`, `and`, `&&`
OR	`OR`, `or`, `\|\|`
NOT	`!(expr)`
Grouping	`(...)`

NOT is only available in the prefixed form !(...).

Comparison operators

For numeric fields: =, !=, >, <, >=, <=. For string fields: only = and !=.

Numeric comparands may include arithmetic (+, -, *, /):

(price - discount) <= 100

List membership

Two forms are supported. They use the same set of keywords (IN and NOT_IN, with the aliases listed below) but different argument shapes.

Infix bracket form — use this with a literal list:

id IN [1, 2, 3, 4]
category NOT_IN ["electronics", "clothing"]

The list members must be INTEGER literals or double-quoted strings. Single quotes are not accepted by the grammar.

Function pipe form — use this when the candidate values are produced by string concatenation upstream. The second argument must be a single pipe-delimited string literal, and the third (optional) argument is the separator and must be "|":

multi_in(category, "electronics|clothing", "|")
multi_notin(uid, "1961|8669|9090", "|")

Bracket lists are not accepted in the function form (multi_in(field, [...]) is a syntax error). Pipe strings are not accepted in the infix form.

Aliases for both forms: IN / in / MULTI_IN / multi_in, NOT_IN / not_in / NOTIN / notin / MULTI_NOTIN / multi_notin.

A field with multiple values matches the membership predicate if any of its values is contained in the literal list.

Literals

Kind	Examples
Integer	`42`, `-7`
Float	`3.14`, `1.5e-3`
String	`"electronics"`, `"new"` (always double-quoted)
Quoted integer (string)	`"123"` (treated as a string in `multi_in`)

Identifiers match [a-zA-Z_][a-zA-Z0-9_]* and may contain dots (namespace.field is one identifier).

Comments start with # and run to end of line.

Examples

# simple equality
category = "electronics"

# numeric range, multi-valued field
price >= 100 AND price <= 1000 AND tag IN ["promo", "new"]

# negation
!(status = "archived") AND multi_notin(region, "us-east|us-west", "|")

# arithmetic on the left side of the comparison
(end_ts - start_ts) > 3600 AND charge_type = 5

Updating Attributes

Use index->UpdateAttribute(id, new_attrs) (or the overload that also takes the previous attribute set for cheaper inverted-index updates):

vsag::AttributeSet new_attrs = build_new_attrs();
auto status = index->UpdateAttribute(/*id=*/123, new_attrs);

The vector itself is unchanged; only the inverted index is updated. Subsequent searches see the new attribute values immediately.

Performance Notes

The attribute inverted index adds memory roughly proportional to the average number of values per field times the number of vectors. For string fields, the dictionary cost is proportional to the number of distinct values.
Highly selective predicates accelerate search (more candidates pruned early); very unselective predicates approach the cost of unfiltered search plus a constant overhead.
For graph indexes, increase ef_search when predicates are very selective so the search has enough surviving candidates to converge.
Use multi_in / IN instead of long OR chains; the inverted index can resolve list membership in a single pass.

Tests as Reference

The most complete usage sample lives in the test suite:

tests/test_index.cpp — TestIndex::TestWithAttr (build attributes, search via SearchRequest, then UpdateAttribute and re-search).
tests/fixtures/data/vector_generator.cpp — generate_attributes shows how to construct AttributeSet* arrays of mixed types programmatically.
src/attr/expression_visitor_test.cpp — exhaustive grammar coverage; useful as a working reference for the DSL.

Python Status

The attribute / hybrid-search API is currently C++-only. There is no pyvsag binding yet, and the placeholder example at examples/python/todo_examples/301_feature_filter.py is intentionally empty.

Serialization

VSAG indexes can be serialized and deserialized through the existing serialization interfaces, supporting persistence, cross-process sharing, and distributed deployment.

This page describes the existing serialization format used by Serialize and Deserialize. For the header-first streaming format introduced later, see New Serialization. The two formats are not compatible with each other.

Three Interfaces

1. `BinarySet` / `ReaderSet`

The most flexible option. The index is split into named binary segments, and the caller owns the storage medium (object store, KV, sharded uploads, etc.).

// Save
vsag::BinarySet bs = index->Serialize().value();
for (const auto& key : bs.GetKeys()) {
    auto binary = bs.Get(key);
    // Write to storage
}

// Load
vsag::BinarySet bs_loaded;
// Populate bs_loaded by reading each key from storage.
auto empty = vsag::Factory::CreateIndex("hgraph", build_params).value();
empty->Deserialize(bs_loaded);

ReaderSet is similar to BinarySet but uses a user-supplied Reader to read on demand, which avoids loading everything at once. This is useful for memory-constrained or partial-deserialization scenarios.

2. File Streams (`std::ostream` / `std::istream`)

The simplest option: serialize the whole index to a file or memory stream:

std::ofstream out("index.bin", std::ios::binary);
index->Serialize(out);

std::ifstream in("index.bin", std::ios::binary);
empty->Deserialize(in);

3. Custom Write Function (`WriteFuncType`)

For streaming or chunked backends, supply a write callback:

index->Serialize([&](const void* buf, uint64_t offset, uint64_t size) {
    // Write [buf, buf+size) at offset
});

Notes

Deserialize requires an empty target index whose configuration (dim, metric_type, etc.) matches the one used at serialization time.
Serialize/Deserialize keep the existing footer-based format. The new SerializeStreaming format is header-first and must be read with DeserializeStreaming or Load.
When upgrading across major versions, check the compatibility notes in the release notes.
References: examples/cpp/318_feature_tune.cpp, examples/cpp/401_persistent_kv.cpp, and examples/cpp/402_persistent_streaming.cpp.

New Serialization

The new serialization format is designed for large index artifacts and forward-only readers. Its main goals are:

Make the file self-describing from the beginning, so readers can inspect the magic, version, metadata, and block manifest without seeking to a footer.
Split index content into typed TLV blocks, so tools can inspect block sizes and future readers can skip unknown non-critical blocks.
Provide one streaming path for full restoration (DeserializeStreaming) and policy-based loading (Load).
Support debugging and operations tooling through a stable layout that can be visualized.

The new serialization format is not compatible with the previous Serialize/Deserialize format. Files written by SerializeStreaming must be read with DeserializeStreaming or Load; files written by Serialize must be read with Deserialize.

Usage Model

Serialization and deserialization are the persistence and transport path for index artifacts. SerializeStreaming writes a built index into a self-describing file, and DeserializeStreaming restores the complete in-memory index when a caller already knows which index object to create. Index::Load is the serving path: it creates the index from the file metadata and returns an IndexPtr that can be used for search.

Streaming serialization usage model

Streaming Serialization

SerializeStreaming, DeserializeStreaming, and Load write and read a forward-only index file. The format is designed for large index artifacts where the reader should not seek to a footer before it can understand the file layout. It is currently implemented for BruteForce, HGraph, IVF, SINDI, and Pyramid.

auto index = vsag::Factory::CreateIndex("hgraph", build_params).value();
index->Build(base).value();

{
    std::ofstream out("hgraph.streaming", std::ios::binary);
    index->SerializeStreaming(out).value();
}

auto restored = vsag::Factory::CreateIndex("hgraph", build_params).value();
{
    std::ifstream in("hgraph.streaming", std::ios::binary);
    restored->DeserializeStreaming(in).value();
}

vsag::IndexPtr loaded;
{
    std::ifstream in("hgraph.streaming", std::ios::binary);
    loaded = vsag::Index::Load(in, "{}").value();
}

Static Load

Index::Load is the entry point for policy-based loading from the new streaming format. Unlike DeserializeStreaming, callers do not create an empty index first. Load reads the streaming metadata, checks the serialized index type and basic_info["index_param"], creates the matching index internally, and then loads the TLV body blocks according to the load parameters.

std::ifstream in("hgraph.streaming", std::ios::binary);
vsag::LoadParameters load_parameters(R"({"base_io_type":"block_memory_io"})");
auto loaded = vsag::Index::Load(in, load_parameters).value();

The returned value is a ready-to-use IndexPtr, so this is the preferred path for loading an index that will serve search traffic. Load parameters control placement policy for supported blocks. The parameters object can be built from a JSON string and can also carry reader objects with SetReader. Unsupported policies return an error instead of silently falling back. The API currently supports streaming BruteForce, HGraph, IVF, SINDI, and Pyramid indexes. BruteForce supports limited block placement policies. HGraph can bind high_precision_codes to an external reader through precise_reader. IVF, SINDI, and Pyramid currently load all emitted streaming blocks into memory.

File Layout

A streaming file starts with a fixed header and then a sequence of TLV blocks:

magic("vsagstm0")
format_version
metadata_length
metadata_json
metadata_checksum
block_header + block_payload
block_header + block_payload
...
section_end

The metadata JSON stores the index name, basic index information, and a block manifest; build parameters are stored in basic_info["index_param"]. The manifest lists the expected block tags, block versions, and whether a block is critical. Unknown critical blocks fail deserialization; unknown non-critical blocks can be skipped by compatible readers.

TLV Block Version Compatibility

format_version describes the whole streaming file structure, such as the fixed header, metadata layout, and TLV framing. When only one block payload changes in a binary-incompatible way, the format should not bump the global format version. Instead, bump the block_version of that TLV block. For example, if HGraph base_codes cannot be parsed by older readers after a basic_flatten_codes implementation change, the base_codes block version must be increased.

Each independently evolving block must distinguish two kinds of version information:

Current write version: the block_version written by the current code when serializing that block.
Supported read versions: the set or range of block versions that the current code can read for that block.

After a reader reads the TLV header, it checks whether tag + block_version is supported by the current code:

Supported versions are parsed by the matching block reader.
Unsupported critical blocks fail fast, preventing older code from misreading newer bytes.
Unsupported non-critical blocks are skipped with value_len, then reading continues from the next block.

Therefore, when a block is upgraded from v1 to v2, the implementation must not only change the current write version to v2; it must also update that block’s supported read versions. If the new code keeps the v1 reader, supported versions should include both v1 and v2 so v2 code can still load v1 indexes. If v1 is intentionally no longer supported, remove it from the supported versions and make old critical blocks fail explicitly.

The block manifest in metadata lets tools and readers inspect expected block versions before reading the body. During body parsing, the block_version stored in each TLV header remains the authoritative version for that payload.

TLV block version check

BruteForce Blocks

BruteForce writes these streaming blocks in order:

Block	Contents	Required
`attribute_filter`	optional attribute filter index	conditional
`base_codes`	flatten codes used for exhaustive search	yes
`label_table`	external labels and label remap	yes

DeserializeStreaming restores the full in-memory BruteForce index. Load currently requires base_codes to be loaded into memory; reader-based loading for required BruteForce codes is rejected.

HGraph Blocks

HGraph writes these streaming blocks in order:

Block	Contents	Required
`label_table`	external labels, label remap, optional source id table	yes
`base_codes`	base flatten codes used by graph search	yes
`bottom_graph`	bottom-layer graph over all vectors	yes
`high_precision_codes`	precise reorder codes when reorder uses separate codes	conditional
`route_graphs`	all upper route graph layers	yes
`extra_info`	optional extra info payloads	conditional
`attribute_filter`	optional attribute filter index	conditional
`raw_vector`	optional stored raw vectors	conditional

DeserializeStreaming restores the full in-memory index. Load loads HGraph blocks into memory by default. Load parameters can set precise_io_type to override the IO type for precise_codes. If they also provide precise_reader, and that reader size matches the high_precision_codes payload size, Load validates the external reader payload checksum and then binds reorder codes to that reader.

IVF Blocks

IVF writes these streaming blocks in order:

Block	Contents	Required
`ivf_bucket`	bucket datacell payloads for inverted lists	yes
`ivf_partition_strategy`	partition strategy state, such as trained centroids	yes
`label_table`	external labels and label remap	yes
`high_precision_codes`	reorder codes when IVF reorder is enabled	conditional
`attribute_filter`	optional attribute filter index	conditional

DeserializeStreaming restores the full in-memory IVF index. Index::Load can create the IVF index directly from streaming metadata and currently loads all emitted IVF blocks into memory.

SINDI Blocks

SINDI writes these streaming blocks in order:

Block	Contents	Required
`sindi_windows`	sparse term windows and quantization runtime state	yes
`label_table`	external labels and label remap	yes
`sindi_rerank_index`	optional rerank flat index when rerank is enabled	conditional
`sindi_term_id_mapper`	optional term-id remapping table	conditional

DeserializeStreaming restores the full in-memory SINDI index. Index::Load can create the SINDI index directly from streaming metadata and currently loads all emitted SINDI blocks into memory. Immutable SINDI runtime serialization is not supported by this streaming path.

Pyramid Blocks

Pyramid writes these streaming blocks in order:

Block	Contents	Required
`label_table`	external labels and label remap	yes
`base_codes`	base flatten codes used by graph search	yes
`high_precision_codes`	precise reorder codes when reorder is enabled	conditional
`pyramid_hierarchies`	hierarchy names and graph roots	yes

DeserializeStreaming restores the full in-memory Pyramid index. Index::Load can create the Pyramid index directly from streaming metadata and currently loads all emitted Pyramid blocks into memory.

Visualizing a Streaming Index

Build the tool and point it at a streaming index file:

cmake --build build --target visualize_index
build/tools/visualize_index/visualize_index \
  --index_path /tmp/vsag-hgraph-streaming.index \
  --html /tmp/vsag-hgraph-streaming.html

The CLI output includes a raw horizontal layout by real byte proportion and a compact logical-block layout. The HTML output groups related small segments, such as a TLV header and its payload, and shows exact segment details in tables.

See examples/cpp/403_persistent_streaming_load.cpp for runnable examples of streaming serialization and Index::Load.

Memory Management

VSAG uses custom Allocator and Resource objects on its hot paths, allowing users to:

plug in existing in-house memory pools;
measure and cap index memory usage;
route allocations precisely in multi-process or NUMA environments.

Custom Allocator

class MyAllocator : public vsag::Allocator {
public:
    std::string Name() override { return "my_allocator"; }
    void* Allocate(size_t size) override;
    void Deallocate(void* p) override;
    void* Reallocate(void* p, size_t size) override;
    // ...
};

auto allocator = std::make_shared<MyAllocator>();
auto resource = std::make_shared<vsag::Resource>(allocator, /*thread_pool=*/nullptr);
auto engine = vsag::Engine(resource);

auto index = engine.CreateIndex("hgraph", build_params).value();

See examples/cpp/201_custom_allocator.cpp for a full example.

Per-Search Temporary Allocator

KnnSearch / RangeSearch can take a per-call Allocator that lives in a thread-local arena, avoiding contention with the global heap:

vsag::SearchParam search_param;
search_param.allocator = thread_local_allocator.get();
auto result = index->KnnSearch(query, k, search_param);

See examples/cpp/313_feature_search_allocator.cpp and examples/cpp/314_feature_hgraph_search_allocator.cpp.

Estimating and Querying Memory

`EstimateMemory(data_num)`

Index::EstimateMemory(data_num) returns a byte-level estimate of the memory the index will occupy once data_num vectors have been inserted. It is computed from the build parameters (dimension, quantization, max_degree, etc.) without allocating any vector storage, so it is safe to call on an empty index and is the recommended way to size a node before ingest:

if (index->CheckFeature(vsag::SUPPORT_ESTIMATE_MEMORY)) {
    uint64_t estimated = index->EstimateMemory(1'000'000);  // bytes
}

See examples/cpp/308_feature_estimate_memory.cpp for a full run.

`EstimateBuildMemory(num_elements)`

Index::EstimateBuildMemory(num_elements) returns the estimated memory (in bytes) required during the build process for num_elements vectors. Unlike EstimateMemory, which estimates the steady-state size of the final index, this accounts for temporary buffers and intermediate data structures that exist only while Build is running. The peak memory during build is typically higher than the post-build footprint:

uint64_t peak = index->EstimateBuildMemory(1000000);  // bytes

Currently only DiskANN provides a non-trivial implementation; other index types throw an exception by default.

`GetMemoryUsage()`

Index::GetMemoryUsage() returns the current memory footprint of an index in bytes:

uint64_t bytes = index->GetMemoryUsage();

Properties:

Implemented by every index type, but only indexes that advertise vsag::SUPPORT_GET_MEMORY_USAGE via CheckFeature are formally guaranteed to return a meaningful value. HGraph, IVF, BruteForce, Pyramid and WARP set the flag (see src/algorithm/{hgraph,ivf,brute_force,pyramid,warp}.cpp); SINDI implements the call (since the method is pure-virtual on Index) but does not currently set the feature flag, so treat its value as informational only.
Thread-safe; can be polled concurrently with searches.
Latency is on the order of microseconds — suitable for production-grade real-time monitoring loops.
Reports memory attributable to the index itself (vectors, graph, quantizer state). The number is typically smaller than the resident set size observed at the OS level, which also includes allocator overhead, scratch buffers, and any data held outside the index (e.g. user-owned input vectors). For SINDI in particular, call GetMemoryUsage() after the build completes to get a representative value.

See examples/cpp/319_feature_get_memory_usage.cpp for a runnable example, including a helper that compares the interface value with the process resident size.

`GetMemoryUsageDetail()`

Index::GetMemoryUsageDetail() returns a breakdown of the current memory usage by component:

std::unordered_map<std::string, uint64_t> detail = index->GetMemoryUsageDetail();
for (const auto& [component, bytes] : detail) {
    std::cout << component << ": " << bytes << " bytes\n";
}

The returned map keys are component names and values are memory in bytes. This is useful for understanding where the memory is going inside an index.

Currently only HGraph provides a meaningful implementation, returning components such as basic_flatten_codes, bottom_graph, route_graph, neighbors_mutex, pool, label_table, high_precise_codes, extra_infos, and raw_vector. SINDI returns an empty map. Other index types throw an exception by default.

Capability Flags

Flag	Meaning
`vsag::SUPPORT_ESTIMATE_MEMORY`	`EstimateMemory(data_num)` is available.
`vsag::SUPPORT_GET_MEMORY_USAGE`	`GetMemoryUsage()` is available.

Both flags can be checked via index->CheckFeature(...) — see Index Introspection.

Thread Pool

Resource also accepts a user-supplied ThreadPool, which combined with a custom allocator gives full control over parallelism and resource ownership. See examples/cpp/203_custom_thread_pool.cpp.

Notes

A custom allocator must be thread-safe.
The allocator’s lifetime must outlive any index and result object referencing it.
If nothing is configured, VSAG falls back to a default malloc-based allocator.

Per-Search Allocator

VSAG exposes a per-call Allocator hook that is separate from the index’s own allocator, intended for use cases such as:

isolating per-query memory from the index’s long-lived heap;
backing high-concurrency online traffic with a thread-local arena that has no atomic contention with neighbours;
accounting or capping each query’s footprint independently of the index.

The hook is exposed through two surfaces — SearchRequest::search_allocator_ (recommended) and the legacy SearchParam::allocator — but how much of a search actually consumes that allocator depends on the index and the entry point. As of today, only HGraph::SearchWithRequest plumbs search_allocator_ end-to-end (scratch buffers and the result Dataset); the other SearchWithRequest implementations (IVF / BruteForce / WARP) use it for some scratch state but still allocate the result Dataset from the index’s own allocator. See Relationship to the Index’s Allocator below for the per-surface breakdown.

Scope. The allocator hook is currently exposed through KnnSearch (SearchParam overload) and SearchWithRequest. RangeSearch does not have an allocator-bearing overload at this time, and SearchRequest::search_allocator_ is not consulted by the range-search path.

Recommended API — `SearchRequest::search_allocator_`

#include "vsag/search_request.h"

vsag::SearchRequest req;
req.query_ = query;
req.mode_ = vsag::SearchMode::KNN_SEARCH;
req.topk_ = 10;
req.params_str_ = R"({"hgraph":{"ef_search":100}})";
req.search_allocator_ = thread_local_allocator.get();  // optional, may stay nullptr

auto result = index->SearchWithRequest(req).value();

SearchRequest (include/vsag/search_request.h) is the recommended, non-deprecated way to drive a single search call. The search_allocator_ field is optional — when left at nullptr, the index falls back to the allocator that was attached to its owning Resource.

Availability. Index::SearchWithRequest has a default implementation that returns an unsupported error. Only HGraph, IVF, BruteForce and WARP implement it today (src/algorithm/{hgraph,ivf,brute_force,warp}.cpp). For indexes that do not yet override SearchWithRequest (HNSW, DiskANN, SINDI, Pyramid), use the legacy SearchParam path described below.

Legacy API — `SearchParam::allocator` (deprecated)

#include "vsag/search_param.h"

nlohmann::json search_params = {{"hgraph", {{"ef_search", 100}}}};
std::string param_str = search_params.dump();

vsag::SearchParam search_param(/*iter_filter=*/false,
                               param_str,
                               /*filter=*/nullptr,
                               /*allocator=*/thread_local_allocator.get());
auto result = index->KnnSearch(query, /*k=*/10, search_param).value();

SearchParam is documented as deprecated in include/vsag/search_param.h (“Use SearchRequest instead”) and remains only for source compatibility. The wording is currently a doc comment — the struct itself does not carry the C++ [[deprecated]] attribute, so the compiler will not emit deprecation warnings, but new code should still target SearchRequest / SearchWithRequest on indexes that support it. The example examples/cpp/314_feature_hgraph_search_allocator.cpp (HGraph) demonstrates the legacy form.

Result Ownership

The result-Dataset ownership contract depends on which index implements SearchWithRequest:

HGraph is the only index that currently plumbs request.search_allocator_ into create_fast_dataset (see src/algorithm/hgraph.cpp — ctx.alloc = request.search_allocator_). The resulting Dataset is marked Owner(true, allocator) and its destructor will call allocator->Deallocate(...) on ids / distances automatically.
IVF / BruteForce / WARP currently construct the result Dataset via create_fast_dataset(..., allocator_) — i.e. the index’s own allocator (src/algorithm/ivf/ivf.cpp, src/algorithm/bruteforce/bruteforce.cpp; WARP uses the BruteForce implementation in WARP mode). request.search_allocator_ is only consulted for scratch state on those paths today; the result buffers are owned by the index’s allocator. Treat the result Dataset’s lifetime as tied to the index’s allocator on these indexes.

What this means in practice:

Do not manually Deallocate the result buffers. Letting the Dataset go out of scope is enough; double-freeing through both manual Deallocate(...) and the destructor is undefined behaviour.
Whichever allocator owns the result must outlive that result Dataset. For HGraph that is the per-search allocator; for IVF / BruteForce / WARP that is the index allocator (always alive while the index is alive).
examples/cpp/314_feature_hgraph_search_allocator.cpp currently makes the deallocation explicit. That pattern is left over from earlier API iterations; new code that targets the current owner-tracking behaviour should rely on the Dataset destructor instead.

The simplest safe pattern is “one allocator per thread, reset between batches”:

ArenaAllocator arena;       // thread-local, big enough for one batch

for (const auto& q : batch) {
    vsag::SearchRequest req;
    req.query_ = q;
    req.topk_ = topk;
    req.params_str_ = params;
    req.search_allocator_ = &arena;
    auto result = index->SearchWithRequest(req).value();
    consume(result);
    // result Dataset destroyed here; arena frees ids/distances via its Deallocate.
}
arena.reset();              // drops every per-query buffer at once

Relationship to the Index’s Allocator

Surface	Allocator used
Index build, insert, persistent state	`Resource`’s allocator (or default if none was passed).
`HGraph::SearchWithRequest` scratch + result `Dataset`	`search_allocator_` if set, otherwise the `Resource`’s allocator. HGraph is the only index that plumbs `search_allocator_` into the result.
`IVF` / `BruteForce` / `WARP` `SearchWithRequest` result `Dataset`	Always the index’s own allocator (`allocator_`). `search_allocator_` is not consulted for result buffers today.
`IVF` / `BruteForce` / `WARP` `SearchWithRequest` scratch state	Uses `search_allocator_` for some intermediate buffers when set; otherwise the index’s allocator.
`KnnSearch(query, k, SearchParam)` (legacy)	Uses `SearchParam::allocator` if set, on indexes whose `KnnSearch` honors it (e.g. HGraph examples). Otherwise the `Resource` allocator.
`KnnSearch(query, k, parameters_str)`	No per-search allocator hook — uses the `Resource` allocator.
`RangeSearch(...)` (all forms)	Uses the `Resource` allocator; no per-search allocator hook.

Setting a per-search allocator never affects the index’s permanent data structures. It only narrows the lifetime of memory touched by one specific search call, and only to the extent that the index/entry point actually consumes it (see the per-row notes above).

Requirements

The allocator must be thread-safe only if it is shared across threads. A thread-local arena does not need internal synchronization.
The allocator’s lifetime must outlive every result Dataset it produced.
Reallocate(nullptr, size) must behave like Allocate(size). VSAG relies on this contract for its internal containers.

Runnable Examples

examples/cpp/314_feature_hgraph_search_allocator.cpp — HGraph (sq8) + custom allocator.

See also Memory Management for the index-level Allocator / Resource setup, and Filtered Search for combining a per-search allocator with custom filtering in a SearchRequest.

Index Introspection

VSAG indexes expose three families of introspection APIs that let callers discover what an index can do, compute distances against existing vectors, and read back structured information about the built index without re-running a search:

CheckFeature(IndexFeature) — runtime capability discovery.
CalDistanceById(...) — distance from a query to specific stored ids.
GetIndexDetailInfos() / GetDetailDataByName(...) — structured per-index detail data.

These APIs are read-only and safe to call concurrently with search.

Capability Discovery — `CheckFeature`

index->CheckFeature(vsag::SUPPORT_*) returns true when the underlying index implementation advertises the given feature. Use it whenever a code path takes an IndexPtr of unknown concrete type (e.g. user-supplied configuration, polymorphic store):

if (index->CheckFeature(vsag::SUPPORT_ESTIMATE_MEMORY)) {
    uint64_t est = index->EstimateMemory(100'000);
}

if (not index->CheckFeature(vsag::SUPPORT_DELETE_BY_ID)) {
    // Skip / fall back to remove + re-add via a different index.
}

Feature flags cover almost every optional surface in the library: build / add / serialize variants, concurrent combinations, metric types, attribute and extra-info filters, Clone, ExportModel, Tune, and more. See include/vsag/index_features.h for the full enumeration.

A runnable example is available at examples/cpp/307_feature_check_features.cpp.

Distances to Existing Ids — `CalDistanceById`

CalDistanceById computes the distance between a query and one or more vectors that are already stored in the index, without running a search. This is useful for re-ranking, A/B evaluation, ground-truth checks, or computing pairwise distances to a known shortlist.

Two overloads are provided:

// Dense vector indexes (HGraph, BruteForce, IVF)
auto r = index->CalDistanceById(query_ptr, ids, count, /*calculate_precise_distance=*/true);

// Sparse vector indexes (SINDI) — wrap the query in a Dataset
auto query_ds = vsag::Dataset::Make();
query_ds->NumElements(1)->SparseVectors(/* ... */);
auto r = index->CalDistanceById(query_ds, ids, count, /*calculate_precise_distance=*/true);

The result Dataset holds count distances in GetDistances(). A value of -1.0F means the corresponding id was invalid (not present in the index).

`calculate_precise_distance`

The trailing bool argument trades precision for latency:

Value	Behavior
`true` (default)	Use the full-precision vector representation. May incur disk I/O on hybrid memory-disk indexes.
`false`	Use the quantized / approximate representation cached for search. Faster, no I/O.

A runnable example is available at examples/cpp/306_feature_calculate_distance_by_id.cpp.

Detail Data — `GetIndexDetailInfos` / `GetDetailDataByName`

GetIndexDetailInfos() returns a list of IndexDetailInfo records that describe every named piece of structured data the index can expose. Each record carries a name, a description, and a type enum that selects the right typed accessor on DetailData.

Support is index-dependent — there is no dedicated SUPPORT_* flag for these two APIs. The Index base class throws std::runtime_error("Index doesn't support ...") by default (GetIndexDetailInfos and GetDetailDataByName in include/vsag/index.h:658,674); HGraph / IVF / BruteForce / Pyramid / SINDI / WARP implement them through InnerIndexInterface. Always handle the tl::expected error path when calling these APIs.

auto infos = index->GetIndexDetailInfos().value();
for (const auto& info : infos) {
    std::cout << info.name << " : " << info.description << '\n';
}

Once you know which entries are available, call GetDetailDataByName(name, info) to retrieve the typed payload:

vsag::IndexDetailInfo info;
auto detail = index->GetDetailDataByName(vsag::INDEX_DETAIL_NAME_NUM_ELEMENTS, info).value();
int64_t n = detail->GetDataScalarInt64();

detail = index->GetDetailDataByName(vsag::INDEX_DETAIL_NAME_LABEL_TABLE, info).value();
auto table = detail->GetData2DArrayInt64();   // [row][col] int64 matrix

detail = index->GetDetailDataByName(vsag::INDEX_DETAIL_DATA_TYPE, info).value();
std::string dt = detail->GetDataScalarString();

Data Types

info.type selects which accessor on DetailData is valid:

`IndexDetailDataType`	Accessor
`TYPE_SCALAR_INT64`	`GetDataScalarInt64()`
`TYPE_SCALAR_DOUBLE`	`GetDataScalarDouble()`
`TYPE_SCALAR_BOOL`	`GetDataScalarBool()`
`TYPE_SCALAR_STRING`	`GetDataScalarString()`
`TYPE_1DArray_INT64`	`GetData1DArrayInt64()`
`TYPE_2DArray_INT64`	`GetData2DArrayInt64()`

Standard detail names exposed as constants in include/vsag/index_detail_info.h:

Constant	Typical type	Meaning
`INDEX_DETAIL_NAME_NUM_ELEMENTS`	`TYPE_SCALAR_INT64`	Number of vectors currently in the index.
`INDEX_DETAIL_NAME_LABEL_TABLE`	`TYPE_2DArray_INT64`	Per-vector label table (e.g. internal-to-user id mapping).
`INDEX_DETAIL_DATA_TYPE`	`TYPE_SCALAR_STRING`	Underlying vector data type (e.g. `"float32"`).

Individual indexes may expose additional names; iterate GetIndexDetailInfos() to discover them at runtime. A runnable example is available at examples/cpp/317_feature_get_detail_data.cpp.

Notes and Limitations

CheckFeature is constant-time. Prefer it over try / catch around an unsupported call.
CalDistanceById requires the underlying index to retain enough information to recompute the distance. For purely quantized indexes (no raw vectors retained), calculate_precise_distance = true may return the quantized distance instead.
GetIndexDetailInfos and GetDetailDataByName are read-only snapshots. The values returned reflect the index state at the moment of the call; concurrent mutations may invalidate them.

Extensibility

VSAG exposes a small set of stable C++ extension points so applications can plug in their own infrastructure without forking the library. This page summarizes what is extensible and what is not, and links to runnable examples.

Public extension points

Extension point	Header	Purpose
`vsag::Allocator`	`vsag/allocator.h`	Custom memory allocation strategy.
`vsag::Logger`	`vsag/logger.h`	Redirect VSAG logs to your logging stack.
`vsag::ThreadPool`	`vsag/thread_pool.h`	Reuse an external worker pool for builds and IO.
`vsag::Filter`	`vsag/filter.h`	Custom pre-filter for `KnnSearch` / `RangeSearch`.
`vsag::Reader` (+ `ReaderSet`)	`vsag/readerset.h`	Custom IO backend for deserialization.

All five are abstract base classes. Each declares at least one pure-virtual method that you must implement; some also declare non-pure-virtual methods with sensible defaults (for example, Filter::CheckValid(const char*), Filter::ValidRatio(), Filter::FilterDistribution(), Filter::GetValidIds(), and Reader::MultiRead()) that you can override only when you need custom behaviour. Implement the required methods, wrap your instance in a std::shared_ptr (or pass a raw pointer where the API requires it), and hand it to VSAG.

Wiring extensions into an index

There are two main entry points.

1. Per-index resources via `Engine`

vsag::Engine (vsag/engine.h) is the recommended way to bind a custom Allocator and ThreadPool to every index it creates:

auto allocator   = std::make_shared<MyAllocator>();
auto thread_pool = std::make_shared<MyThreadPool>();
vsag::Resource resource(allocator, thread_pool);
vsag::Engine engine(&resource);

auto index = engine.CreateIndex("hgraph", parameters).value();
// ... use index ...
engine.Shutdown();

Engine(Resource*) takes a non-owning pointer — the caller is responsible for keeping the Resource alive for at least as long as the engine and every index it produced (until Shutdown() returns / those indexes are destroyed). The Resource itself owns the Allocator / ThreadPool shared pointers. See Memory Management for the full ownership model, and Per-Search Allocator for scoping an allocator to a single search call.

For quick prototypes, Engine::CreateDefaultAllocator() and Engine::CreateThreadPool(num_threads) return ready-to-use implementations.

2. `Factory::CreateIndex` with a raw allocator

vsag::Factory::CreateIndex(name, params, allocator) (vsag/factory.h) accepts an optional Allocator*. This path does not take a thread pool; new code should prefer Engine.

Filter

Implement vsag::Filter and pass a FilterPtr through SearchRequest::filter_ and set SearchRequest::enable_filter_ = true (the filter is ignored when the flag is off). The legacy SearchParam::filter path remains supported. Only CheckValid(int64_t id) is required; the other hooks are optional optimizations:

CheckValid(const char* data) — filter on per-vector extra info.
ValidRatio() — hint the planner about selectivity.
FilterDistribution() — hint about the spatial distribution of the valid ids: NONE (default) means no hint, RELATED_TO_VECTOR means the valid ids are correlated with vector position. See vsag/filter.h.
GetValidIds(...) — expose a precomputed valid-id list for very selective filters.

Runnable example: examples/cpp/301_feature_filter.cpp. The Filtered Search page describes filter integration in detail.

Reader / ReaderSet

Index::Deserialize(const ReaderSet&) lets you stream an index from any storage backend (local file, object storage, remote FS, …) by providing a Reader per named binary stream. Implement Read, AsyncRead, and Size at minimum; MultiRead is optional and improves throughput when the backend supports batched IO. vsag::Factory::CreateLocalFileReader is a reference implementation for local files.

Runnable example: examples/cpp/102_index_diskann.cpp (DiskANN deserialization uses ReaderSet). See Serialization for the full serialize / deserialize matrix.

Logger

VSAG uses a single global logger configured through the Options singleton:

class MyLogger : public vsag::Logger { /* implement Trace/Debug/Info/... */ };
static MyLogger my_logger;
vsag::Options::Instance().set_logger(&my_logger);

The logger pointer is not owned by VSAG — keep it alive for the duration of any VSAG call. Pass nullptr to fall back to the built-in logger.

Runnable example: examples/cpp/202_custom_logger.cpp.

Global tuning via `Options`

vsag::Options::Instance() (vsag/options.h) is a process-wide singleton for settings that do not belong to a specific index:

Setter	Default	Notes
`set_num_threads_io(n)`	`8`	Threads used for disk-index IO during search. Must be in `[1, 200]`.
`set_num_threads_building(n)`	`4`	Threads used while building disk indexes.
`set_block_size_limit(bytes)`	`128 MiB`	Maximum size of a single allocation block. Must be `≥ 256 KiB` (`src/options.cpp:53-57`).
`set_direct_IO_object_align_bit(bits)`	`9`	Direct-IO alignment, in bits. Must be `≤ 21` (alignment size up to 2 MiB; `src/options.cpp:40-46`).
`set_logger(Logger*)`	built-in	See Logger.

These options affect every index in the process; set them once at startup. They do not override per-index parameters such as HGraph’s build_thread_count.

What is not publicly extensible

VSAG does not currently provide stable public interfaces for the following:

Quantizers. Concrete quantizer types (SQ8, PQ, RaBitQ, …) are selected via index parameters; subclassing them from user code is not supported.
Distance computers / metric types. Distance metrics are fixed to l2, ip, and cosine per index.
DataCell / IO / storage backends inside an index. These are implementation details. Use the Reader interface for custom IO at the deserialization boundary.

If you need one of these, please open an issue describing the use case.

A note on `vsag::ext`

The vsag/vsag_ext.h header defines a thin handle-based API (IndexHandler, DatasetHandler, BitsetHandler, …) intended for language bindings and FFI. It is not a user-facing extension surface; prefer the standard vsag::Index API for C++ applications.

examples/cpp/201_custom_allocator.cpp
examples/cpp/202_custom_logger.cpp
examples/cpp/203_custom_thread_pool.cpp
examples/cpp/301_feature_filter.cpp
examples/cpp/102_index_diskann.cpp

Graph Index Enhancement

Graph-based indexes may see recall drops on “hard queries” — queries that are poorly connected to their true nearest neighbors. VSAG patches these queries online or offline using a conjugate graph, noticeably improving tail recall at almost zero index-size cost.

Enabling the Conjugate Graph

At build time:

{
    "hnsw": {
        "max_degree": 32,
        "ef_construction": 400,
        "use_conjugate_graph": true
    }
}

At search time, toggle it via the use_conjugate_graph_search key in the search-parameter JSON (there is no boolean overload on KnnSearch):

std::string search_param_json = R"({
    "hnsw": {
        "ef_search": 100,
        "use_conjugate_graph_search": true
    }
})";
auto result = index->KnnSearch(query, k, search_param_json);

How It Works

The conjugate graph is built by inverting “failure paths” over the training data on the original graph and then used as additional candidate edges during greedy expansion at search time. It is a lightweight patch on the main graph, typically below 10% of the main graph’s size.

Example

examples/cpp/304_feature_enhance_graph.cpp walks through building, training, and comparing recall end-to-end.

When to Use It

Data distributions with sparse clusters or outliers.
Online services sensitive to P99 recall.
You want a recall boost without rebuilding the index.

Notes

Build time increases slightly when enabled.
Conjugate-graph data is serialized together with the index.
It can be combined with Tune — they target route quality and runtime parameters respectively.

Extra Info

extra_info is a fixed-size, opaque per-vector byte payload stored alongside vectors inside the index. It lets you keep small pieces of non-vector metadata, such as timestamps, category ids, permission tags, or application-specific fields, right next to the vectors, so you can:

Retrieve metadata by vector id without a separate KV store.
Update a vector’s metadata in place without re-inserting the vector.
Filter candidates during search using your own metadata instead of post-filtering results.

The library treats the payload as raw bytes. You fully own its layout, serialization, and interpretation.

Index Support

Supported operations by index:

HGraph: store on Build/Add, GetExtraInfoByIds, UpdateExtraInfo, use_extra_info_filter, and search-result extra info.
LazyHGraph: the same support as HGraph in both phases. The flat phase is served by BruteForce, and after transition the graph phase is served by HGraph.
BruteForce: store on Build/Add, GetExtraInfoByIds, UpdateExtraInfo, use_extra_info_filter, and search-result extra info.
IVF and SINDI: store extra info on Build/Add, but do not expose retrieval, update, extra-info filtering, or search-result extra info.

HGraph, LazyHGraph, and BruteForce advertise the related capability flags when extra_info_size > 0. You can always check at runtime with index->CheckFeature(...).

Enabling Extra Info

Add the top-level integer field extra_info_size to the build parameters. The value is the size in bytes of the payload reserved per vector. Once an index is built, the size is fixed and is serialized together with the index.

{
    "dtype": "float32",
    "metric_type": "l2",
    "dim": 128,
    "extra_info_size": 12,
    "index_param": {
        "base_quantization_type": "sq8",
        "max_degree": 26,
        "ef_construction": 100
    }
}

For LazyHGraph, extra_info_size is still a top-level field; the LazyHGraph-specific parameters stay in the lazy_hgraph object:

{
    "dtype": "float32",
    "metric_type": "l2",
    "dim": 128,
    "extra_info_size": 12,
    "lazy_hgraph": {
        "transition_threshold": 1000,
        "hgraph": {
            "base_quantization_type": "sq8",
            "max_degree": 26,
            "ef_construction": 100
        }
    }
}

If extra_info_size is omitted or set to 0, the feature is disabled.

Providing Extra Info on Build / Add

Use the Dataset builder API to attach the payload. The buffer must be contiguous, with vector i’s payload at byte offset i * extra_info_size.

auto base = vsag::Dataset::Make();
base->NumElements(num_vectors)
    ->Dim(dim)
    ->Ids(ids.data())
    ->Float32Vectors(vectors.data())
    ->ExtraInfos(extra_infos.data())   // num_vectors * extra_info_size bytes
    ->ExtraInfoSize(extra_info_size)   // must match the index's extra_info_size
    ->Owner(false);

index->Build(base);   // or index->Add(base)

ExtraInfoSize must equal the index’s extra_info_size; otherwise the call is rejected.

Retrieving Extra Info

From Search Results

When extra_info_size > 0, supported indexes populate the result Dataset with the matching extra_info bytes for every returned id:

auto result = index->KnnSearch(query, k, search_params).value();
const char* infos = result->GetExtraInfos();
auto info_size = result->GetExtraInfoSize();

Use info_size to compute offsets in the returned buffer.

By Ids (`GetExtraInfoByIds`)

Allocate a count * extra_info_size byte buffer and call:

if (index->CheckFeature(vsag::SUPPORT_GET_EXTRA_INFO_BY_ID)) {
    std::vector<char> out(count * extra_info_size);
    index->GetExtraInfoByIds(ids, count, out.data());
}

If the feature is not enabled, the call returns UNSUPPORTED_INDEX_OPERATION.

Updating Extra Info In Place

Update a single vector’s payload without touching the vector itself:

if (index->CheckFeature(vsag::SUPPORT_UPDATE_EXTRA_INFO_CONCURRENT)) {
    auto upd = vsag::Dataset::Make();
    upd->NumElements(1)
       ->Ids(&id)
       ->ExtraInfos(buffer.data())
       ->ExtraInfoSize(extra_info_size)
       ->Owner(false);
    index->UpdateExtraInfo(upd);
}

The dataset must contain exactly one element and the size must match.

Filtering with Extra Info

Post-filtering can be wasteful when the filter prunes many candidates. HGraph and LazyHGraph can call your filter on each candidate’s extra_info bytes during graph traversal, so disqualified candidates never enter the result set. LazyHGraph also supports the same byte-payload filter before transition, where the flat phase runs an exact scan.

Override the byte-buffer overload of vsag::Filter:

class CategoryFilter : public vsag::Filter {
public:
    CategoryFilter(uint32_t lo, uint32_t hi) : lo_(lo), hi_(hi) {}
    bool CheckValid(int64_t /*id*/) const override { return true; }
    bool CheckValid(const char* data) const override {
        uint32_t category_id;
        std::memcpy(&category_id, data, sizeof(category_id));
        return category_id >= lo_ && category_id <= hi_;
    }
    float ValidRatio() const override { return 0.5F; }
private:
    uint32_t lo_, hi_;
};

Enable use_extra_info_filter inside the hgraph block of the search parameters and pass the filter to KnnSearch:

std::string search_params = R"({
    "hgraph": {
        "ef_search": 100,
        "use_extra_info_filter": true
    }
})";
auto filter = std::make_shared<CategoryFilter>(3, 7);
auto result = index->KnnSearch(query, k, search_params, filter).value();

When use_extra_info_filter is true, the search path calls CheckValid(const char*) instead of CheckValid(int64_t). You can guard with index->CheckFeature(vsag::SUPPORT_KNN_SEARCH_WITH_EX_FILTER).

LazyHGraph Notes

extra_info_size must be configured when the LazyHGraph index is created; it is not nested under lazy_hgraph or hgraph.
Extra info supplied while the index is still in the flat phase is migrated into the internal HGraph during transition.
GetExtraInfoByIds, UpdateExtraInfo, search-result extra info, and use_extra_info_filter work before and after transition.
Serialized LazyHGraph indexes preserve both the current phase and the stored extra info.

Capability Flags

vsag::SUPPORT_GET_EXTRA_INFO_BY_ID: GetExtraInfoByIds is available.
vsag::SUPPORT_UPDATE_EXTRA_INFO_CONCURRENT: UpdateExtraInfo is available and thread-safe.
vsag::SUPPORT_KNN_SEARCH_WITH_EX_FILTER: use_extra_info_filter is available in search.

Notes and Limitations

The payload is opaque bytes; you are responsible for serialization/deserialization. The library only copies by offset.
extra_info_size is fixed at build time and persisted in the serialized index.
Storage cost is extra_info_size * num_elements bytes, accounted into EstimateMemory by indexes that support memory estimation for this storage.
Keep the payload compact because it is read during extra-info filtering.
The feature is currently C++ only; there is no Python binding for extra_info.

Example

A complete, runnable example is available at examples/cpp/320_feature_extra_info.cpp. It demonstrates building an HGraph index with extra_info, retrieval by id, extra-info filtering, and in-place updates.

Index Lifecycle Management

After an index is built, VSAG provides several operations that mutate the index in place or produce a new index derived from it. This page documents the full lifecycle surface:

Remove — delete vectors by id.
UpdateVector / UpdateId — modify an existing vector or rename its id.
Clone — produce a deep copy of an existing index.
ExportModel — extract the trained model as an empty index for reuse.

Each operation is optional and is exposed only when the underlying index advertises the matching capability flag via index->CheckFeature(...).

Capability Flags

Operation	Capability Flag	HGraph	IVF	SINDI
`Remove`	(no dedicated flag — see below)	Yes	—	—
`UpdateVector`	`SUPPORT_UPDATE_VECTOR_CONCURRENT`	Yes	—	Yes
`UpdateId`	`SUPPORT_UPDATE_ID_CONCURRENT`	Yes	—	Yes
`Clone`	`SUPPORT_CLONE`	Yes	Yes	—
`ExportModel`	`SUPPORT_EXPORT_MODEL`	Yes	Yes	—

For the flag-gated operations, check at runtime with index->CheckFeature(vsag::SUPPORT_*) before calling; unsupported indexes return UNSUPPORTED_INDEX_OPERATION. Remove does not currently have a dedicated capability flag — see the next section for how to determine whether your index supports it and which mode it supports.

Removing Vectors

Remove deletes vectors by id. HGraph supports two deletion modes with different requirements:

RemoveMode::MARK_REMOVE (the default) only writes a tombstone via the label table and works regardless of support_force_remove. The id is filtered out of subsequent searches, but the underlying graph node and vector storage are kept.
RemoveMode::FORCE_REMOVE physically rewrites the graph and reclaims the slot. This mode is only available when the index was built with support_force_remove: true in index_param. That flag enables the force-remove path and its extra synchronization; calling FORCE_REMOVE on an index built without support_force_remove: true will fail.

{
    "dtype": "float32",
    "metric_type": "l2",
    "dim": 128,
    "index_param": {
        "base_quantization_type": "sq8",
        "max_degree": 16,
        "ef_construction": 100,
        "support_force_remove": true
    }
}

The JSON snippet above is only required if you intend to use FORCE_REMOVE. For MARK_REMOVE alone you can omit the support_force_remove flag.

{
    "dtype": "float32",
    "metric_type": "l2",
    "dim": 128,
    "index_param": {
        "base_quantization_type": "sq8",
        "max_degree": 16,
        "ef_construction": 100
    }
}

// Single-id and batch overloads are available.
index->Remove(id);
index->Remove(std::vector<int64_t>{id1, id2, id3});

Remove Modes

The optional RemoveMode argument selects the deletion strategy:

Mode	Behavior
`RemoveMode::MARK_REMOVE` (default)	Tombstones the id; fast, no shrink or graph repair. Subsequent searches skip the id. Does not require `support_force_remove: true`.
`RemoveMode::FORCE_REMOVE`	Physically removes the vector and repairs the graph. Heavier. Requires the index to be built with `support_force_remove: true`.

Remove returns the number of ids that were successfully removed. Ids that did not exist are silently skipped and not counted.

A runnable example is available at examples/cpp/303_feature_remove.cpp.

Updating Vectors and Ids

`UpdateVector`

UpdateVector(id, new_base, force_update = false) replaces the vector data of an existing id in place. The default force_update = false mode performs a connectivity check: if the new vector is far from the original (which would degrade graph quality), the update is rejected and the caller is expected to fall back to Remove + Add.

std::vector<float> new_vec(dim);  // populate with the replacement vector
auto upd = vsag::Dataset::Make();
upd->NumElements(1)->Dim(dim)->Ids(&id)->Float32Vectors(new_vec.data())->Owner(false);

auto status = index->UpdateVector(id, upd, /*force_update=*/false);
if (status.has_value() && *status) {
    // updated in place
} else if (status.has_value() && not *status) {
    // rejected: new vector is too far from the old one — fall back to remove + add
    index->Remove(id);
    index->Add(upd);
}

Setting force_update = true skips the check and always applies the update; use with caution as it may degrade recall.

`UpdateId`

UpdateId(old_id, new_id) renames an existing id without touching the underlying vector. Returns true on success, false if old_id was not found or new_id already exists.

index->UpdateId(123, 456);

A runnable example combining UpdateVector, Remove, and Add is available at examples/cpp/305_feature_update.cpp.

Cloning an Index

Clone() produces a deep copy of the entire index — vectors, graph, quantizer state, and metadata — as an independent IndexPtr. The clone can be searched, mutated, or serialized independently of the source.

auto cloned = index->Clone().value();

// Both indexes return identical search results immediately after cloning.
auto r1 = index->KnnSearch(query, k, params).value();
auto r2 = cloned->KnnSearch(query, k, params).value();

Clone optionally accepts a custom Allocator so that the cloned index uses a different memory region than the source — useful for handing an index off to a thread or component that owns its own allocator. See Memory Management for allocator details.

A runnable example is available at examples/cpp/309_feature_clone.cpp.

Exporting the Trained Model

ExportModel() returns an empty index that retains all trained state (quantization codebooks, centroids, hyperparameters) of the source but contains no vectors. It is the canonical way to share a pre-trained model across shards, processes, or hosts without re-running training.

auto exported = index->ExportModel();
if (not exported.has_value()) {
    // index does not support ExportModel — handle the error
    return;
}
auto model = *exported;

// Populate the empty model with a new (potentially different) vector set.
for (int64_t i = 0; i < num_vectors; ++i) {
    auto one = vsag::Dataset::Make();
    one->NumElements(1)->Dim(dim)->Ids(ids + i)
       ->Float32Vectors(vectors + i * dim)->Owner(false);
    model->Add(one);
}

The returned index behaves identically to one freshly created via Factory::CreateIndex(...) and trained on the source data — only the per-vector storage is empty. This pattern is particularly useful for IVF-style indexes where training (k-means on centroids) is the dominant cost.

A runnable example is available at examples/cpp/310_feature_export_model.cpp.

Notes and Limitations

Remove, UpdateVector, and UpdateId are concurrent-safe on HGraph when the matching *_CONCURRENT capability flag is set. The flag set also gates safe combinations with concurrent search and add (e.g. SUPPORT_ADD_SEARCH_DELETE_CONCURRENT).
MARK_REMOVE does not free memory; use FORCE_REMOVE or rebuild periodically if you need to reclaim space.
Clone cost scales linearly with index size. For large indexes prefer serialization + deserialization with a dedicated reader if you only need a snapshot on disk.
ExportModel preserves training but not any inserted vectors. The exported model can be freely serialized and shipped before any vectors are added.

API Reference

This chapter is a curated reference for VSAG’s public C++ API — the headers installed under include/vsag/. It documents the classes, structs, enums, and free functions an application links against, grouped by responsibility. The installed headers remain the authoritative source of truth; the pages here explain intent, ownership, and how the pieces fit together.

Looking for how to configure an index (the JSON index_param / search keys)? That is covered in Index Parameters and each index page. This chapter covers the code surface (types and methods), not the JSON schema.

Include and namespace

A single umbrella header pulls in the whole public API, and every symbol lives in the vsag namespace:

#include <vsag/vsag.h>   // includes factory.h, index.h, dataset.h, engine.h, ...

int main() {
    vsag::init();                       // one-time process initialization
    std::string ver = vsag::version();  // git-derived version string
}

Free function	Header	Description
`bool vsag::init()`	`vsag/vsag.h`	Initializes the library. Call once before other APIs. Always returns `true`.
`std::string vsag::version()`	`vsag/vsag.h`	Returns the build version derived from the git revision.

Error-handling model

Almost every fallible call returns tl::expected<T, Error> (a std::expected-style type shipped in vsag/expected.hpp) instead of throwing. A handful of legacy statistics accessors still throw std::runtime_error when unsupported; those are called out on the Index page.

auto result = vsag::Factory::CreateIndex("hgraph", params);
if (not result.has_value()) {
    const vsag::Error& err = result.error();
    std::cerr << "create failed: " << static_cast<int>(err.type) << " " << err.message << "\n";
    return;
}
std::shared_ptr<vsag::Index> index = result.value();

Error carries a machine-readable type and a human-readable message:

struct Error {
    ErrorType type;
    std::string message;
};

`ErrorType`

Defined in vsag/errors.h. Values start at 1 (0 is reserved).

Category	Value	Meaning
Common	`UNKNOWN_ERROR`	Unknown error.
Common	`INTERNAL_ERROR`	Internal algorithm error.
Common	`INVALID_ARGUMENT`	An argument was invalid.
Behavior	`WRONG_STATUS`	Index is in the wrong state for the call.
Behavior	`BUILD_TWICE`	The index was already built and cannot be built again.
Behavior	`INDEX_NOT_EMPTY`	Deserializing onto a non-empty index.
Behavior	`UNSUPPORTED_INDEX`	Requested an index type that does not exist.
Behavior	`UNSUPPORTED_INDEX_OPERATION`	This index does not implement the called method.
Behavior	`DIMENSION_NOT_EQUAL`	Request dimension differs from the index dimension.
Behavior	`INDEX_EMPTY`	Index is empty; cannot search or serialize.
Runtime	`NO_ENOUGH_MEMORY`	Memory allocation failed.
Runtime	`READ_ERROR`	Failed to read from a binary.
Runtime	`MISSING_FILE`	A required file is missing (e.g. DiskANN deserialization).
Runtime	`INVALID_BINARY`	Serialized binary content is invalid.

Because most index methods are virtual with a default body that returns UNSUPPORTED_INDEX_OPERATION, an “unsupported” result is normal and expected: it means the concrete index does not implement that optional capability. Use Index::CheckFeature to probe support ahead of time.

Header map

Header	Primary symbols	Reference page
`factory.h`, `engine.h`, `vsag.h`	`Factory`, `Engine`, `init`, `version`	Factory & Engine
`index.h`	`Index`, `IndexType`, `RemoveMode`, `MergeUnit`	Index
`dataset.h`	`Dataset`, `SparseVector`, `MultiVector`	Dataset
`search_request.h`, `filter.h`, `bitset.h`, `search_param.h`, `iterator_context.h`	`SearchRequest`, `Filter`, `Bitset`	Search Request & Filters
`binaryset.h`, `readerset.h`	`BinarySet`, `Binary`, `Reader`, `ReaderSet`	Serialization Types
`resource.h`, `allocator.h`, `thread_pool.h`, `options.h`, `logger.h`	`Resource`, `Allocator`, `ThreadPool`, `Options`, `Logger`	Resource Management
`attribute.h`, `index_features.h`, `index_detail_info.h`, `utils.h`, `constants.h`	`Attribute`, `IndexFeature`, `IndexDetailInfo`	Auxiliary Types

In this chapter

Factory & Engine — create indexes and readers; own resources with Engine.
Index — the core index interface: build, search, update, serialize, inspect.
Dataset — the builder-pattern container for vectors, ids, and metadata.
Search Request & Filters — SearchRequest, Filter, Bitset, iterator context.
Serialization Types — BinarySet / Binary and Reader / ReaderSet.
Resource Management — allocator, thread pool, engine resources, options, logger.
Auxiliary Types — attributes, feature flags, index detail info, and utility helpers.

Factory & Engine

Every VSAG workflow begins by obtaining an Index. There are two entry points:

Factory — the simplest way to create an index or a file reader. It uses a default (or caller-supplied) allocator and manages resources internally.
Engine — an explicit owner of shared resources (allocator + thread pool). Use it when you want several indexes to share one memory allocator / thread pool, or when you need deterministic control over resource lifetime.

This page also documents the process-level initialization helpers and the top-level helper functions for parameter generation and validation.

Library initialization

#include <vsag/vsag.h>

vsag::init();                       // call once, before any other API
std::string ver = vsag::version();  // e.g. the git-derived build string

Function	Signature	Notes
`vsag::init`	`bool init()`	One-time process initialization. Returns `true`.
`vsag::version`	`std::string version()`	Build version derived from the git revision.

Factory

Declared in vsag/factory.h. Factory is a stateless utility class with only static methods; it cannot be instantiated.

`CreateIndex`

static tl::expected<std::shared_ptr<Index>, Error>
CreateIndex(const std::string& name,
            const std::string& parameters,
            Allocator* allocator = nullptr);

Creates an index of the given type.

Parameter	Description
`name`	Index type name, e.g. `"hgraph"`, `"ivf"`, `"diskann"`, `"brute_force"`, `"sindi"`, `"pyramid"`.
`parameters`	A JSON string describing the index configuration (dtype, dim, metric, index-specific keys). See Index Parameters.
`allocator`	Optional custom `Allocator`. When `nullptr`, VSAG uses a built-in default allocator. The caller must keep the allocator alive for the whole lifetime of the returned index.

Returns a std::shared_ptr<Index> on success, or an Error (typically UNSUPPORTED_INDEX for an unknown name, or INVALID_ARGUMENT for malformed parameters).

auto index = vsag::Factory::CreateIndex("hgraph", R"(
{
    "dtype": "float32",
    "metric_type": "l2",
    "dim": 128,
    "index_param": { "base_quantization_type": "sq8" }
})");
if (not index.has_value()) {
    std::cerr << index.error().message << std::endl;
    return;
}
std::shared_ptr<vsag::Index> hgraph = index.value();

`CreateLocalFileReader`

static std::shared_ptr<Reader>
CreateLocalFileReader(const std::string& filename, int64_t base_offset, int64_t size);

Creates a Reader that reads a window of a local file starting at base_offset for size bytes. This is most often used to build a ReaderSet for streaming deserialization of on-disk indexes. Unlike the methods above, it returns a plain std::shared_ptr (there is no fallible Error channel).

Engine

Declared in vsag/engine.h. An Engine binds a Resource (allocator + thread pool) and lets you create indexes that share it. The engine never takes ownership of a Resource* passed to it; you control its lifetime.

vsag::Resource resource(vsag::Engine::CreateDefaultAllocator().get(), nullptr);
vsag::Engine engine(&resource);

auto index = engine.CreateIndex("hgraph", params);
// ... use index ...

engine.Shutdown();   // release engine-held state; warns on dangling references

Constructor & lifecycle

Member	Signature	Description
Constructor	`explicit Engine(Resource* resource)`	Binds an externally-owned `Resource`. The `Resource` is not managed by the engine.
`Shutdown`	`void Shutdown()`	Gracefully tears down engine-held state. Warns if external references to engine resources still exist, guarding against dangling references.

`CreateIndex`

[[nodiscard]] tl::expected<std::shared_ptr<Index>, Error>
CreateIndex(const std::string& name, const std::string& parameters);

Same semantics as Factory::CreateIndex, except the index is created against the engine’s shared Resource (allocator and thread pool) instead of a per-call allocator.

Static resource helpers

Member	Signature	Description
`CreateDefaultAllocator`	`static std::shared_ptr<Allocator> CreateDefaultAllocator()`	Creates VSAG’s built-in allocator. Returns an empty pointer on failure — check for null.
`CreateThreadPool`	`static tl::expected<std::shared_ptr<ThreadPool>, Error> CreateThreadPool(uint32_t num_threads)`	Creates a thread pool with `num_threads` workers. Returns an `Error` for an invalid count.

See Resource Management for how Resource, Allocator, and ThreadPool fit together, and examples/cpp/201_custom_allocator.cpp / 203_custom_thread_pool.cpp for runnable samples.

Top-level helper functions

These free functions (declared in vsag/index.h) help you generate and validate configuration strings before creating an index. All return tl::expected<..., Error>.

`generate_build_parameters`

tl::expected<std::string, Error>
generate_build_parameters(std::string metric_type,
                          int64_t num_elements,
                          int64_t dim,
                          bool use_conjugate_graph = false);

(Experimental.) Produces a suggested build-parameter JSON string from the dataset shape (metric_type, num_elements, dim). Pass use_conjugate_graph = true to enable conjugate-graph enhancement.

`estimate_search_time`

tl::expected<float, Error>
estimate_search_time(const std::string& index_name,
                     int64_t data_num,
                     int64_t data_dim,
                     const std::string& parameters);

Estimates the per-query search time (in milliseconds) for the given index type and configuration.

`check_diskann_hnsw_build_parameters` / `check_diskann_hnsw_search_parameters`

tl::expected<bool, Error>
check_diskann_hnsw_build_parameters(const std::string& json_string);

tl::expected<bool, Error>
check_diskann_hnsw_search_parameters(const std::string& json_string);

Validate DiskANN/HNSW build and search parameter JSON respectively. On success the value is true; on failure the Error message explains what is wrong. See the Compatibility Check Tool for a CLI wrapper around this kind of validation.

Index

vsag::Index (declared in vsag/index.h) is the central abstraction of the library. Every concrete index type — HGraph, IVF, DiskANN, BruteForce, SINDI, Pyramid, and so on — implements this interface. You never instantiate Index directly; obtain one from Factory::CreateIndex or Engine::CreateIndex and hold it through IndexPtr (std::shared_ptr<Index>).

using IndexPtr = std::shared_ptr<Index>;

How to read this reference

Index exposes many optional capabilities. The base class provides a default implementation for almost every method:

Most methods return tl::unexpected(Error(ErrorType::UNSUPPORTED_INDEX_OPERATION, ...)) when the concrete index does not implement them.
A handful of statistics accessors instead throw std::runtime_error (called out explicitly below). Wrap those in try/catch if you call them on an index that may not support them.

Because “unsupported” is a normal, expected outcome, probe capabilities up front with CheckFeature rather than assuming a method works. Methods marked (pure virtual) must be implemented by every index and are always safe to call.

Pointer/handle types used throughout this page: DatasetPtr (Dataset), FilterPtr (Filter), BitsetPtr (Bitset), BinarySet / ReaderSet (Serialization Types).

Enumerations and helper types

`IndexType`

enum class IndexType {
    HNSW, DISKANN, HGRAPH, IVF, PYRAMID, BRUTEFORCE, SPARSE, SINDI, WARP, LAZY_HGRAPH, SIMQ
};

Returned by GetIndexType.

`RemoveMode`

enum class RemoveMode {
    MARK_REMOVE = 0,   // mark as deleted; no shrink/repair — fast
    FORCE_REMOVE = 1,  // physically remove and repair the graph — heavy
};

Passed to Remove.

`MergeUnit` and `IdMapFunction`

using IdMapFunction = std::function<std::tuple<bool, int64_t>(int64_t)>;

struct MergeUnit {
    IndexPtr index = nullptr;         // source sub-index to merge from
    IdMapFunction id_map_func = nullptr;  // per-id filter + remap
};

For each source id, id_map_func returns {keep, new_id}: keep == true includes the vector under target id new_id. Used by Merge.

`Checkpoint`

struct Index::Checkpoint {
    BinarySet data;       // intermediate state
    bool finish = false;  // true once the build is complete
};

Returned by ContinueBuild to drive incremental builds.

Data-selection flags

Bit flags for GetDataByIdsWithFlag, combined with bitwise OR:

Macro	Value	Selects
`DATA_FLAG_FLOAT32_VECTOR`	`0x01`	float32 vectors
`DATA_FLAG_INT8_VECTOR`	`0x02`	int8 vectors
`DATA_FLAG_SPARSE_VECTOR`	`0x04`	sparse vectors
`DATA_FLAG_EXTRA_INFO`	`0x10`	extra info blobs
`DATA_FLAG_ATTRIBUTE`	`0x20`	attributes
`DATA_FLAG_ID`	`0x40`	ids

`WriteFuncType`

using OffsetType = uint64_t;
using SizeType = uint64_t;
using WriteFuncType = std::function<void(OffsetType, SizeType, const void*)>;

A sink callback for streaming Serialize. Each call asks you to persist SizeType bytes (at the given source pointer) at logical OffsetType in the output.

Build & train

Method	Signature	Notes
`Build`	`tl::expected<std::vector<int64_t>, Error> Build(const DatasetPtr& base)`	(pure virtual) Builds the index from all vectors. Returns the ids that failed to insert.
`Train`	`tl::expected<void, Error> Train(const DatasetPtr& data)`	Trains an index (e.g. IVF centroids, quantizer) without inserting.
`Tune`	`tl::expected<bool, Error> Tune(const std::string& parameters, bool disable_future_tuning = false)`	Applies runtime tuning. See Optimizer (Tune).
`ContinueBuild`	`tl::expected<Checkpoint, Error> ContinueBuild(const DatasetPtr& base, const BinarySet& binary_set)`	Adds dynamism to indexes that cannot insert incrementally; drive it with the returned `Checkpoint`.
`Add`	`tl::expected<std::vector<int64_t>, Error> Add(const DatasetPtr& base)`	Inserts new vectors into an already-built index. Returns ids that failed to insert.

See Build and Train and examples/cpp/311_feature_train.cpp.

Update & remove

Method	Signature	Notes
`Remove`	`tl::expected<uint32_t, Error> Remove(const std::vector<int64_t>& ids, RemoveMode mode = RemoveMode::MARK_REMOVE)`	Removes many ids; returns the count removed.
`Remove`	`tl::expected<uint32_t, Error> Remove(int64_t id, RemoveMode mode = RemoveMode::MARK_REMOVE)`	Single-id convenience overload.
`UpdateId`	`tl::expected<bool, Error> UpdateId(int64_t old_id, int64_t new_id)`	Relabels a base point.
`UpdateVector`	`tl::expected<bool, Error> UpdateVector(int64_t id, const DatasetPtr& new_base, bool force_update = false)`	Replaces the vector for `id`. `force_update = false` performs a connectivity check.
`UpdateExtraInfo`	`tl::expected<bool, Error> UpdateExtraInfo(const DatasetPtr& new_base)`	Updates stored extra-info blobs.
`UpdateAttribute`	`tl::expected<void, Error> UpdateAttribute(int64_t id, const AttributeSet& new_attrs)`	Replaces attributes of `id`.
`UpdateAttribute`	`tl::expected<void, Error> UpdateAttribute(int64_t id, const AttributeSet& new_attrs, const AttributeSet& origin_attrs)`	Same, but supplies the previous attributes for a faster in-place update.

See examples/cpp/303_feature_remove.cpp.

Search

The recommended entry point is SearchWithRequest, which takes a single SearchRequest carrying the query, mode, top-k / radius, and any filters. The older per-argument KnnSearch / RangeSearch overloads remain for compatibility.

Every search returns a DatasetPtr: for KNN, num_elements == 1 and ids / distances have length k; for range search, the result length is the number of matches. See Dataset for how to read results.

`SearchWithRequest`

[[nodiscard]] tl::expected<DatasetPtr, Error>
SearchWithRequest(const SearchRequest& request) const;

Unified KNN or range search driven by SearchRequest. This is the preferred API for new code; it supports attribute filters, callback filters, bitset filters, a per-search allocator, and iterator search through one struct.

`KnnSearch` overloads

// (1) bitset pre-filter — pure virtual
tl::expected<DatasetPtr, Error>
KnnSearch(const DatasetPtr& query, int64_t k, const std::string& parameters,
          BitsetPtr invalid = nullptr) const;

// (2) callback pre-filter — pure virtual
tl::expected<DatasetPtr, Error>
KnnSearch(const DatasetPtr& query, int64_t k, const std::string& parameters,
          const std::function<bool(int64_t)>& filter) const;

// (3) Filter object
tl::expected<DatasetPtr, Error>
KnnSearch(const DatasetPtr& query, int64_t k, const std::string& parameters,
          const FilterPtr& filter) const;

// (4) Filter + iterator context
tl::expected<DatasetPtr, Error>
KnnSearch(const DatasetPtr& query, int64_t k, const std::string& parameters,
          const FilterPtr& filter, IteratorContext*& iter_ctx, bool is_last_search) const;

// (5) SearchParam — [[deprecated]], use SearchWithRequest
tl::expected<DatasetPtr, Error>
KnnSearch(const DatasetPtr& query, int64_t k, SearchParam& search_param) const;

Notes on the filter argument:

In overloads (1)/(2) the predicate/bitset marks vectors filtered out. For a bitset, Test(id) == true means id is excluded. For the std::function predicate, returning true means the id is excluded.
Overload (3)/(4) take a Filter object, whose CheckValid(id) uses the opposite convention (true means keep). See Filtered Search for the full semantics, and examples/cpp/301_feature_filter.cpp.
Overload (4) powers Iterator Search; pass the same iter_ctx across calls and set is_last_search on the final call.

`RangeSearch` overloads

// (1) plain — pure virtual
tl::expected<DatasetPtr, Error>
RangeSearch(const DatasetPtr& query, float radius, const std::string& parameters,
            int64_t limited_size = -1) const;

// (2) bitset pre-filter — pure virtual
tl::expected<DatasetPtr, Error>
RangeSearch(const DatasetPtr& query, float radius, const std::string& parameters,
            BitsetPtr invalid, int64_t limited_size = -1) const;

// (3) callback pre-filter — pure virtual
tl::expected<DatasetPtr, Error>
RangeSearch(const DatasetPtr& query, float radius, const std::string& parameters,
            const std::function<bool(int64_t)>& filter, int64_t limited_size = -1) const;

// (4) Filter object
tl::expected<DatasetPtr, Error>
RangeSearch(const DatasetPtr& query, float radius, const std::string& parameters,
            const FilterPtr& filter, int64_t limited_size = -1) const;

radius bounds the distance; limited_size caps the result count (<= 0 means no limit, 0 is an error). See Range Search and examples/cpp/302_feature_range_search.cpp.

Distance by id

Method	Signature	Notes
`CalcDistanceById`	`tl::expected<float, Error> CalcDistanceById(const float* vector, int64_t id, bool calculate_precise_distance = true) const`	Distance from a dense query to the stored vector `id`.
`CalcDistanceById`	`tl::expected<float, Error> CalcDistanceById(const DatasetPtr& vector, int64_t id, bool calculate_precise_distance = true) const`	Same, accepting a `DatasetPtr` (works for sparse indexes such as SINDI).
`CalDistanceById`	`tl::expected<DatasetPtr, Error> CalDistanceById(const float* query, const int64_t* ids, int64_t count, bool calculate_precise_distance = true) const`	Batch variant; `-1` in the result marks an invalid distance.
`CalDistanceById`	`tl::expected<DatasetPtr, Error> CalDistanceById(const DatasetPtr& query, const int64_t* ids, int64_t count, bool calculate_precise_distance = true) const`	Batch variant accepting a `DatasetPtr` query.

calculate_precise_distance = true may load full-precision vectors (possibly from disk) instead of quantized codes. See Calculate Distance by ID and examples/cpp/306_feature_calculate_distance_by_id.cpp.

Conjugate-graph enhancement

Method	Signature	Notes
`Pretrain`	`tl::expected<uint32_t, Error> Pretrain(const std::vector<int64_t>& base_tag_ids, uint32_t k, const std::string& parameters)`	Enhances chosen base vectors by searching generated queries. Returns successful insertions.
`Feedback`	`tl::expected<uint32_t, Error> Feedback(const DatasetPtr& query, int64_t k, const std::string& parameters, int64_t global_optimum_tag_id = INT64_MAX)`	Feeds a known optimum back into the conjugate graph.

See Graph Enhancement.

Data retrieval

Method	Signature	Notes
`GetMinAndMaxId`	`tl::expected<std::pair<int64_t, int64_t>, Error> GetMinAndMaxId() const`	Smallest and largest ids in the index.
`GetExtraInfoByIds`	`tl::expected<void, Error> GetExtraInfoByIds(const int64_t* ids, int64_t count, char* extra_infos) const`	Copies extra-info blobs for `ids` into a caller-provided buffer.
`GetRawVectorByIds`	`tl::expected<DatasetPtr, Error> GetRawVectorByIds(const int64_t* ids, int64_t count, Allocator* specified_allocator = nullptr) const`	Returns stored vectors. Values are close to the originals but not guaranteed bit-identical (quantization/precision).
`GetDataByIds`	`tl::expected<DatasetPtr, Error> GetDataByIds(const int64_t* ids, int64_t count) const`	Returns all stored data (vectors, attributes, extra info) for `ids`.
`GetDataByIdsWithFlag`	`tl::expected<DatasetPtr, Error> GetDataByIdsWithFlag(const int64_t* ids, int64_t count, uint64_t selected_data_flag) const`	Like `GetDataByIds` but selects fields via `DATA_FLAG_*`.
`GetIndexDetailInfos`	`tl::expected<std::vector<IndexDetailInfo>, Error> GetIndexDetailInfos() const`	Lists the introspectable detail fields. See `IndexDetailInfo`.
`GetDetailDataByName`	`tl::expected<DetailDataPtr, Error> GetDetailDataByName(const std::string& name, IndexDetailInfo& info) const`	Fetches one detail-data payload by name.

See Index Introspection and examples/cpp/317_feature_get_detail_data.cpp.

Capabilities, merge, clone, export

Method	Signature	Notes
`CheckFeature`	`bool CheckFeature(IndexFeature feature) const`	Probes whether an optional capability is supported. See `IndexFeature`.
`Merge`	`tl::expected<void, Error> Merge(const std::vector<MergeUnit>& merge_units)`	Merges same-type sub-indexes with id remapping. See `MergeUnit`.
`Clone`	`tl::expected<IndexPtr, Error> Clone(const std::shared_ptr<Allocator>& allocator = nullptr) const`	Deep-copies the index.
`ExportModel`	`tl::expected<IndexPtr, Error> ExportModel() const`	Returns an empty index carrying only the trained model.
`ExportIDs`	`tl::expected<DatasetPtr, Error> ExportIDs() const`	Returns all ids as a dataset.
`SetImmutable`	`tl::expected<void, Error> SetImmutable()`	Freezes the index; further add/delete is rejected.

See examples/cpp/309_feature_clone.cpp, 310_feature_export_model.cpp, and 315_feature_hgraph_merge.cpp, plus Index Lifecycle Management.

Serialization

Method	Signature	Notes
`Serialize`	`tl::expected<BinarySet, Error> Serialize() const`	(pure virtual) Serializes to an in-memory `BinarySet`.
`Serialize`	`tl::expected<void, Error> Serialize(WriteFuncType write_func) const`	Streams the serialized index through a `WriteFuncType` sink.
`Serialize`	`tl::expected<void, Error> Serialize(std::ostream& out_stream)`	Serializes to an open output stream.
`Deserialize`	`tl::expected<void, Error> Deserialize(const BinarySet& binary_set)`	(pure virtual) Restores from a `BinarySet`. Fails if the index is not empty.
`Deserialize`	`tl::expected<void, Error> Deserialize(const ReaderSet& reader_set)`	(pure virtual) Restores from a `ReaderSet` (e.g. on-disk readers).
`Deserialize`	`tl::expected<void, Error> Deserialize(std::istream& in_stream)`	Restores from an open input stream.

Deserializing onto a non-empty index yields INDEX_NOT_EMPTY. See Serialization and examples/cpp/401_persistent_kv.cpp / 402_persistent_streaming.cpp.

Cache (build acceleration)

Method	Signature	Notes
`ExportCache`	`tl::expected<void, Error> ExportCache(std::ostream& out_stream) const`	Writes a build-time cache (e.g. graph neighbors) that can accelerate a later `Build`.
`ImportCache`	`tl::expected<void, Error> ImportCache(std::istream& in_stream)`	Loads a previously exported cache; the next `Build` reuses it.

Statistics & introspection

Unless noted, these return values directly. The methods marked “throws” raise std::runtime_error (not tl::expected) when the index does not support them.

Method	Signature	Notes
`GetIndexType`	`IndexType GetIndexType() const`	Throws if unsupported.
`GetNumElements`	`int64_t GetNumElements() const`	(pure virtual) Live element count.
`GetNumberRemoved`	`int64_t GetNumberRemoved() const`	Throws if unsupported. Count of removed elements.
`GetMemoryUsage`	`int64_t GetMemoryUsage() const`	(pure virtual) Bytes occupied by the index.
`GetMemoryUsageDetail`	`std::string GetMemoryUsageDetail() const`	Throws if unsupported. Per-component memory as JSON.
`EstimateMemory`	`uint64_t EstimateMemory(uint64_t num_elements) const`	Throws if unsupported. Estimated bytes for `num_elements`.
`GetEstimateBuildMemory`	`int64_t GetEstimateBuildMemory(int64_t num_elements) const`	Throws if unsupported. Estimated peak build memory.
`GetStats`	`std::string GetStats() const`	Throws if unsupported. Runtime statistics as JSON.
`AnalyzeIndexBySearch`	`std::string AnalyzeIndexBySearch(const SearchRequest& request)`	Throws if unsupported. Analysis JSON for a probe search.
`CheckIdExist`	`bool CheckIdExist(int64_t id) const`	Throws if unsupported. Whether `id` is present.

See examples/cpp/308_feature_estimate_memory.cpp, 319_feature_get_memory_usage.cpp, and the Index Analysis Tool.

Dataset

vsag::Dataset (declared in vsag/dataset.h) is the universal container VSAG uses for inputs (base vectors to build/add, query vectors to search) and outputs (search results, retrieved vectors). You always hold it through DatasetPtr:

using DatasetPtr = std::shared_ptr<Dataset>;

Builder pattern

Dataset uses a fluent builder: Make() creates an instance, and every setter returns the same DatasetPtr so calls chain. Setters only store pointers/values — they do not copy your buffers.

auto base = vsag::Dataset::Make()
                ->Dim(128)
                ->NumElements(10000)
                ->Ids(ids)                 // const int64_t*
                ->Float32Vectors(vectors)  // const float*
                ->Owner(false);            // caller keeps ownership of ids/vectors

Ownership

Ownership controls who frees the underlying buffers:

Call	Meaning
`Owner(true)`	The dataset owns its buffers and frees them on destruction (using the default allocator).
`Owner(true, allocator)`	The dataset owns its buffers and frees them via the supplied `Allocator`.
`Owner(false)`	The caller keeps ownership; the dataset only borrows the pointers. They must outlive the dataset.

Use Owner(false) for build/query inputs you already hold. Search results returned by the index use Owner(true), so you can read them and let the DatasetPtr free everything.

DatasetPtr Make();               // static factory

DatasetPtr Owner(bool is_owner, Allocator* allocator);
DatasetPtr Owner(bool is_owner);              // uses the default allocator
DatasetPtr Append(const DatasetPtr& other);   // concatenate another dataset
DatasetPtr DeepCopy(Allocator* allocator = nullptr) const;  // independent copy

Metadata

Setter	Getter	Type	Meaning
`NumElements(int64_t)`	`GetNumElements()`	`int64_t`	Number of elements (vectors/rows).
`Dim(int64_t)`	`GetDim()`	`int64_t`	Dense vector dimensionality.
`Ids(const int64_t*)`	`GetIds()`	`const int64_t*`	Per-element ids (length `NumElements`).
`Distances(const float*)`	`GetDistances()`	`const float*`	Distances (search output; length depends on `k`/matches).

Vector payloads

A dataset carries exactly one vector representation, chosen to match the index’s dtype:

Setter	Getter	Element type	Use with
`Float32Vectors(const float*)`	`GetFloat32Vectors()`	`float`	`dtype: float32`
`Float16Vectors(const uint16_t*)`	`GetFloat16Vectors()`	`uint16_t`	`dtype: float16` and `bfloat16` (raw 16-bit payload)
`Int8Vectors(const int8_t*)`	`GetInt8Vectors()`	`int8_t`	`dtype: int8`
`SparseVectors(const SparseVector*)`	`GetSparseVectors()`	`SparseVector`	`dtype: sparse` (SINDI)

Dense vectors are laid out row-major: element i, dimension j lives at vectors[i * dim + j].

Multi-vector payloads

For documents that hold several dense sub-vectors each:

Setter	Getter	Type	Meaning
`MultiVectors(const MultiVector*)`	`GetMultiVectors()`	`MultiVector`	One entry per document.
`MultiVectorDim(int64_t)`	`GetMultiVectorDim()`	`int64_t`	Floats per sub-vector (independent of `Dim`).
`VectorCounts(const uint32_t*)`	`GetVectorCounts()`	`const uint32_t*`	Sub-vector count per document.

Metadata payloads

Setter	Getter	Type	Meaning
`AttributeSets(const AttributeSet*)`	`GetAttributeSets()`	`AttributeSet`	Per-element attributes for hybrid search.
`ExtraInfos(const char*)`	`GetExtraInfos()`	`const char*`	Packed extra-info blobs.
`ExtraInfoSize(int64_t)`	`GetExtraInfoSize()`	`int64_t`	Bytes per extra-info blob.
`Paths(const std::string*)`	`GetPaths()`	`const std::string*`	Hierarchy paths (Pyramid). Default hierarchy.
`Paths(const std::string& hierarchy, const std::string*)`	`GetPaths(const std::string& hierarchy)`	`const std::string*`	Paths for a named hierarchy.
`SourceID(const std::string*)`	`GetSourceID()`	`const std::string*`	Optional source identifier.

See Attribute Filter (Hybrid Search) and Extra Info.

Diagnostics payloads

Setter	Getter	Type	Meaning
`Statistics(const std::string&)`	`GetStatistics()` / `GetStatistics(keys)`	`std::string` / `std::vector<std::string>`	Serialized statistics; the keyed getter returns values for the requested keys.
`Reasoning(const std::string&)`	`GetReasoning()`	`std::string`	Reasoning report (JSON) explaining recall of `expected_labels_`.

Reading search results

Search methods return a DatasetPtr you read back with the getters:

auto result = index->KnnSearch(query, 10, search_params);
if (result.has_value()) {
    auto r = result.value();
    for (int64_t i = 0; i < r->GetDim(); ++i) {
        int64_t id = r->GetIds()[i];
        float dist = r->GetDistances()[i];
    }
}

For KNN, GetNumElements() is 1 and the ids/distances arrays have length k. For range search, the number of matches is reported through the result’s dimension. See k-Nearest Neighbor Search.

`SparseVector`

struct SparseVector {
    uint32_t len_ = 0;         // number of non-zero entries
    uint32_t* ids_ = nullptr;  // term ids, length len_ (sorted ascending inside the index)
    float* vals_ = nullptr;    // term weights, length len_

    // optional original tokenization (order/duplicates preserved, unlike ids_)
    uint32_t token_seq_len_ = 0;
    uint32_t* token_sequence_ = nullptr;
};

Sorting ids_ ascending before insertion is recommended. token_sequence_ is optional and only used by indexes that consume raw token order.

`MultiVector`

struct MultiVector {
    uint32_t len_ = 0;          // number of sub-vectors in this document
    float* vectors_ = nullptr;  // flat array of len_ * MultiVectorDim floats
};

When Owner(true) is set, each element’s vectors_ must be independently allocated, because the destructor frees each vectors_ separately.

Search Request & Filters

This page covers the types that describe how to search: the unified SearchRequest, the filtering primitives Filter and Bitset, and the IteratorContext used for incremental search. The deprecated SearchParam is documented at the end for migration.

`SearchRequest`

Declared in vsag/search_request.h. SearchRequest is a plain struct that bundles every option for Index::SearchWithRequest. Fill in the fields you need and leave the rest at their defaults.

vsag::SearchRequest request;
request.query_ = query;      // DatasetPtr with one query vector
request.mode_ = vsag::SearchMode::KNN_SEARCH;
request.topk_ = 10;
request.params_str_ = R"({"hgraph": {"ef_search": 100}})";

auto result = index->SearchWithRequest(request);

`SearchMode`

enum class SearchMode {
    KNN_SEARCH = 1,    // return the top-k nearest vectors
    RANGE_SEARCH = 2,  // return all vectors within radius_
};

Basic fields

Field	Type	Default	Meaning
`query_`	`DatasetPtr`	`nullptr`	The query. Exactly one query vector is allowed.
`mode_`	`SearchMode`	`KNN_SEARCH`	KNN vs. range search.
`topk_`	`int64_t`	`10`	Neighbors to return (KNN mode). Must be positive.
`radius_`	`float`	`0.5`	Distance threshold (range mode). Non-negative.
`limited_size_`	`int64_t`	`-1`	Cap on range results; `-1` means no limit.
`params_str_`	`std::string`	`""`	Algorithm-specific search params as JSON (e.g. `ef_search`).

IVF bucket routing

IVF accepts {"ivf":{"scan_buckets_count":N,"disable_bucket_scan":true}} through params_str_. This routing-only mode returns the N selected bucket IDs per query in the result Dataset instead of vector labels. NumElements() equals the number of queries, Dim() equals scan_buckets_count, GetIds() contains bucket IDs (with -1 for empty slots), and GetDistances() has distances to bucket centroids. No vector scan is performed, so filters, topk, range limits, reordering, and reasoning options are ignored.

Filtering fields

Three filtering mechanisms are available and are combined with logical AND when more than one is enabled.

Field	Type	Default	Meaning
`enable_attribute_filter_`	`bool`	`false`	Enable SQL-style attribute filtering.
`attribute_filter_str_`	`std::string`	`""`	The filter expression (see below). Requires `enable_attribute_filter_`.
`enable_filter_`	`bool`	`false`	Enable a custom `Filter` callback.
`filter_`	`FilterPtr`	`nullptr`	The filter object. Requires `enable_filter_`.
`enable_bitset_filter_`	`bool`	`false`	Enable a `Bitset` filter.
`bitset_filter_`	`BitsetPtr`	`nullptr`	The bitset. `Test(id) == true` excludes id. Requires `enable_bitset_filter_`.

The attribute_filter_str_ grammar is SQL-like. Examples:

category = 'electronics' AND price != 1000
multi_in(category, ['electronics', 'clothing']) AND multi_notin(color, ['red', 'blue'])

See Attribute Filter (Hybrid Search) and Filtered Search.

Resource & iterator fields

Field	Type	Default	Meaning
`search_allocator_`	`Allocator*`	`nullptr`	Per-search allocator; falls back to the index allocator when null.
`enable_iterator_search_`	`bool`	`false`	Enable incremental (iterator) search.
`p_iter_ctx_`	`IteratorContext**`	`nullptr`	Handle to the iterator state, reused across calls.
`is_last_search_`	`bool`	`false`	Marks the final call of an iterator sequence.
`expected_labels_`	`std::vector<int64_t>`	`{}`	Ids expected in the result; enables reasoning analysis of missed recalls.

See Per-Search Allocator and Iterator Search, plus examples/cpp/313/314 for the allocator.

`Filter`

Declared in vsag/filter.h. Implement this abstract class to express arbitrary “keep this id?” logic. Hold it through FilterPtr (std::shared_ptr<Filter>).

class Filter {
public:
    enum class Distribution { NONE = 0, RELATED_TO_VECTOR };

    virtual bool CheckValid(int64_t id) const = 0;          // true  => KEEP the id
    virtual bool CheckValid(const char* data) const;         // extra-info variant (default true)
    virtual float ValidRatio() const;                        // fraction kept (default 1.0)
    virtual Distribution FilterDistribution() const;         // hint (default NONE)
    virtual void GetValidIds(const int64_t** valid_ids, int64_t& count) const;
};

Convention: Filter::CheckValid(id) returns true to keep a vector. This is the opposite of the bitset / std::function<bool(int64_t)> pre-filter overloads on Index, where true means filtered out. Keep this distinction in mind when choosing an overload.

Member	Purpose
`CheckValid(int64_t id)`	Core predicate. `true` keeps the id in results.
`CheckValid(const char* data)`	Predicate over an element’s extra-info bytes. Defaults to `true`.
`ValidRatio()`	Estimated fraction of vectors that pass; lets the engine pick a strategy.
`FilterDistribution()`	`RELATED_TO_VECTOR` hints validity correlates with vector position.
`GetValidIds(...)`	Optionally expose the explicit valid-id set.

See examples/cpp/301_feature_filter.cpp.

`Bitset`

Declared in vsag/bitset.h. A compact set of bit flags keyed by position, held through BitsetPtr. It is used both as a filtering input and as a utility (e.g. the result of l2_and_filtering).

static BitsetPtr Random(int64_t length);  // random bitset of the given length
static BitsetPtr Make();                  // empty bitset

void Set(int64_t pos, bool value);
void Set(int64_t pos);       // = Set(pos, true)
bool Test(int64_t pos) const;
uint64_t Count();            // number of set bits
std::string Dump();          // debug dump

When a Bitset is used as a search pre-filter (bitset_filter_, or the invalid argument of KnnSearch / RangeSearch), Test(id) == true means the id is filtered out.

`IteratorContext`

Declared in vsag/iterator_context.h. An opaque handle that stores the position of an in-progress iterator search so that subsequent calls resume where the previous one stopped.

class IteratorContext {
public:
    virtual ~IteratorContext() = default;
};

You do not construct or inspect it directly. VSAG allocates it on the first iterator search; pass the same handle back (via SearchRequest::p_iter_ctx_, or the KnnSearch iterator overload) on each subsequent call, and set the last-search flag on the final call so the engine can release it. See Iterator Search.

`SearchParam` (deprecated)

Declared in vsag/search_param.h. SearchParam predates SearchRequest and is retained only for the deprecated KnnSearch(query, k, SearchParam&) overload.

struct SearchParam {  // [[deprecated]] use SearchRequest
    bool is_iter_filter{false};
    bool is_last_search{false};
    const std::string& parameters;
    FilterPtr filter{nullptr};
    Allocator* allocator{nullptr};
    IteratorContext* iter_ctx{nullptr};
};

Prefer SearchRequest + SearchWithRequest for all new code. SearchParam holds parameters by reference, so the referenced string must outlive the call.

Serialization Types

VSAG can persist an index in two shapes: an in-memory BinarySet (a named collection of byte blobs) or, for on-disk / streaming scenarios, a ReaderSet of lazy Reader objects. These types are the payloads passed to Index::Serialize / Index::Deserialize.

For end-to-end workflows and stream-based serialization, see Serialization and examples/cpp/401_persistent_kv.cpp / 402_persistent_streaming.cpp.

`Binary`

Declared in vsag/binaryset.h. A single named byte buffer with its length.

struct Binary {
    std::shared_ptr<int8_t[]> data;  // the bytes
    uint64_t size;                   // number of bytes
};

The shared_ptr owns the buffer, so a Binary can be copied and stored freely without worrying about lifetime.

`BinarySet`

Declared in vsag/binaryset.h. A string-keyed map of Binary blobs — the standard in-memory serialization container. An index serializes itself into several named parts (graph, vectors, quantizer, etc.), all gathered in one BinarySet.

class BinarySet {
public:
    void Set(const std::string& name, Binary binary);   // store a blob
    Binary Get(const std::string& name) const;          // {nullptr, 0} if absent
    std::vector<std::string> GetKeys() const;            // all stored names
    bool Contains(const std::string& key) const;
};

Method	Description
`Set(name, binary)`	Stores `binary` under `name`, overwriting any existing entry.
`Get(name)`	Returns the blob, or an empty `Binary{nullptr, 0}` if the name is absent.
`GetKeys()`	Returns every stored name.
`Contains(key)`	Whether a blob is stored under `key`.

// Serialize to a BinarySet, then persist each part however you like.
auto serialized = index->Serialize();
if (serialized.has_value()) {
    vsag::BinarySet bs = serialized.value();
    for (const auto& key : bs.GetKeys()) {
        vsag::Binary part = bs.Get(key);
        // write part.data[0 .. part.size) to your store, keyed by `key`
    }
}

To restore, rebuild the BinarySet from your store and call Deserialize(const BinarySet&) on a fresh (empty) index.

`Reader`

Declared in vsag/readerset.h. An abstract source of bytes that the index reads on demand — the basis for deserializing large, disk-resident indexes without loading everything into memory. Obtain a local-file reader from Factory::CreateLocalFileReader, or implement Reader for a custom backend (object storage, mmap, etc.). Hold it through ReaderPtr (std::shared_ptr<Reader>).

class Reader {
public:
    virtual void Read(uint64_t offset, uint64_t len, void* dest) = 0;                  // sync
    virtual void AsyncRead(uint64_t offset, uint64_t len, void* dest, CallBack cb) = 0; // async
    virtual bool MultiRead(uint8_t* dests, const uint64_t* lens,
                           const uint64_t* offsets, uint64_t count);                    // batched
    virtual uint64_t Size() const = 0;
};

Method	Description
`Read(offset, len, dest)`	Synchronously copy `len` bytes from `offset` into `dest`. Thread-safe.
`AsyncRead(offset, len, dest, callback)`	Asynchronous read; `callback` is invoked with an `IOErrorCode` and message on completion.
`MultiRead(dests, lens, offsets, count)`	Perform `count` synchronous reads in one call; returns `false` on any failure.
`Size()`	Total size of the underlying source in bytes.

`IOErrorCode`

enum class IOErrorCode {
    IO_SUCCESS = 0,  // operation succeeded
    IO_ERROR = 1,    // general I/O error
    IO_TIMEOUT = 2,  // operation timed out
};

`CallBack`

using CallBack = std::function<void(IOErrorCode code, const std::string& message)>;

The completion handler for AsyncRead.

`ReaderSet`

Declared in vsag/readerset.h. A string-keyed map of Reader objects — the streaming analogue of BinarySet. Each named part of a serialized index maps to a Reader that fetches that part on demand. Pass a fully populated ReaderSet to Deserialize(const ReaderSet&).

class ReaderSet {
public:
    void Set(const std::string& name, ReaderPtr reader);
    ReaderPtr Get(const std::string& name) const;   // nullptr if absent
    std::vector<std::string> GetKeys() const;
    bool Contains(const std::string& key) const;
};

The method semantics mirror BinarySet, except values are ReaderPtr instead of Binary.

vsag::ReaderSet readers;
readers.Set("graph", vsag::Factory::CreateLocalFileReader("index.graph", 0, graph_size));
readers.Set("vectors", vsag::Factory::CreateLocalFileReader("index.vectors", 0, vec_size));

auto fresh = vsag::Factory::CreateIndex("hgraph", params).value();
fresh->Deserialize(readers);

Resource Management

VSAG lets you take control of the memory and threads it uses. This page covers Allocator (custom memory management), ThreadPool (custom concurrency), Resource (a bundle of the two shared by an Engine), the process-wide Options singleton, and the pluggable Logger.

Runnable samples: examples/cpp/201_custom_allocator.cpp, 202_custom_logger.cpp, and 203_custom_thread_pool.cpp. See also Memory Management and Extensibility.

`Allocator`

Declared in vsag/allocator.h. An abstract interface for custom memory management. Implement it to route all of an index’s allocations through your own pool, arena, or accounting layer, then pass it to Factory::CreateIndex or a Resource.

class Allocator {
public:
    virtual std::string Name() = 0;
    virtual void* Allocate(uint64_t size) = 0;
    virtual void Deallocate(void* p) = 0;
    virtual void* Reallocate(void* p, uint64_t size) = 0;

    template <typename T, typename... Args> T* New(Args&&... args);  // Allocate + construct
    template <typename T> void Delete(T* p);                          // destruct + Deallocate
};

Member	Description
`Name()`	Identifier for the allocator implementation (used in diagnostics).
`Allocate(size)`	Return a block of at least `size` bytes.
`Deallocate(p)`	Free a block previously returned by this allocator.
`Reallocate(p, size)`	Resize a block, preserving contents.
`New<T>(args...)`	Helper: allocate and construct a `T`; frees and rethrows if the constructor throws.
`Delete<T>(p)`	Helper: destruct `*p` and free its storage (null-safe).

An allocator passed to an index must outlive that index. VSAG’s built-in allocator is available via Engine::CreateDefaultAllocator.

`ThreadPool`

Declared in vsag/thread_pool.h. An abstract task executor. Supply your own to make VSAG share your application’s threads instead of spawning its own.

class ThreadPool {
public:
    virtual void WaitUntilEmpty() = 0;
    virtual void SetQueueSizeLimit(std::uint64_t limit) = 0;
    virtual void SetPoolSize(std::uint64_t limit) = 0;
    virtual std::future<void> Enqueue(std::function<void(void)> task) = 0;
};

Member	Description
`WaitUntilEmpty()`	Block until all enqueued tasks finish.
`SetQueueSizeLimit(limit)`	Cap the pending-task queue; behavior past the cap is implementation-defined.
`SetPoolSize(limit)`	Cap the number of worker threads.
`Enqueue(task)`	Submit a task; returns a `std::future<void>` for its completion.

A ready-made pool can be created with Engine::CreateThreadPool.

`Resource`

Declared in vsag/resource.h. A Resource bundles an Allocator and a ThreadPool so that an Engine — and every index it creates — can share them.

class Resource {
public:
    explicit Resource(Allocator* allocator, ThreadPool* thread_pool);
    explicit Resource(const std::shared_ptr<Allocator>& allocator,
                      const std::shared_ptr<ThreadPool>& thread_pool);
    explicit Resource();  // default allocator, no thread pool

    std::shared_ptr<Allocator> GetAllocator() const;
    std::shared_ptr<ThreadPool> GetThreadPool() const;
};

Constructor / method	Description
`Resource(Allocator, ThreadPool)`	Use raw pointers you own. A null allocator means “create and own a default”; a null thread pool means “no pool”.
`Resource(shared_ptr, shared_ptr)`	Same, with shared ownership.
`Resource()`	Default allocator, no thread pool.
`GetAllocator()`	The resource’s allocator (a default one if none was supplied).
`GetThreadPool()`	The resource’s thread pool, or null if none was supplied.

auto alloc = vsag::Engine::CreateDefaultAllocator();
auto pool = vsag::Engine::CreateThreadPool(4).value();
vsag::Resource resource(alloc, pool);
vsag::Engine engine(&resource);
auto index = engine.CreateIndex("hgraph", params);

`Options`

Declared in vsag/options.h. A process-wide singleton for global configuration, accessed via Options::Instance(). Thread-safe. Option is a type alias for Options.

vsag::Options::Instance().set_num_threads_building(8);
vsag::Options::Instance().set_logger(&my_logger);

Setting	Accessors	Default	Meaning
IO threads	`num_threads_io()` / `set_num_threads_io(n)`	`8`	Threads for disk-index IO during search (1–200).
Build threads	`num_threads_building()` / `set_num_threads_building(n)`	`4`	Threads for constructing an index.
Block size limit	`block_size_limit()` / `set_block_size_limit(bytes)`	`128 MB`	Max bytes per allocation block (must be > 2 MB).
Direct-IO align	`direct_IO_object_align_bit()` / `set_direct_IO_object_align_bit(bits)`	`9`	Direct-IO object alignment, in bits (< 21).
Logger	`logger()` / `set_logger(Logger*)`	`nullptr`	Active `Logger`; returns `true` on set.

`Logger`

Declared in vsag/logger.h. An abstract logging sink. Implement it and register it via Options::set_logger to route VSAG’s log output through your application’s logging system.

The built-in logger defaults to info. Set VSAG_LOG_LEVEL before the built-in logger is created to choose trace, debug, info, warn/warning, error, critical, or off. Invalid values are ignored and keep the default level. An explicit SetLevel call still overrides the environment-derived level.

VSAG logs a startup initialization banner when vsag::init() runs. To suppress that banner, set VSAG_SUPPRESS_INIT_BANNER before starting the process. This is useful for tests, CI jobs, or applications that need quieter startup logs.

Truthy values are 1, on, and true; matching for on and true is ASCII case-insensitive, so ON, On, and TRUE also work. Other values leave the banner enabled.

Set the variable before process start. VSAG also runs vsag::init() during static initialization, so setting the variable later from inside the process cannot suppress the first banner.

The banner includes an instance spec value such as 48C503G, combining the cpuinfo core count with total physical memory. Memory is reported in whole GiB using floor division by 1024^3; if the platform query fails, the memory portion is shown as ?G. SIMD lines, including neon and sve, retain their existing distribution/platform/using capability semantics.

VSAG_SUPPRESS_INIT_BANNER=1 ./your_vsag_app
VSAG_SUPPRESS_INIT_BANNER=true ./your_vsag_test

class Logger {
public:
    enum Level : int {
        kTRACE = 0, kDEBUG = 1, kINFO = 2, kWARN = 3, kERR = 4, kCRITICAL = 5, kOFF = 6, kN_LEVELS
    };

    virtual void SetLevel(Level log_level) = 0;
    virtual void Trace(const std::string& msg) = 0;
    virtual void Debug(const std::string& msg) = 0;
    virtual void Info(const std::string& msg) = 0;
    virtual void Warn(const std::string& msg) = 0;
    virtual void Error(const std::string& msg) = 0;
    virtual void Critical(const std::string& msg) = 0;
};

Member	Description
`SetLevel(level)`	Only messages at or above `level` are emitted. `kOFF` disables logging.
`Trace` / `Debug` / `Info` / `Warn` / `Error` / `Critical`	Emit a message at the corresponding severity.

See examples/cpp/202_custom_logger.cpp.

Auxiliary Types

This page gathers the remaining public types: the attribute system used for hybrid (attribute-filtered) search, the IndexFeature capability flags, the index detail info introspection types, the utility functions in utils.h, and the string constants in constants.h.

Attributes

Declared in vsag/attribute.h. Attributes are typed, named metadata attached to each vector, enabling SQL-style filtering during search (see Attribute Filter (Hybrid Search)).

`AttrValueType`

enum AttrValueType {
    INT32 = 1, UINT32 = 2, INT64 = 3, UINT64 = 4,
    INT8 = 5, UINT8 = 6, INT16 = 7, UINT16 = 8,
    STRING = 9,
};

The element type carried by an attribute.

`Attribute`

class Attribute {
public:
    std::string name_{};

    virtual AttrValueType GetValueType() const = 0;
    virtual uint64_t GetValueCount() const = 0;
    virtual Attribute* DeepCopy() const = 0;
    virtual bool Equal(const Attribute* other) const = 0;
};
using AttributePtr = std::shared_ptr<Attribute>;

An abstract, named attribute. Each attribute may hold multiple values (GetValueCount()), so a single field can represent a multi-valued tag set.

Member	Description
`name_`	The attribute (field) name.
`GetValueType()`	The `AttrValueType` of the stored values.
`GetValueCount()`	Number of values held.
`DeepCopy()`	Allocate an independent copy.
`Equal(other)`	Value equality against another attribute.

`AttributeValue<T>`

template <class T>
class AttributeValue : public Attribute {
public:
    AttrValueType GetValueType() const override;
    uint64_t GetValueCount() const override;
    std::vector<T>& GetValue();
    const std::vector<T>& GetValue() const;
    Attribute* DeepCopy() const override;
    bool Equal(const Attribute* other) const override;
};

The concrete, typed implementation of Attribute. Instantiate it with the C++ type matching the desired AttrValueType (e.g. AttributeValue<int32_t>, AttributeValue<std::string>), set name_, and push values into GetValue().

auto tag = std::make_shared<vsag::AttributeValue<int32_t>>();
tag->name_ = "category";
tag->GetValue().push_back(7);

`AttributeSet`

struct AttributeSet {
    std::vector<Attribute*> attrs_;
};

A bag of attributes describing one element. Attach a per-element array of AttributeSet to a Dataset via AttributeSets(...), or pass one to Index::UpdateAttribute.

`IndexFeature`

Declared in vsag/index_features.h. An enum of optional capabilities you can probe with Index::CheckFeature before calling an optional method.

enum IndexFeature {
    NEED_TRAIN = 1,
    SUPPORT_BUILD,
    SUPPORT_ADD_AFTER_BUILD,
    SUPPORT_KNN_SEARCH,
    SUPPORT_RANGE_SEARCH,
    SUPPORT_DELETE_BY_ID,
    SUPPORT_SERIALIZE_BINARY_SET,
    SUPPORT_CAL_DISTANCE_BY_ID,
    SUPPORT_MERGE_INDEX,
    SUPPORT_CLONE,
    /* ... many more ... */
    INDEX_FEATURE_COUNT   // sentinel; always the last value
};

The enum groups capabilities into families:

Family	Examples
Lifecycle	`NEED_TRAIN`, `SUPPORT_BUILD`, `SUPPORT_ADD_AFTER_BUILD`, `SUPPORT_ADD_FROM_EMPTY`, `SUPPORT_RESET`
Search	`SUPPORT_KNN_SEARCH`, `SUPPORT_RANGE_SEARCH`, `SUPPORT_*_WITH_ID_FILTER`, `SUPPORT_KNN_ITERATOR_FILTER_SEARCH`, `SUPPORT_BATCH_SEARCH`
Metric	`SUPPORT_METRIC_TYPE_L2`, `SUPPORT_METRIC_TYPE_INNER_PRODUCT`, `SUPPORT_METRIC_TYPE_COSINE`
Serialization	`SUPPORT_SERIALIZE_FILE` / `_BINARY_SET` / `_WRITE_FUNC`, `SUPPORT_DESERIALIZE_FILE` / `_BINARY_SET` / `_READER_SET`
Concurrency	`SUPPORT_ADD_CONCURRENT`, `SUPPORT_SEARCH_CONCURRENT`, `SUPPORT_ADD_SEARCH_DELETE_CONCURRENT`, and the `SUPPORT_*_WITH_MULTI_THREAD` build/train variants
Introspection & ops	`SUPPORT_ESTIMATE_MEMORY`, `SUPPORT_GET_MEMORY_USAGE`, `SUPPORT_CHECK_ID_EXIST`, `SUPPORT_MERGE_INDEX`, `SUPPORT_CLONE`, `SUPPORT_EXPORT_MODEL`, `SUPPORT_EXPORT_IDS`, `SUPPORT_TUNE`, `SUPPORT_CAL_DISTANCE_BY_ID`, `SUPPORT_GET_*_BY_ID(S)`

INDEX_FEATURE_COUNT marks the end of the enum and is not a real feature. See examples/cpp/307_feature_check_features.cpp.

Index detail info

Declared in vsag/index_detail_info.h. These types describe and carry the structured data returned by Index::GetIndexDetailInfos and Index::GetDetailDataByName. See Index Introspection and examples/cpp/317_feature_get_detail_data.cpp.

`IndexDetailDataType`

enum class IndexDetailDataType {
    TYPE_2DArray_INT64,
    TYPE_1DArray_INT64,
    TYPE_SCALAR_INT64,
    TYPE_SCALAR_DOUBLE,
    TYPE_SCALAR_STRING,
    TYPE_SCALAR_BOOL,
};

Tells you which DetailData getter is valid for a given field.

`IndexDetailInfo`

class IndexDetailInfo {
public:
    std::string name;
    std::string description;
    IndexDetailDataType type;
};

A descriptor for one introspectable field: its name, a human-readable description, and the payload type.

`DetailData`

class DetailData {
public:
    virtual std::vector<int64_t> GetData1DArrayInt64();
    virtual std::vector<std::vector<int64_t>> GetData2DArrayInt64();
    virtual std::string GetDataScalarString();
    virtual bool GetDataScalarBool();
    virtual int64_t GetDataScalarInt64();
    virtual double GetDataScalarDouble();
    // ... const overloads ...
};
using DetailDataPtr = std::shared_ptr<DetailData>;

The payload itself. Read it through the getter matching the descriptor’s IndexDetailDataType; calling a mismatched getter is not meaningful.

Utility functions

Declared in vsag/utils.h. Free helper functions for clustering and recall evaluation.

`kmeans_clustering`

float kmeans_clustering(uint64_t d, uint64_t n, uint64_t k, const float* x,
                        float* centroids, const std::string& dis_type);

Runs k-means over n points of dimension d, writing k centroids into the pre-allocated centroids (size k * d). dis_type is one of "l2", "cosine", "ip". Returns the final quantization error.

`l2_and_filtering`

BitsetPtr l2_and_filtering(int64_t dim, int64_t nb, const float* base,
                           const float* query, float threshold);

Returns a Bitset in which bit i is set (true) when base vector i falls within threshold L2 distance of query — the ground truth consumed by range_search_recall. Note the polarity is the opposite of a search pre-filter, where a set bit excludes an id (see Bitset); invert it before reusing it as an invalid / bitset_filter_ mask.

`knn_search_recall` / `range_search_recall`

float knn_search_recall(const float* base, const int64_t* id_map, int64_t base_num,
                        const float* query, int64_t data_dim,
                        const int64_t* result_ids, int64_t result_size);

float range_search_recall(const float* base, const int64_t* base_ids, int64_t num_base,
                          const float* query, int64_t dim,
                          const int64_t* result_ids, int64_t result_size, float threshold);

Compute the recall of a KNN or range search result against the ground truth derived from the base vectors. Handy for tests and benchmarks; see Benchmarks.

Constants

Declared in vsag/constants.h. A large set of extern const char* const string constants for the keys and enumerated string values used throughout the JSON-based configuration. Using the constants instead of raw string literals avoids typos. They fall into several groups:

Group	Examples
Index type names	`INDEX_HGRAPH`, `INDEX_IVF`, `INDEX_DISKANN`, `INDEX_BRUTE_FORCE`, `INDEX_SINDI`, `INDEX_PYRAMID`
Dataset field names	`DIM`, `NUM_ELEMENTS`, `IDS`, `DISTS`, `FLOAT32_VECTORS`, `SPARSE_VECTORS`
Metric names	`METRIC_L2`, `METRIC_COSINE`, `METRIC_IP`
Data type names	`DATATYPE_FLOAT32`, `DATATYPE_FLOAT16`, `DATATYPE_BFLOAT16`, `DATATYPE_INT8`, `DATATYPE_SPARSE`
Top-level params	`PARAMETER_DTYPE`, `PARAMETER_DIM`, `PARAMETER_METRIC_TYPE`, `INDEX_PARAM`
Per-index params	`HGRAPH_`, `IVF_`, `DISKANN_PARAMETER_`, `PYRAMID_`, `BRUTE_FORCE_*`
Statistics keys	`STATSTIC_MEMORY`, `STATSTIC_KNN_TIME`, `STATSTIC_RANGE_TIME`

For the meaning of each parameter key, see Index Parameters and the individual index pages.

Best Practices

This page gathers practical advice for running VSAG in production, as a companion to the parameter reference and performance tuning guide.

Index Selection

Scenario	Recommended index	Rationale
Medium scale (≤ 10M), in-memory, recall/latency critical	`hgraph`	Unified high-quality graph index with multiple quantizations and Tune support
Coarse recall / candidate layer	`ivf`	Trains once, parallelizes widely
Small scale, 100% precision required	`brute_force`	Exhaustive search; useful as a recall baseline
Multi-tenant or partitioned data	`pyramid`	Multiple subgraphs inside one index, supports tag-based retrieval
Sparse vectors (BM25 / SPLADE-style)	`sindi`	Dedicated sparse-vector index

Detailed parameters: Index Parameters.

Build Time

Pick the metric first: l2 / ip / cosine cannot be changed after the index is built.
ef_construction: typically 200–500. Too small hurts recall; too large slows builds.
max_degree / M: typically 16–48. Larger values mean higher recall and memory.
Quantization: latency-sensitive scenarios favor sq8 or pq; accuracy-sensitive ones favor fp32 or fp16.
Parallel builds: use a custom ThreadPool (see examples/cpp/203_custom_thread_pool.cpp) to control concurrency.

Search Time

ef_search: commonly topk to topk * 10; do a QPS/recall grid search to settle on the right value.
Batch search: merging multiple queries improves cache utilization; batch at the caller or use batch-capable examples.
Filter: use the built-in Filter (examples/cpp/301_feature_filter.cpp) rather than post-filtering.
Per-search allocator: for high-concurrency online services, use a per-thread arena allocator; see Memory Management.

Tuning

Use Tune against realistic query distributions.
Enable the conjugate graph for tail-heavy workloads.
Treat eval_performance as a continuous regression test.

Deployment

The official Docker image is the recommended starting point; see Installation.
For production binaries, pick the distribution matching your ABI: dist-pre-cxx11-abi, dist-cxx11-abi, or dist-libcxx (see Building).
Enable VSAG_ENABLE_INTEL_MKL=ON on Intel CPUs for additional acceleration.

Observability

Index::GetMemoryUsage() exposes runtime memory usage.
The search path supports a custom Logger (examples/cpp/202_custom_logger.cpp) to integrate with your logging stack.
eval_performance can write its metrics directly to InfluxDB for long-term monitoring.

Disk-Based Index Best Practices

Disk-backed HGraph: the graph and compact base codes stay in memory for traversal, while a higher-precision precise copy on disk is read only for the ef_search finalists during reorder

When a corpus grows past the point where every vector fits in RAM, moving the coldest, largest part of the index onto an SSD is the most direct way to control cost. VSAG does this by letting each part of an index choose its own storage backend, so hot data stays in memory while cold data is served from disk. This page covers HGraph with disk-backed IO and shows copy-pasteable configurations, a capacity model, and a tuning checklist.

“Disk index” here means tiering the index data across memory and disk. It is unrelated to filtering by scalar attributes alongside the vector — for that, see Attribute Filter (Hybrid Search).

When to move to disk

A large corpus does not automatically require disk. Weigh these signals first:

Signal	In-memory is fine	Consider disk
Corpus size	≤ tens of millions	Hundreds of millions / billions
Full `fp32` fits in RAM	Yes	No
Latency budget	Sub-millisecond, strict	A few to low-tens of milliseconds is acceptable
Cost structure	RAM cost is acceptable	Want to replace most RAM with SSD

A quick memory estimate: a full fp32 copy occupies about N × dim × 4 bytes. For example 1e9 × 128 × 4 ≈ 512 GB and 1e8 × 768 × 4 ≈ 307 GB — sizes that rarely fit in a single machine’s RAM, which is exactly the disk-index target.

The cost of going to disk is a few random-read I/Os per query, so latency is higher than a pure in-memory index. If your service needs sub-millisecond latency and the data can be compressed to fit in RAM, prefer in-memory HGraph with quantization instead.

The core idea

VSAG’s disk-backed indexing follows one principle: keep a small, approximate representation in memory to navigate, and keep the large, precise representation on disk to rank. Graph traversal touches only the in-memory approximate codes; a small number of finalists are then re-scored (“reordered”) against the precise copy read from disk. Because the disk reads happen only at the end, for only a handful of candidates, the I/O cost stays bounded while recall is recovered by the precise rescore.

How HGraph tiers storage

HGraph stores an index as several independent cells, and each cell can be pointed at its own IO backend. That is what enables “hot in memory, cold on disk”:

Cell	Holds	Access pattern	Recommended placement
`graph`	Adjacency lists of the proximity graph	Read on every hop	Memory (or `mmap_io` under pressure)
`base`	Quantized codes used to traverse and prune	Read on every hop	Memory
`precise`	High-precision copy used to reorder (`use_reorder`)	Read for a few finalists	Disk
`raw_vector`	Optional raw vectors (`store_raw_vector`)	Rarely, e.g. `cosine`/exact	Memory or disk

The IO backends you can assign to a cell:

Backend (`*_io_type`)	Location	Needs `*_file_path`	Notes
`memory_io`	Memory (contiguous)	No	Basic in-memory storage
`block_memory_io`	Memory (block-allocated)	No	Default backend for large cells
`buffer_io`	Disk (buffered `pread`)	Yes	Portable disk reads; works everywhere
`mmap_io`	Disk (mmap + page cache)	Yes	Near-memory speed when the working set fits the page cache
`async_io`	Disk (Linux libaio)	Yes	High-concurrency disk reads; Linux + libaio only, otherwise falls back to `buffer_io`
`reader_io`	Custom `Reader`	No	Read through a user `ReaderSet` at load time (e.g. remote / object storage)

Each cell is wired with a flat pair of build parameters: graph_io_type / graph_file_path, base_io_type / base_file_path, precise_io_type / precise_file_path, and raw_vector_io_type / raw_vector_file_path. A *_file_path is required whenever the matching *_io_type is disk-backed (buffer_io, mmap_io, or async_io); the in-memory backends ignore it. All cells default to block_memory_io (fully in memory).

Recommended configuration: base in memory, precise on disk

The workhorse layout keeps a very compact 3-bit RaBitQ base in memory for traversal and pushes a higher-precision sq8 copy to disk for reorder, with use_reorder turned on so the finalists are re-ranked against it:

{
    "dtype": "float32",
    "metric_type": "l2",
    "dim": 128,
    "index_param": {
        "base_quantization_type": "rabitq",
        "rabitq_bits_per_dim_base": 3,
        "max_degree": 32,
        "ef_construction": 400,
        "use_reorder": true,
        "precise_quantization_type": "sq8",
        "precise_io_type": "async_io",
        "precise_file_path": "/data/vsag/hgraph_precise.data"
    }
}

Here rabitq_bits_per_dim_base selects the standard multi-bit RaBitQ base code (range [1, 8]); leave rabitq_bits_per_dim_precise unset so the base stays a plain RaBitQ code rather than switching to the x+y split variant.

Search is unchanged — set ef_search as usual and reorder transparently reads the precise copy from disk:

{"hgraph": {"ef_search": 200}}

The per-query data flow: graph traversal reads only the in-memory base codes and graph adjacency; disk I/O happens solely at the end, when reorder fetches the precise copy for the small candidate set before returning the top-k. This is why a disk-backed HGraph adds only a bounded number of reads per query rather than one read per hop.

Hardware and deployment

Use NVMe SSDs. Disk-backed vector search is dominated by random-read latency; NVMe is an order of magnitude better than SATA SSDs and essential for async_io / mmap_io.
async_io requires Linux with libaio, which is enabled by default. The CMake option ENABLE_LIBAIO defaults to ON, and the Makefile passes VSAG_ENABLE_LIBAIO=ON; you only set these flags to turn libaio back on if a previous build disabled it. When libaio is absent (including on macOS), async_io logs a one-time warning and falls back to buffer_io, so configs remain portable but lose asynchronous batching. For production throughput, build on Linux with libaio.
Warm the page cache for mmap_io cells after load (e.g. a sequential read of the file, or a warm-up query pass) so early queries do not pay cold-miss latency.
Plan file paths and lifecycle. Disk-backed cells write to the *_file_path you supply; place them on a fast, dedicated volume with enough space for the precise copy, and clean up stale files when rebuilding. Serialize and load the index through the normal Serialization API — the backing files are managed with it.

Capacity planning

Approximate per-vector storage for the main quantizers (plus small per-vector metadata such as norms and errors):

Representation	Bytes per vector	Typical placement
`fp32` (precise)	`dim × 4`	Disk
`fp16` / `bf16`	`dim × 2`	Memory or disk
`sq8`	`dim × 1`	Memory or disk
`sq4`	`dim × 0.5`	Memory
`rabitq` (b-bit)	`dim × b / 8`	Memory

Worked example for N = 1e9, dim = 128:

3-bit rabitq base in memory: 1e9 × 128 × 3 / 8 ≈ 48 GB RAM.
sq8 precise on disk: 1e9 × 128 × 1 ≈ 128 GB SSD.
A full fp32 precise instead (maximum reorder accuracy): 1e9 × 128 × 4 ≈ 512 GB SSD.
Add the graph: roughly N × max_degree × 4 bytes for neighbor ids (memory or mmap_io).

This is how a billion-scale index that would need ~0.5 TB of RAM as pure fp32 collapses to tens of GB of RAM plus an SSD.

Tuning and troubleshooting

Symptom	Likely cause	Action
Recall too low	Base quantization too coarse, or reorder off	Keep `use_reorder: true`; raise `precise_quantization_type` toward `fp32`; increase `ef_search`
Latency too high	Too many disk reads per query	Lower `ef_search`; keep `graph`/`base` in memory; ensure precise-only is on disk; use NVMe + `async_io`
Memory still too high	Base or graph too large	Move base to `sq4` / `pq` / `rabitq`; push `graph` to `mmap_io`
`async_io` seems synchronous	libaio not compiled in	Rebuild with `VSAG_ENABLE_LIBAIO=ON` on Linux; check for the fallback warning
Cold-start latency spikes (mmap)	Page cache not warm	Warm the file after load before serving traffic

Treat these as starting points and validate with eval_performance against a realistic query distribution; the Optimizer (Tune) can then settle search-time parameters automatically.

Metric Semantics in VSAG

This page explains how VSAG treats l2, ip, and cosine in practice.

Warning: VSAG’s internal metric implementations are optimized for performance and consistency. Their behavior may differ from the textbook mathematical definitions, so use the semantics described here when comparing results or preparing ground truth.

VSAG keeps all search APIs in a “smaller is better” distance model. For that reason, several internal implementations reuse squared distances, normalized vectors, or cached norms to keep behavior fast and consistent across index types.

`l2`

The distance is L2Sqr (squared L2 distance).
Internally, many kernels work with L2Sqr for speed.
The squared form is used for performance; ranking remains consistent with L2 distance. Returned distance values and range-search thresholds are squared.

`ip`

The distance is 1 - inner_product.
Larger inner product means smaller distance.

`cosine`

The distance is 1 - cosine_similarity.
For performance, implementations may normalize vectors or store extra norm information so cosine can reuse IP-oriented kernels.

Cosine search generally assumes normalized vectors on the internal compute path. Because the implementation may normalize or cache norms, the returned value is intended to behave like a distance, but floating-point error can still push it slightly outside the ideal mathematical range.

Return Value Range

l2: 0 to +infinity
ip: unbounded; values may be negative when inner_product > 1
cosine: ideally 0 to 2 when cosine similarity is in [-1, 1], but small floating-point deviations are possible

Why this matters

Dataset ground truth, query semantics, and index internals need to agree on the same metric family.
l2, ip, and cosine are not interchangeable after an index is built.
When comparing results across tools, check whether the tool uses a distance or a similarity convention.

Creating an Index
Index Parameters
HDF5 Dataset Format

Optimizer (Tune)

For graph-based indexes (HGraph), VSAG exposes the Tune interface, which automatically adjusts runtime parameters based on a representative query set to get a better trade-off between recall and latency. Internally this is the historical “ELP Optimizer”.

Basic Usage

#include <vsag/vsag.h>

auto index = vsag::Factory::CreateIndex("hgraph", build_params).value();
index->Build(base_dataset);

std::string tune_params = R"(
{
    "queries_dataset": "path/or/inline/queries",
    "target_recall": 0.95,
    "top_k": 10
}
)";
auto ret = index->Tune(tune_params);

The second argument disable_future_tuning defaults to false, allowing repeated calls to keep refining. Set it to true to freeze the parameters.

Relationship with the ELP Optimizer

Older literature (see Research Papers) refers to the “ELP Optimizer”. Its implementation key is use_elp_optimizer, which now lives behind the unified Tune API — users no longer need to flip it directly.

Supported Indexes

Index type	Supports Tune
hgraph	yes
ivf / sindi / brute_force	no

Example

examples/cpp/318_feature_tune.cpp walks through an end-to-end tuning flow:

Create the index and Build.
Call Tune with a representative query set.
Serialize the tuned index for production use.

Notes

Tuning is sensitive to the query distribution — use samples that reflect real traffic.
Tuned parameters are persisted together with the index metadata via Serialize/Deserialize and remain in effect after deployment.

Reference Performance

This page is the entry point and explanation for official performance numbers. For concrete figures, use the latest GitHub releases and reproduce with the performance evaluation tool in your target environment.

Reference Hardware

Official benchmarks typically run on hardware in the following class (concrete SKUs vary per release):

CPU: mainstream x86_64 server CPUs (with AVX2 / AVX-512)
Memory: enough DDR4/DDR5 to cover the index plus OS page cache
OS: Ubuntu 20.04 / 22.04 or CentOS 7 / 8
Build: make release by default; MKL is off by default (VSAG_ENABLE_INTEL_MKL=OFF). To enable it explicitly, use VSAG_ENABLE_INTEL_MKL=ON make release (or -DENABLE_INTEL_MKL=ON when invoking CMake directly)

Reference Datasets

Official comparisons use HDF5 datasets compatible with ann-benchmarks:

Dataset	Dim	Metric	Size
SIFT-1M	128	L2	1,000,000
GIST-1M	960	L2	1,000,000
Deep-10M	96	L2	10,000,000
Text-to-Image-1M	200	IP	1,000,000

Key Metrics

QPS (single- and multi-threaded)
Average recall (Recall@k)
P50 / P95 / P99 latency
Peak memory and index size
Build time

Reproduction

make release
./build-release/tools/eval/eval_performance --config tools/eval/eval_template.yaml

Compare the resulting JSON / Markdown output against the official figures to catch performance regressions or quantization degradations.

Contributing Numbers

Pull requests that extend this page with “results on additional hardware” sections are welcome. Please include:

Detailed CPU / memory / disk information.
The VSAG version (git rev-parse HEAD).
The eval_performance output (JSON and Markdown are both helpful).
The exact build command and environment variables (e.g. VSAG_ENABLE_INTEL_MKL).

Performance Evaluation Tool (`eval_performance`)

eval_performance is the command-line performance evaluation tool shipped with VSAG, under tools/eval/. After building, the binary lives at build-release/tools/eval/eval_performance. It is used to compare throughput, latency, and recall across different indexes or parameter combinations.

Building

Tools are not built by default — enable them explicitly:

# via the project Makefile
VSAG_ENABLE_TOOLS=ON make release
# or: make dev

# or directly through CMake
cmake -S . -B build-release -DCMAKE_BUILD_TYPE=Release -DENABLE_TOOLS=ON
cmake --build build-release -j
# Output: ./build-release/tools/eval/eval_performance

HDF5 must be installed on the system (Ubuntu: apt install libhdf5-dev; CentOS: yum install hdf5-devel).

Two Modes

1. Command-line mode (quick, one-off experiments)

./build-release/tools/eval/eval_performance \
    --datapath /tmp/sift-128-euclidean.hdf5 \
    --index_name hgraph \
    --type search \
    --create_params '{"dim":128,"dtype":"float32","metric_type":"l2","index_param":{"base_quantization_type":"fp32","max_degree":32,"ef_construction":300}}' \
    --search_params '{"hgraph":{"ef_search":60}}' \
    --topk 10

Useful flags include --search_mode (knn / range / knn_filter / range_filter), --search-query-count, --delete-index-after-search, and the various --disable_* switches that turn off individual metrics. The reference template at tools/eval/eval_template.yaml shows the complete YAML shape.

2. Config-file mode (batch comparisons)

The YAML file is passed directly as a positional argument (no --config flag):

./build-release/tools/eval/eval_performance my_eval.yaml

A reference template is available at tools/eval/eval_template.yaml. A single configuration can define multiple named cases, plus an optional global section that holds shared settings such as thread counts, exporters, and an embedded HTTP monitor.

A minimal example:

global:
  num_threads_building: 8
  num_threads_searching: 16
  exporters:
    print-directly:
      to: stdout
      format: table
    save-to-file:
      to: "file:///tmp/eval_results.json"
      format: json

eval_case1:
  datapath: /tmp/sift-128-euclidean.hdf5
  type: search
  index_name: hgraph
  create_params: '{"dim":128,"dtype":"float32","metric_type":"l2","index_param":{"base_quantization_type":"fp32","max_degree":32,"ef_construction":300}}'
  search_params: '{"hgraph":{"ef_search":60}}'
  index_path: /tmp/vsag_eval/hgraph_fp32
  topk: 10

Note: under global.exporters, each entry is a named exporter (a YAML map), not a list item.

Supported Dimensions

Efficiency: QPS, TPS
Quality: average recall and quantile recall (P0/P10/P50/P90…)
Latency: average, P50/P95/P99
Resource: peak memory usage

Search Modes

search_mode accepts knn, range, knn_filter, and range_filter.

Output Formats and Destinations

Each exporter combines a format with a to destination.

Formats: table (or its alias text), json, line_protocol (for InfluxDB).
Destinations:
- stdout — print to standard output.
- file://<path> — write (overwrite) to a file.
- influxdb://<host>:<port>/<path>?<query> — POST to an InfluxDB v2 endpoint. Use format: line_protocol and pass an authentication token via vars.token (the value must include the Token prefix, e.g. Token <your-influxdb-token>).

If no exporter is configured, results are printed to stdout in table format by default.

HTTP Monitor (optional)

When configured, the tool starts an embedded HTTP server for the duration of a batch run and exposes live progress (current case, total cases, completion %) plus the latest metrics. This is helpful for long-running evaluations.

global:
  http_server:
    enabled: true
    port: 8080

Datasets

Any HDF5 dataset from ann-benchmarks (e.g. sift-128-euclidean.hdf5, gist-960-euclidean.hdf5) works out of the box.

References

Source: tools/eval/
Local tool entry point: tools/eval/README.md
Reference numbers on standard hardware: Reference Performance.

HDF5 Dataset Format

VSAG’s evaluation and benchmark tooling (most notably eval_performance) consumes datasets in the HDF5 format used by ann-benchmarks. This page documents the exact layout VSAG expects so you can prepare custom datasets or debug failing evaluations.

The dataset layout described below is the dense layout (selected by the global attribute type="dense", or by omitting the attribute). For sparse datasets (type="sparse"), /train and /test are flat INT8 byte streams of shape (X,) produced by VSAG’s sparse-vector serialization (decoded by parse_sparse_vectors in tools/eval/eval_dataset.cpp); all other datasets and attributes below still apply.

Mandatory Datasets

`/train` (base vectors)

Type: INT8 or FLOAT32
Shape: (N, D)
- N — number of base vectors (number_of_base)
- D — feature dimensionality (dim)
Notes: the element type is inferred from HDF5:
- H5T_INTEGER (1-byte) → INT8
- H5T_FLOAT (4-byte) → FLOAT32

`/test` (query vectors)

Type: must match /train
Shape: (Q, D)
- Q — number of query vectors (number_of_query)
- D — must equal /train’s D

`/neighbors` (ground-truth indices)

Type: INT64
Shape: (Q, K)
- K — number of ground-truth neighbors per query
Content: precomputed top-K indices into /train.

`/distances` (ground-truth distances)

Type: FLOAT32
Shape: (Q, K) (identical to /neighbors)
Note: each entry must align with the same position in /neighbors.

Global Attributes

`type` (vector type)

Type: ASCII string
Required: no (defaults to "dense" if the attribute is missing)
Allowed values:
- "dense" — dense vectors stored as standard matrices in /train and /test
- "sparse" — sparse vectors stored in the serialized format produced by VSAG’s sparse-vector helpers

`distance` (metric definition)

The evaluation tool treats distance values as distances (smaller is better) when comparing against the ground truth in /distances. Prepare ground-truth distances using the formulas below.

Type: ASCII string
Required: yes
Allowed values for dense vectors:
- "euclidean" — L2 distance, computed as sqrt(L2Sqr)
- "ip" — inner-product distance (1 - inner_product); data type auto-detected
- "angular" — cosine distance (1 - cosine_similarity)
Allowed values for sparse vectors:
- "ip" — sparse inner-product distance (1 - sparse_inner_product); other metrics are not supported for sparse vectors
Allowed values for multi-vector:
- Same as dense vectors ("euclidean", "ip", "angular"); multi-vector uses the same per-sub-vector distance function as dense vectors

Optional Datasets

`/train_labels` and `/test_labels`

Type: INT64
Shapes:
- /train_labels: (N,)
- /test_labels: (Q,)
Requirement: if labels are present, both datasets must exist.

`/valid_ratios`

Type: FLOAT32
Shape: (L,)
Usage: stores per-class validation ratios. The evaluation tool indexes this array with the raw label value (valid_ratio_[label], see tools/eval/eval_dataset.h:71), so labels must be non-negative integers and L must be strictly greater than the maximum label value (typically L > max(label) with valid indices 0..L-1). It is the dataset author’s responsibility to keep the array large enough to cover every label that appears in /train_labels and /test_labels.

Multi-Vector Datasets

When type="multi_vector", the file uses a flat-expanded layout where each document’s sub-vectors are concatenated into a single 2D matrix, and a companion vector_counts array records how many sub-vectors belong to each document.

Additional Global Attribute

Attribute	Type	Required	Description
`multi_vector_dim`	`INT64`	yes	Sub-vector dimensionality (number of floats per sub-vector)

Additional Datasets

Dataset	Shape	Type	Description
`/train_multi_vectors`	`(sum_counts_train, D)`	`FLOAT32`	All training sub-vectors, flat-concatenated row by row
`/test_multi_vectors`	`(sum_counts_test, D)`	`FLOAT32`	All query sub-vectors, flat-concatenated row by row
`/train_vector_counts`	`(N,)`	`UINT32`	Number of sub-vectors per training document
`/test_vector_counts`	`(Q,)`	`UINT32`	Number of sub-vectors per query document

D equals multi_vector_dim. sum_counts_train is the sum of all values in /train_vector_counts, and sum_counts_test is the sum of all values in /test_vector_counts.

When type="multi_vector", the standard /train and /test datasets are not required — the document count (N, Q) is derived from /train_vector_counts and /test_vector_counts instead. All other datasets (/neighbors, /distances, optional labels) remain mandatory.

The evaluation tool reconstructs one vsag::MultiVector per document from the flat array plus the counts, then passes the full array to vsag::Dataset::MultiVectors(), VectorCounts(), and MultiVectorDim().

Structural Requirements

Dimensional compatibility
- train_shape[1] == test_shape[1] (same D)
- neighbors.shape == distances.shape

Type mapping

HDF5 Specification	Internal Type	Size	Used In
`H5T_INTEGER` (size=1)	`INT8`	1 byte	`/train`, `/test`
`H5T_FLOAT` (size=4)	`FLOAT32`	4 bytes	`/train`, `/test`, `/distances`, `/valid_ratios`
`H5T_INTEGER` (size=8)	`INT64`	8 bytes	`/neighbors`, `/train_labels`, `/test_labels`

Memory organization
- Row-major storage for all matrices.
- Feature vectors stored contiguously:
  - /train total size = N × D × element_size (1 or 4 bytes per element).

Sparse layout

When the global attribute type equals "sparse", /train and /test do not follow the (N, D) dense matrix layout. They are instead stored as flat INT8 (H5T_INTEGER, size 1) datasets whose payload is a raw byte stream of packed sparse vectors. Calling f["/train"].shape from h5py returns (X,) where X is the total number of bytes; the int8 storage class is a transport detail only — the bytes are not int8 vector elements.

`/train`, `/test` (sparse byte stream)

HDF5 type: H5T_INTEGER, size 1 (INT8)
HDF5 shape: (X,), where X is the total byte-stream length (sum of all per-vector record sizes)
Endianness: little-endian

Content: a contiguous sequence of records, one per sparse vector, in order. Each record has the following fields, concatenated with no padding or separators:

Field	Type	Size	Description
`len`	`uint32`	4 bytes	Number of non-zero entries in the vector
`ids[len]`	`uint32[]`	`4 * len` bytes	Feature indices (column ids)
`vals[len]`	`float32[]`	`4 * len` bytes	Values associated with `ids`

A len == 0 record is allowed and occupies only the 4-byte length field.

Key ordering: on load, the eval tool sorts each vector’s ids in ascending order (and reorders vals accordingly). Writers may emit unordered keys, but readers should not rely on that.

`/train_offsets`, `/test_offsets` (random-access index, optional)

These two datasets store the per-record byte offsets into the matching /train and /test sparse byte streams so that the i-th sparse vector can be located in O(1) without scanning the stream.

HDF5 type: H5T_INTEGER, size 8 (UINT64)
HDF5 shape: (N + 1,) for /train_offsets and (Q + 1,) for /test_offsets
Content: offsets[i] is the byte offset of record i; offsets[N] is the sentinel and equals the total byte stream length. The size of record i is offsets[i + 1] - offsets[i]. The array is non-decreasing.

Both datasets are optional. VSAG writers always emit them when writing sparse files, but legacy sparse files that only contain /train and /test keep loading: the offsets are recomputed on load by walking the byte stream once. When the on-disk offsets are present, they are cross-checked against the recomputed offsets and the file is rejected as corrupted on any mismatch.

`/train_token_sequences`, `/test_token_sequences` (optional)

These two datasets carry the original tokenized document that produced each sparse vector. They are entirely optional: sparse HDF5 files that omit both datasets still load correctly. When present, they must appear in lockstep with /train and /test: the i-th record in /train_token_sequences corresponds to the i-th sparse vector in /train (same for /test).

HDF5 type: H5T_INTEGER, size 1 (INT8)
HDF5 shape: (X,), where X is the total byte-stream length (sum of all per-record sizes)
Endianness: little-endian

Content: a contiguous sequence of records, one per sparse vector, in the same order as /train / /test. Each record has the layout:

Field	Type	Size	Description
`seq_len`	`uint32`	4 bytes	Number of tokens in the original document
`term_ids[seq_len]`	`uint32[]`	`4 * seq_len` bytes	Term ids in tokenization order (duplicates and order are preserved)

Records are concatenated with no padding or separators. A seq_len == 0 record is allowed and occupies only the 4-byte length field; readers should treat it as “no original document available for this vector”.

Number of records: must equal the number of sparse vectors in the matching split. Readers raise an error if counts disagree or if the stream is truncated.
Ordering vs. ids: term_ids are stored in the original token order (duplicates kept). This is intentionally different from ids, which the loader sorts ascending.

`/train_token_sequences_offsets`, `/test_token_sequences_offsets` (required when sequences are present)

Whenever /train_token_sequences (resp. /test_token_sequences) is present, the paired UINT64 offset index must also be present.

HDF5 type: H5T_INTEGER, size 8 (UINT64)
HDF5 shape: (N + 1,) (resp. (Q + 1,))
Content: same contract as /train_offsets, enabling O(1) random access to the i-th token-sequence record.

Contract: the byte-stream dataset and its offsets dataset live or die together. Readers reject the file if exactly one of the pair exists (either a *_token_sequences dataset without its *_offsets, or vice versa). When both are present, the on-disk offsets are cross-checked against the offsets rebuilt from the byte stream; a mismatch is treated as corruption and aborts the load.

Ground truth and metric

/neighbors and /distances follow the same shape and type rules as in the dense layout above. Only "ip" (sparse inner-product distance, 1 - sparse_inner_product) is supported via the distance attribute.

Python helper

The Python package pyvsag ships a decoder in pyvsag.sparse:

from pyvsag.sparse import load_sparse_hdf5

data = load_sparse_hdf5("sparse.hdf5")
# data["type"]      -> "sparse"
# data["distance"]  -> "ip"
# data["train"]     -> list[dict[int, float]]   one dict per sparse vector, keys ascending
# data["test"]      -> list[dict[int, float]]
# data["neighbors"] -> numpy.ndarray  shape (Q, K) int64
# data["distances"] -> numpy.ndarray  shape (Q, K) float32

pyvsag.sparse.decode_sparse_bytes(buffer) is also exposed for callers that already hold the raw byte stream.

Reference implementation

The byte-stream encoder/decoder lives at tools/eval/eval_dataset.cpp (see parse_sparse_vectors and serialize_sparse_vectors).

References

Public benchmark datasets compatible with this layout are available from ann-benchmarks (e.g. sift-128-euclidean.hdf5, gist-960-euclidean.hdf5).
See Evaluation Tool for how datasets in this format are consumed.

Index Analysis (`AnalyzeIndexBySearch` & `analyze_index`)

VSAG ships an introspection capability for inspecting an index that has already been built or loaded, so you can diagnose recall regressions, quantization quality, graph health and search performance without rebuilding the index. This capability is exposed in two ways:

the C++ API Index::AnalyzeIndexBySearch (declared in include/vsag/index.h);
the command-line diagnostic tool analyze_index, located under tools/analyze_index/.

The `AnalyzeIndexBySearch` API

// include/vsag/index.h
virtual std::string
AnalyzeIndexBySearch(const SearchRequest& request);

Input: a SearchRequest (query dataset + topk + search parameter JSON).
Output: a JSON-formatted string containing dynamic, query-driven metrics.
Supported indexes: currently HGraph, IVF, and SINDI. Pyramid only supports static analysis through GetStats() — it does not yet override AnalyzeIndexBySearch. Indexes that do not implement this API will throw an exception when called.

It is complementary to Index::GetStats(), which reports static structural properties of the index without needing query data. For graph-based indexes, additional graph-health details such as degree distribution, entry-point quality, sub-index recall and low-recall hot-spots are exposed through GetStats() rather than through AnalyzeIndexBySearch.

Static metrics from `GetStats()`

HGraph metrics

Metric	Meaning
`total_count`	Total number of vectors in the index
`deleted_count`	Vectors marked for deletion
`connect_components`	Connected components in the proximity graph
`maximal_component_size`	Size of the largest connected component
`in_degree_distribution`	Histogram of graph in-degrees
`out_degree_distribution`	Histogram of graph out-degrees
`average_degree`	Average graph degree over valid nodes
`duplicate_ratio`	Proportion of duplicate vectors in the dataset
`avg_distance_base`	Average distance on sampled base vectors
`recall_base`	Self-recall on sampled base vectors
`time_cost_query`	Average latency when sampled base vectors are searched as queries
`proximity_recall_neighbor`	Recall of graph neighbor lists against true nearest neighbors
`quantization_bias_ratio`	Quantized-distance bias against exact distance
`quantization_inversion_count_rate`	Rate of distance-order inversions caused by quantization
`build_cache_hit_rate`	Fraction of nodes warm-started from an imported cache during the last `Build()`; emits a `skipped_reason` when the index was not built from a cache imported via `ImportCache()`
`build_cache_hit_nodes` / `build_cache_missed_nodes`	Node counts behind `build_cache_hit_rate` (only present when the index was built from an imported cache)

SINDI metrics

Metric	Meaning
`total_count`	Total number of sparse vectors in the index
`window_count`	Number of SINDI windows
`active_term_count.mean` / `min` / `max`	Per-window ratio of non-empty terms to term capacity
`active_term_count.avg_count`	Average count of non-empty terms per window
`posting_length_distribution.mean` / `max` / `p95` / `p99`	Distribution of non-empty posting-list lengths
`posting_length_distribution.long_tail_threshold`	P99 posting-list length used as the long-tail threshold
`posting_length_distribution.long_tail_mean`	Ratio of posting lists longer than the P99 threshold
`mean_doc_retained.mean`	Average ratio of retained terms after document pruning
`recall_base`	Self-recall using sampled base vectors as queries and exact sparse ground truth
`doc_prune_recall`	Candidate recall from the doc-pruned index with query pruning disabled
`doc_prune_bias_mean`	Average relative distance bias between doc-pruned distance and exact sparse distance
`doc_prune_inversion_count_rate`	Candidate-pair order inversion rate introduced by document pruning
`quantization_range.min_val` / `max_val` / `diff`	SQ8 quantization range, emitted only when quantization is enabled
`quantization_recall`	Candidate recall from quantized coarse scoring, emitted only when quantization is enabled
`quantization_bias_ratio`	Average relative distance bias between quantized distance and decoded doc-pruned distance
`quantization_inversion_count_rate`	Candidate-pair order inversion rate introduced by quantization

Metrics that require original base vectors output a skipped_reason object when the data is not available. Original vectors are available inside the index when use_reorder=true; otherwise pass SINDI base_path through the analyze parameters or the command-line option described below.

Dynamic metrics from `AnalyzeIndexBySearch`

HGraph metrics

Metric	Meaning
`recall_query`	Recall on the supplied query set against true nearest neighbors
`avg_distance_query`	Average distance between query vectors and retrieved neighbors
`time_cost_query`	Average per-query latency in milliseconds
`quantization_bias_ratio_query`	Quantization bias observed during query search
`quantization_inversion_count_rate_query`	Query-time ordering errors introduced by quantization

SINDI metrics

Metric	Meaning
`recall_query`	Search-result recall against supplied or generated sparse ground truth
`mean_latency_ms`	Average per-query latency measured while running `KnnSearch`
`time_cost_query`	Alias of `mean_latency_ms`, kept consistent with other analyzers
`postings_scanned.query_term_count_after_prune_mean`	Average number of query terms left after query pruning
`postings_scanned.query_term_with_posting_mean`	Average number of retained query terms that hit at least one non-empty posting list
`postings_scanned.posting_hit_mean`	Average hit ratio of retained query terms against non-empty posting lists
`doc_prune_recall`	Recall of doc-pruned pre-rerank candidates against sparse ground truth with query pruning disabled
`doc_prune_bias_mean`	Average relative distance bias between doc-pruned distance and exact sparse distance on sampled queries
`doc_prune_inversion_count_rate`	Candidate-pair order inversion rate introduced by document pruning on sampled queries
`quantization_recall`	Recall of quantized pre-rerank candidates, emitted only when quantization is enabled
`quantization_bias_ratio`	Average relative distance bias between quantized distance and decoded doc-pruned distance
`quantization_inversion_count_rate`	Candidate-pair order inversion rate introduced by quantization
`reorder_recall.before_reorder_recall_k_at_k`	Recall of coarse top-k candidates before precise reorder
`reorder_recall.after_reorder_recall_k_at_k`	Recall of final top-k candidates after precise reorder
`last_topk_rank_in_heap.mean` / `p95` / `p99` / `max`	Rank distribution of final top-k results inside the pre-rerank candidate heap

SINDI dynamic recall and distance-quality metrics need ground truth. Pass groundtruth_path to reuse an existing .dev.gt file, or pass base_path so the analyzer can generate exact sparse ground truth. save_groundtruth_path can persist generated ground truth for later runs. Without ground truth, those fields return skipped_reason; postings_scanned still runs because it only needs the query and index postings.

Quantization-related fields differ by index type — they are not unified across implementations:

Index	Field	Meaning
`HGraph`	`quantization_bias_ratio_query`	Quantization bias observed during search
`HGraph`	`quantization_inversion_count_rate_query`	Quantization-induced ordering errors during search
`IVF`	`quantization_bias_ratio`	Quantization bias observed during search (only when `use_reorder_` is enabled)
`IVF`	`quantization_inversion_count_rate`	Quantization-induced ordering errors during search (only when `use_reorder_` is enabled)

If you also need degree distribution, entry-point analysis or sub-index quality breakdown, look in the GetStats() JSON instead — AnalyzeIndexBySearch focuses on dynamic, query-driven signals.

The `analyze_index` Tool

analyze_index is the user-facing wrapper around the analyzer APIs. It loads a serialized VSAG index from disk, prints its metadata and GetStats() output, and (optionally) runs AnalyzeIndexBySearch against a query file.

Building

Tools are not built by default — enable them explicitly:

# via the project Makefile
VSAG_ENABLE_TOOLS=ON make release

# or directly through CMake
cmake -S . -B build-release -DCMAKE_BUILD_TYPE=Release -DENABLE_TOOLS=ON
cmake --build build-release -j
# Output: ./build-release/tools/analyze_index/analyze_index

Command-line arguments

Argument	Alias	Required	Description
`--index_path`	`-i`	Yes	Path to the serialized VSAG index file.
`--build_parameter`	`-bp`	No	Build parameters (JSON) used when reloading the index. Defaults to the parameters embedded in the index file.
`--query_path`	`-qp`	No	Binary query dataset path. If omitted, only static analysis is performed.
`--query_data_type`		No	Query dataset type: `auto`, `dense`, or `sparse`. `auto` uses sparse loading for SINDI.
`--base_path`		No	Optional sparse CSR base dataset for SINDI analysis and ground-truth generation.
`--groundtruth_path`		No	Optional SINDI `.dev.gt` ground-truth file. If present, it is reused.
`--save_groundtruth_path`		No	Optional path for saving generated SINDI ground truth.
`--search_parameter`	`-sp`	No	Search parameters (JSON) used during dynamic analysis.
`--topk`	`-k`	No	Top-K for dynamic analysis (default `100`).

The query file format is the simple binary (uint32 rows, uint32 cols, float32 data...) layout consumed by load_query() in tools/analyze_index/analyze_index.cpp.

For SINDI, query and base datasets use CSR sparse binary layout: int64 nrow, int64 ncol, int64 nnz, followed by int64 indptr[nrow + 1], int32 indices[nnz], and float32 data[nnz]. SINDI ground truth uses .dev.gt layout: uint32 query_count, uint32 topk, followed by flattened int32 ids and float32 distances. If --groundtruth_path is not provided but --base_path is available, SINDI analysis generates ground truth from the original sparse base vectors and can save it through --save_groundtruth_path.

Two analysis modes

1. Static analysis (no query file)

./build-release/tools/analyze_index/analyze_index \
    --index_path /path/to/my_index.hgraph

Reports the index name, dimension, data type, metric, build parameters, and GetStats() JSON.

2. Static + dynamic analysis

./build-release/tools/analyze_index/analyze_index \
    --index_path /path/to/my_index.ivf \
    --query_path /path/to/queries.bin \
    --search_parameter '{"ivf":{"scan_buckets_count":16}}' \
    --topk 50

In addition to the static section, prints a Search Analyze: { ... } JSON block produced by AnalyzeIndexBySearch.

When a serialized index only embeds index_param, analyze_index can still reload it without --build_parameter; missing metadata fields are filled with analyzer defaults where possible.

Typical Use Cases

Recall regression triage: confirm whether a drop is caused by quantization (quantization_* metrics), graph structure (connect_components, proximity_recall_neighbor), or query-side parameters (recall_query vs. recall_base).
Capacity / health checks: detect duplicated data (duplicate_ratio), disconnected components, or excessive deletions.
Parameter tuning: re-run AnalyzeIndexBySearch with different search_parameter values to pick an operating point that balances recall_query and time_cost_query — without rebuilding the index.
What-if experiments: override --build_parameter on load to evaluate alternative settings for indexes whose parameters are not embedded in the file.

References

API: Index::AnalyzeIndexBySearch in include/vsag/index.h
Implementations: src/analyzer/{analyzer,hgraph_analyzer,pyramid_analyzer}.h
Tool source: tools/analyze_index/
Local tool entry point: tools/analyze_index/README.md

Compatibility Check Tool (`check_compatibility`)

check_compatibility verifies whether the current VSAG build can load and search index files created by older VSAG versions. It is mainly used in CI to catch serialization and backward compatibility regressions.

Build

Using the project Makefile, enable tools with VSAG_ENABLE_TOOLS=ON; the underlying CMake options are ENABLE_TOOLS=ON and ENABLE_CXX11_ABI=ON:

VSAG_ENABLE_TOOLS=ON make release
# Output: ./build-release/tools/check_compatibility/check_compatibility

When invoking CMake directly, pass both options explicitly:

cmake -S . -B build-release -DCMAKE_BUILD_TYPE=Release \
  -DENABLE_TOOLS=ON -DENABLE_CXX11_ABI=ON
cmake --build build-release -j

Inputs

The command accepts one positional identifier in the form <tag>_<algo_name>, for example v1.0.0_hnsw. For that identifier, the tool expects these files under /tmp/:

File	Purpose
`/tmp/<tag>_<algo_name>.index`	Serialized index produced by the older VSAG version
`/tmp/<tag>_<algo_name>_build.json`	Build parameters used for that index
`/tmp/<tag>_<algo_name>_search.json`	Search parameters used for the sanity check
`/tmp/random_512d_10K.bin`	Test vectors used by the search verification

These files are usually generated by compatibility fixtures from previous releases.

Usage

./build-release/tools/check_compatibility/check_compatibility v1.0.0_hnsw

The tool creates a current-version index instance, deserializes the old index file, then runs a small KNN search. It prints <identifier> success when both load and search succeed; otherwise it prints <identifier> failed and exits with a non-zero status.

Local Entry Point

A short local pointer is kept at tools/check_compatibility/README.md for developers browsing the tool directory.

FAQ

This page collects common questions that VSAG users hit while choosing indexes, tuning performance, and integrating APIs. Follow the linked pages for details.

Which index should I choose?

Common VSAG indexes target different workloads. Choose by data type, scale, and recall / latency target.

hgraph is the default choice for dense vectors. It fits text, image, and multimodal embeddings in online search systems that need high recall and low latency. It supports multiple quantizers, incremental insertion, deletion, reranking, and automatic tuning.

ivf fits large-scale or high-throughput workloads where memory is tight and queries can tolerate bucket-based recall tradeoffs. It reduces scanning by partitioning vectors into buckets.

sindi is for sparse vector retrieval, such as BM25, SPLADE, or BGE-M3 sparse outputs. It only accepts dtype: "sparse" and primarily uses metric_type: "ip".

pyramid fits multi-tenant, partitioned, or tag-path workloads. It keeps multiple subgraphs inside one index and supports tag / path based retrieval.

brute_force is for small datasets, functional validation, and exact-recall baselines. It is exact, but latency and throughput usually do not scale to large datasets.

Practical guidance:

If you are unsure and your vectors are dense, start with hgraph.
Use sindi for sparse vectors.
Use brute_force for small datasets or recall baselines.
Compare ivf when throughput and memory matter more than single-query latency.
Consider pyramid when your data has clear partition, tenant, or path structure.

Related pages: Index Overview, Best Practices.

Why does the same parameter set perform very differently on different datasets?

This is common in vector search. Even if the vector count, dimensionality, and index parameters are identical, different datasets can have very different search difficulty.

The root cause is data distribution:

Some datasets have clear neighbor structure, so graph search reaches the right region quickly.
Some datasets have ambiguous neighbor boundaries, so search must expand more candidates for the same recall.
Embedding normalization, clustering, per-dimension distribution, and noise all affect search difficulty.

For HGraph-like graph indexes, ef_search is a key search-time parameter for recall and latency. It controls how many candidates the search keeps and expands:

Larger ef_search usually improves recall.
Larger ef_search usually increases per-query latency.
When other factors are similar, query latency is often approximately linear in ef_search.

Therefore, do not compare datasets only by QPS at the same ef_search. A more meaningful process is:

Tune ef_search separately on each dataset.
Make each dataset reach the same target recall, such as 95% or 98% recall.
Compare P50 / P95 / P99 latency and QPS at that target recall.

If dataset A reaches 95% recall with ef_search = 80, while dataset B needs ef_search = 300, B being much slower is expected. It means B is harder to search; it does not necessarily mean the index degraded.

When reporting performance, record:

Dataset name and scale.
Dimensionality.
Index parameters.
Target recall and actual recall.
ef_search.
QPS.
P50 / P95 / P99 latency.

Related pages: HGraph, Evaluation Tool.

Why is `sq8_uniform` usually faster than `sq8`? When should I enable `use_reorder`?

Both sq8 and sq8_uniform are 8-bit scalar quantizers, but they use different scaling strategies.

sq8 is per-dimension quantization:

Each dimension has its own min_i / max_i / scale_i.
This adapts better to each dimension’s value range.
Distance computation has to handle per-dimension scales, so the hot path is more complex.

sq8_uniform is global uniform quantization:

All dimensions share one min / max / scale.
Query and base codes can more easily be computed directly in the integer domain.
SIMD, AVX-512, AMX, and NEON paths are more efficient.
Distance computation can avoid per-element dequantization and per-dimension scale handling.

When the data distribution fits this assumption, sq8_uniform is often faster than sq8.

Good use cases for sq8_uniform:

Normalized vectors, especially cosine workloads.
Dimensions have similar value ranges.
Distance computation is the query bottleneck.
Throughput and latency matter more than the last bit of recall.
You can use use_reorder to fix coarse-ranking errors.

Less suitable cases:

Different dimensions have very different value ranges.
Vectors concatenate heterogeneous feature blocks.
Some dimensions have heavy tails or strong outliers.
You do not plan to enable reorder and are very sensitive to recall.

use_reorder first uses the compressed base quantizer for coarse ranking, then reranks candidates with a higher-precision precise quantizer.

Common configuration:

{
    "index_param": {
        "base_quantization_type": "sq8_uniform",
        "use_reorder": true,
        "precise_quantization_type": "fp32"
    }
}

Enable use_reorder when:

You use lossy quantizers such as sq4, sq4_uniform, pq, pqfs, or rabitq.
Recall is not stable enough with sq8 or sq8_uniform.
topk is small but final ranking quality is important.
You can afford an additional higher-precision representation.
Production recall stability matters more than maximum compression.

You can skip use_reorder when:

fp32 or fp16 already meets recall.
sq8_uniform reaches recall targets without reorder.
Memory budget is very tight.
Latency is extremely sensitive and reranking overhead is unacceptable.

Simple guidance:

Throughput first: try sq8_uniform without reorder and measure recall.
Safer default: sq8_uniform + use_reorder: true + precise_quantization_type: "fp32".
Strong compression: sq4_uniform / pq / rabitq usually need reorder.

What are the distance semantics of `l2`, `ip`, and `cosine`?

VSAG search results are always sorted by smaller distance first. Even when the underlying metric is inner product or cosine similarity, VSAG converts the score into distance semantics.

Specific semantics:

l2 returns L2Sqr, the squared L2 distance.
ip returns 1 - inner_product.
cosine returns 1 - cosine_similarity.

Why does l2 return squared distance? Squared L2 distance has the same ordering as L2 distance, and avoiding the square root improves performance. VSAG therefore commonly uses L2Sqr internally and in returned distances.

This affects RangeSearch radius settings:

If you want L2 distance smaller than 2.0, pass radius 4.0.
For ip, radius means 1 - inner_product.
For cosine, radius means 1 - cosine_similarity.

For example, if you want cosine similarity at least 0.8:

distance = 1 - cosine_similarity
radius = 1 - 0.8 = 0.2

Notes:

Different systems may return similarity or distance.
Before comparing with another library or ground truth, confirm the scoring semantics.
After an index is created, metric_type cannot be changed at search time.

Related pages: Metric Semantics, Range Search.

What is the difference between `base_quantization_type` and `precise_quantization_type`?

These two parameters control coarse storage and rerank storage.

base_quantization_type is the main storage quantizer:

It stores the main vectors inside the index.
It is used for coarse distance computation during graph search or inverted-list scanning.
It directly affects memory usage, search speed, and coarse-ranking recall.
Common values include fp32, fp16, bf16, sq8, sq8_uniform, and pq.

precise_quantization_type is the higher-precision quantizer for reranking:

It only takes effect when use_reorder: true.
It reranks coarse candidates.
It corrects distance errors introduced by lossy quantization.
The common choice is fp32; depending on memory budget, fp16, bf16, or sq8 may also be used.

A useful mental model:

base_quantization_type    = format used to quickly find candidates
precise_quantization_type = format used to recompute candidate distances

High-recall baseline:

{
    "index_param": {
        "base_quantization_type": "fp32"
    }
}

Memory / recall tradeoff:

{
    "index_param": {
        "base_quantization_type": "sq8_uniform",
        "use_reorder": true,
        "precise_quantization_type": "fp32"
    }
}

Lower memory:

{
    "index_param": {
        "base_quantization_type": "sq4_uniform",
        "use_reorder": true,
        "precise_quantization_type": "fp32"
    }
}

More aggressive compression:

{
    "index_param": {
        "base_quantization_type": "pq",
        "use_reorder": true,
        "precise_quantization_type": "fp32"
    }
}

Setting guidance:

Recall first, enough memory: base_quantization_type: "fp32".
General production choice: base_quantization_type: "sq8_uniform".
Data distribution does not fit uniform scaling: try sq8.
Tight memory: try sq4_uniform, pq, or rabitq, and enable reorder.
If use_reorder is enabled, start with precise_quantization_type: "fp32".

Note that dtype is the input data type, while base_quantization_type is the internal storage / computation format. They are not the same. For example, input can be dtype: "float32" while the index stores vectors with base_quantization_type: "sq8_uniform".

Related pages: Quantization Overview, HGraph, IVF.

Should I use Bitset, lambda, `Filter`, attribute filtering, or `extra_info`?

VSAG provides multiple filtering APIs for different workloads.

Bitset filtering fits a known set of ids to exclude, such as tombstones, blacklists, or permission-denied ids. Bitset::Test(id) == true means the id is filtered out.

Lambda or std::function<bool(int64_t)> fits simple filtering logic. Returning true means the id is filtered out.

A Filter object fits more complex filtering logic, or cases where you can provide hints such as ValidRatio(). Filter::CheckValid(id) == true means the id is kept.

Attribute filtering fits structured predicates, such as category = "book" AND price <= 100. It is used through SearchRequest and fits vector + structured field hybrid search.

extra_info filtering fits fixed-size byte payloads stored beside each vector. HGraph can filter on those bytes during graph traversal. Filter::CheckValid(const char*) == true means the vector is kept.

How to choose:

Use Bitset if you only need to exclude a known id set.
Use lambda for simple ad hoc logic.
Use a Filter object for complex logic and when you can estimate pass ratio.
Use attribute filtering for named, typed structured fields.
Use extra_info when metadata is a fixed-size byte payload stored with vectors.

The most confusing part is true / false semantics:

Bitset::Test(id) returns true to filter out this id.
A lambda returns true to filter out this id.
Filter::CheckValid(id) returns true to keep this id.
Filter::CheckValid(const char*) returns true to keep the vector.

When using bitset filtering, keep ids in [0, 2^32) when possible to avoid low-32-bit collisions. If the predicate is very selective, graph search may need to expand more candidates to collect enough valid results. For HGraph, consider brute_force_threshold so highly selective filters can automatically fall back to brute-force scanning.

Related pages: Filtered Search, Attribute Filter, Extra Info.

Release Notes

VSAG website release notes are maintained by MAJOR.MINOR series. Each series page covers the first release and every later patch release in that line. GitHub Releases remains the source for the complete per-patch pull request list, assets, and contributor credits.

Release Series

VSAG 1.0
- First release: v1.0.0, July 12, 2026
- Latest patch: v1.0.0
- Status: stable

Future release notes follow the same layout: v1.1, v1.2, v2.0, and so on. Patch releases update their existing series page instead of creating a separate website page.

Version and Note Grouping

Release tags use the vMAJOR.MINOR.PATCH form. The website groups them by MAJOR.MINOR so each page can explain the full series, while GitHub Releases records the exact contents of each tag.

Getting a Specific Version

C++ / source

git checkout vX.Y.Z
make release

Python

Check PyPI for an available binding version, then install that exact version:

pip install pyvsag==X.Y.Z

Binding releases may not match every core C++ tag. The repository also contains C and Node.js/TypeScript bindings. See the corresponding release series page and repository examples for their support and packaging state.

Upgrade Guidance

Read the compatibility section of the target release series before upgrading.
When a serialization format changes, validate old artifacts with the compatibility check tool in a staging environment.
Roll out gradually in production and use the performance evaluation tool to compare recall, latency, and resource use.

For complete patch-level history, see all VSAG releases on GitHub.

VSAG 1.0 Release Notes

v1.0.0 was released on July 12, 2026, the project’s third anniversary.

Official GitHub Release
Full v0.18.0…v1.0.0 changelog
Tag commit: efdaf17a10e96cdb5222baf558d50dfacbdc672e

Overview

VSAG 1.0 is the project’s first long-term support (LTS) major release. At the time of its release, the public v0.x history comprised 81 tags from v0.11 through v0.18. v1.0.0 brings that work together across dense, sparse, hierarchical, and multi-vector retrieval with structured filtering.

The official v1.0.0 release contains 375 changes since v0.18.0: 48 features, 134 improvements, 105 bug fixes, and 88 other changes. The broader v0.11.0...v1.0.0 history contains 1,252 mainline commits.

This note covers only the APIs and features available in the v1.0.0 tag.

Highlights

Index families

The following matrix groups VSAG 1.0 indexes by role: Pyramid and LazyHGraph provide composite or adaptive behavior, while the lower row presents the five core index families.

VSAG 1.0 index matrix

Composite and adaptive indexes:

The partitioned index Pyramid supports assigning one vector to multiple Pyramid paths and scoping searches to a selected path for hierarchical and multi-tenant retrieval (PR #2226).
The self-scaling graph index LazyHGraph starts with exact BruteForce search and converts to HGraph after a configurable threshold, avoiding graph-build overhead for small, growing collections (PR #2151).

Core index families:

The brute-force index BruteForce supports exact search for both single-vector and multi-vector datasets. It is the exact-search baseline and an option for smaller collections.
The graph index HGraph targets high-recall, low-latency dense-vector search. Since its initial implementation, it has added quantization, filtering, range and iterator searches, updates, mark and force-removal modes, cache import and export, diagnostics, and memory-plus-disk configurations.
The space-partitioning index IVF targets large datasets, batch queries, and large top-k workloads. It supports quantization, reordering, attribute filters, parallel build and search, and on-disk bucket storage. See the original IVF PR.
The sparse-vector index SINDI supports BM25-style and learned sparse retrieval, including term-ID remapping, index analysis, immutable reads, FP16 sparse values, term-list compaction, and a low-memory immutable build.
The multi-vector index SIMQ targets ColBERT-style late-interaction retrieval. It generates candidates at the cluster level, then performs exact MaxSim reranking to balance recall and latency (PR #2357).

For configuration and usage details, see Indexes.

Quantization, data types, and hardware acceleration

VSAG 1.0 supports the following input formats, quantizers, transforms, and hardware-acceleration paths:

FP32, INT8, FP16, and BF16 dense inputs, plus sparse and multi-vector datasets, with direct FP16/BF16 input support (PR #1731).
SQ4/SQ8 and their uniform variants.
Product Quantization and PQ FastScan (PR #626, PR #691).
RaBitQ, extended-bit and x+y split base/reorder layouts, FHT/PCA-assisted transforms, and dedicated SIMD kernels.
Transform Quantizer chains and MRL-E dimension reduction.
x86_64 dispatch across SSE, AVX, AVX2, AVX-512, and selected AMX kernels, plus ARM NEON and SVE.
AMX acceleration for SQ8U inner product and KMeans BF16 GEMM (PR #2032).

See Quantization for supported combinations and tuning guidance.

Search, filtering, and index management

Basic search: KnnSearch provides KNN queries, while RangeSearch provides range queries with an optional result limit.
Unified request API: SearchRequest and Index::SearchWithRequest use one request object to select KNN or range search and carry index-specific JSON parameters, supported filters, and diagnostic inputs. In v1.0, HGraph, IVF, and BruteForce implement this interface; supported fields vary by index.
Filtering: ID callbacks/FilterPtr, bitsets, and SQL-like attribute expressions are available. HGraph, IVF, and BruteForce support structured filtering with inverted attribute indexes. HGraph also supports iterator filtering and can switch to brute-force search when the valid ratio is at or below hgraph.brute_force_threshold.
Training and model reuse: Train, Clone, ExportModel, and Tune cover standalone training, deep copies, trained-model export, and index tuning.
Data maintenance and access: The APIs cover batch removal, mark/force removal, ID/vector/attribute updates, source IDs, extra_info, index-detail reads, and CalcDistanceById.
Missed-recall diagnosis: SearchRequest::expected_labels_ helps HGraph, IVF, and BruteForce explain why expected vectors were not recalled. The result Dataset carries the reasoning report (PR #1838).
Statistics and capacity planning: Search, I/O, memory, and index-specific statistics expose runtime state, while memory estimates, index introspection, and analysis tools support capacity planning and troubleshooting.

Support varies by index; Index::CheckFeature(IndexFeature) reports capabilities represented by IndexFeature.

Serialization and compatibility

VSAG 1.0 maintains two serialization families in parallel. The legacy Serialize/Deserialize APIs remain maintained for compatibility with existing integrations. The newer streaming serialization APIs use a header-first, forward-only format and extend the interface with the following capabilities:

SerializeStreaming writes metadata followed by typed TLV blocks.
DeserializeStreaming restores into an existing compatible, empty index object.
Index::Load reads metadata, creates the matching index, and applies supported placement policy.
The v1.0 streaming path supports BruteForce, HGraph, IVF, mutable SINDI, and Pyramid. Streaming serialization of immutable SINDI indexes is not supported in v1.0.

Both API families remain available, but their formats are not compatible; a file must be read by the matching family. Existing integrations can continue to use the legacy APIs, while new integrations should prefer the streaming APIs. See New Serialization for format and block-version details.

Cross-version index fixtures and the compatibility check tool provide a repeatable way to validate old artifacts before an upgrade.

Platforms, bindings, and tooling

The VSAG core C++ library supports Linux and macOS, with most development and the full validation pipeline centered on Linux. Linux x86_64 and AArch64 are both covered by CI, while macOS validation currently targets arm64 builds (source build, PR CI). Prebuilt C++ release archives are currently limited to Linux x86_64.
The Python bindings are packaged as pyvsag; v1.0.0 declares support for CPython 3.6-3.14 and configures wheel builds for that range. Its build uses native CMake integration (PR #1599); the bindings also provide broader index operations, FP16/BF16 inputs, sparse-vector support, and sparse HDF5 helpers.
VSAG now includes a C API and Node.js/TypeScript bindings with quickstart examples. Language bindings are released independently; check the corresponding package version before use.
Builds support system OpenBLAS/fmt dependencies, custom dependency mirrors, and installable CMake package metadata.
eval_performance supports dense, sparse, and multi-vector datasets. analyze_index, check_compatibility, visualize_index, and the HTTP monitor support index analysis, compatibility testing, serialization inspection, and monitoring.

Reliability and validation

Functional and regression tests cover allocation, leaks, out-of-memory paths, and concurrent build, insertion, search, update, removal, and destruction. CI reinforces these checks with ASan for memory safety and TSan for data races. Compatibility fixtures validate legacy-index upgrade paths.

Compatibility and Upgrade Notes from v0.18

VSAG 1.0 is a major release and includes source-level API changes. Review these points before upgrading:

Remove returns a count and supports batches. The v0.18 method tl::expected<bool, Error> Remove(int64_t) now returns tl::expected<uint32_t, Error>. v1.0 also adds a vector overload and explicit remove modes (PR #1551). The final API exposes RemoveMode::MARK_REMOVE and RemoveMode::FORCE_REMOVE; HGraph force removal landed in PR #1810.
Unsupported operations usually return an error. Many default methods that return tl::expected now return tl::unexpected with ErrorType::UNSUPPORTED_INDEX_OPERATION instead of throwing std::runtime_error (PR #2141). Check the tl::expected result before calling .value().
Memory-statistics signatures changed. GetMemoryUsage uses uint64_t, GetMemoryUsageDetail returns a std::unordered_map<std::string, uint64_t>, and GetEstimateBuildMemory became EstimateBuildMemory (PR #2388).
Search migration can be incremental. Prefer SearchRequest / SearchWithRequest for new integration work, but the existing search overloads remain in v1.0.
Do not mix serialization families. Legacy output must use legacy deserialization; streaming output must use DeserializeStreaming or Index::Load.
SINDI heap insertion is automatic. The legacy use_term_lists_heap_insert search parameter is ignored. SINDI derives the strategy from doc_prune_ratio and query_prune_ratio; update configurations that forced the old path.
Intel MKL is opt-in. The default is OFF. Enable it with VSAG_ENABLE_INTEL_MKL=ON through the Makefile or -DENABLE_INTEL_MKL=ON through CMake when required.

For persisted indexes, validate the exact source and target versions in staging. Serialization compatibility can differ by index, feature flags, and format family.

The v0.x Journey to 1.0

v0.11.0 was VSAG’s first formal release after the project was open-sourced. Earlier version numbers were used only for internal iterations and were not published as GitHub Releases, so this history begins with v0.11.

Foundations: v0.11-v0.14

v0.11, September 2024: established the initial HNSW/DiskANN, C++/Python, pre-filter, cosine-distance, locking, and serialization baseline.
v0.12, December 2024: introduced DataCell, I/O, and graph abstractions; HGraph; SQ4/SQ8/INT8 paths; the Engine/factory model; and pyvsag packaging.
v0.13, February 2025: added BruteForce, expanded Pyramid, memory estimation, index feature discovery, filter hints, and the eval_performance tool.
v0.14, April 2025: introduced IVF, FP16/BF16 and RaBitQ support, async/buffer I/O, sparse datasets, HGraph extra_info, iterator filtering, and systematic compatibility checks.

Expansion: v0.15-v0.18

v0.15, June 2025: added Train/Clone/ExportModel, PQ/PQ FastScan, attribute expressions, compressed graphs, HGraph merge and mark-delete, and self-describing legacy serialization with compatibility CI.
v0.16, August 2025: added mmap HGraph, SINDI, parallel IVF, attribute updates, raw-vector access, parameter compatibility checks, and numerous ABI, concurrency, and legacy-index fixes in subsequent patches.
v0.17, October 2025: expanded SearchRequest to cover the main search cases and added search timeouts, broader extra_info, Transform Quantizer support, single-query parallel HGraph search, export APIs, and richer SINDI lifecycle and statistics support.
v0.18, January 2026: added the C API, automated Python wheels, sparse-vector Python bindings, on-disk IVF, index detail/search/I/O statistics, MRL-E and HGraph tuning, extended RaBitQ, and additional SINDI and Pyramid capabilities.

See the v0.11.0…v1.0.0 comparison for the complete commit history.

v1.0 Patch Releases

v1.0.0 — July 12, 2026: first long-term support major release.

Future v1.0.x patches will be added to this section. Their complete PR list and contributor credits remain on GitHub Releases.

Acknowledgments

VSAG 1.0 is the result of work by the Ant Group VSAG team and the wider open-source community. Thank you to every contributor who designed algorithms, implemented features, reported issues, reviewed changes, improved tests, and wrote documentation.

See the contributors page and the official release for contributor details.

Roadmap 2025

This historical roadmap led to VSAG 1.0. v1.0.0 was released on July 12, 2026. See the VSAG 1.0 Release Notes for the delivered result.

As AI capabilities keep advancing and strong open-source LLMs become widespread, demand for unstructured-data retrieval has exploded. Vector algorithms are the cornerstone of unstructured retrieval, and the VSAG community will keep investing in algorithmic research to help partners improve retrieval performance, reduce latency, and cut costs.

The roadmap defined the first major release as follows:

VSAG 1.0 would provide comprehensive support for both graph-based and inverted-index structures, as well as in-memory and memory-plus-disk hybrid retrieval modes, delivering low memory cost and outstanding search performance.

Planned algorithms and features:

Support for common data types to cover diverse unstructured retrieval scenarios
- FP32 vectors: mainstream retrieval scenarios
- INT8, BF16, FP16 vectors: adapt to quantized embedding models without extra storage overhead
- Sparse vectors: extending text-retrieval workloads
Fully optimized core index types covering the majority of retrieval scenarios
- Graph index HGraph: high precision and low latency
- Inverted index IVF: large K and batch query workloads
Rich quantization options for the memory/recall trade-off
- RabitQ (BQ): ultra-high compression with minimal memory
- PQ: flexible compression ratios for accuracy-tolerant scenarios
- SQ4, SQ8: standard quantization with minor recall loss and large memory/perf gains
Multi-platform instruction support to simplify distribution
- x86_64: SSE, AVX, AVX2, AVX-512
- ARM: NEON, SVE
- Optional matrix-multiplication libraries: Intel MKL, OpenBLAS
Resource isolation and fine-grained runtime configurability
- Memory: per-index allocators, enabling tenant-level memory management
- CPU: injectable thread pools to boost write and search throughput

Beyond these, there is much more we want to discuss, design, and build in the open-source community — follow the VSAG project to stay up to date!

Community

VSAG is open-sourced by Ant Group and is actively maintained on GitHub. Developers, researchers, and users are all welcome to join the community.

Channels

GitHub Issues — bug reports, feature requests, design discussions. https://github.com/antgroup/vsag/issues
GitHub Discussions (when enabled) — long-running topics, Q&A, best practices.
Pull Requests — every code, doc, or example change goes through a PR. See Contributing to VSAG.
DingTalk / WeChat groups — if announced by the community, the latest invite links are pinned at the top of the repository README.

Governance

A maintainer team owns code review, releases, and the roadmap.
Every PR requires at least one approving review plus the required CI checks.
Every PR must carry both a kind/* label and a version/* label (enforced by Mergify). See the contributors’ guide.

Ways to Contribute

More than just code:

Docs — fix typos, add examples, translate pages.
Examples — contribute end-to-end demos to examples/cpp/ or examples/python/.
Benchmarks — share results on new hardware or datasets, extending the reference performance page.
Ecosystem integrations — write bindings or adapters for other languages / databases.
Articles — guest posts are welcome under docs/blog/ (see the repository README).

Code of Conduct

The community follows the Contributor Covenant Code of Conduct. Please participate constructively and respectfully.

See Related Projects.

Filing Issues with an AI Agent

You can use an AI coding agent (Claude Code, OpenCode, or Codex) together with the VSAG repository’s built-in /create-issue slash command to draft and submit a high-quality GitHub issue for VSAG. The agent maps your request onto the project’s issue templates, fills in the required fields, and submits the issue through GitHub CLI.

This page walks through the end-to-end setup. The canonical workflow that the agent itself follows lives in .github/agent-prompts/create-issue.md; this page focuses on the user-facing steps.

Prerequisites

A GitHub account.
One of the supported AI coding agents installed and configured locally: Claude Code, OpenCode, or Codex.
git available on your machine.

First, install gh by following the official quickstart for your platform:

https://docs.github.com/en/github-cli/github-cli/quickstart

Then sign in from your terminal:

gh auth login

Choose GitHub.com, pick an authentication protocol (HTTPS is fine), and follow the browser prompts to complete sign-in.

gh auth status

Confirm that GitHub.com authentication is active before continuing.

3. Clone the VSAG repository

git clone https://github.com/antgroup/vsag.git
cd vsag

The /create-issue command and its prompt files live inside the repository, so the agent must be launched from within the vsag/ working directory to pick them up.

4. Launch your agent inside the repo

From the vsag/ directory, start one of the supported agents:

# Claude Code
claude

# OpenCode
opencode

# Codex CLI
codex

5. Run `/create-issue`

In the agent prompt, invoke the slash command and describe your need in natural language. For example:

/create-issue HGraph build crashes when dim=0; want a clear error instead.

The agent will:

Pick the most appropriate template under .github/ISSUE_TEMPLATE/.
Ask follow-up questions if required fields are missing.
Draft the issue body with code/doc references in path:line form.
Show you the final draft for confirmation.
Submit the issue via gh issue create once you approve.

You can iterate with the agent freely — ask it to revise wording, add reproduction steps, switch templates, or attach logs before it submits.

Tips

Be specific: include the index type, parameters, dataset shape, error message, and platform when filing a bug.
For feature requests, describe the use case and the expected API or behavior. The agent will mirror this into the template’s required fields.
Issues do not carry Signed-off-by: — DCO applies only to commits.
If you prefer to drive the workflow without an interactive agent, see the shell wrapper at tools/issue-helper/new-issue.sh.

Projects Using VSAG

OceanBase — Ant Group’s open-source distributed relational database; its vector search is powered by VSAG.
Other vector databases / integrations — if you maintain an integration, feel free to open a PR to list it here.

Dependencies and Inspirations

Faiss — Meta’s vector search library; VSAG borrows ideas in IVF and quantization.
SPANN / SPTAG — Microsoft’s large-scale retrieval system; an inspiration for VSAG’s large-scale search design.

Ecosystem Tooling

ann-benchmarks — the de-facto ANN benchmark harness; VSAG’s performance evaluation tool is compatible with its dataset format.
pybind11 — powers the pyvsag Python binding.
napi-rs — powers the Node.js binding under typescript/.

Bindings / Language Support

C++ (native)
Python — pyvsag, source under python_bindings/ and python/.
Node.js / TypeScript — source under typescript/, npm package name vsag.

Pull requests to extend this list are welcome.

Research Papers

1. Effective and General Distance Computation for Approximate Nearest Neighbor Search [ICDE’25]

Approximate K-nearest-neighbor (AKNN) search in high-dimensional spaces is a key and challenging problem. Distance computation dominates AKNN runtime, and existing approaches rely on approximate distances to gain efficiency, usually at the cost of accuracy. The state-of-the-art ADSampling uses random projection to estimate distances and a correction step to mitigate accuracy loss, but is limited in both effectiveness and generality because both steps depend on random projection. This work improves distance computation by using data-aware orthogonal projections and a data-driven correction procedure decoupled from the approximation step. Extensive experiments show 1.6×–2.1× speedups over ADSampling on real-world datasets with higher accuracy.

Integrated into VSAG under the name BSA; used to reduce the amount of high-precision re-ranking data inside disk-based indexes.

2. VSAG: An Optimized Search Framework for Graph-based Approximate Nearest Neighbor Search [VLDB’25]

Approximate nearest-neighbor search (ANNS) is foundational to vector databases and AI infrastructure. Recent graph-based ANNS algorithms deliver both high accuracy and practical efficiency, but production performance is still limited by random memory access patterns and expensive distance computations. Moreover, graph-based ANNS is highly parameter-sensitive, and finding optimal parameters traditionally requires repeatedly rebuilding the index. This paper introduces VSAG, an open-source framework that targets these issues in production. VSAG is widely deployed across Ant Group services and combines three key optimizations: (i) efficient memory access via prefetching and cache-friendly vector layout to reduce L3 misses; (ii) automatic parameter tuning without rebuilding the index; and (iii) efficient distance computation leveraging modern hardware, scalar quantization, and low-precision fallbacks. On real-world datasets VSAG matches or exceeds state-of-the-art accuracy while achieving up to 4× higher throughput than HNSWlib.

Integrated into VSAG; enabled through the Tune API (historically called the “ELP Optimizer” and implemented behind the use_elp_optimizer key).

3. EnhanceGraph: A Continuously Enhanced Graph-based Index for High-dimensional Approximate Nearest Neighbor Search [arxiv]

Driven by rapid progress in deep learning, high-dimensional ANNS has received growing attention. We observe that graph-based indexes generate large amounts of search and construction logs over their lifetime, but static indexes fail to exploit these valuable signals. This paper proposes EnhanceGraph, a framework that folds both log types into a novel structure called a conjugate graph to improve search quality. Guided by theoretical analysis and observations of the limitations of graph-based indexes, we propose several optimisations: for search logs, the conjugate graph stores edges from local optima to the global optimum to strengthen routing; for construction logs it stores edges pruned from the proximity graph to improve k-NN recall. Experiments on public and real industrial datasets show EnhanceGraph significantly improves accuracy without sacrificing search efficiency, with recall gains reaching from 41.74% to 93.42%. EnhanceGraph has been integrated into VSAG.

Available in VSAG via the use_conjugate_graph parameter.

4. SINDI: an Efficient Index for Approximate Maximum Inner Product Search on Sparse Vectors [arxiv]

Maximum inner product search (MIPS) on sparse vectors is critical for multi-way retrieval used in retrieval-augmented generation (RAG). Recent inverted-index and graph-based algorithms combine high accuracy with practical efficiency, but production performance is often limited by redundant distance computations and frequent random memory accesses. Furthermore, the compressed storage format of sparse vectors makes it hard to take advantage of SIMD acceleration. This paper presents the Sparse Inverted Non-redundant Distance Index (SINDI), which combines three key optimisations: (i) efficient inner-product computation that uses SIMD acceleration and eliminates redundant identifier lookups for batched computations; (ii) memory-friendly design that replaces random access on raw vectors with sequential access on inverted lists, greatly reducing memory-access latency; and (iii) vector pruning that keeps only the non-zero entries with larger magnitude, so query throughput improves while accuracy is preserved. On real-world datasets SINDI is state-of-the-art across scales, languages, and models. On MsMarco, for Recall@50 above 99%, SINDI delivers 4.2×–26.4× higher single-thread QPS than SEISMIC and PyANNs. SINDI has been integrated into VSAG.

SINDI is an index type inside VSAG.

Contributors

The following is the list of VSAG contributors (updated 2026-06-09), ordered by the date of their first contribution:

2024-07-26 Xiangyu Wang (wxyucs) from AntGroup
2024-08-21 Jiabao Jin (inabao) from AntGroup
2024-08-30 👑 Haotian Li (LHT129) from AntGroup
2024-09-04 Xiaoyao Zhong (ShawnShawnYou) from AntGroup
2024-10-23 Jiacai Liu (jiacai2050)
2024-10-28 Coien-rr
2024-12-16 Mingyu Yang (mingyu-hkustgz) from HKUST(GZ)
2025-01-24 Carrot-77 from OceanBase
2025-03-05 Deming Chu (nedchu) from AntGroup
2025-03-25 Liyao Xiong (lyxiong0) from OceanBase
2025-04-15 skylhd from OceanBase
2025-04-23 azl (shadowao) from OceanBase
2025-06-10 dasurax from AntGroup
2025-06-12 L J. Yun (yulijunzj) from AntGroup
2025-06-13 Danbaiwq from OceanBase
2025-06-17 jingyueob from OceanBase
2025-07-28 jac (jac0626)
2025-07-29 mly (mly5269)
2025-08-25 HuMing He (HeHuMing)
2025-10-22 cubicc from ByteDance
2025-10-29 Roxanne0321 from AntGroup
2025-11-12 baoyuan (misaka0714)
2025-11-20 Zihao Wang (hhy3)
2025-12-19 Xinger (Ningsir) from OceanBase
2026-02-05 stuBirdFly from OceanBase
2026-04-08 Sun Jiayu (pkusunjy) from AntGroup
2026-04-20 wei (jiaweizone) from AntGroup
2026-04-20 XFMENG17 from AntGroup
2026-05-07 liric24 from AntGroup
2026-05-14 LightWant from AntGroup
2026-06-09 mukejane
2026-06-09 Jiangtian Feng (jfeng18) from Alibaba

Keyboard shortcuts

VSAG Documentation