HGraph

HGraph is VSAG’s flagship graph-based index. It builds a hierarchical proximity graph and offers a rich set of quantization options, a unified build-parameter schema (index_param), and first-class support for reordering, incremental updates, deletion, and ELP-based runtime tuning.

For most dense-vector workloads (text / image / multimodal embeddings, 64–4096 dims, from a few thousand up to hundreds of millions of points), HGraph is the recommended default.

Source: src/algorithm/hgraph.{h,cpp}
Example: examples/cpp/103_index_hgraph.cpp

How it works

Graph construction. Vectors are organised in a layered proximity graph; upper layers act as navigation aids, the bottom layer connects every data point to its nearest neighbours within a max_degree budget. The construction algorithm can be either NSW-style insertion (graph_type: "nsw", the default) or ODescent (graph_type: "odescent").
Quantization. The base storage is compressed with a configurable quantizer (base_quantization_type — fp32, fp16, bf16, sq8, sq4, sq8_uniform, sq4_uniform, pq, pqfs, rabitq, tq). Optionally, a second high-precision copy is kept (use_reorder: true with precise_quantization_type) and used to re-rank the candidates returned by the coarse search.
Search. Greedy beam search traverses the graph top-down, expanding the current frontier up to ef_search candidates. When reordering is enabled, the final list is re-scored against the precise representation.

Quick start

#include <vsag/vsag.h>

std::string params = R"({
    "dtype": "float32",
    "metric_type": "l2",
    "dim": 128,
    "index_param": {
        "base_quantization_type": "sq8",
        "max_degree": 32,
        "ef_construction": 400
    }
})";
auto index = vsag::Factory::CreateIndex("hgraph", params).value();

// Build.
auto base = vsag::Dataset::Make();
base->NumElements(n)->Dim(128)->Ids(ids)->Float32Vectors(data)->Owner(false);
index->Build(base);

// Search.
auto query = vsag::Dataset::Make();
query->NumElements(1)->Dim(128)->Float32Vectors(q)->Owner(false);
auto result = index->KnnSearch(
    query, /*topk=*/10, R"({"hgraph": {"ef_search": 100}})").value();

Build parameters

Build-time parameters live under index_param. The table below highlights the keys most users need; the exhaustive list is in Index Parameters.

Parameter	Type	Default	Description
`base_quantization_type`	string	— (required)	`fp32`, `fp16`, `bf16`, `sq8`, `sq4`, `sq8_uniform`, `sq4_uniform`, `pq`, `pqfs`, `rabitq`, `tq` — see the Quantization chapter for per-quantizer details
`max_degree`	int	`64`	Maximum out-degree per graph node
`ef_construction`	int	`400`	Candidate list size during build (higher = better recall, slower build)
`graph_type`	string	`"nsw"`	Graph algorithm: `nsw` or `odescent`
`use_reorder`	bool	`false`	Keep a high-precision copy and re-rank after the coarse search
`precise_quantization_type`	string	`"fp32"`	Quantizer used for reordering (takes effect only with `use_reorder: true`)
`base_pq_dim`	int	`1`	Number of PQ subspaces. When using `pq` / `pqfs`, set this explicitly instead of relying on the default.
`build_thread_count`	int	`100`	Threads used to parallelise build
`support_duplicate`	bool	`false`	Enable duplicate-ID detection on insert
`deduplicate_storage`	bool	`false`	Share vector storage between duplicates; requires `support_duplicate: true`
`duplicate_distance_threshold`	float	`0.0`	Duplicate-detection distance threshold. When greater than `0`, deduplicate by the nearest candidate distance; when `0`, fall back to the current code `memcmp` check
`support_remove`	bool	`false`	Enable graph delete-tracking metadata used by mark-remove recovery paths
`support_force_remove`	bool	`false`	Enable `RemoveMode::FORCE_REMOVE` and its extra synchronization on the built index
`store_raw_vector`	bool	`false`	Keep the raw vector in addition to the quantized copy (useful for `cosine`)
`use_elp_optimizer`	bool	`false`	Auto-tune search parameters after build
`base_io_type` / `precise_io_type`	string	`"block_memory_io"`	Storage backend (`memory_io`, `block_memory_io`, `buffer_io`, `async_io`, `mmap_io`)
`base_file_path` / `precise_file_path`	string	—	File path; required when the corresponding `*_io_type` is disk-backed (`buffer_io`, `async_io`, `mmap_io`)
`hgraph_init_capacity`	int	`100`	Initial capacity hint (doesn’t cap the final size)

Deduplicating vector storage

Set both support_duplicate: true and deduplicate_storage: true to let duplicate vectors share one physical code slot while retaining their individual labels. This option currently supports only dense-vector HGraph indexes using graph_type: "nsw"; it is not available for the separate HNSW index or for graph_type: "odescent".

The following operations and configurations are not supported while storage deduplication is enabled:

force removal (support_force_remove: true);
cache-assisted build after ImportCache();
Merge;
legacy v0.14 serialization.

UpdateVector is supported only for IDs whose vector storage is not shared with another duplicate-group member.

Current serialization and streaming serialization are supported.

Supported input data types

The dtype field in the top-level build config selects how Dataset interprets the raw vector bytes. HGraph supports four input types; the dtype value, the corresponding Dataset setter, and the example demonstrating each combination are summarised below.

`dtype`	Element type	`Dataset` setter	Example
`float32`	`float`	`Float32Vectors`	`103_index_hgraph.cpp`
`int8`	`int8_t`	`Int8Vectors`	`316_index_int8_hgraph.cpp`
`float16`	`uint16_t` (IEEE 754 binary16, bit-pattern packed)	`Float16Vectors`	`321_index_fp16_hgraph.cpp`
`bfloat16`	`uint16_t` (Brain Float, bit-pattern packed)	`Float16Vectors` (shared with FP16)	adapt `321_index_fp16_hgraph.cpp` per the notes below

The dim value is the logical vector dimensionality (number of elements), not the byte length, so the same dim is reused across all four data types.

`int8` input

Quantized int8 vectors are passed directly via Int8Vectors:

std::vector<int8_t> data(num_vectors * dim);  // populate with int8 elements
auto base = vsag::Dataset::Make();
base->NumElements(num_vectors)->Dim(dim)->Ids(ids)
    ->Int8Vectors(data.data())->Owner(false);

Build config (note dtype: "int8"):

{
    "dtype": "int8",
    "metric_type": "l2",
    "dim": 128,
    "index_param": {
        "base_quantization_type": "pq",
        "max_degree": 26,
        "ef_construction": 100,
        "alpha": 1.2
    }
}

Queries use the same Int8Vectors setter and the same dtype. A runnable example is 316_index_int8_hgraph.cpp.

`float16` / `bfloat16` input

FP16 and BF16 vectors are both passed through Float16Vectors, which takes a const uint16_t* that points at the 16-bit storage of each element. Conversion from float is up to the caller; inside the VSAG source tree there are convenience helpers (vsag::generic::FloatToFP16 in src/simd/fp16_simd.h and vsag::generic::FloatToBF16 in src/simd/bf16_simd.h), but these are internal headers that are not installed under include/vsag/. Application code linking against an installed VSAG library should provide its own conversion (for example, copy the small helper, use _cvtss_sh / F16C intrinsics, or any FP16 library of choice). The snippet below uses the in-tree helper for brevity:

// The fp16/bf16 helpers below live in src/simd/ and are not part of the public
// installed headers. Replace with your own float -> uint16_t conversion when
// linking against an installed VSAG.
#include "simd/fp16_simd.h"  // FloatToFP16 (for BF16, use simd/bf16_simd.h / FloatToBF16)

std::vector<uint16_t> data(num_vectors * dim);
for (size_t i = 0; i < data.size(); ++i) {
    data[i] = vsag::generic::FloatToFP16(some_float_source());
}
auto base = vsag::Dataset::Make();
base->NumElements(num_vectors)->Dim(dim)->Ids(ids)
    ->Float16Vectors(data.data())->Owner(false);

Build config:

{
    "dtype": "float16",
    "metric_type": "l2",
    "dim": 128,
    "index_param": {
        "base_quantization_type": "pq",
        "max_degree": 26,
        "ef_construction": 100,
        "alpha": 1.2
    }
}

To switch the example to BF16, change dtype to "bfloat16" and replace FloatToFP16 with FloatToBF16; the Float16Vectors setter and the rest of the build/search flow stay the same. A runnable FP16 example is 321_index_fp16_hgraph.cpp.

Note. The header comment at the top of 321_index_fp16_hgraph.cpp currently mentions a BFloat16Vectors() setter, but no such setter exists — Float16Vectors is the single entry point for both FP16 and BF16. Use it for both dtype: "float16" and dtype: "bfloat16".

Choosing an input type

Pick float32 when accuracy matters most and memory budget allows; this is the default.
Pick float16 / bfloat16 to halve the input storage. FP16 has a smaller exponent range; BF16 has fewer mantissa bits but the same exponent range as FP32, which is often preferable for embedding-style vectors.
Pick int8 when your data is already integer-quantised (e.g. produced by an upstream quantiser or by a model with int8 outputs). With int8 input you typically still combine a coarse quantizer such as pq / sq8 for the in-index storage.

The chosen dtype only constrains the input representation. The on-disk / in-memory storage is still controlled by base_quantization_type (and optionally precise_quantization_type when use_reorder: true), so e.g. dtype: "float16" + base_quantization_type: "sq8" is valid.

Search parameters

Search-time parameters live under the hgraph sub-object:

Parameter	Type	Default	Description
`ef_search`	int	— (required)	Size of the search frontier. Larger = higher recall, slower query.
`hops_limit`	int	unlimited	Hard cap on the number of hops the beam search performs before returning the current frontier.
`skip_ratio`	float	`0.2`	Performance tuning parameter for filtered search. Controls the ratio of invalid points to skip, in range `[0.0, 1.0]`. `skip_ratio=0.2` means skip 20% of invalid points and only check 80%. Higher values improve performance but may reduce recall. Only applies to searches with filters. See Filter Skip Strategy below.
`skip_strategy`	string	`"deterministic_accumulative"`	Strategy for filter skipping. Options: `"random"` (random skipping) or `"deterministic_accumulative"` (deterministic cumulative skipping). See Filter Skip Strategy below.
`brute_force_threshold`	float	`0.0`	Selectivity-aware brute-force fallback. When `> 0` and the supplied filter’s `ValidRatio()` is `≤ brute_force_threshold`, the search bypasses the graph traversal entirely and runs an exact scan over the valid ids using the best available flatten codes (see the section below). Must lie in `[0.0, 1.0]`; the default `0.0` disables the feature and preserves legacy behavior.
`rabitq_one_bit_search`	bool	`false`	Enables the RaBitQ filter/lower-bound path. On an x+y split index it uses all x filter bits; see RaBitQ x+y Split.
`rabitq_error_rate`	float	index default	Positive lower-bound error multiplier for this search. It can be tuned without rebuilding the split index.

auto result = index->KnnSearch(
    query, topk, R"({"hgraph": {"ef_search": 200}})").value();

Brute-force fallback under highly selective filters (`brute_force_threshold`)

Graph traversal is the right strategy when most candidates pass the filter — the graph quickly reaches the neighborhood of the query. As filter selectivity increases (only a tiny fraction of vectors survive), the beam has to expand far more nodes just to fill ef_search with valid candidates, and recall drops. At some point an exhaustive scan over the surviving ids is both faster and exact.

brute_force_threshold lets HGraph make that switch automatically on a per-query basis:

// When the active filter keeps ≤ 1% of ids, run an exact scan instead.
auto params = R"({"hgraph": {"ef_search": 200, "brute_force_threshold": 0.01}})";
auto result = index->KnnSearch(query, topk, params, my_filter).value();

How it works (src/algorithm/hgraph/hgraph_search.cpp):

The fallback only fires when all of the following hold:
- brute_force_threshold > 0.0, and
- a filter is supplied, and
- filter->ValidRatio() <= brute_force_threshold.
The accuracy of Filter::ValidRatio() matters — it is the user-supplied hint the dispatcher checks against the threshold. See Filtered Search for the API contract.
The scan iterates every valid inner id and computes distances in batches of 64 using the most precise flatten storage available (raw vectors if store_raw_vector was set, otherwise the high-precision reorder codes when use_reorder=true, otherwise the base quantized codes).
Because the scan already uses precise codes when present, the post-search reorder pass is skipped for queries that took the brute-force branch.
Applies to KnnSearch (the non-iterator overload, which is what SearchWithRequest and the standard KnnSearch(query, k, params, filter) call) and to RangeSearch. It does not apply to the iterator-style KnnSearch(..., IteratorContext*&, ...), because a single sweep cannot be paged across multiple iterator calls.

Picking a value:

Leave at 0.0 (default) for unfiltered or weakly filtered workloads.
For highly selective filters, 0.01–0.05 is a reasonable starting point. Setting it higher than that effectively turns the index into a brute-force scanner whenever a filter is present.
The cost of the brute-force scan is roughly O(N × dim) where N is the total number of indexed vectors (regardless of selectivity, because every id is visited to check CheckValid). The benefit grows when graph search would otherwise need a much larger ef_search to recover recall.

See 322_feature_hgraph_brute_force_threshold.cpp for a runnable brute-force fallback example.

Filter Skip Strategy (skip_ratio and skip_strategy)

When searching with a filter, HGraph needs to frequently call Filter::CheckValid() during graph traversal to verify whether each candidate point is valid. This check can be expensive (especially for complex filter logic). skip_ratio and skip_strategy provide a probabilistic optimization: they skip some filter checks to speed up the search, but may reduce recall.

How It Works

This is a probabilistic optimization strategy: we don’t know in advance which points are valid, so we decide probabilistically whether to visit each point.

skip_ratio (default 0.2): Controls the aggressiveness of skipping filter checks. skip_ratio=0.2 means skip 20% of candidate checks and only check 80%. Higher values skip more, making search faster but potentially reducing recall.
skip_strategy (default “deterministic_accumulative”): Determines how skipping is distributed:
- “random”: Random skipping. Each point is visited independently with probability visit_ratio = valid_ratio + (1 - valid_ratio) * (1 - skip_ratio), so roughly a 1 - skip_ratio fraction of invalid points are skipped.
- “deterministic_accumulative”: Deterministic cumulative skipping. Emits visit decisions at fixed intervals so that the long-run visit ratio matches the target visit_ratio, with lower variance than the random strategy.

The specific formula:

Let valid_ratio be the filter’s global validity rate (from Filter::ValidRatio())
Probability of visiting each point = valid_ratio + (1 - valid_ratio) * (1 - skip_ratio)
In expectation, this targets skipping about skip_ratio of invalid candidate checks when Filter::ValidRatio() is accurate

Usage Examples

// Conservative setting: skip 10% of invalid candidate checks, suitable for high-recall
// scenarios where latency is less critical
auto params = R"({"hgraph": {"ef_search": 200, "skip_ratio": 0.1}})";
auto result = index->KnnSearch(query, topk, params, my_filter).value();

// Use random strategy
auto params = R"({"hgraph": {"ef_search": 200, "skip_ratio": 0.2, "skip_strategy": "random"}})";
auto result = index->KnnSearch(query, topk, params, my_filter).value();

// Aggressive skipping: skip 50% of invalid candidate checks for lower latency
auto params = R"({"hgraph": {"ef_search": 200, "skip_ratio": 0.5}})";
auto result = index->KnnSearch(query, topk, params, my_filter).value();

Choosing Values

Default 0.2: Suitable for most scenarios, balancing performance and recall.
0.1 or lower: Conservative setting, suitable for scenarios with high recall requirements where latency is less critical.
0.5 or higher: Aggressive skipping, suitable for latency-sensitive scenarios where recall degradation is acceptable (e.g., real-time recommendation systems).
0.0: Don’t skip any points, equivalent to disabling this optimization (all points will be checked).

Important notes:

Only applies to searches with filters. These parameters are ignored when no filter is present.
Performance optimization works better when Filter::ValidRatio() is accurately estimated.
Can be used together with brute_force_threshold: when the filter is very strict (ValidRatio is very small), brute_force_threshold will trigger brute-force fallback; otherwise, graph traversal + skip optimization is used.

When to use HGraph

Dense float vectors with dimensions roughly between 64 and 4096.
Latency-sensitive queries where high recall matters.
Mixed workloads with incremental insertion (optionally force removal via support_force_remove).
Memory-constrained deployments that benefit from sq8 / sq4_uniform / pq — often in combination with use_reorder to recover recall.

If your workload is partition-heavy (coarse-grained buckets scanned per query) or strongly I/O-bound on a SSD, compare against IVF before committing to HGraph.

VSAG Documentation