|
| 1 | +# Memory Management |
| 2 | + |
| 3 | +## Overview |
| 4 | + |
| 5 | +Endee does not keep every index resident all the time. `IndexManager` keeps a bounded set of |
| 6 | +live indices in memory in `indices_`, and uses `indices_list_` to choose eviction candidates when |
| 7 | +that live set is full. |
| 8 | + |
| 9 | +When an index is created or loaded on demand, `ensureLiveIndexCapacity()` may call |
| 10 | +`evictIfNeeded()` before admitting the new live index. |
| 11 | + |
| 12 | +In the current implementation, a live index consists of: |
| 13 | + |
| 14 | +- the in-memory `HierarchicalNSW<float>` graph |
| 15 | +- `IDMapper` |
| 16 | +- `VectorStorage`, which itself owns separate MDBX-backed stores for vectors, metadata, and filters |
| 17 | +- optional `SparseVectorStorage` |
| 18 | +- the per-index WAL object |
| 19 | + |
| 20 | +## What Actually Uses DRAM |
| 21 | + |
| 22 | +The dominant in-memory cost of a live dense index is the HNSW structure: |
| 23 | + |
| 24 | +- the base layer, allocated as `maxElements * sizeDataAtBaseLayer_` |
| 25 | +- upper-layer node storage in `dataUpperLayer_` |
| 26 | +- the vector cache, sized from `VECTOR_CACHE_PERCENTAGE` and `VECTOR_CACHE_MIN_BITS` |
| 27 | +- the visited-list pool and other small bookkeeping structures |
| 28 | + |
| 29 | +One important detail: Endee does not load the full dense vector corpus into the HNSW object. |
| 30 | +Dense vectors stay in `VectorStorage` and are fetched on demand through the vector fetcher and the |
| 31 | +vector cache. So the main DRAM cost is the graph plus cache, not a second full copy of the vector |
| 32 | +database. |
| 33 | + |
| 34 | +## Scaling |
| 35 | + |
| 36 | +### 1. Virtual Address Space |
| 37 | + |
| 38 | +Each live dense index opens multiple MDBX environments, each with a large configured upper bound: |
| 39 | + |
| 40 | +| Component | Max map size | |
| 41 | +| --- | --- | |
| 42 | +| `IDMapper` | 8 GiB | |
| 43 | +| dense vector store | 4 TiB | |
| 44 | +| dense metadata store | 512 GiB | |
| 45 | +| filter store | 64 GiB | |
| 46 | +| sparse storage | 1 TiB | |
| 47 | + |
| 48 | +These are the default configured maxima. In `settings.hpp`, both the initial/current map size and |
| 49 | +the maximum map size are runtime-configurable through environment variables such as: |
| 50 | +`NDD_INDEX_META_MAP_SIZE_BITS`, `NDD_INDEX_META_MAP_SIZE_MAX_BITS`, |
| 51 | +`NDD_ID_MAPPER_MAP_SIZE_BITS`, `NDD_ID_MAPPER_MAP_SIZE_MAX_BITS`, |
| 52 | +`NDD_FILTER_MAP_SIZE_BITS`, `NDD_FILTER_MAP_SIZE_MAX_BITS`, |
| 53 | +`NDD_METADATA_MAP_SIZE_BITS`, `NDD_METADATA_MAP_SIZE_MAX_BITS`, |
| 54 | +`NDD_VECTOR_MAP_SIZE_BITS`, `NDD_VECTOR_MAP_SIZE_MAX_BITS`, and |
| 55 | +`NDD_SPARSE_MAP_SIZE_MAX_BITS`. |
| 56 | + |
| 57 | +That is about 5.57 TiB of configured MDBX map capacity for a live index with sparse storage |
| 58 | +enabled. For a dense-only index, the total is about 4.57 TiB. There is also one global metadata |
| 59 | +environment for index metadata with a 128 MiB upper bound. |
| 60 | + |
| 61 | +Because these environments stay open while the index is live, virtual address space becomes a |
| 62 | +scaling constraint, especially when many indices are resident at once. |
| 63 | + |
| 64 | +### 2. Server DRAM |
| 65 | + |
| 66 | +Each live index allocates its HNSW graph structures eagerly when the index is created or loaded. |
| 67 | +That memory must fit in RAM for the server to stay healthy. |
| 68 | + |
| 69 | +### 3. Sticky-Thread MDBX Environments and `PTHREAD_KEYS_MAX` |
| 70 | + |
| 71 | +All dense-index MDBX environments are currently opened without `MDBX_NOSTICKYTHREADS`. The only |
| 72 | +place that enables `MDBX_NOSTICKYTHREADS` today is sparse storage. |
| 73 | + |
| 74 | +That means a live dense index currently opens four sticky-thread MDBX environments: |
| 75 | + |
| 76 | +- `IDMapper` |
| 77 | +- dense vector store |
| 78 | +- dense metadata store |
| 79 | +- filter store |
| 80 | + |
| 81 | +There is also one global sticky-thread environment in `MetadataManager`. |
| 82 | + |
| 83 | +If libmdbx consumes one pthread TLS key per sticky environment, the current constant |
| 84 | +`MAX_LIVE_INDICES = 255` is consistent with the code layout: |
| 85 | + |
| 86 | +- `255 * 4 = 1020` per-index sticky environments |
| 87 | +- `+1` global metadata environment |
| 88 | +- total `1021`, which stays just below a `PTHREAD_KEYS_MAX` of `1024` |
| 89 | + |
| 90 | +On glibc-based systems, `PTHREAD_KEYS_MAX` is a libc build-time constant, so increasing it would |
| 91 | +require rebuilding glibc. |
| 92 | + |
| 93 | +## How Eviction Works Today |
| 94 | + |
| 95 | +`evictIfNeeded()` is currently a live-index-count guard. It runs when: |
| 96 | + |
| 97 | +- `createIndex()` is about to create a new live index |
| 98 | +- `getIndexEntry()` needs to load a cold index from disk |
| 99 | + |
| 100 | +When eviction runs, it: |
| 101 | + |
| 102 | +1. picks the candidate at the back of `indices_list_` |
| 103 | +2. saves the index first if it is dirty |
| 104 | +3. marks the cache entry invalid |
| 105 | +4. removes it from `indices_` |
| 106 | + |
| 107 | +One subtle but important detail: this is not yet a true inactivity-based or LRU policy. |
| 108 | +`indices_list_` is updated on create/load, but not refreshed on search or mutation, and |
| 109 | +`last_access` is currently not used anywhere in the eviction path. |
| 110 | + |
| 111 | +In practice, that means eviction is closer to "oldest loaded/created live index first" than |
| 112 | +"least recently used index first". |
| 113 | + |
| 114 | +## TODO |
| 115 | + |
| 116 | +1. There is no DRAM-based admission or eviction policy yet. `MAX_ANON_MEM` is only a commented |
| 117 | + placeholder. In practice, the usable memory ceiling should be computed at startup from the |
| 118 | + effective deployment limit: |
| 119 | + cgroup limits, container limits, and host/server memory limits. The way this is discovered will |
| 120 | + also be OS-specific, with different logic needed for Linux and macOS. |
| 121 | +2. `MAX_LIVE_INDICES` is a fixed compile-time constant. A better implementation would derive a |
| 122 | + safe cap from the actual runtime environment, for example by checking the system's |
| 123 | + `PTHREAD_KEYS_MAX` value via `getconf PTHREAD_KEYS_MAX`. |
| 124 | +3. The server does not currently refuse startup when the machine or container cannot satisfy the |
| 125 | + minimum memory footprint needed to keep the required live indices healthy. |
0 commit comments