API referenceο
Vectorlite provides the following APIs. Please note vectorlite is currently in beta. There could be breaking changes.
Free-standing Application Defined SQL functionsο
The following functions can be used in any context.
vectorlite_info() -- prints version info and some compile time info. e.g. Is SSE, AVX enabled.
vector_from_json(json_string) -- converts a json array of type TEXT into BLOB(a c-style float32 array)
vector_to_json(vector_blob) -- converts a vector of type BLOB(c-style float32 array) into a json array of type TEXT
vector_distance(vector_blob1, vector_blob2, distance_type_str) -- calculate vector distance between two vectors, distance_type_str could be 'l2', 'cosine', 'ip'
In fact, one can easily implement brute force searching using vector_distance
, which returns 100% accurate search results:
-- use a normal sqlite table
create table my_table(rowid integer primary key, embedding blob);
-- insert
insert into my_table(rowid, embedding) values (0, {your_embedding});
-- search for 10 nearest neighbors using l2 squared distance
select rowid from my_table order by vector_distance({query_vector}, embedding, 'l2') asc limit 10
Virtual Tableο
The core of vectorlite is the virtual table module, which is used to hold vector index and way faster than brute force approach at the cost of not being 100% accurate. A vectorlite table can be created using:
-- Required fields: table_name, vector_name, dimension, max_elements
-- Optional fields:
-- 1. distance_type: defaults to l2
-- 2. ef_construction: defaults to 200
-- 3. M: defaults to 16
-- 4. random_seed: defaults to 100
-- 5. allow_replace_deleted: defaults to true
-- 6. index_file_path: no default value. If not provided, the table will be memory-only. If provided, vectorlite will try to load index from the file and save to it when db connection is closed.
create virtual table {table_name} using vectorlite({vector_name} float32[{dimension}] {distance_type}, hnsw(max_elements={max_elements}, {ef_construction=200}, {M=16}, {random_seed=100}, {allow_replace_deleted=true}), {index_file_path});
You can insert, update and delete a vectorlite table as if itβs a normal sqlite table.
-- rowid is required during insertion, because rowid is used to connect the vector to its metadata stored elsewhere. Auto-generating rowid doesn't makes sense.
insert into my_vectorlite_table(rowid, vector_name) values ({your_rowid}, {vector_blob});
-- Note: update and delete statements that uses rowid filter require sqlite3_version >= 3.38 to run.
update my_vectorlite_table set vector_name = {new_vector_blob} where rowid = {your_rowid};
delete from my_vectorlite_table where rowid = {your_rowid};
The following functions should be only used when querying a vectorlite table
-- returns knn_parameter that will be passed to knn_search().
-- vector_blob: vector to search
-- k: how many nearest neighbors to search for
-- ef: optional. A HNSW parameter that controls speed-accuracy trade-off. Defaults to 10 at first. If set to another value x, it will remain x if not specified again in another query within a single db connection.
knn_param(vector_blob, k, ef)
-- Should only be used in the `where clause` in a `select` statement to tell vectorlite to speed up the query using HNSW index
-- vector_name should match the vectorlite table's definition
-- knn_parameter is usually constructed using knn_param()
knn_search(vector_name, knn_parameter)
-- An example of vector search query. `distance` is an implicit column of a vectorlite table.
select rowid, distance from my_vectorlite_table where knn_search(vector_name, knn_param({vector_blob}, {k}))
-- An example of vector search query with pushed-down metadata(rowid) filter, requires sqlite_version >= 3.38 to run.
select rowid, distance from my_vectorlite_table where knn_search(vector_name, knn_param({vector_blob}, {k})) and rowid in (1,2,3,4,5)