In this section, you’ll learn basic operations in Python, TypeScript, and Rust SDKs .
For the LanceDB Cloud/Enterprise API Reference, check the HTTP REST API Specification .
Prerequisites
Installation Options
pip install lancedb
npm install @lancedb/lancedb
cargo add lancedb
Preview Releases
Stable releases are created about every 2 weeks. For the latest features and bug fixes, you can install the Preview Release. These releases receive the same level of testing as stable releases but are not guaranteed to be available for more than 6 months after they are released. Once your application is stable, we recommend switching to stable releases.
pip install --pre --extra-index-url https://pypi.fury.io/lancedb/ lancedb
npm install @lancedb/lancedb@preview
[dependencies]
lancedb = { git = "https://github.com/lancedb/lancedb.git", tag = "vX.Y.Z-beta.N" }
For Rust, we don’t push Preview Releases to crates.io, but you can reference the tag in GitHub within your Cargo dependencies:
Useful Libraries
For this tutorial, we use some common libraries to help us work with data.
import lancedb
import pandas as pd
import numpy as np
import pyarrow as pa
import os
import * as lancedb from "@lancedb/lancedb";
import * as arrow from "apache-arrow";
1. Connect to LanceDB
LanceDB Cloud
Don’t forget to get your Cloud API key here! The trial is free and you don’t need a credit card.
uri = "db://your-database-uri"
api_key = "your-api-key"
region = "us-east-1"
const dbUri = process.env.LANCEDB_URI || 'db://your-database-uri';
const apiKey = process.env.LANCEDB_API_KEY;
const region = process.env.LANCEDB_REGION;
LanceDB OSS
In case you want to use the embedded version locally, you can connect without credentials:
import lancedb
import pandas as pd
import pyarrow as pa
uri = "data/sample-lancedb"
db = lancedb.connect(uri)
const db = await lancedb.connect(databaseDir);
#[tokio::main]
async fn main() -> Result<()> {
let uri = "data/sample-lancedb";
let db = connect(uri).execute().await?;
}
LanceDB will create the directory if it doesn’t exist (including parent directories).
If you need a reminder of the URI, you can call db.uri()
.
2. Working with Tables
Create a Table From Data
If you have data to insert into the table at creation time, you can simultaneously create a table and insert the data into it. The schema of the data will be used as the schema of the table.
If the table already exists, LanceDB will raise an error by default. If you want to overwrite the table, you can pass in mode="overwrite"
to the create_table
method.
data = [
{"vector": [3.1, 4.1], "item": "foo", "price": 10.0},
{"vector": [5.9, 26.5], "item": "bar", "price": 20.0},
]
tbl = db.create_table("my_table", data=data)
const _tbl = await db.createTable(
"myTable",
[
{ vector: [3.1, 4.1], item: "foo", price: 10.0 },
{ vector: [5.9, 26.5], item: "bar", price: 20.0 },
],
{ mode: "overwrite" },
);
let initial_data = create_some_records()?;
let tbl = db
.create_table("my_table", initial_data)
.execute()
.await
.unwrap();
Create an Empty Table
Sometimes you may not have the data to insert into the table at creation time.
In this case, you can create an empty table and specify the schema, so that you can add
data to the table at a later time (as long as it conforms to the schema). This is
similar to a CREATE TABLE
statement in SQL.
schema = pa.schema([pa.field("vector", pa.list_(pa.float32(), list_size=2))])
tbl = db.create_table("empty_table", schema=schema)
const schema = new arrow.Schema([
new arrow.Field("id", new arrow.Int32()),
new arrow.Field("name", new arrow.Utf8()),
]);
const emptyTbl = await db.createEmptyTable("empty_table", schema);
let schema = Arc::new(Schema::new(vec![
Field::new("id", DataType::Int32, false),
Field::new("item", DataType::Utf8, true),
]));
db.create_empty_table("empty_table", schema).execute().await
Open a Table
Once created, you can open a table as follows:
tbl = db.open_table("my_table")
const _tbl = await db.openTable("myTable");
let table = db.open_table("my_table").execute().await.unwrap();
List Tables
If you forget your table’s name, you can always get a listing of all table names:
print(db.table_names())
const tableNames = await db.tableNames();
println!("{:?}", db.table_names().execute().await?);
Drop Table
Use the drop_table()
method on the database to remove a table.
db.drop_table("my_table")
await db.dropTable("myTable");
db.drop_table("my_table").await.unwrap();
This permanently removes the table and is not recoverable, unlike deleting rows.
By default, if the table does not exist, an exception is raised. To suppress this, you can pass in ignore_missing=True
.
3. Adding Data
LanceDB supports data in several formats: pyarrow
, pandas
, polars
and pydantic
. You can also work with regular python lists & dictionaries, as well as json and csv files.
Add Data to a Table
The data will be appended to the existing table. By default, data is added in append mode, but you can also use mode="overwrite"
to replace existing data.
# Option 1: Add a list of dicts to a table
data = [
{"vector": [1.3, 1.4], "item": "fizz", "price": 100.0},
{"vector": [9.5, 56.2], "item": "buzz", "price": 200.0},
]
tbl.add(data)
# Option 2: Add a pandas DataFrame to a table
df = pd.DataFrame(data)
tbl.add(data)
const data = [
{ vector: [1.3, 1.4], item: "fizz", price: 100.0 },
{ vector: [9.5, 56.2], item: "buzz", price: 200.0 },
];
await tbl.add(data);
let new_data = create_some_records()?;
tbl.add(new_data).execute().await.unwrap();
Delete Rows
Use the delete()
method on tables to delete rows from a table. To choose
which rows to delete, provide a filter that matches on the metadata columns.
This can delete any number of rows that match the filter.
tbl.delete('item = "fizz"')
await tbl.delete('item = "fizz"');
tbl.delete("id > 24").await.unwrap();
The deletion predicate is a SQL expression that supports the same expressions
as the where()
clause (only_if()
in Rust) on a search. They can be as
simple or complex as needed. To see what expressions are supported, see the
SQL filters
section.
4. Vector Search
Once you’ve embedded the query, you can find its nearest neighbors as follows. LanceDB uses L2 (Euclidean) distance by default, but supports other distance metrics like cosine similarity and dot product.
tbl.search([100, 100]).limit(2).to_pandas()
const res = await tbl.search([100, 100]).limit(2).toArray();
table
.query()
.limit(2)
.nearest_to(&[1.0; 128])?
.execute()
.await?
.try_collect::<Vec<_>>()
.await
This returns a Pandas DataFrame with the results.
5. Building an Index
By default, LanceDB runs a brute-force scan over the dataset to find the K nearest neighbors (KNN). For larger datasets, this can be computationally expensive.
Indexing Threshold: If your table has more than 50,000 vectors, you should create an ANN index to speed up search performance. The IVF_PQ Index uses IVF (Inverted File) partitioning to reduce the search space.
tbl.create_index(num_sub_vectors=1)
await tbl.createIndex("vector");
table.create_index(&["vector"], Index::Auto).execute().await
6. Embedding API
You can use the Embedding API when working with embedding models. It automatically vectorizes the data at ingestion and query time and comes with built-in integrations with popular embedding models like Openai, Hugging Face, Sentence Transformers, CLIP and more.
from lancedb.pydantic import LanceModel, Vector
from lancedb.embeddings import get_registry
db = lancedb.connect("/tmp/db")
func = get_registry().get("openai").create(name="text-embedding-ada-002")
class Words(LanceModel):
text: str = func.SourceField()
vector: Vector(func.ndims()) = func.VectorField()
table = db.create_table("words", schema=Words, mode="overwrite")
table.add([{"text": "hello world"}, {"text": "goodbye world"}])
query = "greetings"
actual = table.search(query).limit(1).to_pydantic(Words)[0]
print(actual.text)
import * as lancedb from "@lancedb/lancedb";
import "@lancedb/lancedb/embedding/openai";
import { LanceSchema, getRegistry, register } from "@lancedb/lancedb/embedding";
import { EmbeddingFunction } from "@lancedb/lancedb/embedding";
import { type Float, Float32, Utf8 } from "apache-arrow";
const db = await lancedb.connect(databaseDir);
const func = getRegistry()
.get("openai")
?.create({ model: "text-embedding-ada-002" }) as EmbeddingFunction;
const wordsSchema = LanceSchema({
text: func.sourceField(new Utf8()),
vector: func.vectorField(),
});
const tbl = await db.createEmptyTable("words", wordsSchema, {
mode: "overwrite",
});
await tbl.add([{ text: "hello world" }, { text: "goodbye world" }]);
const query = "greetings";
const actual = (await tbl.search(query).limit(1).toArray())[0];
use std::{iter::once, sync::Arc};
use arrow_array::{Float64Array, Int32Array, RecordBatch, RecordBatchIterator, StringArray};
use arrow_schema::{DataType, Field, Schema};
use futures::StreamExt;
use lancedb::{
arrow::IntoArrow,
connect,
embeddings::{openai::OpenAIEmbeddingFunction, EmbeddingDefinition, EmbeddingFunction},
query::{ExecutableQuery, QueryBase},
Result,
};
#[tokio::main]
async fn main() -> Result<()> {
let tempdir = tempfile::tempdir().unwrap();
let tempdir = tempdir.path().to_str().unwrap();
let api_key = std::env::var("OPENAI_API_KEY").expect("OPENAI_API_KEY is not set");
let embedding = Arc::new(OpenAIEmbeddingFunction::new_with_model(
api_key,
"text-embedding-3-large",
)?);
let db = connect(tempdir).execute().await?;
db.embedding_registry()
.register("openai", embedding.clone())?;
let table = db
.create_table("vectors", make_data())
.add_embedding(EmbeddingDefinition::new(
"text",
"openai",
Some("embeddings"),
))?
.execute()
.await?;
let query = Arc::new(StringArray::from_iter_values(once("something warm")));
let query_vector = embedding.compute_query_embeddings(query)?;
let mut results = table
.vector_search(query_vector)?
.limit(1)
.execute()
.await?;
let rb = results.next().await.unwrap()?;
let out = rb
.column_by_name("text")
.unwrap()
.as_any()
.downcast_ref::<StringArray>()
.unwrap();
let text = out.iter().next().unwrap().unwrap();
println!("Closest match: {}", text);
Ok(())
}
What’s Next?
This section covered the very basics of using LanceDB.
-
To learn more about vector databases, you may want to read about Search or Indexing to get familiar with the concepts.
-
If you’ve already worked with other vector databases, dive into the Table Docs to learn how to work with LanceDB Tables in more detail.