LanceDB Cloud and Enterprise provide performant full-text search based on BM25, allowing you to incorporate keyword-based search in your retrieval solutions.
The
create_fts_index API returns immediately, but the building of the FTS index is asynchronous.
Creating FTS Indexes
import lancedb
# Connect to LanceDB
db = lancedb.connect(
uri="db://your-project-slug",
api_key="your-api-key",
region="us-east-1"
)
table_name = "lancedb-cloud-quickstart"
table = db.open_table(table_name)
table.create_fts_index("text")import * as lancedb from "@lancedb/lancedb"
const db = await lancedb.connect({
uri: "db://your-project-slug",
apiKey: "your-api-key",
region: "us-east-1"
});
const tableName = "lancedb-cloud-quickstart"
const table = openTable(tableName);
await table.createIndex("text", {
config: lancedb.Index.fts()
});Check FTS index status using the methods above .
index_name = "text_idx"
table.wait_for_index([index_name])const indexName = "text_idx"
await table.waitForIndex([indexName], 60)Configuration Options
FTS Configuration Parameters
LanceDB supports the following configurable parameters for full-text search:
| Parameter | Type | Default | Description |
|---|---|---|---|
| with_position | bool | False | Store token positions (required for phrase queries) |
| base_tokenizer | str | “simple” | Text splitting method: - “simple”: Split by whitespace/punctuation - “whitespace”: Split by whitespace only - “raw”: Treat as single token |
| language | str | “English” | Language for tokenization (stemming/stop words) |
| max_token_length | int | 40 | Maximum token size in bytes; tokens exceeding this length are omitted from the index |
| lower_case | bool | True | Convert tokens to lowercase |
| stem | bool | True | Apply stemming (e.g., “running” → “run”) |
| remove_stop_words | bool | True | Remove common stop words |
| ascii_folding | bool | True | Normalize accented characters |
💡 Key Parameters
- The
max_token_lengthparameter helps optimize indexing performance by filtering out non-linguistic content like base64 data and long URLs - When
with_positionis disabled, phrase queries will not work, but index size is reduced and indexing is faster ascii_foldingis useful for handling international text (e.g., “café” → “cafe”)
Phrase Query Configuration
To enable phrase queries, you must modify these parameters from their default values:
| Parameter | Required Value | Purpose |
|---|---|---|
| with_position | True | Enables tracking of token positions for phrase matching |
| remove_stop_words | False | Preserves all words, including stop words, for exact phrase matching |