Schema and Data Evolution with LanceDB

Schema evolution enables non-breaking modifications to a database table’s structure — such as adding columns, altering data types, or dropping fields — to adapt to evolving data requirements without service interruptions. LanceDB supports ACID-compliant schema evolution through granular operations (add/alter/drop columns), allowing you to:

  • Iterate Safely: Modify schemas in production with versioned datasets and backward compatibility
  • Scale Seamlessly: Handle ML model iterations, regulatory changes, or feature additions
  • Optimize Continuously: Remove unused fields or enforce new constraints without downtime

Schema Evolution Operations

LanceDB supports three primary schema evolution operations:

  1. Adding new columns: Extend your table with additional attributes
  2. Altering existing columns: Change column names, data types, or nullability
  3. Dropping columns: Remove unnecessary columns from your schema
💡 Schema Evolution Performance
Schema evolution operations are applied immediately but do not typically require rewriting all data. However, data type changes may involve more substantial operations.

Adding New Columns

You can add new columns to a table with the add_columns method in Python or addColumns in TypeScript/JavaScript. New columns are populated based on SQL expressions you provide.

Setting Up the Example Table

First, let’s create a sample table with product data to demonstrate schema evolution:

python
table_name = "schema_evolution_add_example"

data = [
    {
        "id": 1,
        "name": "Laptop",
        "price": 1200.00,
        "vector": np.random.random(128).tolist(),
    },
    {
        "id": 2,
        "name": "Smartphone",
        "price": 800.00,
        "vector": np.random.random(128).tolist(),
    },
    {
        "id": 3,
        "name": "Headphones",
        "price": 150.00,
        "vector": np.random.random(128).tolist(),
    },
    {
        "id": 4,
        "name": "Monitor",
        "price": 350.00,
        "vector": np.random.random(128).tolist(),
    },
    {
        "id": 5,
        "name": "Keyboard",
        "price": 80.00,
        "vector": np.random.random(128).tolist(),
    },
]

table = db.create_table(table_name, data, mode="overwrite")
typescript
const tableName = "schema_evolution_add_example";

const data = [
  {
    id: 1,
    name: "Laptop",
    price: 1200.0,
    vector: Array.from({ length: 128 }, () => Math.random()),
  },
  {
    id: 2,
    name: "Smartphone",
    price: 800.0,
    vector: Array.from({ length: 128 }, () => Math.random()),
  },
  {
    id: 3,
    name: "Headphones",
    price: 150.0,
    vector: Array.from({ length: 128 }, () => Math.random()),
  },
  {
    id: 4,
    name: "Monitor",
    price: 350.0,
    vector: Array.from({ length: 128 }, () => Math.random()),
  },
  {
    id: 5,
    name: "Keyboard",
    price: 80.0,
    vector: Array.from({ length: 128 }, () => Math.random()),
  },
];

const table = await db.createTable(tableName, data, { mode: "overwrite" });

Adding Calculated Columns

You can add new columns that are calculated from existing data using SQL expressions:

python
# Add a discounted price column (10% discount)
table.add_columns({"discounted_price": "cast((price * 0.9) as float)"})
typescript
// Add a discounted price column (10% discount)
await table.addColumns([
  { name: "discounted_price", valueSql: "cast((price * 0.9) as float)" },
]);

Adding Columns with Default Values

Add boolean columns with default values for status tracking:

python
# Add a stock status column with default value
table.add_columns({"in_stock": "cast(true as boolean)"})
typescript
// Add a stock status column with default value
await table.addColumns([
  { name: "in_stock", valueSql: "cast(true as boolean)" },
]);

Adding Nullable Columns

Add timestamp columns that can contain NULL values:

python
# Add a nullable timestamp column
table.add_columns({"last_ordered": "cast(NULL as timestamp)"})
typescript
// Add a nullable timestamp column
await table.addColumns([
  { name: "last_ordered", valueSql: "cast(NULL as timestamp)" },
]);
💡 NULL Values in New Columns
When adding columns that should contain NULL values, be sure to cast the NULL to the appropriate type, e.g., cast(NULL as timestamp).

Altering Existing Columns

You can alter columns using the alter_columns method in Python or alterColumns in TypeScript/JavaScript. This allows you to:

  • Rename a column
  • Change a column’s data type
  • Modify nullability (whether a column can contain NULL values)

Setting Up the Example Table

Create a table with a custom schema to demonstrate column alterations:

python
table_name = "schema_evolution_alter_example"

data = [
    {
        "id": 1,
        "name": "Laptop",
        "price": 1200,
        "discount_price": 1080.0,
        "vector": np.random.random(128).tolist(),
    },
    {
        "id": 2,
        "name": "Smartphone",
        "price": 800,
        "discount_price": 720.0,
        "vector": np.random.random(128).tolist(),
    },
    {
        "id": 3,
        "name": "Headphones",
        "price": 150,
        "discount_price": 135.0,
        "vector": np.random.random(128).tolist(),
    },
    {
        "id": 4,
        "name": "Monitor",
        "price": 350,
        "discount_price": 315.0,
        "vector": np.random.random(128).tolist(),
    },
    {
        "id": 5,
        "name": "Keyboard",
        "price": 80,
        "discount_price": 72.0,
        "vector": np.random.random(128).tolist(),
    },
]
schema = pa.schema(
    {
        "id": pa.int64(),
        "name": pa.string(),
        "price": pa.int32(),
        "discount_price": pa.float64(),
        "vector": pa.list_(pa.float32(), 128),
    }
)

table = db.create_table(table_name, data, schema=schema, mode="overwrite")
typescript
const tableName = "schema_evolution_alter_example";

const data = [
  {
    id: 1,
    name: "Laptop",
    price: 1200,
    discount_price: 1080.0,
    vector: Array.from({ length: 128 }, () => Math.random()),
  },
  {
    id: 2,
    name: "Smartphone",
    price: 800,
    discount_price: 720.0,
    vector: Array.from({ length: 128 }, () => Math.random()),
  },
  {
    id: 3,
    name: "Headphones",
    price: 150,
    discount_price: 135.0,
    vector: Array.from({ length: 128 }, () => Math.random()),
  },
  {
    id: 4,
    name: "Monitor",
    price: 350,
    discount_price: 315.0,
    vector: Array.from({ length: 128 }, () => Math.random()),
  },
  {
    id: 5,
    name: "Keyboard",
    price: 80,
    discount_price: 72.0,
    vector: Array.from({ length: 128 }, () => Math.random()),
  },
];

const table = await db.createTable(tableName, data, { mode: "overwrite" });

Renaming Columns

Change column names to better reflect their purpose:

python
# Rename discount_price to sale_price
table.alter_columns({"path": "discount_price", "rename": "sale_price"})
typescript
// Rename discount_price to sale_price
await table.alterColumns([
  {
    path: "discount_price",
    rename: "sale_price",
  },
]);

Changing Data Types

Convert column data types for better performance or compatibility:

python
# Change price from int32 to int64 for larger numbers
table.alter_columns({"path": "price", "data_type": pa.int64()})

Making Columns Nullable

Allow columns to contain NULL values:

python
# Make the name column nullable
table.alter_columns({"path": "name", "nullable": True})
typescript
// Make the name column nullable
await table.alterColumns([
  {
    path: "name",
    nullable: true,
  },
]);

Multiple Changes at Once

Apply several alterations in a single operation:

python
# Rename, change type, and make nullable in one operation
table.alter_columns(
    {
        "path": "sale_price",
        "rename": "final_price",
        "data_type": pa.float64(),
        "nullable": True,
    }
)
typescript
// Rename and make nullable in one operation
await table.alterColumns([
  {
    path: "sale_price",
    rename: "final_price",
    nullable: true,
  },
]);
💡 Data Type Changes
Changing data types requires rewriting the column data and may be resource-intensive for large tables. Renaming columns or changing nullability is more efficient as it only updates metadata.

Dropping Columns

You can remove columns using the drop_columns method in Python or dropColumns in TypeScript/JavaScript.

Setting Up the Example Table

Create a table with temporary columns that we’ll remove:

python
table_name = "schema_evolution_drop_example"

data = [
    {
        "id": 1,
        "name": "Laptop",
        "price": 1200.00,
        "temp_col1": "X",
        "temp_col2": 100,
        "vector": np.random.random(128).tolist(),
    },
    {
        "id": 2,
        "name": "Smartphone",
        "price": 800.00,
        "temp_col1": "Y",
        "temp_col2": 200,
        "vector": np.random.random(128).tolist(),
    },
    {
        "id": 3,
        "name": "Headphones",
        "price": 150.00,
        "temp_col1": "Z",
        "temp_col2": 300,
        "vector": np.random.random(128).tolist(),
    },
]

table = db.create_table(table_name, data, mode="overwrite")
typescript
const tableName = "schema_evolution_drop_example";

const data = [
  {
    id: 1,
    name: "Laptop",
    price: 1200.0,
    temp_col1: "X",
    temp_col2: 100,
    vector: Array.from({ length: 128 }, () => Math.random()),
  },
  {
    id: 2,
    name: "Smartphone",
    price: 800.0,
    temp_col1: "Y",
    temp_col2: 200,
    vector: Array.from({ length: 128 }, () => Math.random()),
  },
  {
    id: 3,
    name: "Headphones",
    price: 150.0,
    temp_col1: "Z",
    temp_col2: 300,
    vector: Array.from({ length: 128 }, () => Math.random()),
  },
];

const table = await db.createTable(tableName, data, { mode: "overwrite" });

Dropping Single Columns

Remove individual columns that are no longer needed:

python
# Remove the first temporary column
table.drop_columns(["temp_col1"])
typescript
// Remove the first temporary column
await table.dropColumns(["temp_col1"]);

Dropping Multiple Columns

Remove several columns at once for efficiency:

python
# Remove the second temporary column
table.drop_columns(["temp_col2"])
typescript
// Remove the second temporary column
await table.dropColumns(["temp_col2"]);
💡 Irreversible Column Deletion
Dropping columns cannot be undone. Make sure you have backups or are certain before removing columns.

Vector Column Considerations

Vector columns (used for embeddings) have special considerations. When altering vector columns, you should ensure consistent dimensionality.

Converting List to FixedSizeList

A common schema evolution task is converting a generic list column to a fixed-size list for performance:

python
vector_dim = 768  # Your embedding dimension
table.alter_columns(dict(path="embedding", data_type=pa.list_(pa.float32(), vector_dim)))