A Guide to Uploading Lance Datasets on the Hugging Face Hub
Build a multimodal Lance dataset, publish it to the Hub, and query the precomputed vector + FTS indexes in LanceDB, without needing to download the dataset locally.
Blog category:
Build a multimodal Lance dataset, publish it to the Hub, and query the precomputed vector + FTS indexes in LanceDB, without needing to download the dataset locally.
OpenClaw and similar personal autonomous agents need a local-first long-term memory layer. LanceDB fits that role with embedded deployment, filesystem-native storage, and multimodal retrieval.
How we redesigned blob storage in Lance to make multimodal data a first-class citizen, with four storage semantics (Inline, Packed, Dedicated, External) that automatically adapt to your workload.
Lance file format 2.2 introduces Blob V2, nested schema evolution, native Map type support, and additional compression and performance improvements for AI/ML data workloads.
How Lance's Arrow-native architecture enables first-class geospatial support through extension types, GeoDataFusion integration, and R-Tree indexing.
A deep dive into how table formats handle version management for ML/AI experimentation, and how Lance unifies branching, tagging, and shallow clone on top of its multi-base architecture.
Announcing native read support for Lance format on Hugging Face Hub. You can now distribute your large multimodal datasets as a single, searchable artifact (including blobs, embeddings and indexes) all in one place!
Learn how LanceDB benchmarks storage and how we achieved one million disk reads per second.
A tour of Lance's file path design, and how Lance’s new multi-base layout enables multi-location datasets (such as Uber’s multi-bucket setup) with minimal metadata rewrites.