Because Lance is built on top of Apache Arrow , LanceDB is tightly integrated with the Python data ecosystem, including Pandas and PyArrow. The sequence of steps in a typical workflow is shown below.
Create dataset
First, we need to connect to a LanceDB database.
=== “Sync API”
```python
--8<-- "python/python/tests/docs/test_python.py:import-lancedb"
--8<-- "python/python/tests/docs/test_python.py:connect_to_lancedb"
```
=== “Async API”
```python
--8<-- "python/python/tests/docs/test_python.py:import-lancedb"
--8<-- "python/python/tests/docs/test_python.py:connect_to_lancedb_async"
```
We can load a Pandas DataFrame
to LanceDB directly.
=== “Sync API”
```python
--8<-- "python/python/tests/docs/test_python.py:import-pandas"
--8<-- "python/python/tests/docs/test_python.py:create_table_pandas"
```
=== “Async API”
```python
--8<-- "python/python/tests/docs/test_python.py:import-pandas"
--8<-- "python/python/tests/docs/test_python.py:create_table_pandas_async"
```
Similar to the
pyarrow.write_dataset()
method, LanceDB’s
db.create_table()
accepts data in a variety of forms.
If you have a dataset that is larger than memory, you can create a table with Iterator[pyarrow.RecordBatch]
to lazily load the data:
=== “Sync API”
```python
--8<-- "python/python/tests/docs/test_python.py:import-iterable"
--8<-- "python/python/tests/docs/test_python.py:import-pyarrow"
--8<-- "python/python/tests/docs/test_python.py:make_batches"
--8<-- "python/python/tests/docs/test_python.py:create_table_iterable"
```
=== “Async API”
```python
--8<-- "python/python/tests/docs/test_python.py:import-iterable"
--8<-- "python/python/tests/docs/test_python.py:import-pyarrow"
--8<-- "python/python/tests/docs/test_python.py:make_batches"
--8<-- "python/python/tests/docs/test_python.py:create_table_iterable_async"
```
You will find detailed instructions of creating a LanceDB dataset in Getting Started and API sections.
Vector search
We can now perform similarity search via the LanceDB Python API.
=== “Sync API”
```python
--8<-- "python/python/tests/docs/test_python.py:vector_search"
```
=== “Async API”
```python
--8<-- "python/python/tests/docs/test_python.py:vector_search_async"
```
vector item price _distance
0 [5.9, 26.5] bar 20.0 14257.05957
If you have a simple filter, it’s faster to provide a where
clause to LanceDB’s search
method.
For more complex filters or aggregations, you can always resort to using the underlying DataFrame
methods after performing a search.
=== “Sync API”
```python
--8<-- "python/python/tests/docs/test_python.py:vector_search_with_filter"
```
=== “Async API”
```python
--8<-- "python/python/tests/docs/test_python.py:vector_search_with_filter_async"
```