Pandas and PyArrow

Because Lance is built on top of Apache Arrow , LanceDB is tightly integrated with the Python data ecosystem, including Pandas and PyArrow. The sequence of steps in a typical workflow is shown below.

Create dataset

First, we need to connect to a LanceDB database.

=== “Sync API”

```python
--8<-- "python/python/tests/docs/test_python.py:import-lancedb"
--8<-- "python/python/tests/docs/test_python.py:connect_to_lancedb"
```

=== “Async API”

```python
--8<-- "python/python/tests/docs/test_python.py:import-lancedb"
--8<-- "python/python/tests/docs/test_python.py:connect_to_lancedb_async"
```

We can load a Pandas DataFrame to LanceDB directly.

=== “Sync API”

```python
--8<-- "python/python/tests/docs/test_python.py:import-pandas"
--8<-- "python/python/tests/docs/test_python.py:create_table_pandas"
```

=== “Async API”

```python
--8<-- "python/python/tests/docs/test_python.py:import-pandas"
--8<-- "python/python/tests/docs/test_python.py:create_table_pandas_async"
```

Similar to the pyarrow.write_dataset() method, LanceDB’s db.create_table() accepts data in a variety of forms.

If you have a dataset that is larger than memory, you can create a table with Iterator[pyarrow.RecordBatch] to lazily load the data:

=== “Sync API”

```python
--8<-- "python/python/tests/docs/test_python.py:import-iterable"
--8<-- "python/python/tests/docs/test_python.py:import-pyarrow"
--8<-- "python/python/tests/docs/test_python.py:make_batches"
--8<-- "python/python/tests/docs/test_python.py:create_table_iterable"
```

=== “Async API”

```python
--8<-- "python/python/tests/docs/test_python.py:import-iterable"
--8<-- "python/python/tests/docs/test_python.py:import-pyarrow"
--8<-- "python/python/tests/docs/test_python.py:make_batches"
--8<-- "python/python/tests/docs/test_python.py:create_table_iterable_async"
```

You will find detailed instructions of creating a LanceDB dataset in Getting Started and API sections.

We can now perform similarity search via the LanceDB Python API.

=== “Sync API”

```python
--8<-- "python/python/tests/docs/test_python.py:vector_search"
```

=== “Async API”

```python
--8<-- "python/python/tests/docs/test_python.py:vector_search_async"
```
code
    vector     item  price    _distance
0  [5.9, 26.5]  bar   20.0  14257.05957

If you have a simple filter, it’s faster to provide a where clause to LanceDB’s search method. For more complex filters or aggregations, you can always resort to using the underlying DataFrame methods after performing a search.

=== “Sync API”

```python
--8<-- "python/python/tests/docs/test_python.py:vector_search_with_filter"
```

=== “Async API”

```python
--8<-- "python/python/tests/docs/test_python.py:vector_search_with_filter_async"
```