How to deploy & self-host SedonaDB on DigitalOcean

Contents show

Here’s a detailed, practical guide to deploying and self-hosting SedonaDB on DigitalOcean. I assume you already understand the basics of SedonaDB (its geospatial-native design, SQL/Python APIs, etc.). If not, you may want to read the intro we drafted earlier.

Before you begin, make sure you have an active DigitalOcean account and appropriate billing setup, since you’ll provision droplets, storage, and potentially networking services.

Why host SedonaDB yourself

SedonaDB is a single-node analytical database built from the ground up to treat spatial data as first-class (i.e. geometry types, CRS awareness, spatial joins, etc.). It fits a niche between lightweight tools like DuckDB (with spatial extensions) and full relational spatial databases like PostGIS. You may prefer self-hosting when you want full control over hardware, data sovereignty, or avoid managed database costs.

On DigitalOcean, you can provision a droplet (Virtual Machine) and set up SedonaDB there, giving you a dedicated environment you can scale (vertically) and configure to your needs.

In this guide I’ll walk you through:

Droplet provisioning & basic setup
Installing dependencies, Rust toolchain, and SedonaDB
Configuration (service, ports, security)
Connecting to SedonaDB remotely or via Python
Optional enhancements (backups, TLS, monitoring)

Let’s begin.

1. Provision a Droplet

Log into DigitalOcean’s dashboard and click Create → Droplets.
Choose an image — I recommend Ubuntu 24.04 LTS (or latest stable LTS).
Choose the size (vCPUs, RAM, disk) based on your expected workload. For moderate geospatial queries, 4 vCPUs / 8–16 GB RAM is often a safe starting point; you can scale up later.
Choose a region close to your users.
Add your SSH keys (so you can SSH in securely).
Optionally enable monitoring, backups, private networking, and other features.
Click “Create Droplet”.

Once the droplet is ready, note its public IP address (e.g. 203.0.113.45).

2. Initial server setup

SSH into your machine:

ssh root@203.0.113.45

ssh root@203.0.113.45

I recommend these initial steps:

Add a non-root user (e.g. sedonadb) and give it sudo privileges.
Harden SSH (disable root login, change port if desired).
Install basic packages:

apt update
apt upgrade -y
apt install -y build-essential curl git \
    ca-certificates pkg-config libssl-dev

apt update
apt upgrade -y
apt install -y build-essential curl git \
    ca-certificates pkg-config libssl-dev

Enable a firewall (ufw) and allow SSH:

ufw allow OpenSSH
ufw enable

ufw allow OpenSSH
ufw enable

This gives you a safe baseline before installing SedonaDB.

3. Installing Rust, dependencies, and SedonaDB

SedonaDB is implemented in Rust and uses modern in-memory and columnar technologies. The project supports installing the Python wrapper (apache-sedona[db]) via pip.

Rust toolchain

If you want to build from source:

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
# follow prompts to install stable toolchain
source $HOME/.cargo/env

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
# follow prompts to install stable toolchain
source $HOME/.cargo/env

Ensure you have rustc --version and cargo --version.

Clone the SedonaDB repo

git clone https://github.com/apache/sedona-db.git
cd sedona-db

git clone https://github.com/apache/sedona-db.git
cd sedona-db

You may wish to check out a stable release tag (e.g. v0.1.0 or latest).

Build SedonaDB

Inside the repo directory:

cargo build --release

cargo build --release

This produces an optimized binary in target/release (e.g. sedonadb or similar).

Python wrapper & CLI

To make it accessible from Python, you’ll want to install the apache-sedona[db] Python package in a virtual environment:

apt install -y python3 python3-venv python3-pip
python3 -m venv venv
source venv/bin/activate
pip install --upgrade pip
pip install "apache-sedona[db]"

apt install -y python3 python3-venv python3-pip
python3 -m venv venv
source venv/bin/activate
pip install --upgrade pip
pip install "apache-sedona[db]"

This gives you access to sedona.db.connect() and corresponding SQL/CLI interfaces.

At this point you should have:

A compiled SedonaDB binary
Python environment with SedonaDB package

You can test with a minimal query:

import sedona.db
sd = sedona.db.connect()
res = sd.sql("SELECT ST_Point(0,0) AS geom")
res.show()

import sedona.db
sd = sedona.db.connect()
res = sd.sql("SELECT ST_Point(0,0) AS geom")
res.show()

If this returns a point geometry, you’re on the right track.

4. Service setup & configuration

You’ll want SedonaDB to run as a service, open a port, and accept connections.

Choose a port & firewall

Decide on a listening port (e.g. 54321 or 8888, whatever convention SedonaDB uses). Then permit that port:

ufw allow 54321

ufw allow 54321

Systemd service (example)

Create a systemd unit file /etc/systemd/system/sedonadb.service:

[Unit]
Description=SedonaDB single-node analytical server
After=network.target

[Service]
User=sedonadb
ExecStart=/home/sedonadb/sedona-db/target/release/sedonadb \
    --port 54321 --data-dir /home/sedonadb/data
Restart=on-failure

[Install]
WantedBy=multi-user.target

[Unit]
Description=SedonaDB single-node analytical server
After=network.target

[Service]
User=sedonadb
ExecStart=/home/sedonadb/sedona-db/target/release/sedonadb \
    --port 54321 --data-dir /home/sedonadb/data
Restart=on-failure

[Install]
WantedBy=multi-user.target

Adjust paths and user accordingly. Then enable and start:

systemctl daemon-reload
systemctl enable sedonadb
systemctl start sedonadb

systemctl daemon-reload
systemctl enable sedonadb
systemctl start sedonadb

Check logs with journalctl -u sedonadb -f.

5. Connecting remotely & client usage

With the service running and port open, you can connect:

From Python on another machine:

import sedona.db
sd = sedona.db.connect(host="203.0.113.45", port=54321)

import sedona.db
sd = sedona.db.connect(host="203.0.113.45", port=54321)

From your server itself:

sedonadb-cli --host localhost --port 54321

sedonadb-cli --host localhost --port 54321

Use SQL or spatial operations (just as shown earlier in examples).

You can now ingest GeoParquet, shapefiles, GeoJSON, or other supported formats, issue spatial joins, KNN, etc. SedonaDB retains CRS metadata through transformations.

6. Optional enhancements & production hardening

TLS / SSL encryption

To secure client connections, you can wrap SedonaDB behind a TLS proxy (e.g. Nginx or Caddy) or integrate TLS termination in the service if supported. For example, you can terminate TLS in Nginx and proxy to the SedonaDB port internally.

Persistent storage & backups

Ensure your data directory is backed by a reliable disk or volume. You can attach DigitalOcean Block Storage volumes and mount them at /mnt/data. Use regular snapshot or rsync backups. Also consider WAL (write-ahead log) or export strategies depending on how SedonaDB handles durability (check official docs for version you use).

Monitoring & logging

Install a monitoring agent (DigitalOcean’s agent, or Prometheus + node exporter) to track CPU, memory, disk I/O. Also rotate SedonaDB logs (via logrotate or journald policies).

Scaling

SedonaDB is single-node. If your workload grows beyond its capacity, you’ll need to scale the droplet vertically (more RAM/CPU) or migrate to a distributed setup (e.g. SedonaSpark or SedonaFlink).

Access controls

If SedonaDB supports authentication, configure user accounts and permissions. Otherwise, restrict network access (e.g. only allow specific IPs via firewall).

7. Troubleshooting & tips

If connecting fails, confirm port is listening (ss -tlnp | grep 54321) and firewall rules.
Use logs (journalctl -u sedonadb) to catch startup errors (e.g. missing dependencies, permission issues).
Ensure Python version compatibility; SedonaDB requires Python ≥ 3.8 for the apache-sedona[db] package.
For large datasets, allocate sufficient RAM to avoid swapping.
Always test core spatial operations (joins, KNN) early to catch missing features or CRS issues.

8. Example walk-through

Suppose you have a GeoParquet file roads.parquet and buildings.parquet, and you want to load these and run a spatial join.

CREATE TABLE roads AS SELECT * FROM read_parquet('/home/sedonadb/data/roads.parquet');
CREATE TABLE buildings AS SELECT * FROM read_parquet('/home/sedonadb/data/buildings.parquet');

SELECT b.id, COUNT(*) AS cnt
FROM buildings b, roads r
WHERE ST_Intersects(b.geom, r.geom)
GROUP BY b.id;

CREATE TABLE roads AS SELECT * FROM read_parquet('/home/sedonadb/data/roads.parquet');
CREATE TABLE buildings AS SELECT * FROM read_parquet('/home/sedonadb/data/buildings.parquet');

SELECT b.id, COUNT(*) AS cnt
FROM buildings b, roads r
WHERE ST_Intersects(b.geom, r.geom)
GROUP BY b.id;

You can run this from Python:

import sedona.db
sd = sedona.db.connect(host="203.0.113.45", port=54321)
q = """
    SELECT b.id, COUNT(*) AS cnt
    FROM read_parquet('buildings.parquet') b, read_parquet('roads.parquet') r
    WHERE ST_Intersects(b.geom, r.geom)
    GROUP BY b.id
"""
res = sd.sql(q).fetch_all()
print(res)

import sedona.db
sd = sedona.db.connect(host="203.0.113.45", port=54321)
q = """
    SELECT b.id, COUNT(*) AS cnt
    FROM read_parquet('buildings.parquet') b, read_parquet('roads.parquet') r
    WHERE ST_Intersects(b.geom, r.geom)
    GROUP BY b.id
"""
res = sd.sql(q).fetch_all()
print(res)

This demonstrates SedonaDB’s native spatial SQL support—no external spatial engine or layering is required.

Final thoughts

Hosting SedonaDB on DigitalOcean gives you full control over your geospatial analytics engine, with a balance between performance and simplicity. Because SedonaDB is designed for spatial-first operations on a single node, it complements cluster-based Sedona workflows. Use this setup for prototyping, dashboards, moderate spatial workloads, or as a stepping stone toward more distributed architectures.