Here’s a detailed, practical guide to deploying and self-hosting SedonaDB on DigitalOcean. I assume you already understand the basics of SedonaDB (its geospatial-native design, SQL/Python APIs, etc.). If not, you may want to read the intro we drafted earlier.
Before you begin, make sure you have an active DigitalOcean account and appropriate billing setup, since you’ll provision droplets, storage, and potentially networking services.
Why host SedonaDB yourself
SedonaDB is a single-node analytical database built from the ground up to treat spatial data as first-class (i.e. geometry types, CRS awareness, spatial joins, etc.). It fits a niche between lightweight tools like DuckDB (with spatial extensions) and full relational spatial databases like PostGIS. You may prefer self-hosting when you want full control over hardware, data sovereignty, or avoid managed database costs.
On DigitalOcean, you can provision a droplet (Virtual Machine) and set up SedonaDB there, giving you a dedicated environment you can scale (vertically) and configure to your needs.
In this guide I’ll walk you through:
- Droplet provisioning & basic setup
- Installing dependencies, Rust toolchain, and SedonaDB
- Configuration (service, ports, security)
- Connecting to SedonaDB remotely or via Python
- Optional enhancements (backups, TLS, monitoring)
Let’s begin.
1. Provision a Droplet
- Log into DigitalOcean’s dashboard and click Create → Droplets.
- Choose an image — I recommend Ubuntu 24.04 LTS (or latest stable LTS).
- Choose the size (vCPUs, RAM, disk) based on your expected workload. For moderate geospatial queries, 4 vCPUs / 8–16 GB RAM is often a safe starting point; you can scale up later.
- Choose a region close to your users.
- Add your SSH keys (so you can SSH in securely).
- Optionally enable monitoring, backups, private networking, and other features.
- Click “Create Droplet”.
Once the droplet is ready, note its public IP address (e.g. 203.0.113.45
).
2. Initial server setup
SSH into your machine:
ssh root@203.0.113.45
I recommend these initial steps:
- Add a non-root user (e.g.
sedonadb
) and give it sudo privileges. - Harden SSH (disable root login, change port if desired).
- Install basic packages:
apt update
apt upgrade -y
apt install -y build-essential curl git \
ca-certificates pkg-config libssl-dev
- Enable a firewall (ufw) and allow SSH:
ufw allow OpenSSH
ufw enable
This gives you a safe baseline before installing SedonaDB.
3. Installing Rust, dependencies, and SedonaDB
SedonaDB is implemented in Rust and uses modern in-memory and columnar technologies. The project supports installing the Python wrapper (apache-sedona[db]
) via pip.
Rust toolchain
If you want to build from source:
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
# follow prompts to install stable toolchain
source $HOME/.cargo/env
Ensure you have rustc --version
and cargo --version
.
Clone the SedonaDB repo
git clone https://github.com/apache/sedona-db.git
cd sedona-db
You may wish to check out a stable release tag (e.g. v0.1.0
or latest).
Build SedonaDB
Inside the repo directory:
cargo build --release
This produces an optimized binary in target/release
(e.g. sedonadb
or similar).
Python wrapper & CLI
To make it accessible from Python, you’ll want to install the apache-sedona[db]
Python package in a virtual environment:
apt install -y python3 python3-venv python3-pip
python3 -m venv venv
source venv/bin/activate
pip install --upgrade pip
pip install "apache-sedona[db]"
This gives you access to sedona.db.connect()
and corresponding SQL/CLI interfaces.
At this point you should have:
- A compiled SedonaDB binary
- Python environment with SedonaDB package
You can test with a minimal query:
import sedona.db
sd = sedona.db.connect()
res = sd.sql("SELECT ST_Point(0,0) AS geom")
res.show()
If this returns a point geometry, you’re on the right track.
4. Service setup & configuration
You’ll want SedonaDB to run as a service, open a port, and accept connections.
Choose a port & firewall
Decide on a listening port (e.g. 54321
or 8888
, whatever convention SedonaDB uses). Then permit that port:
ufw allow 54321
Systemd service (example)
Create a systemd unit file /etc/systemd/system/sedonadb.service
:
[Unit]
Description=SedonaDB single-node analytical server
After=network.target
[Service]
User=sedonadb
ExecStart=/home/sedonadb/sedona-db/target/release/sedonadb \
--port 54321 --data-dir /home/sedonadb/data
Restart=on-failure
[Install]
WantedBy=multi-user.target
Adjust paths and user accordingly. Then enable and start:
systemctl daemon-reload
systemctl enable sedonadb
systemctl start sedonadb
Check logs with journalctl -u sedonadb -f
.
5. Connecting remotely & client usage
With the service running and port open, you can connect:
- From Python on another machine:
import sedona.db
sd = sedona.db.connect(host="203.0.113.45", port=54321)
- From your server itself:
sedonadb-cli --host localhost --port 54321
- Use SQL or spatial operations (just as shown earlier in examples).
You can now ingest GeoParquet, shapefiles, GeoJSON, or other supported formats, issue spatial joins, KNN, etc. SedonaDB retains CRS metadata through transformations.
6. Optional enhancements & production hardening
TLS / SSL encryption
To secure client connections, you can wrap SedonaDB behind a TLS proxy (e.g. Nginx or Caddy) or integrate TLS termination in the service if supported. For example, you can terminate TLS in Nginx and proxy to the SedonaDB port internally.
Persistent storage & backups
Ensure your data directory is backed by a reliable disk or volume. You can attach DigitalOcean Block Storage volumes and mount them at /mnt/data
. Use regular snapshot or rsync
backups. Also consider WAL (write-ahead log) or export strategies depending on how SedonaDB handles durability (check official docs for version you use).
Monitoring & logging
Install a monitoring agent (DigitalOcean’s agent, or Prometheus + node exporter) to track CPU, memory, disk I/O. Also rotate SedonaDB logs (via logrotate
or journald policies).
Scaling
SedonaDB is single-node. If your workload grows beyond its capacity, you’ll need to scale the droplet vertically (more RAM/CPU) or migrate to a distributed setup (e.g. SedonaSpark or SedonaFlink).
Access controls
If SedonaDB supports authentication, configure user accounts and permissions. Otherwise, restrict network access (e.g. only allow specific IPs via firewall).
7. Troubleshooting & tips
- If connecting fails, confirm port is listening (
ss -tlnp | grep 54321
) and firewall rules. - Use logs (
journalctl -u sedonadb
) to catch startup errors (e.g. missing dependencies, permission issues). - Ensure Python version compatibility; SedonaDB requires Python ≥ 3.8 for the
apache-sedona[db]
package. - For large datasets, allocate sufficient RAM to avoid swapping.
- Always test core spatial operations (joins, KNN) early to catch missing features or CRS issues.
8. Example walk-through
Suppose you have a GeoParquet file roads.parquet
and buildings.parquet
, and you want to load these and run a spatial join.
CREATE TABLE roads AS SELECT * FROM read_parquet('/home/sedonadb/data/roads.parquet');
CREATE TABLE buildings AS SELECT * FROM read_parquet('/home/sedonadb/data/buildings.parquet');
SELECT b.id, COUNT(*) AS cnt
FROM buildings b, roads r
WHERE ST_Intersects(b.geom, r.geom)
GROUP BY b.id;
You can run this from Python:
import sedona.db
sd = sedona.db.connect(host="203.0.113.45", port=54321)
q = """
SELECT b.id, COUNT(*) AS cnt
FROM read_parquet('buildings.parquet') b, read_parquet('roads.parquet') r
WHERE ST_Intersects(b.geom, r.geom)
GROUP BY b.id
"""
res = sd.sql(q).fetch_all()
print(res)
This demonstrates SedonaDB’s native spatial SQL support—no external spatial engine or layering is required.
Final thoughts
Hosting SedonaDB on DigitalOcean gives you full control over your geospatial analytics engine, with a balance between performance and simplicity. Because SedonaDB is designed for spatial-first operations on a single node, it complements cluster-based Sedona workflows. Use this setup for prototyping, dashboards, moderate spatial workloads, or as a stepping stone toward more distributed architectures.