Skip to Content
Unified docs shell with shared Classifyre tokens and acid-green highlight accents.
DeploymentDocker (dev / demo)

Docker

This image is not for production. It runs every service in a single container with an embedded database and no replication, high availability, or automated backups. Use the Kubernetes deployment for production workloads.

The Classifyre all-in-one Docker image bundles everything into a single container — PostgreSQL, the NestJS API, the Next.js web UI, and a Caddy reverse proxy. One command and the full application is running on port 3000.

Good for

  • Local development and feature exploration
  • Sales demos and proof-of-concept trials
  • Offline or air-gapped environments
  • CI integration tests against a real running instance

The image

ghcr.io/andrebanandre/unstructured

Available for linux/amd64 and linux/arm64. Tags follow the same scheme as the Kubernetes images — pin to a release version for reproducible demos.

docker pull ghcr.io/andrebanandre/unstructured:latest
 
# Pin to a specific release (recommended)
docker pull ghcr.io/andrebanandre/unstructured:0.1.8

What runs inside

All services start automatically via s6-overlay, a lightweight process supervisor that manages boot order and handles graceful shutdown.

Container  →  port 3000

└── s6-overlay (PID 1)
    ├── PostgreSQL 16    — database for all application data
    ├── NestJS API       — REST + WebSocket backend  (internal :8000)
    ├── Next.js Web      — dashboard UI              (internal :3100)
    └── Caddy            — reverse proxy, single public endpoint
                           /         → web UI
                           /api/*    → API
                           /socket.io/* → WebSocket

Prisma migrations run automatically every time the container starts. You never need to run them manually.


Quick start

Run without persistence

docker run --rm \
  -p 3000:3000 \
  ghcr.io/andrebanandre/unstructured:latest

Open http://localhost:3000 in your browser.

The API health endpoint is at http://localhost:3000/api/ping.

Without a volume, everything — sources, findings, settings, and credentials — is lost when the container stops.

Add a data volume

docker run --rm \
  -p 3000:3000 \
  -v classifyre-data:/data \
  ghcr.io/andrebanandre/unstructured:latest

With -v classifyre-data:/data the database and all application state survive container restarts and image upgrades.


Volumes

The container writes everything to /data. This single mount point covers the entire application state.

/data
├── postgres/        PostgreSQL data directory (all your sources, findings, detectors, jobs)
└── logs/
    ├── api.log
    ├── postgres.log
    └── caddy.log

What you lose without a volume

DataImpact if lost
PostgreSQL databaseAll sources, findings, custom detectors, and job history gone
Encryption keyStored connector credentials become permanently unreadable
LogsNo audit trail between sessions

Why the encryption key matters

Classifyre encrypts connector credentials (API tokens, passwords) at rest using CLASSIFYRE_MASKED_CONFIG_KEY. When no volume is mounted, a new random key is generated on every container start. Any credentials you saved in the previous session become unreadable because the key that encrypted them no longer exists.

Always mount a volume for any session where you configure real connectors.


Docker Compose

For demos that survive machine reboots, Docker Compose is simpler than bare docker run flags.

services:
  classifyre:
    image: ghcr.io/andrebanandre/unstructured:latest
    ports:
      - "3000:3000"
    volumes:
      - classifyre-data:/data
    environment:
      LOG_LEVEL: info
    restart: unless-stopped
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:3000/api/ping"]
      interval: 30s
      timeout: 10s
      retries: 5
      start_period: 60s
 
volumes:
  classifyre-data:
# Start in the background
docker compose up -d
 
# Follow logs
docker compose logs -f
 
# Stop (volume is preserved)
docker compose down

Environment variables

VariableDefaultDescription
CLASSIFYRE_MASKED_CONFIG_KEYauto-generated32-character key for encrypting connector credentials. Set explicitly if you need the same key across sessions without a persistent volume.
LOG_LEVELinfoLog verbosity: debug, info, warn, error.
NODE_ENVproductionRuntime environment.
docker run \
  -p 3000:3000 \
  -v classifyre-data:/data \
  -e LOG_LEVEL=debug \
  ghcr.io/andrebanandre/unstructured:latest

System requirements

MinimumRecommended
CPU1 core2 cores
RAM1 GB2 GB
Disk2 GB5 GB

Playwright (browser-based crawling) is bundled in the image. Connectors that use it consume an additional ~500 MB RAM per browser instance during active scans.


Upgrading

Pull the new image, stop the existing container, start again pointing at the same volume. Migrations run automatically.

docker pull ghcr.io/andrebanandre/unstructured:latest
 
docker compose down
docker compose up -d

The data volume is untouched by image upgrades.


Backup and restore

Even for demos, you may want to preserve a working state.

Backup:

docker run --rm \
  -v classifyre-data:/data \
  -v "$(pwd)/backups:/backup" \
  alpine \
  tar czf /backup/classifyre-$(date +%Y%m%d).tar.gz -C /data .

Restore:

docker run --rm \
  -v classifyre-data:/data \
  -v "$(pwd)/backups:/backup" \
  alpine \
  tar xzf /backup/classifyre-20240101.tar.gz -C /data

Troubleshooting

Container exits immediately

docker logs <container-id>

Common causes: port 3000 already in use (lsof -i :3000), insufficient disk space (docker system df).

Web UI loads but API returns errors

# Tail the API log
docker exec <container-id> tail -f /data/logs/api.log
 
# Check s6 service status
docker exec <container-id> s6-rc -a list

Verify health

curl -i http://localhost:3000/api/ping
# → 200 {"status":"ok"}

Moving to production

When you outgrow the single-container setup, deploy Classifyre on Kubernetes with proper separation of concerns, a managed database, and horizontal scaling.

Kubernetes Deployment

Last updated on