Screening deployment

This guide covers deploying Marble with sanctions and watchlist screening. Screening requires Elasticsearch and two additional services: yente (indexer) and motiva (search engine). Continuous screening adds optional monitoring of internal watchlists.

Prerequisites

Ensure Marble is otherwise running (API server, worker, and database all functional). You'll need:

  • Elasticsearch (v9+) — standalone or managed service, reachable by yente and motiva
  • Container orchestration (Kubernetes, Cloud Run, or equivalent) for yente and motiva
  • Network connectivity between services (Elasticsearch, yente, motiva, Marble backend)

Step 1: Deploy Elasticsearch

Elasticsearch v9+ powers the search backend. Use a managed service (Elastic Cloud...) or self-host.

OpenSearch Compatibility

We do not provide official support for OpenSearch, but several users have reported no problems running Marble screening with OpenSearch.

Minimum Configuration

  • Cluster health: Green status with at least 1 node operational
  • Shards: At least 2 shards recommended for production deployments (YENTE_INDEX_SHARDS=2)
  • Storage: Allocate sufficient disk space for sanctions data (typically 50GB+)
  • Memory: 16GB+ heap size recommended

Managed Services

AWS OpenSearch:

  • Create domain via AWS Console
  • Enable encryption at rest and in transit
  • Configure IAM authentication or IP whitelist
  • Note the domain endpoint and credentials

Elastic Cloud:

  • Create deployment
  • Configure security settings
  • Retrieve credentials for yente and motiva

Verify connectivity:

curl http://elasticsearch:9200/_cluster/health
# Response: {"cluster_name":"elasticsearch","status":"green",...}

With authentication: Generate credentials and note the username/password for yente and motiva configuration.

Step 2: Deploy yente (Indexer)

yente indexes sanctions data into Elasticsearch. Run it as a scheduled job that triggers periodically (every 4-6 hours recommended).

Docker image: ghcr.io/opensanctions/yente:5.4.0 (check releases for latest)

Manifest file:

Create a manifest JSON file (see Manifest Configuration below). The manifest should be:

  • Accessible from both yente and motiva (via HTTP URL or mounted file)
  • Not authenticated — it contains no sensitive data
  • Protected by network isolation if needed

Required environment variables:

VariableExampleNotes
YENTE_INDEX_URLhttp://elasticsearch:9200Elasticsearch cluster URL
YENTE_MANIFESThttp://marble-backend:8080/screening-manifest.jsonPath or unauthenticated URL accessible from yente service
YENTE_INDEX_USERNAMEelasticRequired if Elasticsearch has auth
YENTE_INDEX_PASSWORDRequired if Elasticsearch has auth
YENTE_INDEX_SHARDS2Number of Elasticsearch shards (minimum 2 for production)

Optional environment variables:

VariableDefaultNotes
YENTE_INDEX_AUTO_REPLICAS0-allReplica configuration
SCREENING_INDEXER_TOKENRequired for continuous screening; token used to authenticate against Marble's catalog endpoint

Step 3: Deploy motiva (Search Engine)

Motiva handles screening queries from the Marble API. Run it as a long-lived service.

Docker image: ghcr.io/apognu/motiva:v0.8.1 (check releases for latest)

Required environment variables:

VariableExampleNotes
INDEX_URLhttp://elasticsearch:9200Elasticsearch cluster URL

With Elasticsearch authentication:

VariableExampleNotes
INDEX_AUTH_METHODbasic, bearer, api_key, or encoded_api_keyAuthentication method
INDEX_CLIENT_IDelasticUsername or API key ID (required for basic, api_key)
INDEX_CLIENT_SECRETPassword, token, or secret (required for all methods except none)

Optional environment variables:

VariableDefaultNotes
INDEX_NAMEyenteIndex name prefix (must match yente's YENTE_INDEX_NAME)
MANIFEST_URLPath or unauthenticated URL accessible from motiva service
SCREENING_INDEXER_TOKENToken for Marble catalog authentication (required for continuous screening)

Step 4: Configure Marble Backend

Environment variables to set on both API server and worker:

# URL of motiva instance (internal network address)
OPENSANCTIONS_API_HOST=http://motiva:8000

# Optional: token for continuous screening
SCREENING_INDEXER_TOKEN=<random-secure-token>

# Optional: blob storage for continuous screening datasets
CONTINUOUS_SCREENING_BUCKET_URL=s3://bucket-name/path
# or: gs://bucket-name/path
# or: file:///local/path (development only)

Manifest Configuration

The manifest is a JSON file that defines which data catalogs yente and motiva will use. Both services read from the same manifest.

Manifest location: Serve from S3, GCS, or mount as a file.

Basic manifest (public OpenSanctions only):

{
  "catalogs": [
    {
      "url": "https://data.opensanctions.org/datasets/latest/index.json",
      "scope": "default",
      "resource_name": "entities.ftm.json"
    }
  ]
}

With continuous screening (custom watchlists):

{
  "catalogs": [
    {
      "url": "https://data.opensanctions.org/datasets/latest/index.json",
      "scope": "default",
      "resource_name": "entities.ftm.json"
    },
    {
      "url": "https://your-marble-instance.com/screening-indexer/catalogs",
      "auth_token": "$SCREENING_INDEXER_TOKEN"
    }
  ]
}

Notes:

  • $SCREENING_INDEXER_TOKEN is a variable reference that yente/motiva resolve at runtime from the environment variable
  • Do not manually substitute the token value
  • For public URLs, ensure they're reachable from yente and motiva containers

Optional: Continuous Screening

Continuous screening monitors your internal watchlists for updates to external sanctions lists. This requires the setup above plus additional configuration.

Prerequisites

  • Screening (steps 1-4 above) must be deployed and working
  • Generate a secure token for SCREENING_INDEXER_TOKEN (used for API authentication):
    openssl rand -hex 32
  • Set SCREENING_INDEXER_TOKEN on Marble API, Marble worker, yente, and motiva

Configuration

1. Update Marble backend environment variables:

Set these on both API server and worker:

CONTINUOUS_SCREENING_BUCKET_URL=s3://your-bucket/continuous-screening
# or gs://your-bucket/continuous-screening
# or file:///local/path (for development)

SCREENING_INDEXER_TOKEN=<same-token-used-above>

2. Configure yente and motiva:

Pass the shared token and manifest URL (as described in Steps 2-3 above):

# yente
YENTE_MANIFEST=https://your-marble-instance.com/screening-manifest.json
SCREENING_INDEXER_TOKEN=<same-token>

# motiva
MANIFEST_URL=https://your-marble-instance.com/screening-manifest.json
SCREENING_INDEXER_TOKEN=<same-token>

How It Works

  1. Marble tracks monitored entities — When you configure continuous screening in the Marble UI, new and updated entities are tracked in the database
  2. Marble creates datasets — Periodic background jobs generate full/delta datasets in blob storage (S3, GCS...)
  3. yente indexes custom data — During reindexing, yente pulls datasets from Marble and indexes them into ES
  4. Marble monitors updates — Periodic jobs scan OpenSanctions for changes and match them against your entities
  5. Alerts are generated — Matches above a confidence threshold create continuous screening alerts

Environment Variables

Marble backend (API server and worker):

VariableRequiredDefaultNotes
CONTINUOUS_SCREENING_BUCKET_URLYesBlob storage for datasets: s3://, gs://, file://
SCREENING_INDEXER_TOKENYesShared secret for API authentication
CREATE_FULL_DATASET_INTERVALNo24hHow often to generate full datasets
SCAN_DATASET_UPDATES_INTERVALNo24hHow often to scan for OpenSanctions updates

Verification

Check motiva status

curl http://motiva:8000/catalog
# Should return the data catalog

Test motiva matching

Once motiva is running and yente has indexed data:

curl -X POST http://motiva:8000/match/default \
  -H "Content-Type: application/json" \
  -d '{
    "queries": {
      "entity1": {
        "schema": "Person",
        "properties": {
          "name": ["Vladimir Putin"]
        }
      }
    }
  }'

Expected: 200 response with matches and confidence scores.

Check Elasticsearch indices

curl http://elasticsearch:9200/_cat/indices
# Should list yente_* indices