Architecture¶

Kreuzberg Cloud is the managed SaaS layer on top of Kreuzberg, the open-source document intelligence library. Documents go in via REST; text, tables, and metadata come out via REST polling or webhooks.

Request flow¶

sequenceDiagram
    participant C as Client
    participant A as API
    participant N as NATS JetStream
    participant W as Worker
    participant KV as Result cache

    C->>A: POST /v1/extract (multipart)
    A->>N: Publish JobMessage
    A-->>C: 202 Accepted + job_ids

    N->>W: Pull JobMessage
    W->>W: Kreuzberg extraction
    W->>KV: Store result
    W->>N: Publish completion event
    W->>N: ACK

    C->>A: GET /v1/jobs/{id}
    A->>KV: Lookup result
    A-->>C: 200 OK + result

Extraction is async — POST /v1/extract returns 202 with one or more job_ids, and the actual text shows up later. Use polling for quick scripts, webhooks for production.

Components¶

API — public REST surface (/v1/extract, /v1/jobs/{id}). Authenticates the request, validates the project quota, and publishes a JobMessage to NATS JetStream. Returns immediately with the job_ids.
Workers — pull JobMessages from JetStream, run Kreuzberg extraction on the documents, write results to the NATS KV result cache (12-hour TTL), and emit a completion event. Two pools: a CPU pool for text/table extraction and a GPU pool for layout and embedding work. Jobs are routed to one pool or the other based on the ExtractionConfig (job.extract.standard.> vs job.extract.gpu.>).
Webhook delivery — consumes completion events and POSTs signed payloads to registered URLs. At-least-once delivery with HMAC-SHA256 signatures.
Result cache — NATS KV bucket keyed by {project_id}.{job_id}, 12-hour TTL. Sub-millisecond reads on GET /v1/jobs/{id}. PostgreSQL holds the full history beyond TTL.

Data model¶

Project — billing, quota, and webhook scope. Each request resolves to a project_id; PostgreSQL Row-Level Security enforces tenant isolation.
API key — kz_ prefix (live) or sk_sandbox_ (anonymous sandbox). SHA-256 hashed before lookup, cached in NATS KV for sub-ms validation. See Authentication.
Job — one document or one batch entry. Carries either inline bytes (≤1 MB) or a storage reference to GCS/S3.
Result — { content, metadata, tables?, chunks? }. Same shape as the open-source Kreuzberg ExtractionResult.

Scaling¶

API and webhook delivery scale via Kubernetes HPA on CPU/memory.
Workers scale via KEDA on NATS JetStream consumer lag. The GPU pool scales to zero between bursts; the CPU pool keeps a warm baseline.
Large documents (>10 MB) are chunked and processed in parallel; chunks rejoin into a single result.

Self-hosting¶

The same extraction engine ships as the open-source kreuzberg library and Docker image. Cloud differentiates on managed infrastructure (NATS, PostgreSQL, GCS/S3, KEDA), multi-tenancy, billing, and webhook delivery — not on extraction quality. If you'd rather run the extractor yourself, the Cloud REST shape and the open-source ExtractionResult are intentionally close.

Web crawling is handled by the sister kreuzcrawl project. Today the Cloud extract endpoint takes bytes; crawl-then-extract pipelines run kreuzcrawl on the client side or via the open-source library.

Edit this page on GitHub