Skip to content

Architecture

Kreuzberg Cloud is the managed SaaS layer on top of Kreuzberg, the open-source document intelligence library. Documents go in via REST; text, tables, and metadata come out via REST polling or webhooks.

Request flow

sequenceDiagram
    participant C as Client
    participant A as API
    participant N as NATS JetStream
    participant W as Worker
    participant KV as Result cache

    C->>A: POST /v1/extract (multipart)
    A->>N: Publish JobMessage
    A-->>C: 202 Accepted + job_ids

    N->>W: Pull JobMessage
    W->>W: Kreuzberg extraction
    W->>KV: Store result
    W->>N: Publish completion event
    W->>N: ACK

    C->>A: GET /v1/jobs/{id}
    A->>KV: Lookup result
    A-->>C: 200 OK + result

Extraction is async — POST /v1/extract returns 202 with one or more job_ids, and the actual text shows up later. Use polling for quick scripts, webhooks for production.

Components

  • API — public REST surface (/v1/extract, /v1/jobs/{id}). Authenticates the request, validates the project quota, and publishes a JobMessage to NATS JetStream. Returns immediately with the job_ids.
  • Workers — pull JobMessages from JetStream, run Kreuzberg extraction on the documents, write results to the NATS KV result cache (12-hour TTL), and emit a completion event. Two pools: a CPU pool for text/table extraction and a GPU pool for layout and embedding work. Jobs are routed to one pool or the other based on the ExtractionConfig (job.extract.standard.> vs job.extract.gpu.>).
  • Webhook delivery — consumes completion events and POSTs signed payloads to registered URLs. At-least-once delivery with HMAC-SHA256 signatures.
  • Result cache — NATS KV bucket keyed by {project_id}.{job_id}, 12-hour TTL. Sub-millisecond reads on GET /v1/jobs/{id}. PostgreSQL holds the full history beyond TTL.

Data model

  • Project — billing, quota, and webhook scope. Each request resolves to a project_id; PostgreSQL Row-Level Security enforces tenant isolation.
  • API keykz_ prefix (live) or sk_sandbox_ (anonymous sandbox). SHA-256 hashed before lookup, cached in NATS KV for sub-ms validation. See Authentication.
  • Job — one document or one batch entry. Carries either inline bytes (≤1 MB) or a storage reference to GCS/S3.
  • Result{ content, metadata, tables?, chunks? }. Same shape as the open-source Kreuzberg ExtractionResult.

Scaling

  • API and webhook delivery scale via Kubernetes HPA on CPU/memory.
  • Workers scale via KEDA on NATS JetStream consumer lag. The GPU pool scales to zero between bursts; the CPU pool keeps a warm baseline.
  • Large documents (>10 MB) are chunked and processed in parallel; chunks rejoin into a single result.

Self-hosting

The same extraction engine ships as the open-source kreuzberg library and Docker image. Cloud differentiates on managed infrastructure (NATS, PostgreSQL, GCS/S3, KEDA), multi-tenancy, billing, and webhook delivery — not on extraction quality. If you'd rather run the extractor yourself, the Cloud REST shape and the open-source ExtractionResult are intentionally close.

Web crawling is handled by the sister kreuzcrawl project. Today the Cloud extract endpoint takes bytes; crawl-then-extract pipelines run kreuzcrawl on the client side or via the open-source library.

Edit this page on GitHub