Architecture¶
Kreuzberg Cloud is the managed SaaS layer on top of Kreuzberg, the open-source document intelligence library. Documents go in via REST; text, tables, and metadata come out via REST polling or webhooks.
Request flow¶
sequenceDiagram
participant C as Client
participant A as API
participant N as NATS JetStream
participant W as Worker
participant KV as Result cache
C->>A: POST /v1/extract (multipart)
A->>N: Publish JobMessage
A-->>C: 202 Accepted + job_ids
N->>W: Pull JobMessage
W->>W: Kreuzberg extraction
W->>KV: Store result
W->>N: Publish completion event
W->>N: ACK
C->>A: GET /v1/jobs/{id}
A->>KV: Lookup result
A-->>C: 200 OK + result
Extraction is async — POST /v1/extract returns 202 with one or more
job_ids, and the actual text shows up later. Use polling for
quick scripts, webhooks for production.
Components¶
- API — public REST surface (
/v1/extract,/v1/jobs/{id}). Authenticates the request, validates the project quota, and publishes aJobMessageto NATS JetStream. Returns immediately with thejob_ids. - Workers — pull
JobMessages from JetStream, run Kreuzberg extraction on the documents, write results to the NATS KV result cache (12-hour TTL), and emit a completion event. Two pools: a CPU pool for text/table extraction and a GPU pool for layout and embedding work. Jobs are routed to one pool or the other based on theExtractionConfig(job.extract.standard.>vsjob.extract.gpu.>). - Webhook delivery — consumes completion events and POSTs signed payloads to registered URLs. At-least-once delivery with HMAC-SHA256 signatures.
- Result cache — NATS KV bucket keyed by
{project_id}.{job_id}, 12-hour TTL. Sub-millisecond reads onGET /v1/jobs/{id}. PostgreSQL holds the full history beyond TTL.
Data model¶
- Project — billing, quota, and webhook scope. Each request resolves to a
project_id; PostgreSQL Row-Level Security enforces tenant isolation. - API key —
kz_prefix (live) orsk_sandbox_(anonymous sandbox). SHA-256 hashed before lookup, cached in NATS KV for sub-ms validation. See Authentication. - Job — one document or one batch entry. Carries either inline bytes (≤1 MB) or a storage reference to GCS/S3.
- Result —
{ content, metadata, tables?, chunks? }. Same shape as the open-source KreuzbergExtractionResult.
Scaling¶
- API and webhook delivery scale via Kubernetes HPA on CPU/memory.
- Workers scale via KEDA on NATS JetStream consumer lag. The GPU pool scales to zero between bursts; the CPU pool keeps a warm baseline.
- Large documents (>10 MB) are chunked and processed in parallel; chunks rejoin into a single result.
Self-hosting¶
The same extraction engine ships as the open-source kreuzberg
library and Docker image. Cloud differentiates on managed
infrastructure (NATS, PostgreSQL, GCS/S3, KEDA), multi-tenancy, billing, and
webhook delivery — not on extraction quality. If you'd rather run the
extractor yourself, the Cloud REST shape and the open-source ExtractionResult
are intentionally close.
Web crawling is handled by the sister kreuzcrawl project.
Today the Cloud extract endpoint takes bytes; crawl-then-extract pipelines
run kreuzcrawl on the client side or via the open-source library.