Skip to content

Getting your results

Extraction is asynchronous: POST /v1/extract returns immediately with one or more job_ids, and the actual text shows up later. There are two ways to collect it.

Pattern Best for What you do
Polling Quick scripts, sandbox experiments, batches of dozens Call GET /v1/jobs/{id} until status is terminal.
Webhooks Production, batches of hundreds+, always-on integrations Register a URL once; we POST signed payloads when jobs finish.

If in doubt, start with polling — every SDK ships an extract_and_wait helper that does it for you in one line. Switch to webhooks when you stop wanting to keep a poller process alive.


Polling

POST /v1/extract            →  { "job_ids": ["..."], "status": "pending" }
GET  /v1/jobs/{id}          →  { "id", "filename", "status", "result"? }

Statuses

status Meaning result.content present
pending Queued, not yet picked up no
processing Worker is extracting no
completed Done, full text available yes
partial_success Some pages/files failed, partial text available yes
failed Unrecoverable error no (error in error_message)

Treat completed, partial_success, and failed as terminal. Anything else means "poll again."

Cadence

The SDKs default to 1-second poll with exponential backoff (×2, capped at 30 s) and a 5-minute total timeout. Tune via poll_interval, timeout, and backoff options on wait_for_job / waitForJob. Don't poll faster than 1 s — rate limits apply.

Code

Python (SDK)
from kreuzberg_cloud import KreuzbergCloud

with KreuzbergCloud(api_key="...") as client:
    # One call: submit, poll, return when terminal.
    job = client.extract_and_wait(file="invoice.pdf")
    print(job.result.content)

    # Or split it if you need to do other work in between:
    accepted = client.extract(file="invoice.pdf")
    job = client.wait_for_job(accepted.job_ids[0], timeout=60)
TypeScript (SDK)
import { KreuzbergCloud } from "@kreuzberg/cloud";
import { readFile } from "node:fs/promises";

const client = new KreuzbergCloud({ apiKey: process.env.KREUZBERG_API_KEY! });
const data = await readFile("invoice.pdf");

const job = await client.extractAndWait({
  file: { name: "invoice.pdf", data, mimeType: "application/pdf" },
});
console.log(job.result?.content);
Go (SDK)
ctx := context.Background()
client, _ := kreuzbergcloud.New(kreuzbergcloud.WithAPIKey(os.Getenv("KREUZBERG_API_KEY")))
file, _ := os.Open("invoice.pdf")
defer file.Close()

result, err := client.ExtractAndWait(ctx,
    kreuzbergcloud.FileSource{Name: "invoice.pdf", Reader: file},
    nil,
)
Dart (SDK)
final accepted = await client.extractMultipart(
  files: [await MultipartFile.fromFile('invoice.pdf')],
);
final job = await client.waitForJob(accepted.jobIds.first);
print(job.result?.content);
curl
JOB=$(curl -sX POST https://api.kreuzberg.cloud/v1/extract \
  -H "Authorization: Bearer $KREUZBERG_API_KEY" \
  -F "file=@invoice.pdf" | jq -r '.job_ids[0]')

while [ "$(curl -s "https://api.kreuzberg.cloud/v1/jobs/$JOB" \
  -H "Authorization: Bearer $KREUZBERG_API_KEY" | jq -r .status)" \
  != "completed" ]; do sleep 1; done

curl -s "https://api.kreuzberg.cloud/v1/jobs/$JOB" \
  -H "Authorization: Bearer $KREUZBERG_API_KEY" | jq -r .result.content

Webhooks

Register a URL on your project, and Kreuzberg Cloud POSTs the result to it as soon as the job reaches a terminal status — no poller, no keep-alive process.

Set up a webhook

The fastest path: open the dashboard and add a webhook to your project. You'll need:

  • URL — must be https://. Should respond with 2xx quickly (we time out at 30 s).
  • Events — pick from job.completed, job.failed, job.cancelled.
  • Secret — leave blank to have one generated, or paste a 32+ byte random string. Save it once; we hash-store it.

Payload

We POST JSON like this:

{
  "event_id":      "01HZQ...",
  "job_id":        "550e8400-e29b-41d4-a716-446655440000",
  "project_id":    "1cbb9d72-660a-4df2-ba3d-66d83b6afaff",
  "status":        "completed",
  "error_message": null,
  "timestamp":     1747038551,
  "attempt_count": 1
}

Headers:

Header Value
Content-Type application/json
User-Agent kreuzberg-webhook/<version>
X-Webhook-Signature sha256=<hex> (only if you set a secret)
X-Idempotency-Key the event_id — use it to deduplicate retries

Then call GET /v1/jobs/{job_id} once with your API key to fetch the actual extracted text. Webhook payloads are intentionally small.

Verify the signature

X-Webhook-Signature is HMAC-SHA256 of the raw request body with your webhook secret, hex-encoded, prefixed with sha256=. Verify it before trusting the payload.

Python (FastAPI)
import hmac
import hashlib
from fastapi import FastAPI, Header, HTTPException, Request

SECRET = b"..."  # your webhook secret

app = FastAPI()

@app.post("/webhooks/kreuzberg")
async def receive(request: Request,
                  x_webhook_signature: str = Header(...)):
    body = await request.body()
    expected = "sha256=" + hmac.new(SECRET, body, hashlib.sha256).hexdigest()
    if not hmac.compare_digest(x_webhook_signature, expected):
        raise HTTPException(401, "bad signature")
    # ... fetch GET /v1/jobs/{job_id} and process
    return {"ok": True}
TypeScript (Node / Express)
import crypto from "node:crypto";
import express from "express";

const SECRET = process.env.KREUZBERG_WEBHOOK_SECRET!;
const app = express();

app.post("/webhooks/kreuzberg", express.raw({ type: "application/json" }),
  (req, res) => {
    const sig = req.header("x-webhook-signature") ?? "";
    const expected = "sha256=" + crypto
      .createHmac("sha256", SECRET)
      .update(req.body)
      .digest("hex");
    if (!crypto.timingSafeEqual(Buffer.from(sig), Buffer.from(expected))) {
      return res.status(401).end();
    }
    // ... fetch GET /v1/jobs/{job_id} and process
    res.json({ ok: true });
  });
Go (net/http)
import (
    "crypto/hmac"
    "crypto/sha256"
    "encoding/hex"
    "io"
    "net/http"
)

var secret = []byte("...")

func receive(w http.ResponseWriter, r *http.Request) {
    body, _ := io.ReadAll(r.Body)
    mac := hmac.New(sha256.New, secret)
    mac.Write(body)
    expected := "sha256=" + hex.EncodeToString(mac.Sum(nil))
    sig := r.Header.Get("X-Webhook-Signature")
    if !hmac.Equal([]byte(sig), []byte(expected)) {
        http.Error(w, "bad signature", http.StatusUnauthorized)
        return
    }
    // ... fetch GET /v1/jobs/{job_id} and process
    w.WriteHeader(http.StatusOK)
}

Retries

We deliver each event at least once. If your endpoint returns non-2xx or times out:

  • Up to 5 attempts total.
  • Backoff: 5 s → 30 s → 5 min (then dead-letter).
  • 4xx other than 429 is treated as permanent — we stop retrying.
  • 2xx, 429, 5xx, and connection errors are retried until the cap.

Use the event_id (also in X-Idempotency-Key) to deduplicate: the same event may arrive more than once if your endpoint responds slowly.

Testing

The dashboard has a Send test button that fires a synthetic payload at your URL with a real signature — use it to verify your handler before going live.