Introduction

DBL — DocDigitizer Benchmark Language

DBL is a declarative domain-specific language for defining and executing document extraction benchmarks. In a single script you declare which documents to process, which extraction engines to compare, how to run them, what metrics to compute, and how to visualize the results.

Your First Script

A DBL script is a benchmark block containing up to five inner blocks. The two required blocks are documents and engines.

benchmark "My First Benchmark" {

    documents {
        from dataset "my-dataset"
        where document.file_type in ["pdf"]
        limit 20
    }

    engines {
        engine "openai" {
            model "gpt-4o"
            mode "vision"
        }
        engine "anthropic" {
            model "claude-sonnet-4-5"
            mode "text"
        }
    }

    execution {
        runs 3
        mode parallel
        max_concurrent 4
    }

    analysis {
        compute accuracy
        compute timing {
            percentiles [50, 90, 95, 99]
            group_by engine
        }
        compute cost {
            group_by engine
        }
    }

    output {
        graph bar {
            title "Accuracy by Provider"
            x: engine.name
            y: accuracy.score
        }
    }
}

How to Run

  1. Open the DBL Editor from the app sidebar.
  2. Write or paste your script, or choose a template from the Templates Gallery.
  3. Click Validate to check for syntax and semantic errors.
  4. Click Run to start the benchmark execution.
  5. Monitor progress in the Executions panel; results appear as interactive charts.
Language Reference

Language Reference

Block Structure

Every DBL script is a benchmark block with a name string, containing one or more of the following blocks in order:

BlockRequiredPurpose
documentsYesSelect which documents to process from a named dataset
enginesYes*Declare one or more extraction engines — required when not using run blocks
runNoNamed execution lane with its own engines and ordered passes (multi-pass mode)
executionNoControl run count, parallelism, and concurrency limits
analysisNoSpecify which metrics to compute (accuracy, timing, cost)
outputNoDefine charts to generate from analysis results
throttleNoSimulate customer bandwidth constraints during document fetching
sweepNoRun the benchmark across multiple values of a parameter to find optimal configurations
labelsNoTag the benchmark (or a run) with metadata labels for filtering and grouping
includeNoInline a named snippet at the current insertion point

Keywords

All 50 reserved keywords grouped by their primary block context:

GroupKeywords
Top-levelbenchmark, documents, engines, execution, analysis, output
Documentsfrom, dataset, where, limit
Enginesengine, model, mode
Executionruns, parallel, sequential, max_concurrent, max_per_engine
Analysiscompute, accuracy, timing, cost, percentiles, group_by
Output / Graphgraph, bar, pie, line, candlestick, line_regression, histogram, normalized_histogram, title, series, x, y, label, value, category, open, high, low, close, values, bins, group_by
Filter operatorsin, contains, matches, and, or, not
Literalstrue, false
Throttlethrottle, bandwidth
Sweepsweep
Multi-pass / Runrun, pass, when, empty, include, labels
Prompts / Schemasprompts, schemas, select, pattern, set, expand
Post-extractionpost_extraction, extract_schemas, validate_extraction, consolidate, schema
Comparecompare, one_to_many, many_to_many, reference, min_engines, normalization, confidence_threshold

Operators

Comparison Operators

OperatorSymbolApplicable To
Greater than>integer, float, size, duration
Less than<integer, float, size, duration
Greater or equal>=integer, float, size, duration
Less or equal<=integer, float, size, duration
Equal==any type
Not equal!=any type

Keyword Operators

OperatorRoleExample
inMembership test against an arraydocument.file_type in ["pdf", "png"]
containsSubstring test (string fields)document.file_name contains "invoice"
matchesRegex test (string fields)document.file_name matches "^INV-[0-9]+"
andLogical conjunction (multiple where clauses)... and document.file_size > 10kb
orLogical disjunction... or document.category == "image"
notLogical negation (prefix)where not document.status == "error"

Literals

TypePatternExamples
String"..." (double-quoted, supports \\, \", \n, \t)"invoices", "gpt-4o"
Integer[0-9]+3, 50, 100
Float[0-9]+.[0-9]+0.95, 1.0
Size<int>(kb|mb|gb)100kb, 5mb, 2gb
Duration<int>(s|m|h)30s, 5m, 2h
Booleantrue | falsetrue
Array[ value, value, ... ]["pdf", "png"], [50, 90, 95, 99]

Comments

"dbl-comment"># This is a line comment

"dbl-comment">/*
  This is a block comment.
  It can span multiple lines.
*/

benchmark "Example" {
    documents {
        from dataset "my-dataset"  "dbl-comment"># inline comment
    }
}
Blocks Reference

Blocks Reference

Detailed syntax and semantics for each block.

documents

Selects which documents to include in the benchmark from a named dataset, with optional filtering and a row limit. This block is required.

documents {
    from dataset "dataset-name"
    where <field> <operator> <value>
    limit <number>
}

Available filter fields

FieldTypeDescription
document.file_typestringFile extension, e.g. "pdf", "png", "docx"
document.file_sizesize (bytes)File size; supports size unit literals (100kb, 5mb)
document.file_namestringOriginal file name including extension
document.categorystringBroad category: "pdf", "office", "image"
document.page_countintegerNumber of pages (where applicable)
document.statusstringProcessing status, e.g. "ready", "error"

Filter examples

documents {
    from dataset "invoices"

    "dbl-comment"># Type filter with array membership
    where document.file_type in ["pdf", "png"]

    "dbl-comment"># Size range
    where document.file_size > 100kb
    where document.file_size < 10mb

    "dbl-comment"># File name substring match
    where document.file_name contains "invoice"

    "dbl-comment"># Regex pattern
    where document.file_name matches "^INV-[0-9]{4}"

    "dbl-comment"># Exact status match
    where document.status == "ready"

    "dbl-comment"># Page count range
    where document.page_count >= 1
    where document.page_count <= 20

    limit 50
}

engines

Declares one or more extraction engines. Each engine maps to a provider and optionally specifies a model ID and extraction mode. This block is required and must contain at least one engine declaration.

engines {
    engine "ProviderName" {
        model "model-id"
        mode "text" | "vision" | "default"
    }
}

Supported providers and models

Provider nameExample model IDsModes
"openai"gpt-4o, gpt-4o-mini, o3-minitext, vision
"anthropic"claude-sonnet-4-5, claude-haiku-4-5text, vision
"google"gemini-2.5-flash, gemini-2.5-protext, vision
"mistral"mistral-large-latest, mistral-medium-latesttext, vision
"cohere"command-r-plus, command-rtext only
"azure"gpt-4o, gpt-4o-mini (deployment names)text, vision
"bedrock"anthropic.claude-3-sonnet, amazon.titan-text-expresstext, vision
"docdigitizer"(no model required)default
engines {
    engine "openai" {
        model "gpt-4o"
        mode "vision"
    }
    engine "anthropic" {
        model "claude-sonnet-4-5"
        mode "text"
    }
    engine "docdigitizer" {
        mode "default"
    }
}

execution

Controls how many times each document/engine combination is run, and whether runs happen in parallel or sequentially. This block is optional; defaults are shown below.

execution {
    runs <number>            "dbl-comment"># default: 1
    mode parallel | sequential   "dbl-comment"># default: sequential
    max_concurrent <number>  "dbl-comment"># default: 1 (parallel only)
    max_per_engine <number>  "dbl-comment"># default: unbounded
}
PropertyDefaultDescription
runs1Number of times each document is run through each engine
modesequentialparallel runs engines concurrently; sequential runs them one at a time
max_concurrent1Maximum number of simultaneous workers (parallel mode only)
max_per_engineunboundedMaximum concurrent workers per engine; must be <= runs
execution {
    runs 5
    mode parallel
    max_concurrent 10
    max_per_engine 3
}

analysis

Specifies which metrics the platform should compute after all extraction runs complete. Results are used by the output block to generate charts. This block is optional.

analysis {
    compute accuracy

    compute timing {
        percentiles [50, 90, 95, 99]
        group_by engine, model
    }

    compute cost {
        group_by engine
    }
}
MetricWhat it computes
compute accuracyCompares extraction output against ground-truth schema; populates accuracy.* properties
compute timingAggregates wall-clock durations; supports custom percentile arrays and group-by; populates timing.*
compute costAggregates token-based cost estimates; populates cost.*

output

Defines which charts to render. Each graph directive specifies a chart type and maps domain properties to visual axes. This block is optional.

output {
    graph <type> {
        title "Chart Title"
        <axis>: <domain.property>
    }
}

Property values use dot-access notation referencing domain objects. String values are used only for title.

throttle

Simulates customer bandwidth constraints by limiting the rate at which documents are fetched during a benchmark run. This is useful for testing how extraction engines behave under real-world network conditions. This block is optional; at most one throttle block is allowed per benchmark.

throttle {
    bandwidth 5mb
}
PropertyTypeDescription
bandwidthsize (bytes/sec)Maximum download rate for document fetching. Use size unit literals: kb, mb, gb (e.g. 512kb, 5mb, 1gb). Plain integers (raw bytes/sec) are also accepted.

Examples

"dbl-comment"># Simulate a slow broadband connection (5 MB/s)
throttle {
    bandwidth 5mb
}

"dbl-comment"># Simulate a constrained mobile connection (512 KB/s)
throttle {
    bandwidth 512kb
}

"dbl-comment"># Use raw bytes/sec for precise control
throttle {
    bandwidth 1048576   "dbl-comment"># 1 MB/s
}

A throttle block with no bandwidth directive has no effect and produces a warning. Bandwidth values ≤ 0 are a semantic error (SEM013). Very low bandwidths (below 100 KB/s) produce a warning about potential request timeouts.

sweep

Runs the benchmark once for each value in a list, varying a named execution parameter across the runs. Each value produces a separate, independent benchmark execution. This is useful for finding the optimal configuration for concurrency, run counts, or bandwidth limits. This block is optional.

"dbl-comment"># Syntax
sweep <variable> [<value>, <value>, ...]

Valid variables

VariableValue typeDescription
bandwidthsize (e.g. 1mb)Sweeps the throttle bandwidth limit — requires size literals
runsintegerSweeps the number of repetitions per document/engine pair
max_concurrentintegerSweeps the maximum number of simultaneous workers
max_per_engineintegerSweeps the per-engine concurrency limit

Examples

"dbl-comment"># Find the right bandwidth throttle for realistic testing
sweep bandwidth [1mb, 5mb, 10mb, 50mb]

"dbl-comment"># Measure how result quality changes with more repetitions
sweep runs [1, 3, 5, 10]

"dbl-comment"># Find the optimal concurrency level
sweep max_concurrent [1, 2, 4, 8, 16]

"dbl-comment"># Tune per-engine limits
sweep max_per_engine [1, 2, 3]

Each value in the sweep array produces a full, independent benchmark run. All values in a sweep array must be the same type — integers for runs, max_concurrent, and max_per_engine; size literals for bandwidth. Unknown variable names and empty value arrays are semantic errors (SEM014). Sweeps with more than 20 values produce a performance warning.

Multi-Pass Execution

Multi-Pass Execution

DBL v1.1 introduces run blocks — named execution lanes that contain their own engines and an ordered sequence of passes. Multi-pass benchmarks let you chain extraction steps so that later passes can branch based on the results of earlier ones.

Scripts that do not use run blocks continue to work unchanged — a flat engines block at benchmark level is treated as a single implicit run with one pass.

run

A run block declares a named execution lane. Each benchmark may contain multiple runs. Runs execute in declaration order. Each run specifies its own engines and an ordered list of passes.

run "run-name" {
    labels ["label1", "label2"]

    engine "provider" {
        model "model-id"
        mode "text" | "vision" | "default"
    }

    pass 1 "first-pass" {
        prompts { labels ["categorization"] }
    }

    pass 2 "second-pass" {
        when pass.1.category not empty
        prompts { labels ["schema_bounded"] }
    }
}

Engine selection inside a run

Engines may be declared explicitly, or selected from the model registry by labels:

run "explicit_engine" {
    engine "openai" { model "gpt-4o" mode "vision" }
    engine "anthropic" { model "claude-sonnet-4-20250514" }
}

run "label_based" {
    "dbl-comment"># Resolves to all active models with ALL listed labels
    engines { labels ["fast", "vision"] }
}

Semantic rules

CodeRule
SEM021Run name must be a non-empty string
SEM021Run must contain at least one engine declaration or engine label selection
SEM021Duplicate run names within the same benchmark are an error

pass

A pass block is a numbered execution step within a run. Passes run in ascending number order. Each pass may specify its own prompts, schemas, and post_extraction directives. A pass with no when clause always executes. A pass whose when evaluates to false is silently skipped.

pass <number> ["optional-name"] {
    [when <boolean-expression>]
    [prompts { ... }]
    [schemas { ... }]
    [post_extraction { ... }]
}

Examples

"dbl-comment"># Pass 1 — no condition, always runs
pass 1 "categorize" {
    prompts { labels ["categorization"] }
}

"dbl-comment"># Pass 2runs only when pass 1 returned a category
pass 2 "extract" {
    when pass.1.category not empty and pass.1.confidence > 0.5
    prompts { labels ["schema_bounded"] }
    schemas { labels ["invoices"] expand true }
}

"dbl-comment"># Pass 3 — fallback when categorization produced no category
pass 3 "fallback" {
    when pass.1.category empty
    prompts { labels ["freewheel"] }
}

Semantic rules

CodeRule
SEM022Pass number must be a positive integer >= 1
SEM022Pass numbers must be unique within the same run
SEM022A when clause may only reference passes with a lower number (no forward references)
SEM026Pass when clause references a pass number that is not lower than the current pass number

when expressions

The when clause is a boolean expression evaluated at runtime for each document, using the JSON result of prior passes as its context. It supports the same operators as where clauses, plus empty and not empty for presence testing.

Pass references

SyntaxMeaning
pass.1.field_nameAccess field_name in the result JSON of pass number 1
pass.categorize.field_nameAccess field_name in the result JSON of the pass named categorize

Operators

OperatorMeaningExample
emptyField is absent or nullpass.1.category empty
not emptyField is present and non-nullpass.1.category not empty
==Equalitypass.1.type == "invoice"
!=Inequalitypass.1.status != "error"
> < >= <=Numeric comparisonpass.1.confidence > 0.5
containsSubstring testpass.1.notes contains "urgent"
matchesRegex testpass.1.ref matches "^INV-"
inArray membershippass.1.type in ["invoice", "receipt"]
andLogical conjunctionpass.1.category not empty and pass.1.confidence > 0.5
orLogical disjunctionpass.1.type == "invoice" or pass.1.type == "receipt"
notLogical negation (prefix)not pass.1.verified == true

Operator precedence

Precedence from highest to lowest: not > and > or. Use parentheses to override:

"dbl-comment"># Compound condition
when pass.1.category not empty and pass.1.confidence > 0.5

"dbl-comment"># Type-based branching
when pass.1.type == "invoice" or pass.1.type == "receipt"

"dbl-comment"># Parenthesised override
when pass.1.category not empty and (pass.1.confidence > 0.8 or pass.1.manual_review == true)

"dbl-comment"># Fallback
when pass.1.category empty

Semantic rules

CodeRule
SEM023when expression is invalid, references an undefined pass, or uses a type-incompatible operator

include & snippets

The include directive inserts the content of a named snippet at the current position. Snippets are reusable DBL fragments stored in the platform (script type snippet). They may contain any valid block-level content.

include "snippet-name"

Contextual resolution

The snippet is pasted verbatim at the insertion point. What it may inject depends on where it appears:

benchmark "Multi-pass Test" {
    include "standard_analysis_output"     "dbl-comment"># injects analysis{} and output{} blocks

    documents { from dataset "Dataset1" }

    run "llm" {
        engine "openai" { model "gpt-4o" }
        include "two_pass_categorize"       "dbl-comment"># injects pass 1 and pass 2 definitions
    }

    analysis { compute accuracy }           "dbl-comment"># explicit block wins over the snippet's
}

Override rule

ContextRule
Benchmark levelIf the script and the snippet both declare the same block (e.g., analysis), the explicit block wins and the snippet's version is discarded.
Inside a run blockIf both declare pass N for the same number, the explicit pass wins.

Semantic rules

CodeRule
SEM024Snippet name must be a non-empty string; circular includes and depth > 5 are errors

labels & engine labels

labels directive

The labels directive tags the benchmark or a run with metadata strings. Labels are attached to every extraction result produced in that scope and are used for filtering and grouping in analysis and comparison.

benchmark "Invoice Pipeline" {
    labels ["experiment_q1", "invoices"]    "dbl-comment"># benchmark-level labels

    documents { from dataset "Dataset1" limit 100 }

    run "idp_baseline" {
        labels ["idp"]                      "dbl-comment"># run-level labels
        engine "docdigitizer" { mode "default" }
    }

    run "llm_two_pass" {
        labels ["llm", "multi_pass"]        "dbl-comment"># run-level labels
        engine "openai" { model "gpt-4o" mode "vision" }

        pass 1 "categorize" {
            prompts { labels ["categorization"] }
        }
    }
}

Engine selection by labels

Inside a run, engines can be selected from the model registry by label instead of declared explicitly. The executor resolves to all active models matching all listed labels.

run "fast_run" {
    "dbl-comment"># Selects all active models labelled "fast" AND "vision"
    engines { labels ["fast", "vision"] }

    pass 1 {
        prompts { labels ["categorization"] }
    }
}

Label-based engine selection requires models to be registered and marked active in the Model Registry. Explicit engine declarations andengines { labels [...] } may coexist in the same run.

Full multi-pass LLM vs IDP benchmark example

benchmark "Invoice Pipeline — LLM vs IDP" {
    labels ["experiment_q1", "invoices"]

    documents {
        from dataset "Dataset1"
        where document.file_type in ["pdf"]
        limit 100
    }

    "dbl-comment"># Run 1: DocDigitizer IDP baseline (single pass)
    run "idp_baseline" {
        labels ["idp"]
        engine "docdigitizer" { mode "default" }
    }

    "dbl-comment"># Run 2: Two-pass LLM pipeline
    run "llm_two_pass" {
        labels ["llm", "multi_pass"]
        engine "openai" { model "gpt-4o" mode "vision" }
        engine "anthropic" { model "claude-sonnet-4-20250514" }

        pass 1 "categorize" {
            prompts { labels ["categorization"] }
        }

        pass 2 "extract" {
            when pass.1.category not empty and pass.1.confidence > 0.5
            prompts { labels ["schema_bounded"] }
            schemas { labels ["invoices"] expand true }
        }

        pass 3 "fallback" {
            when pass.1.category empty
            prompts { labels ["freewheel"] }
        }
    }

    execution {
        runs 2
        mode parallel
        max_concurrent 6
    }

    schemas {
        labels ["invoices"]
        expand true
    }

    analysis {
        compute accuracy
        compute timing { percentiles [50, 90, 95] group_by engine }
        compute cost { group_by engine }
    }

    output {
        graph bar { title "Accuracy by Engine" x: engine.name y: accuracy.score }
        graph pie { title "Cost Distribution" value: cost.total label: engine.name }
    }

    compare many_to_many { }
}
Domain Objects

Domain Objects

These are the valid dot-access paths in filter expressions, group_by clauses, and graph property values. Any path not listed here is a semantic error.

document.*

PathTypeDescription
document.file_typestringFile extension (e.g. "pdf", "png", "docx")
document.file_sizesize (bytes)File size in bytes; supports size unit literals
document.file_namestringOriginal file name including extension
document.categorystringBroad category: "pdf", "office", "image"
document.page_countintegerNumber of pages (where applicable)
document.statusstringProcessing status string

engine.*

PathTypeDescription
engine.namestringProvider name (e.g. "openai", "docdigitizer")
engine.modelstringModel identifier (e.g. "gpt-4o")
engine.modestringProcessing mode: "text", "vision", "default"

extraction.*

Properties of a single extraction run (one document x one engine x one repetition).

PathTypeDescription
extraction.duration_msnumberWall-clock extraction time in milliseconds
extraction.input_tokensintegerNumber of input tokens consumed
extraction.output_tokensintegerNumber of output tokens produced
extraction.estimated_costnumberEstimated monetary cost in USD
extraction.statusstringResult status: "success", "error", "timeout"

timing.* after compute timing

PathTypeDescription
timing.minnumberMinimum duration across all runs (ms)
timing.maxnumberMaximum duration across all runs (ms)
timing.avgnumberMean duration (ms)
timing.p50number50th percentile / median duration (ms)
timing.p90number90th percentile duration (ms)
timing.p95number95th percentile duration (ms)
timing.p99number99th percentile duration (ms)

Non-standard percentiles requested via percentiles [...] are accessible as timing.p{N} where N is the requested integer.

accuracy.* after compute accuracy

PathTypeDescription
accuracy.scorenumberOverall accuracy ratio, 0.0 to 1.0
accuracy.fields_matchedintegerCount of correctly extracted fields
accuracy.fields_totalintegerTotal fields in the expected schema

cost.* after compute cost

PathTypeDescription
cost.totalnumberTotal cost across all runs (USD)
cost.averagenumberAverage cost per run (USD)
cost.per_documentnumberAverage cost per document (USD)
Visualization

Graph Types

Each graph type has specific required and optional axis properties. Missing required properties are reported as semantic errors (SEM005).

bar

Grouped bar chart. Best for comparing a metric across providers or models.

PropertyRequired
xYes — categorical axis (e.g. engine.name)
yYes — numeric value (e.g. accuracy.score)
titleNo — chart title string
seriesNo — grouping dimension (e.g. engine.model)
graph bar {
    title "Accuracy by Provider"
    x: engine.name
    y: accuracy.score
    series: engine.model
}
pie

Pie/donut chart. Best for showing proportional distribution (e.g. cost share).

PropertyRequired
valueYes — numeric slice size
labelYes — slice label (e.g. engine.name)
titleNo
graph pie {
    title "Cost Distribution"
    value: cost.total
    label: engine.name
}
line

Line chart. Best for time-series or ordered comparisons across multiple runs.

PropertyRequired
xYes — x-axis domain
yYes — y-axis metric
titleNo
seriesNo — multi-line grouping
graph line {
    title "Latency over Runs"
    x: extraction.duration_ms
    y: timing.avg
    series: engine.name
}
candlestick

Candlestick / box chart. Shows timing distribution (min, p50, p95, max) per provider.

PropertyRequired
categoryYes — x-axis category
openYes — lower quartile / p50
highYes — high whisker / p95
lowYes — low whisker / min
closeYes — upper quartile / avg
titleNo
graph candlestick {
    title "Time Distribution"
    category: engine.name
    open: timing.p50
    high: timing.p95
    low: timing.min
    close: timing.avg
}
line_regression

Scatter plot with regression line. Best for showing correlation (e.g. file size vs latency).

PropertyRequired
xYes — independent variable
yYes — dependent variable
seriesYes — color grouping
titleNo
graph line_regression {
    title "Size vs Latency"
    x: document.file_size
    y: extraction.duration_ms
    series: engine.name
}
histogram

Vertical bar histogram showing frequency distribution of values. Supports multiple series for comparing distributions across engines or documents.

PropertyRequired
valuesYes — numeric field to bin (e.g. extraction.duration_ms)
seriesNo — grouping dimension (e.g. engine.name)
binsNo — integer bucket count (default 10)
titleNo — chart title string
graph histogram {
    title "Extraction Time Distribution"
    values: extraction.duration_ms
    series: engine.name
    bins: 15
}
normalized_histogram

For each group (e.g. document), values are normalized to [0, 1] relative to the group's maximum. Shows relative performance independent of document complexity.

PropertyRequired
valuesYes — numeric field to bin (e.g. extraction.duration_ms)
group_byYes — normalization group (e.g. document.file_name)
seriesNo — comparison dimension (e.g. engine.name)
binsNo — integer bucket count (default 10)
titleNo — chart title string
graph normalized_histogram {
    title "Normalized Time Distribution"
    values: extraction.duration_ms
    group_by: document.file_name
    series: engine.name
}
Templates

Templates Gallery

Ready-made scripts to get you started quickly. Click Copy to copy the script source to your clipboard, then paste it into the DBL Editor.

Templates are loaded from the DBL Editor. Book a demo to access the full template library in the editor.

Quick StartOpen

Minimal benchmark with two providers and basic accuracy/timing analysis.

Invoice ExtractionOpen

PDF invoice extraction across OpenAI, Anthropic, and DocDigitizer.

Vision vs TextOpen

Compare vision and text extraction modes on the same provider.

Cost AnalysisOpen

Cost breakdown across all configured providers with group_by engine and model.

Timing Deep DiveOpen

Full percentile timing analysis (p50/p90/p95/p99) with candlestick chart.

Large File StudyOpen

Filter documents > 5mb and measure how file size correlates with latency.

Multi-Run StabilityOpen

Run each document 10 times to measure extraction consistency over repetitions.

Category ComparisonOpen

Separate accuracy benchmarks for PDF, image, and Office document categories.

Patterns

Common Snippets

Copy-paste patterns for the most frequent use cases.

Filter PDFs only
where document.file_type in ["pdf"]
Large files only (over 1 MB)
where document.file_size > 1mb
File name contains keyword
where document.file_name contains "invoice"
Parallel execution with limits
execution {
    runs 3
    mode parallel
    max_concurrent 5
    max_per_engine 2
}
Custom percentile array
compute timing {
    percentiles [25, 50, 75, 90, 95, 99]
    group_by engine
}
Exclude errored documents
where not document.status == "error"
Multi-page documents only
where document.page_count >= 2
File name regex match
where document.file_name matches "^INV-[0-9]+"
Throttle bandwidth (simulate 5 MB/s)
throttle {
    bandwidth 5mb
}
Sweep runs to measure consistency
sweep runs [1, 3, 5, 10]
Sweep bandwidth to find realistic throttle
sweep bandwidth [1mb, 5mb, 10mb, 50mb]
Sweep concurrency to tune parallel execution
sweep max_concurrent [1, 2, 4, 8, 16]
Errors

Error Reference

DBL reports errors with a code, category, line number, and description. The compiler collects all errors before stopping — you will see every issue in a single pass.

Lexical Errors (LEX)

CodeDescription
LEX001Unterminated string literal — missing closing "
LEX002Unknown escape sequence inside a string literal
LEX003Unterminated block comment — missing closing */
LEX004Unrecognized character in source input

Syntax Errors (PAR)

CodeDescription
PAR001Expected token not found (e.g. expected { after block name)
PAR002Unexpected token inside a block
PAR003Missing required block — documents or engines not present
PAR004Empty array literal [] — arrays must contain at least one element

Semantic Errors (SEM)

CodeDescription
SEM001Unknown dot-access path — the property does not exist in the domain model
SEM002Operator/type mismatch in where clause (e.g. > on a string field)
SEM003Duplicate engine name within the same engines block
SEM004Duplicate metric type in analysis (e.g. two compute timing directives)
SEM005Missing required graph property for the given graph type
SEM006Unknown graph property — not in the allowed set for this graph type
SEM007Invalid engine.mode value — must be "text", "vision", or "default"
SEM008limit must be a positive integer greater than zero
SEM009runs must be a positive integer >= 1
SEM010Duplicate property key inside an execution block
SEM011percentiles value out of range — each value must be between 1 and 99
SEM012Unknown group_by identifier — does not resolve to a valid domain property
SEM013bandwidth in a throttle block must be a positive value (> 0)
SEM014sweep has an unknown variable name, an empty values array, or mismatched value types
SEM015prompts block has an invalid select pattern or empty label values
SEM016schemas block has an invalid select/pattern or empty label values
SEM017post_extraction directive unknown, duplicate, or has an invalid property
SEM018compare block has an invalid mode, missing/invalid property, or reference engine not declared
SEM021run block has an empty name, no engine declarations, or a duplicate run name
SEM022pass block has an invalid number, a duplicate number within the run, or a forward pass reference in when
SEM023when expression is invalid, references an undefined pass, or uses a type-incompatible operator
SEM024include has an empty snippet name, circular include chain, or exceeds maximum include depth (5)
SEM025Engine label selection (engines { labels [...] }) has an empty array or non-string values
SEM026when clause references a pass number that is not lower than the current pass number