DBL — DocDigitizer Benchmark Language
DBL is a declarative domain-specific language for defining and executing document extraction benchmarks. In a single script you declare which documents to process, which extraction engines to compare, how to run them, what metrics to compute, and how to visualize the results.
Your First Script
A DBL script is a benchmark block containing up to five inner blocks. The two required blocks are documents and engines.
benchmark "My First Benchmark" { documents { from dataset "my-dataset" where document.file_type in ["pdf"] limit 20 } engines { engine "openai" { model "gpt-4o" mode "vision" } engine "anthropic" { model "claude-sonnet-4-5" mode "text" } } execution { runs 3 mode parallel max_concurrent 4 } analysis { compute accuracy compute timing { percentiles [50, 90, 95, 99] group_by engine } compute cost { group_by engine } } output { graph bar { title "Accuracy by Provider" x: engine.name y: accuracy.score } } }
How to Run
- Open the DBL Editor from the app sidebar.
- Write or paste your script, or choose a template from the Templates Gallery.
- Click Validate to check for syntax and semantic errors.
- Click Run to start the benchmark execution.
- Monitor progress in the Executions panel; results appear as interactive charts.
Language Reference
Block Structure
Every DBL script is a benchmark block with a name string, containing one or more of the following blocks in order:
| Block | Required | Purpose |
|---|---|---|
documents | Yes | Select which documents to process from a named dataset |
engines | Yes* | Declare one or more extraction engines — required when not using run blocks |
run | No | Named execution lane with its own engines and ordered passes (multi-pass mode) |
execution | No | Control run count, parallelism, and concurrency limits |
analysis | No | Specify which metrics to compute (accuracy, timing, cost) |
output | No | Define charts to generate from analysis results |
throttle | No | Simulate customer bandwidth constraints during document fetching |
sweep | No | Run the benchmark across multiple values of a parameter to find optimal configurations |
labels | No | Tag the benchmark (or a run) with metadata labels for filtering and grouping |
include | No | Inline a named snippet at the current insertion point |
Keywords
All 50 reserved keywords grouped by their primary block context:
| Group | Keywords |
|---|---|
| Top-level | benchmark, documents, engines, execution, analysis, output |
| Documents | from, dataset, where, limit |
| Engines | engine, model, mode |
| Execution | runs, parallel, sequential, max_concurrent, max_per_engine |
| Analysis | compute, accuracy, timing, cost, percentiles, group_by |
| Output / Graph | graph, bar, pie, line, candlestick, line_regression, histogram, normalized_histogram, title, series, x, y, label, value, category, open, high, low, close, values, bins, group_by |
| Filter operators | in, contains, matches, and, or, not |
| Literals | true, false |
| Throttle | throttle, bandwidth |
| Sweep | sweep |
| Multi-pass / Run | run, pass, when, empty, include, labels |
| Prompts / Schemas | prompts, schemas, select, pattern, set, expand |
| Post-extraction | post_extraction, extract_schemas, validate_extraction, consolidate, schema |
| Compare | compare, one_to_many, many_to_many, reference, min_engines, normalization, confidence_threshold |
Operators
Comparison Operators
| Operator | Symbol | Applicable To |
|---|---|---|
| Greater than | > | integer, float, size, duration |
| Less than | < | integer, float, size, duration |
| Greater or equal | >= | integer, float, size, duration |
| Less or equal | <= | integer, float, size, duration |
| Equal | == | any type |
| Not equal | != | any type |
Keyword Operators
| Operator | Role | Example |
|---|---|---|
in | Membership test against an array | document.file_type in ["pdf", "png"] |
contains | Substring test (string fields) | document.file_name contains "invoice" |
matches | Regex test (string fields) | document.file_name matches "^INV-[0-9]+" |
and | Logical conjunction (multiple where clauses) | ... and document.file_size > 10kb |
or | Logical disjunction | ... or document.category == "image" |
not | Logical negation (prefix) | where not document.status == "error" |
Literals
| Type | Pattern | Examples |
|---|---|---|
| String | "..." (double-quoted, supports \\, \", \n, \t) | "invoices", "gpt-4o" |
| Integer | [0-9]+ | 3, 50, 100 |
| Float | [0-9]+.[0-9]+ | 0.95, 1.0 |
| Size | <int>(kb|mb|gb) | 100kb, 5mb, 2gb |
| Duration | <int>(s|m|h) | 30s, 5m, 2h |
| Boolean | true | false | true |
| Array | [ value, value, ... ] | ["pdf", "png"], [50, 90, 95, 99] |
Comments
"dbl-comment"># This is a line comment "dbl-comment">/* This is a block comment. It can span multiple lines. */ benchmark "Example" { documents { from dataset "my-dataset" "dbl-comment"># inline comment } }
Blocks Reference
Detailed syntax and semantics for each block.
documents
Selects which documents to include in the benchmark from a named dataset, with optional filtering and a row limit. This block is required.
documents { from dataset "dataset-name" where <field> <operator> <value> limit <number> }
Available filter fields
| Field | Type | Description |
|---|---|---|
document.file_type | string | File extension, e.g. "pdf", "png", "docx" |
document.file_size | size (bytes) | File size; supports size unit literals (100kb, 5mb) |
document.file_name | string | Original file name including extension |
document.category | string | Broad category: "pdf", "office", "image" |
document.page_count | integer | Number of pages (where applicable) |
document.status | string | Processing status, e.g. "ready", "error" |
Filter examples
documents { from dataset "invoices" "dbl-comment"># Type filter with array membership where document.file_type in ["pdf", "png"] "dbl-comment"># Size range where document.file_size > 100kb where document.file_size < 10mb "dbl-comment"># File name substring match where document.file_name contains "invoice" "dbl-comment"># Regex pattern where document.file_name matches "^INV-[0-9]{4}" "dbl-comment"># Exact status match where document.status == "ready" "dbl-comment"># Page count range where document.page_count >= 1 where document.page_count <= 20 limit 50 }
engines
Declares one or more extraction engines. Each engine maps to a provider and optionally specifies a model ID and extraction mode. This block is required and must contain at least one engine declaration.
engines { engine "ProviderName" { model "model-id" mode "text" | "vision" | "default" } }
Supported providers and models
| Provider name | Example model IDs | Modes |
|---|---|---|
"openai" | gpt-4o, gpt-4o-mini, o3-mini | text, vision |
"anthropic" | claude-sonnet-4-5, claude-haiku-4-5 | text, vision |
"google" | gemini-2.5-flash, gemini-2.5-pro | text, vision |
"mistral" | mistral-large-latest, mistral-medium-latest | text, vision |
"cohere" | command-r-plus, command-r | text only |
"azure" | gpt-4o, gpt-4o-mini (deployment names) | text, vision |
"bedrock" | anthropic.claude-3-sonnet, amazon.titan-text-express | text, vision |
"docdigitizer" | (no model required) | default |
engines { engine "openai" { model "gpt-4o" mode "vision" } engine "anthropic" { model "claude-sonnet-4-5" mode "text" } engine "docdigitizer" { mode "default" } }
execution
Controls how many times each document/engine combination is run, and whether runs happen in parallel or sequentially. This block is optional; defaults are shown below.
execution { runs <number> "dbl-comment"># default: 1 mode parallel | sequential "dbl-comment"># default: sequential max_concurrent <number> "dbl-comment"># default: 1 (parallel only) max_per_engine <number> "dbl-comment"># default: unbounded }
| Property | Default | Description |
|---|---|---|
runs | 1 | Number of times each document is run through each engine |
mode | sequential | parallel runs engines concurrently; sequential runs them one at a time |
max_concurrent | 1 | Maximum number of simultaneous workers (parallel mode only) |
max_per_engine | unbounded | Maximum concurrent workers per engine; must be <= runs |
execution { runs 5 mode parallel max_concurrent 10 max_per_engine 3 }
analysis
Specifies which metrics the platform should compute after all extraction runs complete. Results are used by the output block to generate charts. This block is optional.
analysis { compute accuracy compute timing { percentiles [50, 90, 95, 99] group_by engine, model } compute cost { group_by engine } }
| Metric | What it computes |
|---|---|
compute accuracy | Compares extraction output against ground-truth schema; populates accuracy.* properties |
compute timing | Aggregates wall-clock durations; supports custom percentile arrays and group-by; populates timing.* |
compute cost | Aggregates token-based cost estimates; populates cost.* |
output
Defines which charts to render. Each graph directive specifies a chart type and maps domain properties to visual axes. This block is optional.
output { graph <type> { title "Chart Title" <axis>: <domain.property> } }
Property values use dot-access notation referencing domain objects. String values are used only for title.
throttle
Simulates customer bandwidth constraints by limiting the rate at which documents are fetched during a benchmark run. This is useful for testing how extraction engines behave under real-world network conditions. This block is optional; at most one throttle block is allowed per benchmark.
throttle { bandwidth 5mb }
| Property | Type | Description |
|---|---|---|
bandwidth | size (bytes/sec) | Maximum download rate for document fetching. Use size unit literals: kb, mb, gb (e.g. 512kb, 5mb, 1gb). Plain integers (raw bytes/sec) are also accepted. |
Examples
"dbl-comment"># Simulate a slow broadband connection (5 MB/s) throttle { bandwidth 5mb } "dbl-comment"># Simulate a constrained mobile connection (512 KB/s) throttle { bandwidth 512kb } "dbl-comment"># Use raw bytes/sec for precise control throttle { bandwidth 1048576 "dbl-comment"># 1 MB/s }
A throttle block with no bandwidth directive has no effect and produces a warning. Bandwidth values ≤ 0 are a semantic error (SEM013). Very low bandwidths (below 100 KB/s) produce a warning about potential request timeouts.
sweep
Runs the benchmark once for each value in a list, varying a named execution parameter across the runs. Each value produces a separate, independent benchmark execution. This is useful for finding the optimal configuration for concurrency, run counts, or bandwidth limits. This block is optional.
"dbl-comment"># Syntax sweep <variable> [<value>, <value>, ...]
Valid variables
| Variable | Value type | Description |
|---|---|---|
bandwidth | size (e.g. 1mb) | Sweeps the throttle bandwidth limit — requires size literals |
runs | integer | Sweeps the number of repetitions per document/engine pair |
max_concurrent | integer | Sweeps the maximum number of simultaneous workers |
max_per_engine | integer | Sweeps the per-engine concurrency limit |
Examples
"dbl-comment"># Find the right bandwidth throttle for realistic testing sweep bandwidth [1mb, 5mb, 10mb, 50mb] "dbl-comment"># Measure how result quality changes with more repetitions sweep runs [1, 3, 5, 10] "dbl-comment"># Find the optimal concurrency level sweep max_concurrent [1, 2, 4, 8, 16] "dbl-comment"># Tune per-engine limits sweep max_per_engine [1, 2, 3]
Each value in the sweep array produces a full, independent benchmark run. All values in a sweep array must be the same type — integers for runs, max_concurrent, and max_per_engine; size literals for bandwidth. Unknown variable names and empty value arrays are semantic errors (SEM014). Sweeps with more than 20 values produce a performance warning.
Multi-Pass Execution
DBL v1.1 introduces run blocks — named execution lanes that contain their own engines and an ordered sequence of passes. Multi-pass benchmarks let you chain extraction steps so that later passes can branch based on the results of earlier ones.
Scripts that do not use run blocks continue to work unchanged — a flat engines block at benchmark level is treated as a single implicit run with one pass.
run
A run block declares a named execution lane. Each benchmark may contain multiple runs. Runs execute in declaration order. Each run specifies its own engines and an ordered list of passes.
run "run-name" { labels ["label1", "label2"] engine "provider" { model "model-id" mode "text" | "vision" | "default" } pass 1 "first-pass" { prompts { labels ["categorization"] } } pass 2 "second-pass" { when pass.1.category not empty prompts { labels ["schema_bounded"] } } }
Engine selection inside a run
Engines may be declared explicitly, or selected from the model registry by labels:
run "explicit_engine" { engine "openai" { model "gpt-4o" mode "vision" } engine "anthropic" { model "claude-sonnet-4-20250514" } } run "label_based" { "dbl-comment"># Resolves to all active models with ALL listed labels engines { labels ["fast", "vision"] } }
Semantic rules
| Code | Rule |
|---|---|
SEM021 | Run name must be a non-empty string |
SEM021 | Run must contain at least one engine declaration or engine label selection |
SEM021 | Duplicate run names within the same benchmark are an error |
pass
A pass block is a numbered execution step within a run. Passes run in ascending number order. Each pass may specify its own prompts, schemas, and post_extraction directives. A pass with no when clause always executes. A pass whose when evaluates to false is silently skipped.
pass <number> ["optional-name"] { [when <boolean-expression>] [prompts { ... }] [schemas { ... }] [post_extraction { ... }] }
Examples
"dbl-comment"># Pass 1 — no condition, always runs pass 1 "categorize" { prompts { labels ["categorization"] } } "dbl-comment"># Pass 2 — runs only when pass 1 returned a category pass 2 "extract" { when pass.1.category not empty and pass.1.confidence > 0.5 prompts { labels ["schema_bounded"] } schemas { labels ["invoices"] expand true } } "dbl-comment"># Pass 3 — fallback when categorization produced no category pass 3 "fallback" { when pass.1.category empty prompts { labels ["freewheel"] } }
Semantic rules
| Code | Rule |
|---|---|
SEM022 | Pass number must be a positive integer >= 1 |
SEM022 | Pass numbers must be unique within the same run |
SEM022 | A when clause may only reference passes with a lower number (no forward references) |
SEM026 | Pass when clause references a pass number that is not lower than the current pass number |
when expressions
The when clause is a boolean expression evaluated at runtime for each document, using the JSON result of prior passes as its context. It supports the same operators as where clauses, plus empty and not empty for presence testing.
Pass references
| Syntax | Meaning |
|---|---|
pass.1.field_name | Access field_name in the result JSON of pass number 1 |
pass.categorize.field_name | Access field_name in the result JSON of the pass named categorize |
Operators
| Operator | Meaning | Example |
|---|---|---|
empty | Field is absent or null | pass.1.category empty |
not empty | Field is present and non-null | pass.1.category not empty |
== | Equality | pass.1.type == "invoice" |
!= | Inequality | pass.1.status != "error" |
> < >= <= | Numeric comparison | pass.1.confidence > 0.5 |
contains | Substring test | pass.1.notes contains "urgent" |
matches | Regex test | pass.1.ref matches "^INV-" |
in | Array membership | pass.1.type in ["invoice", "receipt"] |
and | Logical conjunction | pass.1.category not empty and pass.1.confidence > 0.5 |
or | Logical disjunction | pass.1.type == "invoice" or pass.1.type == "receipt" |
not | Logical negation (prefix) | not pass.1.verified == true |
Operator precedence
Precedence from highest to lowest: not > and > or. Use parentheses to override:
"dbl-comment"># Compound condition when pass.1.category not empty and pass.1.confidence > 0.5 "dbl-comment"># Type-based branching when pass.1.type == "invoice" or pass.1.type == "receipt" "dbl-comment"># Parenthesised override when pass.1.category not empty and (pass.1.confidence > 0.8 or pass.1.manual_review == true) "dbl-comment"># Fallback when pass.1.category empty
Semantic rules
| Code | Rule |
|---|---|
SEM023 | when expression is invalid, references an undefined pass, or uses a type-incompatible operator |
include & snippets
The include directive inserts the content of a named snippet at the current position. Snippets are reusable DBL fragments stored in the platform (script type snippet). They may contain any valid block-level content.
include "snippet-name"
Contextual resolution
The snippet is pasted verbatim at the insertion point. What it may inject depends on where it appears:
benchmark "Multi-pass Test" { include "standard_analysis_output" "dbl-comment"># injects analysis{} and output{} blocks documents { from dataset "Dataset1" } run "llm" { engine "openai" { model "gpt-4o" } include "two_pass_categorize" "dbl-comment"># injects pass 1 and pass 2 definitions } analysis { compute accuracy } "dbl-comment"># explicit block wins over the snippet's }
Override rule
| Context | Rule |
|---|---|
| Benchmark level | If the script and the snippet both declare the same block (e.g., analysis), the explicit block wins and the snippet's version is discarded. |
| Inside a run block | If both declare pass N for the same number, the explicit pass wins. |
Semantic rules
| Code | Rule |
|---|---|
SEM024 | Snippet name must be a non-empty string; circular includes and depth > 5 are errors |
labels & engine labels
labels directive
The labels directive tags the benchmark or a run with metadata strings. Labels are attached to every extraction result produced in that scope and are used for filtering and grouping in analysis and comparison.
benchmark "Invoice Pipeline" { labels ["experiment_q1", "invoices"] "dbl-comment"># benchmark-level labels documents { from dataset "Dataset1" limit 100 } run "idp_baseline" { labels ["idp"] "dbl-comment"># run-level labels engine "docdigitizer" { mode "default" } } run "llm_two_pass" { labels ["llm", "multi_pass"] "dbl-comment"># run-level labels engine "openai" { model "gpt-4o" mode "vision" } pass 1 "categorize" { prompts { labels ["categorization"] } } } }
Engine selection by labels
Inside a run, engines can be selected from the model registry by label instead of declared explicitly. The executor resolves to all active models matching all listed labels.
run "fast_run" { "dbl-comment"># Selects all active models labelled "fast" AND "vision" engines { labels ["fast", "vision"] } pass 1 { prompts { labels ["categorization"] } } }
Label-based engine selection requires models to be registered and marked active in the Model Registry. Explicit engine declarations andengines { labels [...] } may coexist in the same run.
Full multi-pass LLM vs IDP benchmark example
benchmark "Invoice Pipeline — LLM vs IDP" { labels ["experiment_q1", "invoices"] documents { from dataset "Dataset1" where document.file_type in ["pdf"] limit 100 } "dbl-comment"># Run 1: DocDigitizer IDP baseline (single pass) run "idp_baseline" { labels ["idp"] engine "docdigitizer" { mode "default" } } "dbl-comment"># Run 2: Two-pass LLM pipeline run "llm_two_pass" { labels ["llm", "multi_pass"] engine "openai" { model "gpt-4o" mode "vision" } engine "anthropic" { model "claude-sonnet-4-20250514" } pass 1 "categorize" { prompts { labels ["categorization"] } } pass 2 "extract" { when pass.1.category not empty and pass.1.confidence > 0.5 prompts { labels ["schema_bounded"] } schemas { labels ["invoices"] expand true } } pass 3 "fallback" { when pass.1.category empty prompts { labels ["freewheel"] } } } execution { runs 2 mode parallel max_concurrent 6 } schemas { labels ["invoices"] expand true } analysis { compute accuracy compute timing { percentiles [50, 90, 95] group_by engine } compute cost { group_by engine } } output { graph bar { title "Accuracy by Engine" x: engine.name y: accuracy.score } graph pie { title "Cost Distribution" value: cost.total label: engine.name } } compare many_to_many { } }
Domain Objects
These are the valid dot-access paths in filter expressions, group_by clauses, and graph property values. Any path not listed here is a semantic error.
document.*
| Path | Type | Description |
|---|---|---|
document.file_type | string | File extension (e.g. "pdf", "png", "docx") |
document.file_size | size (bytes) | File size in bytes; supports size unit literals |
document.file_name | string | Original file name including extension |
document.category | string | Broad category: "pdf", "office", "image" |
document.page_count | integer | Number of pages (where applicable) |
document.status | string | Processing status string |
engine.*
| Path | Type | Description |
|---|---|---|
engine.name | string | Provider name (e.g. "openai", "docdigitizer") |
engine.model | string | Model identifier (e.g. "gpt-4o") |
engine.mode | string | Processing mode: "text", "vision", "default" |
extraction.*
Properties of a single extraction run (one document x one engine x one repetition).
| Path | Type | Description |
|---|---|---|
extraction.duration_ms | number | Wall-clock extraction time in milliseconds |
extraction.input_tokens | integer | Number of input tokens consumed |
extraction.output_tokens | integer | Number of output tokens produced |
extraction.estimated_cost | number | Estimated monetary cost in USD |
extraction.status | string | Result status: "success", "error", "timeout" |
timing.* after compute timing
| Path | Type | Description |
|---|---|---|
timing.min | number | Minimum duration across all runs (ms) |
timing.max | number | Maximum duration across all runs (ms) |
timing.avg | number | Mean duration (ms) |
timing.p50 | number | 50th percentile / median duration (ms) |
timing.p90 | number | 90th percentile duration (ms) |
timing.p95 | number | 95th percentile duration (ms) |
timing.p99 | number | 99th percentile duration (ms) |
Non-standard percentiles requested via percentiles [...] are accessible as timing.p{N} where N is the requested integer.
accuracy.* after compute accuracy
| Path | Type | Description |
|---|---|---|
accuracy.score | number | Overall accuracy ratio, 0.0 to 1.0 |
accuracy.fields_matched | integer | Count of correctly extracted fields |
accuracy.fields_total | integer | Total fields in the expected schema |
cost.* after compute cost
| Path | Type | Description |
|---|---|---|
cost.total | number | Total cost across all runs (USD) |
cost.average | number | Average cost per run (USD) |
cost.per_document | number | Average cost per document (USD) |
Graph Types
Each graph type has specific required and optional axis properties. Missing required properties are reported as semantic errors (SEM005).
Grouped bar chart. Best for comparing a metric across providers or models.
| Property | Required |
|---|---|
x | Yes — categorical axis (e.g. engine.name) |
y | Yes — numeric value (e.g. accuracy.score) |
title | No — chart title string |
series | No — grouping dimension (e.g. engine.model) |
graph bar { title "Accuracy by Provider" x: engine.name y: accuracy.score series: engine.model }
Pie/donut chart. Best for showing proportional distribution (e.g. cost share).
| Property | Required |
|---|---|
value | Yes — numeric slice size |
label | Yes — slice label (e.g. engine.name) |
title | No |
graph pie { title "Cost Distribution" value: cost.total label: engine.name }
Line chart. Best for time-series or ordered comparisons across multiple runs.
| Property | Required |
|---|---|
x | Yes — x-axis domain |
y | Yes — y-axis metric |
title | No |
series | No — multi-line grouping |
graph line { title "Latency over Runs" x: extraction.duration_ms y: timing.avg series: engine.name }
Candlestick / box chart. Shows timing distribution (min, p50, p95, max) per provider.
| Property | Required |
|---|---|
category | Yes — x-axis category |
open | Yes — lower quartile / p50 |
high | Yes — high whisker / p95 |
low | Yes — low whisker / min |
close | Yes — upper quartile / avg |
title | No |
graph candlestick { title "Time Distribution" category: engine.name open: timing.p50 high: timing.p95 low: timing.min close: timing.avg }
Scatter plot with regression line. Best for showing correlation (e.g. file size vs latency).
| Property | Required |
|---|---|
x | Yes — independent variable |
y | Yes — dependent variable |
series | Yes — color grouping |
title | No |
graph line_regression { title "Size vs Latency" x: document.file_size y: extraction.duration_ms series: engine.name }
Vertical bar histogram showing frequency distribution of values. Supports multiple series for comparing distributions across engines or documents.
| Property | Required |
|---|---|
values | Yes — numeric field to bin (e.g. extraction.duration_ms) |
series | No — grouping dimension (e.g. engine.name) |
bins | No — integer bucket count (default 10) |
title | No — chart title string |
graph histogram { title "Extraction Time Distribution" values: extraction.duration_ms series: engine.name bins: 15 }
For each group (e.g. document), values are normalized to [0, 1] relative to the group's maximum. Shows relative performance independent of document complexity.
| Property | Required |
|---|---|
values | Yes — numeric field to bin (e.g. extraction.duration_ms) |
group_by | Yes — normalization group (e.g. document.file_name) |
series | No — comparison dimension (e.g. engine.name) |
bins | No — integer bucket count (default 10) |
title | No — chart title string |
graph normalized_histogram { title "Normalized Time Distribution" values: extraction.duration_ms group_by: document.file_name series: engine.name }
Templates Gallery
Ready-made scripts to get you started quickly. Click Copy to copy the script source to your clipboard, then paste it into the DBL Editor.
Templates are loaded from the DBL Editor. Book a demo to access the full template library in the editor.
Minimal benchmark with two providers and basic accuracy/timing analysis.
PDF invoice extraction across OpenAI, Anthropic, and DocDigitizer.
Compare vision and text extraction modes on the same provider.
Cost breakdown across all configured providers with group_by engine and model.
Full percentile timing analysis (p50/p90/p95/p99) with candlestick chart.
Filter documents > 5mb and measure how file size correlates with latency.
Run each document 10 times to measure extraction consistency over repetitions.
Separate accuracy benchmarks for PDF, image, and Office document categories.
Common Snippets
Copy-paste patterns for the most frequent use cases.
where document.file_type in ["pdf"]
where document.file_size > 1mb
where document.file_name contains "invoice"
execution { runs 3 mode parallel max_concurrent 5 max_per_engine 2 }
compute timing { percentiles [25, 50, 75, 90, 95, 99] group_by engine }
where not document.status == "error"
where document.page_count >= 2
where document.file_name matches "^INV-[0-9]+"
throttle { bandwidth 5mb }
sweep runs [1, 3, 5, 10]
sweep bandwidth [1mb, 5mb, 10mb, 50mb]
sweep max_concurrent [1, 2, 4, 8, 16]
Error Reference
DBL reports errors with a code, category, line number, and description. The compiler collects all errors before stopping — you will see every issue in a single pass.
Lexical Errors (LEX)
| Code | Description |
|---|---|
LEX001 | Unterminated string literal — missing closing " |
LEX002 | Unknown escape sequence inside a string literal |
LEX003 | Unterminated block comment — missing closing */ |
LEX004 | Unrecognized character in source input |
Syntax Errors (PAR)
| Code | Description |
|---|---|
PAR001 | Expected token not found (e.g. expected { after block name) |
PAR002 | Unexpected token inside a block |
PAR003 | Missing required block — documents or engines not present |
PAR004 | Empty array literal [] — arrays must contain at least one element |
Semantic Errors (SEM)
| Code | Description |
|---|---|
SEM001 | Unknown dot-access path — the property does not exist in the domain model |
SEM002 | Operator/type mismatch in where clause (e.g. > on a string field) |
SEM003 | Duplicate engine name within the same engines block |
SEM004 | Duplicate metric type in analysis (e.g. two compute timing directives) |
SEM005 | Missing required graph property for the given graph type |
SEM006 | Unknown graph property — not in the allowed set for this graph type |
SEM007 | Invalid engine.mode value — must be "text", "vision", or "default" |
SEM008 | limit must be a positive integer greater than zero |
SEM009 | runs must be a positive integer >= 1 |
SEM010 | Duplicate property key inside an execution block |
SEM011 | percentiles value out of range — each value must be between 1 and 99 |
SEM012 | Unknown group_by identifier — does not resolve to a valid domain property |
SEM013 | bandwidth in a throttle block must be a positive value (> 0) |
SEM014 | sweep has an unknown variable name, an empty values array, or mismatched value types |
SEM015 | prompts block has an invalid select pattern or empty label values |
SEM016 | schemas block has an invalid select/pattern or empty label values |
SEM017 | post_extraction directive unknown, duplicate, or has an invalid property |
SEM018 | compare block has an invalid mode, missing/invalid property, or reference engine not declared |
SEM021 | run block has an empty name, no engine declarations, or a duplicate run name |
SEM022 | pass block has an invalid number, a duplicate number within the run, or a forward pass reference in when |
SEM023 | when expression is invalid, references an undefined pass, or uses a type-incompatible operator |
SEM024 | include has an empty snippet name, circular include chain, or exceeds maximum include depth (5) |
SEM025 | Engine label selection (engines { labels [...] }) has an empty array or non-string values |
SEM026 | when clause references a pass number that is not lower than the current pass number |