data-dict.yaml is a lightweight, YAML-based data dictionary specification. It
describes a collection of related tables — their columns, types, constraints,
relationships, and the domain vocabulary you need to understand them — in a
single file that humans and AI agents can co-author and keep in sync with your
data.
Full documentation, including the detailed specification, lives at data-dict.tidyverse.org.
This repo contains two things:
- The specification — the prose definition of the format, in
site/spec.md(rendered at data-dict.tidyverse.org). - The CLI — a Rust command-line tool that validates a
data-dict.yamlfile against the spec and against the underlying data.
See the examples (source in
site/examples/) for complete data dictionaries, or the
overview for the motivation behind the
project.
The data-dict CLI validates dictionaries. It can:
- Check that a file is structurally valid and internally consistent
(
validate-schema). Pass a file, or a directory containing adata-dict.yaml(defaults to the current directory). - Compare a dictionary against a real Parquet file to confirm the data matches
what the dictionary claims (
parquet validate). - Print the column types of a Parquet file (
parquet types). - Print an embedded agent skill for reading or writing data dictionaries
(
skill read/skill write). - Print the full specification (
spec).
Build and install from source with Cargo:
cargo install --git https://github.com/tidyverse/data-dict data-dict-cliOr clone the repo and build locally:
git clone https://github.com/tidyverse/data-dict.git
cd data-dict
cargo build --workspace --release
# binary is at target/release/data-dictRun data-dict with no arguments to see the usage:
Usage: data-dict <COMMAND>
Commands:
validate-schema Validate a data-dict.yaml file or directory against the schema [default: .]
spec Print the data-dict.yaml specification
parquet types Print column types for a parquet file
parquet validate Validate a parquet file's columns against a data dictionary
skill read Skill for reading and understanding a data dictionary
skill write Skill for creating or updating a data dictionary
help Print this message or the help of the given subcommand(s)
This is a Rust workspace with three crates:
crates/data-dict/— core library: YAML parsing, schema validation, lowering to a typed model, and semantic linting.crates/data-dict-cli/— thin CLI wrapper.crates/data-dict-parquet/— reads Parquet schemas and maps column types to data-dict types.
cargo build --workspace
cargo test --workspaceThe website is a Quarto project in site/, published automatically to data-dict.tidyverse.org on every push to main.