Add derivative filename validation by astewartau · Pull Request #70 · bids-standard/python-validator

astewartau · 2026-03-05T23:47:40Z

Summary

Adds schema.rules.files.deriv to the regex chain in BIDSValidator._init_regexes() so derivative filenames are recognized as valid BIDS
Derivative-specific entities (space, desc, res, den, label, atlas, seg, hemi, scale) are now parsed correctly by parse() and accepted by is_bids()

Before

>>> BIDSValidator.is_bids("/sub-01/anat/sub-01_space-MNI152_desc-preproc_T1w.nii.gz")
False
>>> BIDSValidator.parse("/sub-01/anat/sub-01_space-MNI152_desc-preproc_T1w.nii.gz")
{}

After

>>> BIDSValidator.is_bids("/sub-01/anat/sub-01_space-MNI152_desc-preproc_T1w.nii.gz")
True
>>> BIDSValidator.parse("/sub-01/anat/sub-01_space-MNI152_desc-preproc_T1w.nii.gz")
{'subject': '01', 'datatype': 'anat', 'space': 'MNI152', 'description': 'preproc', 'suffix': 'T1w', 'extension': '.nii.gz'}

Closes #62

Test plan

New tests/test_derivatives.py with 12 parametrized tests (valid filenames, entity parsing, invalid filenames)
Existing tests pass

Include schema.rules.files.deriv in the regex chain so derivative filenames (with entities like space, desc, res, den) are recognized as valid BIDS. Closes #62

astewartau · 2026-03-05T23:53:56Z

One question to discuss: Should derivative filenames only be accepted under bids/derivatives/? This is currently not the case under this PR.

These tested general BIDS validation (wrong extensions, missing entities) rather than anything derivative-specific.

codecov · 2026-03-06T00:42:16Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 90.98%. Comparing base (878d57d) to head (640b2fc).

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #70      +/-   ##
==========================================
+ Coverage   90.89%   90.98%   +0.08%     
==========================================
  Files          13       14       +1     
  Lines         846      854       +8     
  Branches      124      124              
==========================================
+ Hits          769      777       +8     
  Misses         47       47              
  Partials       30       30

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

effigies · 2026-03-06T01:46:49Z

I just want to check what your goal here is. Is this intended to flesh out the BIDSValidator object for PYBIDS use, or is it to work on the full validator?

If it's the former, then you're right that we should distinguish between raw and derivative datasets, instead of declaring derivative files always valid. I might suggest giving BIDSValidator.__init__ a dataset_type parameter that enables derivative rules.

If it's the latter, then we need to do some design. Here's the current validation loop:

python-validator/src/bids_validator/__main__.py

Lines 55 to 71 in 878d57d

    
           def validate(tree: FileTree, schema: Namespace) -> None: 
        
               """Check if the file path is BIDS compliant. 
        
               Parameters 
        
               ---------- 
        
               tree : FileTree 
        
                   Full FileTree object to iterate over and check 
        
               schema : Namespace 
        
                   Schema object to validate dataset against 
        
               """ 
        
               validator = BIDSValidator() 
        
               dataset = Dataset(tree, schema) 
        
               for file in walk(tree, dataset): 
        
                   if not validator.is_bids(file.path): 
        
                       print(f'{file.path} is not a valid bids filename')

file is a Context:

python-validator/src/bids_validator/context.py

Lines 378 to 482 in 878d57d

    
           @attrs.define 
        
           class Context: 
        
               """A context object that creates context for file on access.""" 
        
               file: FileTree 
        
               dataset: Dataset 
        
               subject: ctx.Subject | None 
        
               file_parts: FileParts = attrs.field(init=False) 
        
               def __attrs_post_init__(self) -> None: 
        
                   self.file_parts = FileParts.from_file(self.file, self.schema) 
        
               @property 
        
               def schema(self) -> Namespace: 
        
                   """The BIDS specification schema.""" 
        
                   return self.dataset.schema 
        
               @property 
        
               def path(self) -> str: 
        
                   """Path of the current file.""" 
        
                   return self.file_parts.path 
        
               @property 
        
               def entities(self) -> dict[str, str | None]: 
        
                   """Entities parsed from the current filename.""" 
        
                   return self.file_parts.entities 
        
               @property 
        
               def datatype(self) -> str | None: 
        
                   """Datatype of current file, for examples, anat.""" 
        
                   return self.file_parts.datatype 
        
               @property 
        
               def suffix(self) -> str | None: 
        
                   """Suffix of current file.""" 
        
                   return self.file_parts.suffix 
        
               @property 
        
               def extension(self) -> str | None: 
        
                   """Extension of current file including initial dot.""" 
        
                   return self.file_parts.extension 
        
               @property 
        
               def modality(self) -> str | None: 
        
                   """Modality of current file, for examples, MRI.""" 
        
                   if (datatype := self.file_parts.datatype) is not None: 
        
                       return datatype_to_modality(datatype, self.schema) 
        
                   return None 
        
               @property 
        
               def size(self) -> int: 
        
                   """Length of the current file in bytes.""" 
        
                   return self.file.path_obj.stat().st_size 
        
               @property 
        
               def associations(self) -> ctx.Associations: 
        
                   """Associated files, indexed by suffix, selected according to the inheritance principle.""" 
        
                   return ctx.Associations() 
        
               @property 
        
               def columns(self) -> Namespace | None: 
        
                   """TSV columns, indexed by column header, values are arrays with column contents.""" 
        
                   if self.extension == '.tsv': 
        
                       return load_tsv(self.file) 
        
                   elif self.extension == '.tsv.gz': 
        
                       columns = tuple(self.sidecar.Columns) if self.sidecar else () 
        
                       return load_tsv_gz(self.file, columns) 
        
                   return None 
        
               @property 
        
               def json(self) -> Namespace | None: 
        
                   """Contents of the current JSON file.""" 
        
                   if self.file_parts.extension == '.json': 
        
                       return Namespace(load_json(self.file)) 
        
                   return None 
        
               @property 
        
               def gzip(self) -> None: 
        
                   """Parsed contents of gzip header.""" 
        
                   pass 
        
               @cached_property 
        
               def nifti_header(self) -> ctx.NiftiHeader | None: 
        
                   """Parsed contents of NIfTI header referenced elsewhere in schema.""" 
        
                   if self.extension in ('.nii', '.nii.gz'): 
        
                       return load_nifti_header(self.file) 
        
                   return None 
        
               @property 
        
               def ome(self) -> None: 
        
                   """Parsed contents of OME-XML header, which may be found in OME-TIFF or OME-ZARR files.""" 
        
                   pass 
        
               @property 
        
               def tiff(self) -> None: 
        
                   """TIFF file format metadata.""" 
        
                   pass 
        
               @property 
        
               def sidecar(self) -> Namespace | None: 
        
                   """Sidecar metadata constructed via the inheritance principle.""" 
        
                   sidecar = load_sidecar(self.file) or {} 
        
                   return Namespace(sidecar)

We could rename it context.

Broadly speaking we need something like:

for rule in file_rules:
    if full_match(context, rule):
        break
    if partial_match(context, rule):
        partials.append(rule)
else:  # Break is never called
    suggestion = generate_suggestion(partials)  # May be empty
    error("UNKNOWN_FILE", suggestion)

We don't currently have a rule data structure, but we can create dataclasses to match the types of rules:

@attrs.define
class PathRule:
    selectors: list[str]
    level: Literal["optional", "required"]
    path: str

@attrs.define
class StemRule:
    selectors: list[str]
    level: Literal["optional", "required"]
    stem: str
    datatypes: list[str]
    extensions: list[str]

@attrs.define
class SuffixRule:
    selectors: list[str]
    level: Literal["optional", "required"]
    suffixes: list[str]
    entities: dict[str, Literal["optional", "required"]]
    datatypes: list[str]
    extensions: list[str]

Possibly our full_match could look like:

def full_match(context: Context, rule: PathRule | StemRule | SuffixRule) -> bool:
    match rule:
        case PathRule(path=path):
            return context.path == path
        case StemRule(stem=stem, datatypes=dtypes, extensionss=exts):
            return fnmatch(context.file.name, stem) and ...
        case SuffixRule(suffixes=suffixes, ...):
            return context.suffix in suffixes

Anyway, that got a bit long. What's the target?

bendhouseart · 2026-03-09T17:43:35Z

If it's the former, then you're right that we should distinguish between raw and derivative datasets, instead of declaring derivative files always valid. I might suggest giving BIDSValidator.init a dataset_type parameter that enables derivative rules.

I believe it was this, that is to help validate generated (or existing) filenames from PyBIDS and other tools.

Add derivative filename validation

0125508

Include schema.rules.files.deriv in the regex chain so derivative filenames (with entities like space, desc, res, den) are recognized as valid BIDS. Closes #62

astewartau added 4 commits March 6, 2026 09:54

Fix line length lint error

78f6959

Remove unnecessary fixture from derivative tests

a098ac4

Add comments noting derivative-specific features in tests

67dd6da

Remove redundant invalid filename tests

640b2fc

These tested general BIDS validation (wrong extensions, missing entities) rather than anything derivative-specific.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add derivative filename validation#70

Add derivative filename validation#70
astewartau wants to merge 5 commits intomainfrom
validate-derivatives

astewartau commented Mar 5, 2026 •

edited

Loading

Uh oh!

astewartau commented Mar 5, 2026

Uh oh!

codecov bot commented Mar 6, 2026 •

edited

Loading

Uh oh!

effigies commented Mar 6, 2026

Uh oh!

bendhouseart commented Mar 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

astewartau commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Before

After

Test plan

Uh oh!

astewartau commented Mar 5, 2026

Uh oh!

codecov bot commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

effigies commented Mar 6, 2026

Uh oh!

bendhouseart commented Mar 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

astewartau commented Mar 5, 2026 •

edited

Loading

codecov bot commented Mar 6, 2026 •

edited

Loading