Skip to content

add_columns_with_schema rejects all-null Map<K, V> columns #6835

Description

@kushudai

Dataset.add_columns(pa.Schema) errors with Invalid user input: All-null columns must be nullable. on any Arrow Map<K, V> field, even when the outer Map is declared nullable=true.

Schema::all_fields_nullable pre-order walks every nested field and requires f.nullable on each. Arrow Map layout mandates entries.nullable = false and entries.key.nullable = false
(spec), which Lance itself enforces at field construction (field.rs#L1123-L1128).
The two checks contradict each other, so no Map column can pass the metadata-only AllNulls path regardless of the outer field's nullable flag.

Reproduction

import lance
import pyarrow as pa

ds = lance.write_dataset(
    pa.table({"x": pa.array([1, 2, 3], type=pa.int64())}),
    "/tmp/lance_map_allnulls_repro",
    mode="overwrite",
)
ds.add_columns(
    pa.schema([
        pa.field("cutoffs", pa.map_(pa.string(), pa.float64()), nullable=True),
    ])
)
OSError: Invalid user input: All-null columns must be nullable.,
.../rust/lance/src/dataset/schema_evolution.rs:361:28

Reproduced on pylance 4.0.0 and 6.0.0; Bug exists on main - all_fields_nullable body is byte-identical from v4.0.1 to as of this writing.

Related

NewColumnTransform::AllNulls introduced in #3391; existing test at schema_evolution.rs#L1177-L1209 covers the leaf non-nullable-outer-field case but no nested types.

Proposed Fix

I'm not actually sure but collection types like lists and maps should be allowed to be null-able?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions