Dataset.add_columns(pa.Schema) errors with Invalid user input: All-null columns must be nullable. on any Arrow Map<K, V> field, even when the outer Map is declared nullable=true.
Schema::all_fields_nullable pre-order walks every nested field and requires f.nullable on each. Arrow Map layout mandates entries.nullable = false and entries.key.nullable = false
(spec), which Lance itself enforces at field construction (field.rs#L1123-L1128).
The two checks contradict each other, so no Map column can pass the metadata-only AllNulls path regardless of the outer field's nullable flag.
Reproduction
import lance
import pyarrow as pa
ds = lance.write_dataset(
pa.table({"x": pa.array([1, 2, 3], type=pa.int64())}),
"/tmp/lance_map_allnulls_repro",
mode="overwrite",
)
ds.add_columns(
pa.schema([
pa.field("cutoffs", pa.map_(pa.string(), pa.float64()), nullable=True),
])
)
OSError: Invalid user input: All-null columns must be nullable.,
.../rust/lance/src/dataset/schema_evolution.rs:361:28
Reproduced on pylance 4.0.0 and 6.0.0; Bug exists on main - all_fields_nullable body is byte-identical from v4.0.1 to as of this writing.
Related
NewColumnTransform::AllNulls introduced in #3391; existing test at schema_evolution.rs#L1177-L1209 covers the leaf non-nullable-outer-field case but no nested types.
Proposed Fix
I'm not actually sure but collection types like lists and maps should be allowed to be null-able?
Dataset.add_columns(pa.Schema)errors withInvalid user input: All-null columns must be nullable.on any ArrowMap<K, V>field, even when the outer Map is declarednullable=true.Schema::all_fields_nullablepre-order walks every nested field and requiresf.nullableon each. Arrow Map layout mandatesentries.nullable = falseandentries.key.nullable = false(spec), which Lance itself enforces at field construction (
field.rs#L1123-L1128).The two checks contradict each other, so no Map column can pass the metadata-only
AllNullspath regardless of the outer field'snullableflag.Reproduction
Reproduced on
pylance4.0.0and6.0.0; Bug exists on main -all_fields_nullablebody is byte-identical fromv4.0.1to as of this writing.Related
NewColumnTransform::AllNullsintroduced in #3391; existing test atschema_evolution.rs#L1177-L1209covers the leaf non-nullable-outer-field case but no nested types.Proposed Fix
I'm not actually sure but collection types like lists and maps should be allowed to be null-able?