When using the new onSchemaChange feature the generated SQL adds each new column with a separate ALTER TABLE statement, which hits BigQuery's 5 alterations per 10 seconds rate limit.
Feature Announcement
Error:
Exceeded rate limits: too many table update operations for this table. For more information, see https://cloud.google.com/bigquery/docs/troubleshoot-quotas
Minimal case to reproduce
config {
schema: 'scratch',
type: "incremental",
onSchemaChange: "EXTEND",
uniqueKey: "a"
}
${when(incremental(),
`SELECT 1 as a, 2 as b, 3 as c, 4 as d, 5 as e, 6 as f, 7 as g`,
`SELECT 1 as a`
)}
Current Behaviour
Dataform maps all new column additions to separate ALTER TABLE statements:
IF ARRAY_LENGTH(columns_added) > 0 THEN
FOR new_column IN (SELECT * FROM UNNEST(columns_added) AS column_info)
DO
EXECUTE IMMEDIATE FORMAT(
'ALTER TABLE `project.schema.table` ADD COLUMN %s %s',
new_column.column_name, new_column.data_type
);
END FOR;
END IF;
Expected Behaviour
Dataform should use a single ALTER TABLE ADD COLUMN statement to add multiple columns at once:
A single statement has the following benefits:
- Atomic changes — either all fields are added or none are. Currently a single incorrect column configuration can leave a table in a partially updated state.
- Better resource use — avoid hitting the 5 updates/10s rate limit
Concerns
It's not clear to me if this is a dataform core issue -- there is no logic in dataform core for the actual SQL generation, if this is the case let me know.
When using the new
onSchemaChangefeature the generated SQL adds each new column with a separateALTER TABLEstatement, which hits BigQuery's 5 alterations per 10 seconds rate limit.Feature Announcement
Error:
Minimal case to reproduce
Current Behaviour
Dataform maps all new column additions to separate
ALTER TABLEstatements:Expected Behaviour
Dataform should use a single
ALTER TABLE ADD COLUMNstatement to add multiple columns at once:A single statement has the following benefits:
Concerns
It's not clear to me if this is a dataform core issue -- there is no logic in dataform core for the actual SQL generation, if this is the case let me know.