Problem — apify call has a high non-zero-exit rate; hypothesis is avoidable input errors, agent-skewed. The --input flag help is just "Optional JSON input to be given to the Actor."
Hard-fail paths (src/lib/commands/resolve-input.ts): file path in --input; JSON array; malformed JSON (shell quoting); --json + --output-dataset.
Changes (help/UX only — no new telemetry):
- Expand
--input description: must be a JSON object; mind shell quoting; "for a file, use --input-file."
- On the file-path and parse errors, append a pointer to
apify actors info <actor> --input to discover the schema.
- Add the discover→inspect→call flow to the command's long description/examples.
Cohort + measurement (existing telemetry only):
- Source:
cli_command Segment event. Filter commandString ∈ {call, actors call}.
- Metric: failure rate =
exitCode != 0 share.
- Cohorts:
userAgent (per-skill attribution), aiAgent set vs unset, flagsUsed (input vs input-file vs neither), wasRetried (friction proxy).
- Method: 2–4 week baseline, then compare the same cohorts across CLI versions. Success = failure-rate drop in the
aiAgent/--input cohort. Guardrail: total call volume holds.
- Limitation to state explicitly: without a failure-reason field we see that calls fail, not why — so this validates the aggregate hypothesis, not per-error attribution.
Measurement = failure rate by version, old vs new
Cohort directly on context.app.version.
- Baseline cohort = last release without the fix; treatment cohort = first release(s) with it.
- Compare
exitCode != 0 share for commandString ∈ {call, actors call} between the two version cohorts.
- Cohort by version, not calendar date — releases don't update atomically, so old and new run side-by-side in the field; the version field is what lets you compare them cleanly instead of fighting adoption lag.
- Hold confounders constant: compare within
aiAgent-set, ideally within same userAgent, since early upgraders skew (more CI / power users). Two-proportion z-test on the failure rate; check per-version sample size is large enough.
Measurement section → "compare exitCode != 0 for call between the pre-fix and post-fix context.app.version cohorts; segment by aiAgent/userAgent to control for population drift; two-proportion test."
Problem —
apify callhas a high non-zero-exit rate; hypothesis is avoidable input errors, agent-skewed. The--inputflag help is just "Optional JSON input to be given to the Actor."Hard-fail paths (
src/lib/commands/resolve-input.ts): file path in--input; JSON array; malformed JSON (shell quoting);--json+--output-dataset.Changes (help/UX only — no new telemetry):
--inputdescription: must be a JSON object; mind shell quoting; "for a file, use--input-file."apify actors info <actor> --inputto discover the schema.Cohort + measurement (existing telemetry only):
cli_commandSegment event. FiltercommandString ∈ {call, actors call}.exitCode != 0share.userAgent(per-skill attribution),aiAgentset vs unset,flagsUsed(inputvsinput-filevs neither),wasRetried(friction proxy).aiAgent/--inputcohort. Guardrail: totalcallvolume holds.Measurement = failure rate by version, old vs new
Cohort directly on
context.app.version.exitCode != 0share forcommandString ∈ {call, actors call}between the two version cohorts.aiAgent-set, ideally within sameuserAgent, since early upgraders skew (more CI / power users). Two-proportion z-test on the failure rate; check per-version sample size is large enough.Measurement section → "compare
exitCode != 0forcallbetween the pre-fix and post-fixcontext.app.versioncohorts; segment byaiAgent/userAgentto control for population drift; two-proportion test."