fix: stream multipart uploads to avoid OOM on large files by jpoehnelt · Pull Request #477 · googleworkspace/cli

jpoehnelt · 2026-03-13T20:28:41Z

Summary

Fixes #244 — uploading large files via --upload causes an out-of-memory crash because the entire file is read into memory (tokio::fs::read), then copied into a second Vec by build_multipart_body. A 5 GB file requests ~20 GB of contiguous RAM.

This replaces the buffered approach with a streaming multipart/related body:

build_multipart_stream yields the body in three chained streams: preamble (Bytes) → file chunks via ReaderStream → postamble (Bytes)
Content-Length is computed from tokio::fs::metadata so Google APIs still receive the correct header without buffering the file
Memory usage is now O(64 KB) regardless of file size (zero-copy via bytes::Bytes)
Proper Result error propagation for metadata serialization (no unwrap_or)

The old build_multipart_body is retained under #[cfg(test)] for the existing unit tests.

Supersedes #418 — incorporates all review feedback from that PR (ReaderStream instead of manual unfold, zero-copy Bytes, proper error handling).

Test Plan

cargo clippy -- -D warnings ✅
cargo test — 610/610 pass (2 new tests added)
- test_build_multipart_stream_content_length — verifies declared Content-Length matches expected preamble + file + postamble arithmetic
- test_build_multipart_stream_large_file — 256 KB file (larger than 64 KB chunk size) verifies multi-chunk content-length accuracy

New Dependencies

tokio-util = { version = "0.7", features = ["io"] } — provides ReaderStream
bytes = "1" — zero-copy byte buffers (already a transitive dependency via reqwest)

Replace buffered file read + build_multipart_body in build_http_request with streaming build_multipart_stream using tokio_util::io::ReaderStream. Memory usage drops from O(file_size) to O(64 KB) regardless of upload size. Content-Length is pre-computed from file metadata so Google APIs still receive the correct header without buffering. Fixes #244

changeset-bot · 2026-03-13T20:28:45Z

🦋 Changeset detected

Latest commit: f267bee

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package

Name	Type
@googleworkspace/cli	Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

gemini-code-assist · 2026-03-13T20:29:04Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical out-of-memory issue that occurred when uploading large files by switching from a buffered approach to a streaming mechanism. The change ensures that file uploads are processed in chunks, significantly reducing memory footprint and improving the stability of the application when dealing with substantial data volumes. This enhancement makes the upload process more robust and scalable without compromising the accuracy of request headers.

Highlights

Memory Optimization: Implemented streaming multipart uploads to prevent Out-Of-Memory (OOM) errors when handling large files, reducing memory usage from O(file_size) to O(64 KB).
Streaming Implementation: Introduced build_multipart_stream which constructs the request body using three chained streams: preamble, file chunks via ReaderStream, and postamble.
Content-Length Calculation: The Content-Length header is now accurately computed from tokio::fs::metadata without buffering the entire file, ensuring compatibility with Google APIs.
Zero-Copy Efficiency: Utilized bytes::Bytes for zero-copy operations, further enhancing memory efficiency.
Error Handling: Improved error propagation for metadata serialization by replacing unwrap_or with proper Result handling.
Legacy Support & Testing: The original build_multipart_body function has been retained under #[cfg(test)] for existing unit tests, and two new tests were added to validate the streaming functionality and content-length calculation.

Changelog

.changeset/stream-multipart-uploads.md
- Added a new changeset file documenting the streaming multipart upload feature.
Cargo.lock
- Updated dependency tree to include bytes and tokio-util.
Cargo.toml
- Added tokio-util with io feature and bytes as new dependencies.
src/executor.rs
- Imported TryStreamExt for stream manipulation.
- Modified build_http_request to use tokio::fs::metadata for file size and call the new build_multipart_stream function.
- Updated HTTP request headers to include Content-Length and set the body from the streaming function.
- Introduced build_multipart_stream for constructing a streaming multipart body using ReaderStream and bytes::Bytes.
- Annotated the legacy build_multipart_body function with #[cfg(test)].
- Added test_build_multipart_stream_content_length to verify content length calculation for streamed bodies.
- Added test_build_multipart_stream_large_file to ensure correct content length for multi-chunk files.

Activity

The author, jpoehnelt, created this pull request to fix issue Bug: Out-Of-Memory (OOM) Crash on Large File Uploads (Google Drive/YouTube) #244, which involved OOM crashes during large file uploads.
This PR supersedes fix: stream multipart uploads to avoid OOM on large files #418, incorporating all previous review feedback, such as using ReaderStream, zero-copy Bytes, and improved error handling.
The author confirmed that cargo clippy -- -D warnings passed without issues.
The author confirmed that cargo test passed with 610/610 tests, including two new tests specifically for the streaming multipart functionality.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Generative AI Prohibited Use Policy, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request effectively addresses a critical out-of-memory issue by replacing buffered file uploads with a streaming approach. The use of ReaderStream and pre-calculating Content-Length from file metadata is a solid implementation. My feedback focuses on improving error message clarity to aid in future debugging. Overall, this is an excellent and important improvement.

src/executor.rs

- Metadata error now says 'Failed to get metadata' instead of misleading 'Failed to read upload file' - File::open error in stream now includes the file path for easier debugging

codecov · 2026-03-13T20:32:36Z

Codecov Report

❌ Patch coverage is 82.47423% with 17 lines in your changes missing coverage. Please review.
✅ Project coverage is 67.81%. Comparing base (835e1f1) to head (f267bee).
⚠️ Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
src/executor.rs	82.47%	17 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #477      +/-   ##
==========================================
+ Coverage   67.71%   67.81%   +0.10%     
==========================================
  Files          38       38              
  Lines       17044    17136      +92     
==========================================
+ Hits        11541    11621      +80     
- Misses       5503     5515      +12

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

googleworkspace-bot · 2026-03-13T20:34:14Z

/gemini review

gemini-code-assist

Code Review

This pull request effectively resolves a critical out-of-memory issue during large file uploads by replacing the buffered implementation with a streaming multipart body. The use of tokio_util::ReaderStream and pre-calculating Content-Length from file metadata is a solid approach. The changes are well-structured, and the new tests provide good coverage for the streaming logic. I have one suggestion to further improve the robustness of file path handling.

src/executor.rs

Uploads a small text file, verifies the response has a file ID, then cleans up by deleting it. Validates the streaming multipart upload path end-to-end against real Google APIs.

googleworkspace-bot · 2026-03-13T20:42:28Z

/gemini review

gemini-code-assist

Code Review

This pull request effectively addresses the out-of-memory issue with large file uploads by switching to a streaming multipart body. The implementation using ReaderStream is solid and the pre-computation of Content-Length is correct. I've identified one high-severity security concern regarding the handling of the upload file path and have provided a detailed suggestion to mitigate it.

src/executor.rs

The upload is via the +upload helper command, not files create --upload. Also pipe stderr through tee so errors are visible in CI logs.

googleworkspace-bot · 2026-03-13T20:51:22Z

/gemini review

gemini-code-assist

Code Review

This pull request successfully refactors multipart uploads to use streaming, which is a great improvement to prevent out-of-memory errors with large files. The use of ReaderStream and pre-calculating Content-Length is well-implemented. However, I've identified a critical security vulnerability related to path validation for the uploaded file, which could allow an attacker to read arbitrary files from the system. My review includes suggestions to address this by leveraging the existing validation utilities in the codebase.

src/executor.rs

googleworkspace-bot · 2026-03-13T20:55:07Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a significant improvement by replacing buffered multipart uploads with a streaming approach, effectively fixing an out-of-memory issue with large files. The implementation is well-structured and includes new tests. However, I've found a critical issue in how the multipart preamble is constructed, which includes unintended whitespace that will corrupt the request body and cause uploads to fail. The new tests unfortunately replicate this bug and will also need to be corrected.

gemini-code-assist · 2026-03-13T20:58:17Z

src/executor.rs

+    let preamble = format!(
+        "--{boundary}\r\nContent-Type: application/json; charset=UTF-8\r\n\r\n{metadata_json}\r\n\
+         --{boundary}\r\nContent-Type: {media_mime}\r\n\r\n"
+    );


The multi-line format! macro for preamble includes leading whitespace from the source code indentation on the second line. This will create a malformed multipart/related body because the boundary separator will be --{boundary} instead of --{boundary}. This will cause the upload to fail.

Suggested change

let preamble = format!(

"--{boundary}\r\nContent-Type: application/json; charset=UTF-8\r\n\r\n{metadata_json}\r\n\

--{boundary}\r\nContent-Type: {media_mime}\r\n\r\n"

);

let preamble = format!("--{boundary}\r\nContent-Type: application/json; charset=UTF-8\r\n\r\n{metadata_json}\r\n--{boundary}\r\nContent-Type: {media_mime}\r\n\r\n");

gemini-code-assist · 2026-03-13T20:58:17Z

src/executor.rs

+        let preamble = format!(
+            "--{boundary}\r\nContent-Type: application/json; charset=UTF-8\r\n\r\n{metadata_json}\r\n\
+             --{boundary}\r\nContent-Type: text/plain\r\n\r\n"
+        );


The test calculates the expected preamble length using the same buggy format string as in build_multipart_stream. The leading whitespace on the second line of the format string should be removed to correctly test against a valid multipart body.

Suggested change

let preamble = format!(

"--{boundary}\r\nContent-Type: application/json; charset=UTF-8\r\n\r\n{metadata_json}\r\n\

--{boundary}\r\nContent-Type: text/plain\r\n\r\n"

);

let preamble = format!("--{boundary}\r\nContent-Type: application/json; charset=UTF-8\r\n\r\n{metadata_json}\r\n--{boundary}\r\nContent-Type: text/plain\r\n\r\n");

gemini-code-assist · 2026-03-13T20:58:17Z

src/executor.rs

+        let preamble = format!(
+            "--{boundary}\r\nContent-Type: application/json; charset=UTF-8\r\n\r\n{{}}\r\n\
+             --{boundary}\r\nContent-Type: application/octet-stream\r\n\r\n"
+        );


Similar to the other test, the preamble calculation here includes extra whitespace that will not match a correctly formed multipart body. The leading whitespace on the second line of the format string should be removed.

Suggested change

let preamble = format!(

"--{boundary}\r\nContent-Type: application/json; charset=UTF-8\r\n\r\n{{}}\r\n\

--{boundary}\r\nContent-Type: application/octet-stream\r\n\r\n"

);

let preamble = format!("--{boundary}\r\nContent-Type: application/json; charset=UTF-8\r\n\r\n{{}}\r\n--{boundary}\r\nContent-Type: application/octet-stream\r\n\r\n");

jpoehnelt added the area: http label Mar 13, 2026

googleworkspace-bot added the area: distribution label Mar 13, 2026

googleworkspace-bot added the cla: yes This human has signed the Contributor License Agreement. label Mar 13, 2026