Skip to content

fix: stream multipart uploads to avoid OOM on large files#477

Merged
jpoehnelt merged 5 commits intomainfrom
fix/stream-multipart-uploads
Mar 13, 2026
Merged

fix: stream multipart uploads to avoid OOM on large files#477
jpoehnelt merged 5 commits intomainfrom
fix/stream-multipart-uploads

Conversation

@jpoehnelt
Copy link
Member

Summary

Fixes #244 — uploading large files via --upload causes an out-of-memory crash because the entire file is read into memory (tokio::fs::read), then copied into a second Vec by build_multipart_body. A 5 GB file requests ~20 GB of contiguous RAM.

This replaces the buffered approach with a streaming multipart/related body:

  • build_multipart_stream yields the body in three chained streams: preamble (Bytes) → file chunks via ReaderStream → postamble (Bytes)
  • Content-Length is computed from tokio::fs::metadata so Google APIs still receive the correct header without buffering the file
  • Memory usage is now O(64 KB) regardless of file size (zero-copy via bytes::Bytes)
  • Proper Result error propagation for metadata serialization (no unwrap_or)

The old build_multipart_body is retained under #[cfg(test)] for the existing unit tests.

Supersedes #418 — incorporates all review feedback from that PR (ReaderStream instead of manual unfold, zero-copy Bytes, proper error handling).

Test Plan

  • cargo clippy -- -D warnings
  • cargo test610/610 pass (2 new tests added)
    • test_build_multipart_stream_content_length — verifies declared Content-Length matches expected preamble + file + postamble arithmetic
    • test_build_multipart_stream_large_file — 256 KB file (larger than 64 KB chunk size) verifies multi-chunk content-length accuracy

New Dependencies

  • tokio-util = { version = "0.7", features = ["io"] } — provides ReaderStream
  • bytes = "1" — zero-copy byte buffers (already a transitive dependency via reqwest)

Replace buffered file read + build_multipart_body in build_http_request
with streaming build_multipart_stream using tokio_util::io::ReaderStream.

Memory usage drops from O(file_size) to O(64 KB) regardless of upload size.
Content-Length is pre-computed from file metadata so Google APIs still
receive the correct header without buffering.

Fixes #244
@changeset-bot
Copy link

changeset-bot bot commented Mar 13, 2026

🦋 Changeset detected

Latest commit: f267bee

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
@googleworkspace/cli Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical out-of-memory issue that occurred when uploading large files by switching from a buffered approach to a streaming mechanism. The change ensures that file uploads are processed in chunks, significantly reducing memory footprint and improving the stability of the application when dealing with substantial data volumes. This enhancement makes the upload process more robust and scalable without compromising the accuracy of request headers.

Highlights

  • Memory Optimization: Implemented streaming multipart uploads to prevent Out-Of-Memory (OOM) errors when handling large files, reducing memory usage from O(file_size) to O(64 KB).
  • Streaming Implementation: Introduced build_multipart_stream which constructs the request body using three chained streams: preamble, file chunks via ReaderStream, and postamble.
  • Content-Length Calculation: The Content-Length header is now accurately computed from tokio::fs::metadata without buffering the entire file, ensuring compatibility with Google APIs.
  • Zero-Copy Efficiency: Utilized bytes::Bytes for zero-copy operations, further enhancing memory efficiency.
  • Error Handling: Improved error propagation for metadata serialization by replacing unwrap_or with proper Result handling.
  • Legacy Support & Testing: The original build_multipart_body function has been retained under #[cfg(test)] for existing unit tests, and two new tests were added to validate the streaming functionality and content-length calculation.
Changelog
  • .changeset/stream-multipart-uploads.md
    • Added a new changeset file documenting the streaming multipart upload feature.
  • Cargo.lock
    • Updated dependency tree to include bytes and tokio-util.
  • Cargo.toml
    • Added tokio-util with io feature and bytes as new dependencies.
  • src/executor.rs
    • Imported TryStreamExt for stream manipulation.
    • Modified build_http_request to use tokio::fs::metadata for file size and call the new build_multipart_stream function.
    • Updated HTTP request headers to include Content-Length and set the body from the streaming function.
    • Introduced build_multipart_stream for constructing a streaming multipart body using ReaderStream and bytes::Bytes.
    • Annotated the legacy build_multipart_body function with #[cfg(test)].
    • Added test_build_multipart_stream_content_length to verify content length calculation for streamed bodies.
    • Added test_build_multipart_stream_large_file to ensure correct content length for multi-chunk files.
Activity
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Generative AI Prohibited Use Policy, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@googleworkspace-bot googleworkspace-bot added the cla: yes This human has signed the Contributor License Agreement. label Mar 13, 2026
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively addresses a critical out-of-memory issue by replacing buffered file uploads with a streaming approach. The use of ReaderStream and pre-calculating Content-Length from file metadata is a solid implementation. My feedback focuses on improving error message clarity to aid in future debugging. Overall, this is an excellent and important improvement.

@github-actions github-actions bot added the gemini: reviewed Gemini Code Assist has reviewed the latest changes label Mar 13, 2026
- Metadata error now says 'Failed to get metadata' instead of misleading
  'Failed to read upload file'
- File::open error in stream now includes the file path for easier debugging
@codecov
Copy link

codecov bot commented Mar 13, 2026

Codecov Report

❌ Patch coverage is 82.47423% with 17 lines in your changes missing coverage. Please review.
✅ Project coverage is 67.81%. Comparing base (835e1f1) to head (f267bee).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
src/executor.rs 82.47% 17 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #477      +/-   ##
==========================================
+ Coverage   67.71%   67.81%   +0.10%     
==========================================
  Files          38       38              
  Lines       17044    17136      +92     
==========================================
+ Hits        11541    11621      +80     
- Misses       5503     5515      +12     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions github-actions bot removed the gemini: reviewed Gemini Code Assist has reviewed the latest changes label Mar 13, 2026
@googleworkspace-bot
Copy link
Collaborator

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively resolves a critical out-of-memory issue during large file uploads by replacing the buffered implementation with a streaming multipart body. The use of tokio_util::ReaderStream and pre-calculating Content-Length from file metadata is a solid approach. The changes are well-structured, and the new tests provide good coverage for the streaming logic. I have one suggestion to further improve the robustness of file path handling.

@github-actions github-actions bot added the gemini: reviewed Gemini Code Assist has reviewed the latest changes label Mar 13, 2026
Uploads a small text file, verifies the response has a file ID,
then cleans up by deleting it. Validates the streaming multipart
upload path end-to-end against real Google APIs.
@github-actions github-actions bot removed the gemini: reviewed Gemini Code Assist has reviewed the latest changes label Mar 13, 2026
@googleworkspace-bot
Copy link
Collaborator

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively addresses the out-of-memory issue with large file uploads by switching to a streaming multipart body. The implementation using ReaderStream is solid and the pre-computation of Content-Length is correct. I've identified one high-severity security concern regarding the handling of the upload file path and have provided a detailed suggestion to mitigate it.

@github-actions github-actions bot added the gemini: reviewed Gemini Code Assist has reviewed the latest changes label Mar 13, 2026
The upload is via the +upload helper command, not files create --upload.
Also pipe stderr through tee so errors are visible in CI logs.
@github-actions github-actions bot removed the gemini: reviewed Gemini Code Assist has reviewed the latest changes label Mar 13, 2026
@googleworkspace-bot
Copy link
Collaborator

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request successfully refactors multipart uploads to use streaming, which is a great improvement to prevent out-of-memory errors with large files. The use of ReaderStream and pre-calculating Content-Length is well-implemented. However, I've identified a critical security vulnerability related to path validation for the uploaded file, which could allow an attacker to read arbitrary files from the system. My review includes suggestions to address this by leveraging the existing validation utilities in the codebase.

@github-actions github-actions bot added the gemini: reviewed Gemini Code Assist has reviewed the latest changes label Mar 13, 2026
@googleworkspace-bot
Copy link
Collaborator

/gemini review

@jpoehnelt jpoehnelt merged commit 945ac91 into main Mar 13, 2026
20 checks passed
@jpoehnelt jpoehnelt deleted the fix/stream-multipart-uploads branch March 13, 2026 20:56
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant improvement by replacing buffered multipart uploads with a streaming approach, effectively fixing an out-of-memory issue with large files. The implementation is well-structured and includes new tests. However, I've found a critical issue in how the multipart preamble is constructed, which includes unintended whitespace that will corrupt the request body and cause uploads to fail. The new tests unfortunately replicate this bug and will also need to be corrected.

Comment on lines +858 to +861
let preamble = format!(
"--{boundary}\r\nContent-Type: application/json; charset=UTF-8\r\n\r\n{metadata_json}\r\n\
--{boundary}\r\nContent-Type: {media_mime}\r\n\r\n"
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The multi-line format! macro for preamble includes leading whitespace from the source code indentation on the second line. This will create a malformed multipart/related body because the boundary separator will be --{boundary} instead of --{boundary}. This will cause the upload to fail.

Suggested change
let preamble = format!(
"--{boundary}\r\nContent-Type: application/json; charset=UTF-8\r\n\r\n{metadata_json}\r\n\
--{boundary}\r\nContent-Type: {media_mime}\r\n\r\n"
);
let preamble = format!("--{boundary}\r\nContent-Type: application/json; charset=UTF-8\r\n\r\n{metadata_json}\r\n--{boundary}\r\nContent-Type: {media_mime}\r\n\r\n");

Comment on lines +1466 to +1469
let preamble = format!(
"--{boundary}\r\nContent-Type: application/json; charset=UTF-8\r\n\r\n{metadata_json}\r\n\
--{boundary}\r\nContent-Type: text/plain\r\n\r\n"
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The test calculates the expected preamble length using the same buggy format string as in build_multipart_stream. The leading whitespace on the second line of the format string should be removed to correctly test against a valid multipart body.

Suggested change
let preamble = format!(
"--{boundary}\r\nContent-Type: application/json; charset=UTF-8\r\n\r\n{metadata_json}\r\n\
--{boundary}\r\nContent-Type: text/plain\r\n\r\n"
);
let preamble = format!("--{boundary}\r\nContent-Type: application/json; charset=UTF-8\r\n\r\n{metadata_json}\r\n--{boundary}\r\nContent-Type: text/plain\r\n\r\n");

Comment on lines +1505 to +1508
let preamble = format!(
"--{boundary}\r\nContent-Type: application/json; charset=UTF-8\r\n\r\n{{}}\r\n\
--{boundary}\r\nContent-Type: application/octet-stream\r\n\r\n"
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Similar to the other test, the preamble calculation here includes extra whitespace that will not match a correctly formed multipart body. The leading whitespace on the second line of the format string should be removed.

Suggested change
let preamble = format!(
"--{boundary}\r\nContent-Type: application/json; charset=UTF-8\r\n\r\n{{}}\r\n\
--{boundary}\r\nContent-Type: application/octet-stream\r\n\r\n"
);
let preamble = format!("--{boundary}\r\nContent-Type: application/json; charset=UTF-8\r\n\r\n{{}}\r\n--{boundary}\r\nContent-Type: application/octet-stream\r\n\r\n");

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: distribution area: http cla: yes This human has signed the Contributor License Agreement. gemini: reviewed Gemini Code Assist has reviewed the latest changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: Out-Of-Memory (OOM) Crash on Large File Uploads (Google Drive/YouTube)

3 participants