Skip to content

Clear buffers of filtered clips after processing in ClipWriterStage#1876

Open
suiyoubi wants to merge 4 commits into
mainfrom
aot/free-buffer
Open

Clear buffers of filtered clips after processing in ClipWriterStage#1876
suiyoubi wants to merge 4 commits into
mainfrom
aot/free-buffer

Conversation

@suiyoubi
Copy link
Copy Markdown
Contributor

Description

Usage

# Add snippet demonstrating usage

Checklist

  • I am familiar with the Contributing Guide.
  • New or Existing tests cover these changes.
  • The documentation is up to date with these changes.

…o prevent memory bloat. Added unit test to verify buffer clearance functionality.

Signed-off-by: Ao Tang <aot@nvidia.com>
@suiyoubi suiyoubi requested a review from abhinavg4 as a code owner April 27, 2026 17:48
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Apr 27, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 27, 2026

Greptile Summary

This PR fixes a memory leak in ClipWriterStage where filtered_clips (e.g. motion-filtered) never had their clip.buffer cleared after processing. Because these clips bypass ClipFrameExtractionStage (which normally does the cleanup), the buffers were accumulating and causing ~1.8 GB serialised task objects. The fix also refactors the existing cleanup block into a dedicated _cleanup_video_data static method.

Confidence Score: 5/5

Safe to merge — focused bug fix with targeted tests and no regressions introduced.

The change is a minimal, well-scoped fix that closes a real memory leak. The refactoring into _cleanup_video_data is clean, the omission of cosmos_embed1_embedding for filtered clips is correct (as confirmed by prior review discussion), and the new test directly validates the fix.

No files require special attention.

Important Files Changed

Filename Overview
nemo_curator/stages/video/io/clip_writer.py Extracts cleanup logic into _cleanup_video_data and adds clip.buffer = None for filtered_clips; fix is correct and consistent with the stage pipeline ordering discussed in prior review.
tests/stages/video/io/test_clip_writer.py Adds assertion to existing process test and a new dedicated test verifying filtered-clip buffer is cleared after process(); coverage is adequate.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[ClipWriterStage.process] --> B[Write clip embeddings to buffer]
    B --> C[Submit clip write tasks]
    C --> D[Submit filtered_clip write tasks]
    D --> E[Wait for all futures]
    E --> F[Write video-level embeddings & metadata]
    F --> G[_cleanup_video_data]
    G --> H[video.clips: clear buffer, embedding, windows]
    G --> I[video.filtered_clips: clear buffer]
    H --> J[Task returned]
    I --> J
Loading

Reviews (3): Last reviewed commit: "Merge branch 'main' into aot/free-buffer" | Re-trigger Greptile

Comment on lines +183 to +184
for clip in video.filtered_clips:
clip.buffer = None
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Consider clearing cosmos_embed1_embedding on filtered clips

The cleanup block for regular video.clips clears both clip.buffer and clip.cosmos_embed1_embedding. If a filtered clip somehow acquires an embedding (e.g., via a future pipeline change or if the ordering of stages varies), its embedding would not be released here. For symmetry with the existing cleanup pattern and defensive memory management, consider also setting clip.cosmos_embed1_embedding = None.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@greptile-apps, no need : filtered clips are moved to video.filtered_clips by MotionFilterStage or AestheticFilterStage — both run before the embedding stages.
CosmosEmbed1FrameCreationStage and CosmosEmbed1EmbeddingStage only iterate video.clips, so filtered clips can never have cosmos_embed1_embedding set.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense — if the filtering stages always run before the embedding stages and the embedding stages only touch video.clips, there's no path for filtered clips to acquire cosmos_embed1_embedding. The existing cleanup is sufficient.

Tip: You can give me more context with .greptile/rules so I get this right next time.

@suiyoubi
Copy link
Copy Markdown
Contributor Author

/ok to test a8b63ef

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant