HDDS-14020. Use ForkJoinPool instead of a ScheduledThreadPoolExecutor in BackgroundService by smengcl · Pull Request #9686 · apache/ozone

smengcl · 2026-01-28T23:33:05Z

What changes were proposed in this pull request?

This is the continuation of #9390

This addressed comment from @sumitagrawl (#9390 (comment)) which significantly reduces the number of files touched.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-14020

How was this patch tested?

Existing tests

…rvice Change-Id: I4c1a051e8574d32375cbebeb10546563f4a4f817

Change-Id: Ie2118356f902443a93fe666d890ad4d59e9dd467

…omment: apache#9390 (comment) Refactor BackgroundTask to use wrapper pattern for ForkJoinPool integration This commit introduces BackgroundTaskForkJoin as a wrapper class to integrate BackgroundTask with ForkJoinPool, avoiding the need to change all service implementations from 'implements' to 'extends'. Key changes: - Reverted BackgroundTask from abstract class back to interface - Created BackgroundTaskForkJoin wrapper extending RecursiveTask - Updated BackgroundService to wrap tasks before forking - Reverted all service task classes to 'implements BackgroundTask'

smengcl · 2026-01-28T23:41:04Z

There are some issues after rebasing. Let me fix it.

Copilot

Pull request overview

Refactors Ozone background services to run tasks via a ForkJoinPool (instead of a ScheduledThreadPoolExecutor) and updates directory deletion to support fork/join-style parallelism, targeting shutdown deadlock avoidance (HDDS-14020).

Changes:

Reworked BackgroundService scheduling/execution to use a ForkJoinPool plus a shared ScheduledExecutorService.
Updated DirectoryDeletingService to optionally fork internal deletion work and adjusted related tests.
Introduced fork/join wrappers and minor task-wrapping/refactoring in deleting services and SST filtering.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 16 comments.

Show a summary per file

File	Description
hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/BackgroundService.java	Core switch to ForkJoin-based execution and custom periodic scheduling.
hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/utils/BackgroundTaskForkJoin.java	New ForkJoin wrapper for `BackgroundTask` execution.
hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/utils/BackgroundServiceScheduler.java	New shared scheduler supplier for periodic rescheduling.
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/service/DirectoryDeletingService.java	Uses fork/join-style parallelism instead of internal executors; adds `allowTasksToFork` plumbing.
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/service/AbstractKeyDeletingService.java	Refactors task wrapping into a reusable `BackgroundDeleteTask`.
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/SstFilteringService.java	Adjusts `call()` signature and adds interrupt handling.
hadoop-ozone/ozone-manager/src/test/java/org/apache/hadoop/ozone/om/service/TestDirectoryDeletingService.java	Updates task construction for the new `DirDeletingTask` signature.
hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/common/BlockDeletingServiceTestImpl.java	Updates `PeriodicalTask` construction to match new signature.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-28T23:44:06Z

hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/SstFilteringService.java

              }
+            } catch (InterruptedException e) {
+              Thread.currentThread().interrupt();
+              LOG.error("SST filtering task interrupted for snapshot: {}", snapShotTableKey, e);


After catching InterruptedException, the code re-interrupts the thread but continues processing subsequent snapshots. Typically interruption should stop the task (e.g., break the loop / return) to allow the service to shut down promptly and avoid doing more work on an interrupted thread.

Suggested change

LOG.error("SST filtering task interrupted for snapshot: {}", snapShotTableKey, e);

LOG.error("SST filtering task interrupted for snapshot: {}", snapShotTableKey, e);

break;

Copilot · 2026-01-28T23:44:06Z

hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/BackgroundService.java

+          scheduledExecuterService.schedule(() -> exec.submit(new PeriodicalTask(this)),
+              intervalInMillis, TimeUnit.MILLISECONDS);


The scheduled runnable in scheduleNextTask() calls exec.submit(...) without re-checking shutdown state or whether exec is still non-null. Since shutdown() sets exec = null and the shared scheduler uses shutdown() (not shutdownNow()), already-scheduled tasks can still execute after shutdown and hit NPE / RejectedExecutionException. Guard inside the scheduled runnable (check shutdown flag + exec != null) and/or cancel scheduled tasks on shutdown.

Suggested change

scheduledExecuterService.schedule(() -> exec.submit(new PeriodicalTask(this)),

intervalInMillis, TimeUnit.MILLISECONDS);

scheduledExecuterService.schedule(() -> {

if (!isShutdown.get() && exec != null) {

exec.submit(new PeriodicalTask(this));

}

}, intervalInMillis, TimeUnit.MILLISECONDS);

Copilot · 2026-01-28T23:44:07Z

hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/BackgroundService.java

+        exec = new ForkJoinPool(threadPoolSize, factory, null, false);
+        isShutdown = new AtomicReference<>(false);


Background tasks frequently do blocking I/O (eg OM Ratis submitRequest, RocksDB calls). Using a ForkJoinPool for blocking work can reduce parallelism and hurt throughput unless blocking sections use ForkJoinPool.managedBlock(...) (or a dedicated blocking pool is used). Consider addressing blocking sections or documenting why ForkJoinPool is safe here.

Copilot · 2026-01-28T23:44:07Z

hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/utils/BackgroundServiceScheduler.java

+  public static synchronized UncheckedAutoCloseableSupplier<ScheduledExecutorService> get() {
+    if (executor == null) {
+      ScheduledThreadPoolExecutor scheduler = new ScheduledThreadPoolExecutor(1);
+      executor = ReferenceCountedObject.wrap(scheduler, () -> { }, (shutdown) -> {


BackgroundServiceScheduler uses the default thread factory for ScheduledThreadPoolExecutor, which creates non-daemon threads. Since this is a shared background scheduler, a non-daemon thread can keep the JVM alive if something forgets to close/release it. Consider using a daemon thread factory (and naming the thread) for the scheduler.

Copilot · 2026-01-28T23:44:07Z

hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/BackgroundTaskForkJoin.java

+public class BackgroundTaskForkJoin extends RecursiveTask<BackgroundTaskForkJoin.BackgroundTaskForkResult> {
+  private static final long serialVersionUID = 1L;
+  private final transient BackgroundTask backgroundTask;
+
+  public BackgroundTaskForkJoin(BackgroundTask backgroundTask) {


This class is added under the hdds-common module but depends on BackgroundTask/BackgroundTaskResult, which are defined in hdds-server-framework (and hdds-common does not depend on it). Since hdds-server-framework already depends on hdds-common, adding the reverse dependency would create a cycle; as-is, this likely won’t compile. Consider moving this class into hdds-server-framework, or into a new shared module that both can depend on.

Copilot · 2026-01-28T23:44:09Z

hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/BackgroundService.java

-          }, exec).exceptionally(e -> null), (Void1, Void) -> null);
+      Consumer<BackgroundTaskForkJoin> taskForkHandler = task -> {
+        task.fork();
+        tasksInFlight.offer(task);


Method accept ignores exceptional return value of Queue.offer.

Suggested change

tasksInFlight.offer(task);

if (!tasksInFlight.offer(task)) {

LOG.error("Failed to enqueue background task for service {}. Task will not be tracked.", serviceName);

}

Copilot · 2026-01-28T23:44:09Z

...ozone-manager/src/main/java/org/apache/hadoop/ozone/om/service/DirectoryDeletingService.java

+              }
+            };
+            task.fork();
+            recursiveTasks.offer(task);


Method processDeletedDirsForStore ignores exceptional return value of Queue<RecursiveTask>.offer.

Suggested change

recursiveTasks.offer(task);

if (!recursiveTasks.offer(task)) {

// If the task cannot be enqueued, ensure it is joined and

// mark that not all deleted directories were processed.

task.join();

processedAllDeletedDirs = false;

break;

}

Copilot · 2026-01-28T23:44:09Z

...ozone-manager/src/main/java/org/apache/hadoop/ozone/om/service/DirectoryDeletingService.java

              pendingDeletedDirInfo.getValue(),
              pendingDeletedDirInfo.getKey(), isDirReclaimable, allSubDirList,
-              getOzoneManager().getKeyManager(), reclaimableFileFilter, remainNum);
+              dds.getOzoneManager().getKeyManager(), reclaimableFileFilter, remainNum);


Access of element annotated with VisibleForTesting found in production code.

Copilot · 2026-01-28T23:44:09Z

...ozone-manager/src/main/java/org/apache/hadoop/ozone/om/service/DirectoryDeletingService.java

+        dds.optimizeDirDeletesAndSubmitRequest(dirNum, subDirNum,
            subFileNum, allSubDirList, purgePathRequestList, snapshotTableKey,
-            startTime, getOzoneManager().getKeyManager(),
+            startTime, dds.getOzoneManager().getKeyManager(),


Access of element annotated with VisibleForTesting found in production code.

Suggested change

startTime, dds.getOzoneManager().getKeyManager(),

startTime, keyManager,

Copilot · 2026-01-28T23:44:10Z

...ozone-manager/src/main/java/org/apache/hadoop/ozone/om/service/DirectoryDeletingService.java

              omSnapshotManager.getActiveSnapshot(snapInfo.getVolumeName(), snapInfo.getBucketName(),
                  snapInfo.getName())) {
-            KeyManager keyManager = snapInfo == null ? getOzoneManager().getKeyManager()
+            KeyManager keyManager = snapInfo == null ? dds.getOzoneManager().getKeyManager()


Access of element annotated with VisibleForTesting found in production code.

swamirishi and others added 4 commits November 28, 2025 19:39

HDDS-14020. Use forkjoin pool instead of thread pool on background se…

14d25f1

…rvice Change-Id: I4c1a051e8574d32375cbebeb10546563f4a4f817

HDDS-14020. Fix Scheduler

93d51f5

Change-Id: Ie2118356f902443a93fe666d890ad4d59e9dd467

Merge remote-tracking branch 'asf' into HDDS-14020_FJP_Change_v2

92c8392

smengcl requested review from Copilot, sumitagrawl and swamirishi January 28, 2026 23:33

Copilot started reviewing on behalf of smengcl January 28, 2026 23:33 View session

smengcl mentioned this pull request Jan 28, 2026

HDDS-14020. Use ForkJoinPool instead of a ScheduledThreadPoolExecutor in BackgroundService #9390

Closed

smengcl marked this pull request as draft January 28, 2026 23:37

Copilot AI reviewed Jan 28, 2026

View reviewed changes

smengcl added 2 commits January 28, 2026 15:47

Move BackgroundTaskForkJoin to framework module

5c44c5b

ForkJoinPool handling in DiskBalancerService

529a964

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HDDS-14020. Use ForkJoinPool instead of a ScheduledThreadPoolExecutor in BackgroundService#9686

HDDS-14020. Use ForkJoinPool instead of a ScheduledThreadPoolExecutor in BackgroundService#9686
smengcl wants to merge 6 commits intoapache:masterfrom
smengcl:HDDS-14020_FJP_Change_v2

smengcl commented Jan 28, 2026

Uh oh!

smengcl commented Jan 28, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jan 28, 2026

Uh oh!

Copilot AI Jan 28, 2026

Uh oh!

Copilot AI Jan 28, 2026

Uh oh!

Copilot AI Jan 28, 2026

Uh oh!

Copilot AI Jan 28, 2026

Uh oh!

Copilot AI Jan 28, 2026

Uh oh!

Copilot AI Jan 28, 2026

Uh oh!

Copilot AI Jan 28, 2026

Uh oh!

Copilot AI Jan 28, 2026

Uh oh!

Copilot AI Jan 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	LOG.error("SST filtering task interrupted for snapshot: {}", snapShotTableKey, e);
	LOG.error("SST filtering task interrupted for snapshot: {}", snapShotTableKey, e);
	break;

		scheduledExecuterService.schedule(() -> exec.submit(new PeriodicalTask(this)),
		intervalInMillis, TimeUnit.MILLISECONDS);

-          scheduledExecuterService.schedule(() -> exec.submit(new PeriodicalTask(this)),
-              intervalInMillis, TimeUnit.MILLISECONDS);
+          scheduledExecuterService.schedule(() -> {
+            if (!isShutdown.get() && exec != null) {
+              exec.submit(new PeriodicalTask(this));
+            }
+          }, intervalInMillis, TimeUnit.MILLISECONDS);

		exec = new ForkJoinPool(threadPoolSize, factory, null, false);
		isShutdown = new AtomicReference<>(false);

-        tasksInFlight.offer(task);
+        if (!tasksInFlight.offer(task)) {
+          LOG.error("Failed to enqueue background task for service {}. Task will not be tracked.", serviceName);
+        }

-            recursiveTasks.offer(task);
+            if (!recursiveTasks.offer(task)) {
+              // If the task cannot be enqueued, ensure it is joined and
+              // mark that not all deleted directories were processed.
+              task.join();
+              processedAllDeletedDirs = false;
+              break;
+            }

	startTime, dds.getOzoneManager().getKeyManager(),
	startTime, keyManager,

Conversation

smengcl commented Jan 28, 2026

What changes were proposed in this pull request?

What is the link to the Apache JIRA

How was this patch tested?

Uh oh!

smengcl commented Jan 28, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants