Skip to content

Add SYCL support for t/geometry/t/pipelines kernels, Transform ops, and NNS CPU fallback#7443

Draft
Copilot wants to merge 24 commits into
mainfrom
copilot/add-sycl-kernels-for-cuda
Draft

Add SYCL support for t/geometry/t/pipelines kernels, Transform ops, and NNS CPU fallback#7443
Copilot wants to merge 24 commits into
mainfrom
copilot/add-sycl-kernels-for-cuda

Conversation

Copilot AI commented Feb 20, 2026

Copy link
Copy Markdown
Contributor

Port missing CUDA kernels to SYCL and fix all "Unimplemented device" errors for SYCL tensors in the t/pipelines and t/geometry subsystems.

Type

  • New feature (non-breaking change which adds functionality). Resolves #

Motivation and Context

SYCL:0 devices hit utility::LogError("Unimplemented device") in several hot paths:

  • Transform{Points,Normals} / Rotate{Points,Normals} — no SYCL dispatch in Transform.cpp
  • EstimateNormals, EstimateColorGradients, RemoveRadiusOutliers, RemoveStatisticalOutliers, ComputeBoundaryPointsNearestNeighborSearch asserts non-SYCL
  • All t/pipelines/kernel functions (ComputeFPFHFeature, ComputePosePoint*, FillInLinearSystem, RGBDOdometry, TransformationConverter) — no SYCL kernels existed

Checklist:

  • I have run python util/check_style.py --apply to apply Open3D code style
    to my code.
  • This PR changes Open3D behavior or adds new functionality.
    • Both C++ (Doxygen) and Python (Sphinx / Google style) documentation is
      updated accordingly.
    • I have added or updated C++ and / or Python unit tests OR included test
      results
      (e.g. screenshots or numbers) here.
  • I will follow up and update the code if CI fails.
  • For fork PRs, I have selected Allow edits from maintainers.

Description

New SYCL pipeline kernels (t/pipelines/kernel/)

Five new *SYCL.cpp files implement SYCL equivalents of every CUDA kernel:

File Functions
FeatureSYCL.cpp ComputeFPFHFeatureSYCL
FillInLinearSystemSYCL.cpp FillInRigidAlignmentTermSYCL, FillInSLACAlignmentTermSYCL, FillInSLACRegularizerTermSYCL
RegistrationSYCL.cpp ComputePosePointToPointSYCL, ComputePosePointToPlaneSYCL, ComputeColoredICPResidualAndGradientSYCL, ComputePointToPlaneDistancesSYCL
RGBDOdometrySYCL.cpp All four RGBD odometry pose solvers
TransformationConverterSYCL.cpp Pose↔transformation converters

Kernels use sycl::nd_range + sycl::reduce_over_group for reduction paths (matching CUDA's cub::BlockReduce) instead of plain global atomics, and sycl::local_accessor for shared local memory.

New SYCL geometry kernel (t/geometry/kernel/)

TransformSYCL.cpp — SYCL implementations of TransformPoints, TransformNormals, RotatePoints, RotateNormals via sycl::queue::parallel_for. Reuses the existing TransformImpl.h per-element kernels, guarded by a new OPEN3D_SKIP_TRANSFORM_MAIN macro (mirrors OPEN3D_SKIP_FPFH_MAIN in FeatureImpl.h) to avoid duplicate symbol errors.

Transform.cpp and Transform.h updated with IsSYCL() dispatch branches.

NNS CPU fallback for SYCL devices (t/geometry/PointCloud.cpp, t/pipelines/registration/)

NearestNeighborSearch explicitly rejects SYCL tensors. Every NNS call site now transparently redirects to CPU when device.IsSYCL():

const core::Device nns_device = device.IsSYCL() ? core::Device("CPU:0") : device;
const core::Tensor positions_nns = GetPointPositions().To(nns_device);
core::nns::NearestNeighborSearch nns(positions_nns);
// ... run NNS ...
indices = indices.To(device);  // bring results back for SYCL kernels

Affected: EstimateNormals (Hybrid/KNN/Radius), EstimateColorGradients (Hybrid/KNN/Radius), RemoveRadiusOutliers, RemoveStatisticalOutliers, ComputeBoundaryPoints, ComputeFPFHFeature, CorrespondencesFromFeatures, EvaluateRegistration, MultiScaleICP, GetInformationMatrix.

Test coverage

All t/pipelines tests (Registration, Feature, TransformationEstimation, RGBDOdometry, TransformationConverter, SLAC) updated to use PermuteDevicesWithSYCL so SYCL:0 is exercised in CI.


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

@update-docs

update-docs Bot commented Feb 20, 2026

Copy link
Copy Markdown

Thanks for submitting this pull request! The maintainers of this repository would appreciate if you could update the CHANGELOG.md based on your changes.

Copilot AI and others added 2 commits February 20, 2026 13:01
… Registration, RGBDOdometry, TransformationConverter

Co-authored-by: ssheorey <41028320+ssheorey@users.noreply.github.com>
…L declaration

Co-authored-by: ssheorey <41028320+ssheorey@users.noreply.github.com>
Copilot AI changed the title [WIP] Add equivalent SYCL kernels for existing CUDA kernels Add SYCL kernels for t/pipelines: Feature, FillInLinearSystem, Registration, RGBDOdometry, TransformationConverter Feb 20, 2026
Copilot AI requested a review from ssheorey February 20, 2026 13:07
… to SYCL kernel files

Co-authored-by: ssheorey <41028320+ssheorey@users.noreply.github.com>
Copilot AI changed the title Add SYCL kernels for t/pipelines: Feature, FillInLinearSystem, Registration, RGBDOdometry, TransformationConverter Add SYCL kernels for t/pipelines/kernel (Feature, FillInLinearSystem, Registration, RGBDOdometry, TransformationConverter) Feb 23, 2026
…e coverage

Co-authored-by: ssheorey <41028320+ssheorey@users.noreply.github.com>
Copilot AI changed the title Add SYCL kernels for t/pipelines/kernel (Feature, FillInLinearSystem, Registration, RGBDOdometry, TransformationConverter) Add SYCL kernels for t/pipelines (Feature, FillInLinearSystem, Registration, RGBDOdometry, TransformationConverter) and enable SYCL test coverage Mar 13, 2026
…ead of plain global atomics

Co-authored-by: ssheorey <41028320+ssheorey@users.noreply.github.com>
Copilot AI changed the title Add SYCL kernels for t/pipelines (Feature, FillInLinearSystem, Registration, RGBDOdometry, TransformationConverter) and enable SYCL test coverage Optimize SYCL pipeline kernels: group reduction instead of plain global atomics Mar 13, 2026
Copilot AI changed the title Optimize SYCL pipeline kernels: group reduction instead of plain global atomics Add SYCL support for NNS-dependent operations via CPU fallback Mar 14, 2026
Copilot AI and others added 5 commits March 14, 2026 07:30
Co-authored-by: ssheorey <41028320+ssheorey@users.noreply.github.com>
Co-authored-by: ssheorey <41028320+ssheorey@users.noreply.github.com>
…/Open3D into copilot/add-sycl-kernels-for-cuda
@ssheorey ssheorey force-pushed the copilot/add-sycl-kernels-for-cuda branch from 43e7713 to ef3f5a1 Compare March 16, 2026 19:38
…als, RotatePoints, RotateNormals

Co-authored-by: ssheorey <41028320+ssheorey@users.noreply.github.com>
Copilot AI changed the title Add SYCL support for NNS-dependent operations via CPU fallback Add SYCL support for t/geometry/t/pipelines kernels, Transform ops, and NNS CPU fallback Mar 16, 2026
@ssheorey ssheorey added this to the v0.20 milestone Jun 19, 2026
@ssheorey ssheorey added the status / to merge Looks good, merge after minor updates. label Jun 19, 2026
ssheorey added 5 commits June 20, 2026 20:45
- Implemented SYCL device support in ImageImpl.h and PointCloud.h for various functions including ToSYCL, ClipTransformSYCL, and PyrDownDepthSYCL.
- Added corresponding SYCL implementations in ImageSYCL.cpp and PointCloudSYCL.cpp.
- Updated PointCloud.cpp and PointCloudImpl.h to handle SYCL devices in functions like Unproject, Project, and GetPointMaskWithinAABB.
- Modified tests to accommodate SYCL devices, ensuring compatibility and skipping unsupported tests where necessary.
- Introduced new test cases for SYCL in Python tests for nearest neighbor search and registration.
…rnels

Implement SYCL custom kernels for TriangleMesh normals/areas and VoxelBlockGrid
touch operations. Stub out remaining VoxelBlockGrid kernels due to core::HashMap
limitations on SYCL, and add corresponding C++ and Python tests.
FP64 check
missing hash function
improved object copy
fix early return bug in IndexAddContiguousSYCL
use single workgroup launch to avoid atomics in ArgReduce (Check and
revert)
custom kernel for merging top-k results in KnnIndex (k<256)
fix nanoflann call missing sort option
Add RGB2Gray with tensor ops (for SYCL)
…L kernel invocations to use direct queue parallel_for calls.

Add tests for nearest nbr and hashmap
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

status / to merge Looks good, merge after minor updates.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants