[PERF]: Epic for binding overhead improvements

This issue is tracking performance improvements and investigations to Python-to-C binding overhead, mostly driven by the benchmark of `cuTensorMapEncodeTiled` devised in #659.  That is a useful benchmark because it is a function with an unusually high number of arguments (and therefore unusually high Python-to-C overhead).

## Comparison to a more limited Cython binding

As an interesting experimental datapoint, a colleague provided a vibe-coded Cython binding for `cuTensorMapEncodeTiled` [that runs about 4x faster than `cuda-bindings` official one](https://gist.github.com/mdboom/09998002d205b682e60b6d6175e3d6f2).  It is useful to see where some overheads may be reduced, but care should be taken looking at its raw performance: this wrapper accepts far fewer things as inputs than the CUDA bindings, and doesn't include developer niceties, like enums.

## Merged or in-progress fixes

Timings below are per-iteration of the benchmark in #659.  This includes /both/ binding overhead and some fixed amount of time in the actual CUDA call.

- 4.80us Baseline time
- 3.63us #1543
- 2.73us #1545 
- 2.70us #1581 
- 2.59us #1616 
- 2.38us #1638

## Under investigation

Issues in this category are theoretical findings to reduce the operations required for type conversion, but haven't necessarily yet been confirmed to have a measurable effect.

- #1639
- #1640
- #1642

## Deferred (effective, but high effort)

- #1643

## Rejected (ineffective)

- #1605
- #1649
- #1637
- #1644

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PERF]: Epic for binding overhead improvements #1645

Comparison to a more limited Cython binding

Merged or in-progress fixes

Under investigation

Deferred (effective, but high effort)

Rejected (ineffective)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[PERF]: Epic for binding overhead improvements #1645

Description

Comparison to a more limited Cython binding

Merged or in-progress fixes

Under investigation

Deferred (effective, but high effort)

Rejected (ineffective)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions