Add Highway SIMD acceleration to ImageBufAlgo [add, sub, mul, div, mad, resample]#4994
Add Highway SIMD acceleration to ImageBufAlgo [add, sub, mul, div, mad, resample]#4994ssh4net wants to merge 70 commits intoAcademySoftwareFoundation:mainfrom
Conversation
|
I suspect you used LLM for some of this? Which is fine, but I think you should document in the PR description (commit comment) which tool you used and for what parts. |
| template<class Rtype, class Atype, class Btype> | ||
| static bool | ||
| add_impl_hwy(ImageBuf& R, const ImageBuf& A, const ImageBuf& B, ROI roi, | ||
| int nthreads) | ||
| { |
There was a problem hiding this comment.
I haven't done a line-by-line comparison, but it seems to me that the only difference between add_impl_hwy, sub_impl_hwy, and mul_impl_hwy is likely going to be
[](auto d, auto a, auto b) { return hn::Add(a, b); }
versus that one lambda changing for Sub and Mul.
I would love for even the initial commit to reduce this whole thing to a shared hwy_binary_perpixel_op() template that takes the lambda housing the op kernel as a templated parameter.
| // Process pixel by pixel (scalar fallback for strided channels) | ||
| for (int x = roi.xbegin; x < roi.xend; ++x) { | ||
| Rtype* r_ptr = ChannelPtr<Rtype>(Rv, x, y, roi.chbegin); | ||
| const Atype* a_ptr = ChannelPtr<Atype>(Av, x, y, | ||
| roi.chbegin); | ||
| const Btype* b_ptr = ChannelPtr<Btype>(Bv, x, y, | ||
| roi.chbegin); |
There was a problem hiding this comment.
I think we should benchmark the strided case and see how it compares to the contiguous case and the full scalar fallback that we've always had. If there is no big speed gain, I would be in favor of eliminating this whole clause and let non-contiguous strides use the old scalar path, then there is much less template expansion for hwy in the cases where there is not a large gain to be had. Note that this means that the "to hwy or not to hwy" test would need to test contiguity in addition to just localpixels().
|
@ssh4net It's been a while since this PR has been updated, but after your last push, it's failing to build. Can you please rebase on main, fix so it passes CI, and ensure that there is a DCO sign-off on each commit? I would like to proceed with this in some form. |
|
@lgritz sure! Give me a bit of time. I have added some fixes based on discussion above, but not verified fully yet, and switched to other projects 😅 |
Optional SIMD optimizations for selected ImageBufAlgo operations using the Google Highway library: • add/sub • mul/div • mad • resample Adds CMake and build system support, new implementation helpers, and developer documentation. Signed-off-by: Vlad (Kuzmin) Erium <libalias@gmail.com>
Signed-off-by: Vlad (Kuzmin) Erium <libalias@gmail.com>
This reverts commit 4d3b1f3. Signed-off-by: Vlad (Kuzmin) Erium <libalias@gmail.com>
Co-authored-by: Larry Gritz <lg@larrygritz.com> Signed-off-by: Vlad Erium <shaamaan@gmail.com> Signed-off-by: Vlad (Kuzmin) Erium <libalias@gmail.com>
Co-authored-by: Larry Gritz <lg@larrygritz.com> Signed-off-by: Vlad Erium <shaamaan@gmail.com> Signed-off-by: Vlad (Kuzmin) Erium <libalias@gmail.com>
Co-authored-by: Larry Gritz <lg@larrygritz.com> Signed-off-by: Vlad Erium <shaamaan@gmail.com> Signed-off-by: Vlad (Kuzmin) Erium <libalias@gmail.com>
Signed-off-by: Vlad (Kuzmin) Erium <libalias@gmail.com>
Signed-off-by: Vlad (Kuzmin) Erium <libalias@gmail.com>
Signed-off-by: Vlad (Kuzmin) Erium <libalias@gmail.com>
Generic per-pixel HWY operation helpers for binary and ternary ops, refactors add/sub/mul/div/mad HWY implementations to use these helpers, and ensures HWY SIMD is only used for contiguous channel ranges. Adds a new test to verify correct fallback to scalar code for strided (non-contiguous) ROI channel ranges. Signed-off-by: Vlad (Kuzmin) Erium <libalias@gmail.com>
Add specialized HWY fast-paths for add/sub/mul/div/mad to handle the common case where the ROI selects RGB channels of 4-channel (RGBA) images by processing full 4-channel interleaved data and preserving alpha bitwise. Introduce small op lambdas for each operator and handle float/half/double same-type cases with contiguous-channel checks, half-promote/demote paths, and division zero-safety. Also update tests to pre-fill destination buffers and compare results (removed ROI from compare) to validate the strided-ROI fallback behavior. Affects imagebufalgo_addsub.cpp, imagebufalgo_mad.cpp, imagebufalgo_muldiv.cpp and imagebufalgo_test.cpp. Signed-off-by: Vlad (Kuzmin) Erium <libalias@gmail.com>
Replace duplicated ad-hoc SIMD special-cases in add/sub/mad with generalized HWY helpers that handle the common packed-RGBA-but-ROI-is-RGB case. Introduce PromoteVec/DemoteVec, lane-type mapping for half, interleaved Load/Store helpers (including partial-vector variants), and per-pixel/ternary routines that preserve alpha or mask it for native integer ops. Also switch HwyPixels to use pixel/scanline stride, add necessary forward declarations and includes, and simplify callers to use the new helpers, broadening support to integer/native ops and reducing code duplication. Signed-off-by: Vlad (Kuzmin) Erium <libalias@gmail.com>
…()` (AcademySoftwareFoundation#4987) Rearrangements in 3.1 dropped the list of recognized attributes from the visible online docs and failed to document the span varieties. We fix and also reword a lot of the descriptions for clarity and uniformity. The previous organization was that there were several varieties of attribute(). In the header, the first one had the overall long explanation, including the list of all the recognized attributes. The other ones had short explanations of how they differed. In the docs, each one was referenced explicitly, pulling in its attendant bit of documentation. What really happened is that in the header, I made the new span-based version the "flagship" one with the full explanation, but I neglected to reference it in the docs, so the long description disappeared. I could have fixed by just adding refs to the new functions to the docs, as I originally meant to. But while I was there, I took the opportunity to surround the whole collection with a group marker, and then include the lot of them with a single reference to the group, rather than need to refer to each function variant individually. And while I was at it, I also reworded (and hopefully improved) some of those explanations. Signed-off-by: Larry Gritz <lg@larrygritz.com> Signed-off-by: Vlad (Kuzmin) Erium <libalias@gmail.com>
Signed-off-by: Larry Gritz <lg@larrygritz.com> Signed-off-by: Vlad (Kuzmin) Erium <libalias@gmail.com>
…Foundation#4990) Implement RLE compression support for the SGI output plugin. Reading RLE encoded images was already supported, but writing was never done up until this point. The existing sgi test seems sufficient to catch issues and it covers input/output of both 1 byte-per-pixel and 2 byte-per-pixel files. The documentation for the image plugins are sometimes not very clear about which attributes are relevant for input vs. output. There's usually 3 sections: Attributes, Attributes for Input, and Attributes for Output. Before this PR, SGI mentioned the "compression" attribute in the "general" Attributes section (rather than say just the Input section), which caused a bit of grief as the only way to discover that RLE was not implemented for Output was to glance at the file size of the resulting file... I had assumed that compression was supported for output too but discovered that it was not. Now that this PR implements the attribute for output I've left the documentation as-is in the "general" Attributes section since it applies to both read/writing now. But I'm open for suggestions here. Signed-off-by: Jesse Yurkovich <jesse.y@gmail.com> Signed-off-by: Vlad (Kuzmin) Erium <libalias@gmail.com>
Signed-off-by: Larry Gritz <lg@larrygritz.com> Signed-off-by: Vlad (Kuzmin) Erium <libalias@gmail.com>
Starting with 1.21, libheif seems to change behavior: When no CICP metadata is present, libheif now returns 2,2,2 (all unspecified) on read. OIIO convention, though, is to not set the attribute if valid CICP data is not in the file. --------- Signed-off-by: Larry Gritz <lg@larrygritz.com> Signed-off-by: Vlad (Kuzmin) Erium <libalias@gmail.com>
…wareFoundation#4993) For IBA::resample() when bilinear interpolation is used, almost all of the expense was due to its relying on ImageBuf::interppixel which is simple but constructs a new ImageBuf::ConstIterator EVERY TIME, which is very expensive. Reimplement in a way that reuses a single iterator. This speeds up IBA::resample by 20x or more typicaly. Also refactor resample to pull the handling of deep images into a separate helper function and out of the main inner loop. And add some benchmarking. Signed-off-by: Larry Gritz <lg@larrygritz.com> Signed-off-by: Vlad (Kuzmin) Erium <libalias@gmail.com>
* CI test vs the latest freetype 2.14.1 * Bump the version of freetype that we auto-build to the latest (from 2.13.2) * Simplify BZip2 finding logic, switch to using targets Signed-off-by: Larry Gritz <lg@larrygritz.com> Signed-off-by: Vlad (Kuzmin) Erium <libalias@gmail.com>
…areFoundation#4998) The Intel MacOS 15 CI testing is getting dicier... lots of times, Homebrew doesn't have cached versions of updated packages, so it tries to build from source, which takes forever. The big culprit today is Qt. So, basically, just on this one CI job variant, don't ask it to install Qt. If it's there, it's there. If not, just skip it. It's tested plenty in other variants. Signed-off-by: Larry Gritz <lg@larrygritz.com> Signed-off-by: Vlad (Kuzmin) Erium <libalias@gmail.com>
…cademySoftwareFoundation#4997) Signed-off-by: Larry Gritz <lg@larrygritz.com> Signed-off-by: Vlad (Kuzmin) Erium <libalias@gmail.com>
Fixes AcademySoftwareFoundation#5000 Signed-off-by: Brad Smith <brad@comstyle.com> Signed-off-by: Vlad (Kuzmin) Erium <libalias@gmail.com>
…wareFoundation#4995) Signed-off-by: Larry Gritz <lg@larrygritz.com> Signed-off-by: Vlad (Kuzmin) Erium <libalias@gmail.com>
Reflecting this month's releases and other things that recently went into main. Signed-off-by: Larry Gritz <lg@larrygritz.com> Signed-off-by: Vlad (Kuzmin) Erium <libalias@gmail.com>
…areFoundation#5026) Even though we have CI testing on Mac with ARM CPU that were passing, after getting a new laptop, I saw some test failures that were due to just a few pixels on a few tests needing a higher comparision threshold. Results are correct, just different due to the math. I guess this machine (CPU? build flags? specific compiler or library versions?) is ever so slightly different than the CI Macs, so I caught a few more instances that needed to be adjusted. I tried to increase the thresholds as little as possible to fix the problem. Signed-off-by: Larry Gritz <lg@larrygritz.com> Signed-off-by: Vlad (Kuzmin) Erium <libalias@gmail.com>
…AcademySoftwareFoundation#5025) I think it was basically harmless, since we do all the metadata name comparisons using case-insensitive comparisons. But we use "Exif:" as our prefix for Exif data throughout OIIO by convention, and there was this tiny handful of places where we said "exif:". Signed-off-by: Larry Gritz <lg@larrygritz.com> Signed-off-by: Vlad (Kuzmin) Erium <libalias@gmail.com>
…ySoftwareFoundation#5027) Need to test some MSVS-specific macros to determine what architecture to report. And especially, if it doesn't know the processor architecture, it still should be *appending* that to the platform, not replacing it! This caused MSVS-compiled OIIO on Windows to report "unknown arch?" Signed-off-by: Larry Gritz <lg@larrygritz.com> Signed-off-by: Vlad (Kuzmin) Erium <libalias@gmail.com>
AcademySoftwareFoundation#5031) Signed-off-by: Larry Gritz <lg@larrygritz.com> Signed-off-by: Vlad (Kuzmin) Erium <libalias@gmail.com>
…ademySoftwareFoundation#5029) Since the OpenImageIO 2.5 series, when calls to `check_open` were added, any format that did not declare support for "tiles" would immediately fail to open. But many of the formats which attempted to emulate tiles, by buffering the contents and writing it all as scanlines at the end, were not updated. All of the tile emulation code for these formats is effectively dead-code and untested. Remove the tile emulation code from these formats. An example of what the failure currently looks like: ```python >>> out = oiio.ImageOutput.create("test.png") >>> spec = oiio.ImageSpec(64, 64, 3, 'uint8') >>> spec.tile_width = 64 >>> out.open("test.png", spec) False >>> out.geterror() 'png does not support tiled images' ``` No tests were impacted. Signed-off-by: Jesse Yurkovich <jesse.y@gmail.com> Signed-off-by: Vlad (Kuzmin) Erium <libalias@gmail.com>
…ftwareFoundation#5006) Review: we have long had two assertion macros: OIIO_ASSERT which aborts upon failure in Debug builds and prints but continues in Release builds, and OIIO_DASSERT which aborts in Debug builds and is completely inactive for Relase builds. Inspired by C++26 contracts, and increasingly available "hardening modes" in major compilers (especially with the LLVM/clang project's libc++), I'm introducing some new verification helpers. New macro `OIIO_CONTRACT_ASSERT` more closely mimics C++26 contract_assert in many ways, and perhaps will simply wrap C++ contract_assert when C++26 is on our menu. Important ways that OIIO_CONTRACT_ASSERT differs from OIIO_ASSERT and OIIO_DASSERT: * Keeping in line with C++ contracts, there are 4 possible responses to a failed contract assertion: Ignore, Observe (print only), Enforce (print and abort) and Quick-Enforce (just abort). * Also define hardening levels: None, Fast, Extensive, and Debug, mimicking the levels of libc++. The idea is that maybe there will be some CONTRACT_ASSERT checks you only want to do for certain hardening levels. * By default, the contract failure response is Enforce, unless it's both a release build and the hardening level is set to None (in which case the response will be Ignore). But it's also overrideable optionally on a per-translation-unit basis by setting OIIO_ASSERTION_RESPONSE_DEFAULT before any OIIO headers are included (though obviously that only applies to inline functions or templates, not to any already-compiled code in the library). * Macros for explicit hardening levels: OIIO_HARDENING_ASSERT_FAST(), EXTENSIVE(), and DEBUG(), which call CONTRACT_ASSERT only when the hardening level is what's required or stricter. I also changed the bounds checking in operator[] of string_view, span, and image_span to use the contract assertions. Note that this adds a tiny bit of overhead, since the default is "enforce" for release builds (previously, using OIIO_DASSERT, it did no checks for release builds). But the benchmarks seem to idicate that the perf difference is barely measurable. I added some benchmarking that proves that the bounds check adds a minute overhead to an element access for a trivial `span<float>`, maybe even indescernable. Here are benchmarks comparing raw pointer access, std::array access, span access with the new checks, span access carefully bypassing the tests. Linux workstation, gcc-11, on my work computer: pointer operator[]: 647.8 ns (+/- 0.1ns) std::array operator[]: 647.8 ns (+/- 0.1ns) span operator[] : 657.6 ns (+/- 0.5ns) span unsafe indexing: 648.2 ns (+/- 0.2ns) span range : 648.1 ns (+/- 0.1ns) These are the most stable tests I have, with the least trial-to-trial variation, and show about a 1.5% speed hit on the bounds-checked span access itself, which I think will be truly un-measurable in the context of being interleaved with any other operations that you do with the data you pull from the span. Mac Intel, Apple Clang 17, on my (old) personal laptop: (much more variable timing, probably from MacOS scheduler quirks) pointer operator[]: 929.2 ns (+/- 6.7ns) std::array operator[]: 913.1 ns (+/- 20.6ns) span operator[] : 905.8 ns (+/- 13.3ns) span unsafe indexing: 913.9 ns (+/- 16.6ns) span range : 916.4 ns (+/- 20.3ns) You can see that here there is no obvious penalty, in fact it appears a little faster, but all within the timing uncertainty of the multiple trials, so statistically it's hard to discern any penalty. And a couple more for good measure from our CI, but note that because these are uncontrolled machines somewhere on the GitHub cloud, the timings might not be as reliable: Windows, MSVS 2022: pointer operator[]: 3716.3 ns (+/- 6.3ns) std::array operator[]: 3715.5 ns (+/- 3.4ns) span operator[] : 3715.6 ns (+/- 2.6ns) span unsafe indexing: 3712.1 ns (+/- 0.7ns) span range : 3714.2 ns (+/- 2.9ns) Linux, gcc-14, C++20: pointer operator[]: 1130.9 ns (+/- 0.2ns), 884.2 k/s std::array operator[]: 1132.0 ns (+/- 0.4ns), 883.4 k/s span operator[] : 1133.7 ns (+/- 0.4ns), 882.1 k/s span unsafe indexing: 1134.2 ns (+/- 1.6ns), 881.7 k/s span range : 1133.9 ns (+/- 0.7ns), 881.9 k/s MacOS ARM: pointer operator[]: 3456.6 ns (+/- 7.5ns) std::array operator[]: 3466.8 ns (+/- 12.2ns) span operator[] : 3610.9 ns (+/- 11.0ns) span unsafe indexing: 3607.4 ns (+/- 4.9ns) span range : 3612.4 ns (+/- 12.2ns) Windows with MSVS and Linux with newer g++ don't appear to show any penalty, and the bracketing of trial times indicates that maybe it's consistent enough to be meaningful? I can't think of anything I'm doing wrong here that would throw off the timing or disable the range checking on these tests. For MacOS ARM, the span looks like it has about a 4% penalty versus raw pointers? But OTOH, span bounds-checked vs non-checked vs range-for are all the same, so maybe the speed vs raw pointer is something else entirely? Also please note that a preferred way to avoid these extra bounds checks entirely is to change an index-oriented loop like span s; for (size_t i = 0; i < s.size(); ++i) foo(s[i]); // maybe bounds check on each iteration? to a range based loop: span s; for (auto& v : s) foo(v); which should be inherently safe and require no in-loop checks at all. --------- Signed-off-by: Larry Gritz <lg@larrygritz.com> Signed-off-by: Vlad (Kuzmin) Erium <libalias@gmail.com>
…ndation#5032) Mac Intel is getting long in the tooth, and quite often the Homebrew packages for Intel are found to be uncached and will try to build from source. When it's OpenCV, that's disastrous for our CI build times, it can get stalled for hours building all of OpenCV and its dependencies. So disable it for that one build variant. Signed-off-by: Larry Gritz <lg@larrygritz.com> Signed-off-by: Vlad (Kuzmin) Erium <libalias@gmail.com>
…mySoftwareFoundation#5030) Extra protections for corrupted BMP files that claim to be palette images, but have a BPP that doesn't support palette images. Also an extra guard around accessing the palette array if it is empty. Add an extra test case for this kind of corruption. Signed-off-by: Larry Gritz <lg@larrygritz.com> Signed-off-by: Vlad (Kuzmin) Erium <libalias@gmail.com>
…mySoftwareFoundation#5035) Fixes AcademySoftwareFoundation#5023 This was crashing when writing TIFF information that was supposed to be arrays of more than one rational, but in fact was provided as a single value, it was reading past the end of a memory array. I noticed that this whole region needs a cleanup, this is not the only problem. But a full overhaul seems too risky to backport, so my strategy is as follows: * THIS fix first, which I will backport right away to 3.0 and 3.1. * I will then submit a separate PR (already implemented and tested) that is a much more complete fix and overhaul of this portion of the code (and other places). That will get merged into main when approved. * After the second PR is merged, I'll hold it in main for a while to test its safety, and then decide if it seems ok to backport to 3.1 (but definitely not 3.0). Signed-off-by: Larry Gritz <lg@larrygritz.com> Signed-off-by: Vlad (Kuzmin) Erium <libalias@gmail.com>
…ademySoftwareFoundation#5039) Bump the version of 'fmt' library that we download and build (if not found) from 10.2 to 12.1. Some other touch-ups in build_fmt.cmake. Also, we have seen that recent fmt versions will fail to compile on MSVC unless using the `/utf-8` compiler flag, so ensure that is used and also passed on to other clients of libOpenImageIO_Util (which expose templates using those headers). Signed-off-by: Larry Gritz <lg@larrygritz.com> Signed-off-by: Vlad (Kuzmin) Erium <libalias@gmail.com>
AcademySoftwareFoundation#5036) This is a more comprehensive fix for issues discovered in PR AcademySoftwareFoundation#5035. The original problem reported in Issue AcademySoftwareFoundation#5023 was a crash when writing TIFF information that was supposed to be arrays of more than one rational, it was reading past the end of a memory array. AcademySoftwareFoundation#5035 is a minimal, immediate fix to address the crashes. But in the process, I saw a number of ways in which we were dropping metadata on the floor when the types didn't exactly match, but that we *could* handle with automatic conversion. The new cases that we handle with this PR are: * Exif RESOLUTIONUNIT tag is a short, but by convention we store it by the name as a string in OIIO metadata, so we need to convert back to a code (we did so for the main TIFF metadata, but not for Exif in TIFF). * Handle Exif "version" and "flashpixversion" metadata which have unusual encoding in TIFF files (they are 4-character strings, but must be stored in a TIFF tag of type BYTES, not as the usual type ASCII that most strings use. * Handle things that TIFF insists are ASCII but that come to us as metadata that's strings. Easy -- our `ParamValue.get_string()` automatically converts ther things like ints or floats into string representation. * Much more flexibility in automatically converting among the signed and unsigned, 16 and 32 bit, integer types when the metadata in our ImageSpec is integer but not the specific type of integer that TIFF/Exif thinks it should be. This doesn't appear to change the results of anything in our testsuite, but it's possible that some non-TIFF-to-TIFF image conversions that contain Exif data may now do certain type conversions properly instead of just silently dropping the metadata that had non-matching (but reasonably valid) types. Additionally, to do this nicely, I ended up adding a new TypeURational alias in typedesc.h (similar to TypeRational, but the case where both numerator and denominator are unsigned ints). And also fixed a random comment typo I noticed in tiffinput.cpp. --------- Signed-off-by: Larry Gritz <lg@larrygritz.com> Signed-off-by: Vlad (Kuzmin) Erium <libalias@gmail.com>
This is a PR proposing to keep the gamma precision. In some cases, we need more precise gamma values, while the existing rounding operation loses most of the precision. This change will continue to use rounded values to calculate and store color space information, but retain the original value in the "Gamma" parameter. In addition, it can also tidy up existing code. I've verified with png/exif.png & python-colorconfig tests. No regression is introduced. Signed-off-by: Lumina Wang <lumina.wang@autodesk.com> Signed-off-by: Vlad (Kuzmin) Erium <libalias@gmail.com>
…ion#5042) Also switch to a better idiom for detecting if we're a fork. Signed-off-by: Larry Gritz <lg@larrygritz.com> Signed-off-by: Vlad (Kuzmin) Erium <libalias@gmail.com>
…demySoftwareFoundation#5043) Implement support for reading and writing monochrome. Reading requires libheif 1.17+ for heif_image_handle_get_preferred_decoding_colorspace. Previously writing a single channel image would cause an exception due to wrong parameters, but close() would continue writing the image and crash. Destroy m_ctx on exception to prevent that for other potential errors. Test added for monochrome read and write. --------- Signed-off-by: Brecht Van Lommel <brecht@blender.org> Signed-off-by: Vlad (Kuzmin) Erium <libalias@gmail.com>
…ixVersion (AcademySoftwareFoundation#5045) This allows us to correctly read the ExifVersion and FlashPixVersion metadata in an EXIF block of a TIFF file. Signed-off-by: Larry Gritz <lg@larrygritz.com> Signed-off-by: Vlad (Kuzmin) Erium <libalias@gmail.com>
AcademySoftwareFoundation#5046) Fixes AcademySoftwareFoundation#5044 Oops, the logic was a little mixed up when there were exactly two images. One reason that this was a special case is that conceptually, there is just a stack, but the implementation is that there is a separate variable for the top item, and then the actual stack is all the other items. Also add more thorough testing of TOP/BOTTOM, including what happens for 2, 1, and also 0 items on the image stack (errors in that last case). Signed-off-by: Larry Gritz <lg@larrygritz.com> Signed-off-by: Vlad (Kuzmin) Erium <libalias@gmail.com>
…ftwareFoundation#5034) * cmake utility build_dependency_with_cmake was unconditionally doing a shallow clone and using `clone -b`, but that only works if it's got a branch or tag name, not if it has a commit hash. So change the logic so it does a shallow clone only if GIT_TAG is specified but GIT_COMMIT is not. * pybind11 self-builder is modified to allow a git commit override. Signed-off-by: Larry Gritz <lg@larrygritz.com> Signed-off-by: Vlad (Kuzmin) Erium <libalias@gmail.com>
…size (AcademySoftwareFoundation#5037) For various tile sizes (and scanline), benchmark how long it takes to read and write a 4k x 2k image. Signed-off-by: Larry Gritz <lg@larrygritz.com> Signed-off-by: Vlad (Kuzmin) Erium <libalias@gmail.com>
…areFoundation#5040) Intel icc is deprecated and hasn't had a release for a few years. It's holding us back, both by making us work around an ever growing number of icc bugs and limitation that will never be fixed, as well as not allowing us to upgrade minimum versions of certain dependencies, because icc can't correctly compile newer versions (as an example, it cannot use a 'fmt' library newer than the oldest we support, 7.0). So it's time to thank icc for its service and put it on the ice floe for the polar bears to eat. This is of course in main (future 3.2), and will not be backported to release branches, since we never stop support of a dependency or toolchain of existing releases. People requiring icc for whatever reason may keep using OIIO 3.1 or older. We will continue to support and test icx, the fully supported Intel LLVM-based compiler. --------- Signed-off-by: Larry Gritz <lg@larrygritz.com> Signed-off-by: Vlad (Kuzmin) Erium <libalias@gmail.com>
…eFoundation#5041) The previous minimum, 7.0, dated from mid-2020. We are raising now (in main / future 3.2 only) to 9.0, which dates from mid-2022, so we're still supporting several versions and/or years back. Because this changes minimum dependency versions, it will NOT be backported to release branches (3.1 or earlier). I had to remove the CI test variant for icc, because ancient icc can't correctly build newer versions of fmt, it seems. There is a separate PR to simply drop icc from our list of supported compilers. If anybody wants to argue for pulling the minimum up even farther (say, to fmt 10.0, released in 2023, so still supporting 3 years back), which would simplify even more places, I would consider it. Signed-off-by: Larry Gritz <lg@larrygritz.com> Signed-off-by: Vlad (Kuzmin) Erium <libalias@gmail.com>
…on#5061) The CI stub generation has been broken for a few days, failing CI every time. The checked-in stub files seem fine. Just turn off this check until we can figure out why it is broken. Signed-off-by: Larry Gritz <lg@larrygritz.com> Signed-off-by: Vlad (Kuzmin) Erium <libalias@gmail.com>
) I was seeing warnings with instantiation of the ispow2 function template for unsigned type, where the `x >= 0` clause is meaningless. Use a constexpr if to eliminate that pointless test for unsigned types. Signed-off-by: Larry Gritz <lg@larrygritz.com> Signed-off-by: Vlad (Kuzmin) Erium <libalias@gmail.com>
…cademySoftwareFoundation#5054) I tested out the JPEG XL CICP support and noticed that color primaries 12 was not supported. This pull request is looking to extend P3 support for color primaries 12. Note: color primaries 11 uses the DCI white point and color primaries 12 uses the D65 white point. The JxlPrimaries enum only covers P3 primaries as value 11 and not 12. See, https://github.com/libjxl/libjxl/blob/main/lib/include/jxl/color_encoding.h#L55-L75 Further code is therefore required to account for this on read and write. Tests for read and write of color primaries 11 and 12 were added. Signed-off-by: Shane Smith <shane.smith@dreamworks.com> Signed-off-by: Vlad (Kuzmin) Erium <libalias@gmail.com>
Signed-off-by: Vlad (Kuzmin) Erium <libalias@gmail.com>
Signed-off-by: Vlad (Kuzmin) Erium <libalias@gmail.com>
Signed-off-by: Vlad (Kuzmin) Erium <libalias@gmail.com>
Signed-off-by: Vlad (Kuzmin) Erium <libalias@gmail.com>
Signed-off-by: Vlad (Kuzmin) Erium <libalias@gmail.com>
Optional SIMD optimizations for selected ImageBufAlgo operations using the Google Highway library: • add/sub
• mul/div
• mad
• resample
Adds CMake and build system support, new implementation helpers, and developer documentation.
Code mostly wrote using frontier Opus4.5 and Codex GPT5.2 High models with a strict rules.
Checklist:
behavior.
testsuite.
PR, by pushing the changes to my fork and seeing that the automated CI
passed there. (Exceptions: If most tests pass and you can't figure out why
the remaining ones fail, it's ok to submit the PR and ask for help. Or if
any failures seem entirely unrelated to your change; sometimes things break
on the GitHub runners.)
fixed any problems reported by the clang-format CI test.
corresponding Python bindings. If altering ImageBufAlgo functions, I also
exposed the new functionality as oiiotool options.