Big memory allocations in L1GS #238

Abdelrahman912 · 2025-11-29T22:04:18Z

closes #230

Abdelrahman912 · 2025-11-29T22:09:19Z

============================================================
Preconditioner Construction Timings:
============================================================
─────────────────────────────────────────────────────────────────────────────────
                                        Time                    Allocations      
                               ───────────────────────   ────────────────────────
       Tot / % measured:            152ms /  99.9%           6.26MiB / 100.0%    

Section                ncalls     time    %tot     avg     alloc    %tot      avg
─────────────────────────────────────────────────────────────────────────────────
_precompute_blocks          1    152ms  100.0%   152ms   6.26MiB  100.0%  6.26MiB
  kernel execution          1    100ms   65.5%   100ms   8.20KiB    0.1%  8.20KiB
  allocate SLbuffer         1   52.1ms   34.2%  52.1ms   5.22MiB   83.4%  5.22MiB
  isapprox(A, A')           1    464μs    0.3%   464μs   1.00MiB   16.0%  1.00MiB
  kernel setup              1   25.1μs    0.0%  25.1μs      128B    0.0%     128B
  allocate D_Dl1            1   24.6μs    0.0%  24.6μs   25.9KiB    0.4%  25.9KiB
  adapt(backend, _A)        1   40.0ns    0.0%  40.0ns     0.00B    0.0%    0.00B
  synchronize               1   36.0ns    0.0%  36.0ns     0.00B    0.0%    0.00B
─────────────────────────────────────────────────────────────────────────────────
============================================================
ldiv! Timings during solve:
============================================================
───────────────────────────────────────────────────────────────────────────────────────────
                                                  Time                    Allocations      
                                         ───────────────────────   ────────────────────────
            Tot / % measured:                 136ms /  83.0%           4.22MiB /  22.1%    

Section                          ncalls     time    %tot     avg     alloc    %tot      avg
───────────────────────────────────────────────────────────────────────────────────────────
ldiv! (CPU)                         120    113ms  100.0%   942μs    955KiB  100.0%  7.96KiB
  _forward_sweep!                   120    113ms   99.7%   939μs    954KiB   99.9%  7.95KiB
    _forward_sweep! (internal)      120    113ms   99.6%   938μs    953KiB   99.8%  7.94KiB
      kernel call                   120    111ms   98.6%   929μs    936KiB   98.0%  7.80KiB
      kernel construction           120    722μs    0.6%  6.01μs   15.0KiB    1.6%     128B
      synchronize                   120   10.6μs    0.0%  87.9ns     0.00B    0.0%    0.00B
      convert size_A                120   4.75μs    0.0%  39.6ns     0.00B    0.0%    0.00B
      @unpack partitioning          120   4.34μs    0.0%  36.2ns     0.00B    0.0%    0.00B
      ndrange calc                  120   4.18μs    0.0%  34.9ns     0.00B    0.0%    0.00B
      @unpack P                     120   3.37μs    0.0%  28.0ns     0.00B    0.0%    0.00B
  y .= x                            120    208μs    0.2%  1.73μs     0.00B    0.0%    0.00B
───────────────────────────────────────────────────────────────────────────────────────────============================================================
Unprec. no. iters: 981, time: 1.05442742
Prec. no. iters: 119, time: 0.136057039

============================================================
Preconditioner Construction Timings:
============================================================
─────────────────────────────────────────────────────────────────────────────────
                                        Time                    Allocations      
                               ───────────────────────   ────────────────────────
       Tot / % measured:            355ms /  99.8%           11.3MiB /  91.5%    

Section                ncalls     time    %tot     avg     alloc    %tot      avg
─────────────────────────────────────────────────────────────────────────────────
_precompute_blocks          1    355ms  100.0%   355ms   10.3MiB  100.0%  10.3MiB
  kernel execution          1    349ms   98.3%   349ms   8.20KiB    0.1%  8.20KiB
  allocate SLbuffer         1   4.19ms    1.2%  4.19ms   7.42MiB   72.0%  7.42MiB
  isapprox(A, A')           1   1.71ms    0.5%  1.71ms   2.85MiB   27.6%  2.85MiB
  kernel setup              1   53.5μs    0.0%  53.5μs      128B    0.0%     128B
  allocate D_Dl1            1   15.2μs    0.0%  15.2μs   30.9KiB    0.3%  30.9KiB
  synchronize               1   37.0ns    0.0%  37.0ns     0.00B    0.0%    0.00B
  adapt(backend, _A)        1   36.0ns    0.0%  36.0ns     0.00B    0.0%    0.00B
─────────────────────────────────────────────────────────────────────────────────
============================================================
ldiv! Timings during solve:
============================================================
───────────────────────────────────────────────────────────────────────────────────────────
                                                  Time                    Allocations      
                                         ───────────────────────   ────────────────────────
            Tot / % measured:                 797ms /  62.8%           18.0MiB /  17.9%    

Section                          ncalls     time    %tot     avg     alloc    %tot      avg
───────────────────────────────────────────────────────────────────────────────────────────
ldiv! (CPU)                         415    501ms  100.0%  1.21ms   3.21MiB  100.0%  7.93KiB
  _forward_sweep!                   415    499ms   99.8%  1.20ms   3.21MiB  100.0%  7.93KiB
    _forward_sweep! (internal)      415    499ms   99.7%  1.20ms   3.21MiB  100.0%  7.93KiB
      kernel call                   415    495ms   98.9%  1.19ms   3.16MiB   98.3%  7.80KiB
      kernel construction           415   2.63ms    0.5%  6.33μs   51.9KiB    1.6%     128B
      synchronize                   415   14.9μs    0.0%  36.0ns     0.00B    0.0%    0.00B
      convert size_A                415   12.3μs    0.0%  29.6ns     0.00B    0.0%    0.00B
      @unpack P                     415   12.1μs    0.0%  29.1ns     0.00B    0.0%    0.00B
      ndrange calc                  415   11.2μs    0.0%  27.0ns     0.00B    0.0%    0.00B
      @unpack partitioning          415   11.0μs    0.0%  26.6ns     0.00B    0.0%    0.00B
  y .= x                            415    796μs    0.2%  1.92μs     0.00B    0.0%    0.00B
───────────────────────────────────────────────────────────────────────────────────────────============================================================
Unprec. no. iters: 3067, time: 17.799613799
Prec. no. iters: 414, time: 0.796364037

@termi-official these are the outputs from the two examples in the test_preconditioners. There is the initial time which is done once and for all and there is the ldiv! which is called multiple times. the huge chunk of allocations comes from kernel constructions basically.

Also one thing that might be optimized is isapprox(A, A') but we need a prior info from the user I believe.

src/solver/linear/preconditioners/l1_gauss_seidel.jl

codecov · 2025-11-29T22:14:46Z

Codecov Report

❌ Patch coverage is 56.73289% with 196 lines in your changes missing coverage. Please review.
✅ Project coverage is 70.39%. Comparing base (a07d127) to head (0ec39a1).

Files with missing lines	Patch %	Lines
...c/solver/linear/preconditioners/l1_gauss_seidel.jl	56.73%	196 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #238      +/-   ##
==========================================
- Coverage   71.71%   70.39%   -1.32%     
==========================================
  Files          78       78              
  Lines        6515     6864     +349     
==========================================
+ Hits         4672     4832     +160     
- Misses       1843     2032     +189

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

src/solver/linear/preconditioners/l1_gauss_seidel.jl

termi-official

Thanks for investigating Abdelrahman! I left some comments on things which I catched on the fly.

Also take a look at CodeCov, it looks like not all of your code is covered in the CI.

src/solver/linear/preconditioners/l1_gauss_seidel.jl

termi-official

Thanks for the changes Abdelrahman. There are some minor things left tho.

@AbdAlazezAhmed can you confirm that this works now on the ECG?

src/solver/linear/preconditioners/l1_gauss_seidel.jl

test/test_preconditioners.jl

termi-official · 2025-12-03T11:25:45Z

test/test_preconditioners.jl

I think we are missing tests for the symmetric case.

Thunderbolt.jl/test/test_preconditioners.jl

Lines 157 to 163 in 8f872e3

@testset "Symmetric A" begin

md = mdopen("HB/bcsstk15")

A = md.A # type here is Symmetric{Float64, SparseMatrixCSC{Float64, Int64}}

b = ones(size(A, 1))

test_l1gs_prec(A, b)

end

end

matrix here is already of type Symmetric
also for the test algorithmic logic I added this

Thunderbolt.jl/test/test_preconditioners.jl

Lines 21 to 33 in 8f872e3

function test_sym(testname, A, x, y_exp, D_Dl1_exp, SLbuffer_exp, partsize)

@testset "$testname Symmetric" begin

total_ncores = 8 # Assuming 8 cores for testing

for ncores in 1:total_ncores # testing for multiple cores to check that the answer is independent of the number of cores

builder = L1GSPrecBuilder(PolyesterDevice(ncores))

P = A isa Symmetric ? builder(A, partsize) : builder(A, partsize; isSymA=true)

@test P.D_Dl1 ≈ D_Dl1_exp

@test P.SLbuffer ≈ SLbuffer_exp

y = P \ x

@test y ≈ y_exp

end

end

end

I see. Thanks!

However, we should use CG for the symmetric problems to ensure that the preconditioner actually conserves the symmetry.

src/solver/linear/preconditioners/Preconditioners.jl

termi-official

There are still some issues.

ThreadedSparseMatrixCSC support is missing
The preconditioner tanks the symmetry
There is too much of memory usage

Lets try to make the following work

using Thunderbolt
using SparseArrays,OrderedCollections,TimerOutputs,LinearAlgebra

# TimerOutputs.enable_debug_timings(Thunderbolt)
# TimerOutputs.enable_debug_timings(Thunderbolt.Preconditioners)
geo = Hexahedron
nel = 2

size = 2.
signal_strength = 0.04
nel_heart = (6, 6, 6) .* 1 .* nel
nel_torso = (8, 8, 8) .* Int(size) .* nel
ground_vertex = Vec(0.0, 0.0, 0.0)
electrodes = [
    ground_vertex,
    Vec(-size,  0.,  0.), 
    Vec( size,  0.,  0.),
    Vec( 0., -size,  0.),
    Vec( 0.,  size,  0.),
    Vec( 0.,  0., -size),
    Vec( 0.,  0.,  size),
]
electrode_pairs = [[i,1] for i in 2:length(electrodes)]

heart_grid = generate_mesh(geo, nel_heart)
Ferrite.transform_coordinates!(heart_grid, x->Vec{3}(sign.(x) .* x.^2))

κ  = ConstantCoefficient(SymmetricTensor{2,3,Float64}((1.0, 0, 0, 1.0, 0, 1.0)))
κᵢ = AnalyticalCoefficient(
    (x,t) -> norm(x,Inf) ≤ 1.0 ? SymmetricTensor{2,3,Float64}((1.0, 0, 0, 1.0, 0, 1.0)) : SymmetricTensor{2,3,Float64}((0.0, 0, 0, 0.0, 0, 0.0)),
    CartesianCoordinateSystem{3}()
)

heart_model = TransientDiffusionModel(
    κᵢ,
    NoStimulationProtocol(), # Poisoning to detecte if we accidentally touch these
    :φₘ
)
heart_fun = semidiscretize(
    heart_model,
    FiniteElementDiscretization(
        Dict(:φₘ => LagrangeCollection{1}()),
        Dirichlet[]
    ),
    heart_grid
)

op = Thunderbolt.setup_assembled_operator(
    Thunderbolt.PerColorAssemblyStrategy(Thunderbolt.PolyesterDevice(8)),
    Thunderbolt.BilinearDiffusionIntegrator(
        κ,
        QuadratureRuleCollection(2),
        :φₘ,
    ),
    Thunderbolt.SparseMatrixCSC,
    heart_fun.dh,
)
Thunderbolt.update_operator!(op, 0.0) # trigger assembly


torso_grid_ = generate_grid(geo, nel_torso, Vec((-size,-size,-size)), Vec((size,size,size)))
addcellset!(torso_grid_, "heart", x->norm(x,Inf) ≤ 1.0)
# addcellset!(torso_grid_, "surrounding-tissue", x->norm(x,Inf) ≥ 1.0)
torso_grid_.cellsets["surrounding-tissue"] = OrderedSet([i for i in 1:getncells(torso_grid_) if i ∉ torso_grid_.cellsets["heart"]])
torso_grid = Thunderbolt.to_mesh(torso_grid_)
u = zeros(Thunderbolt.solution_size(heart_fun))
geselowitz_electrodes = [[electrodes[1], electrodes[i]] for i in 2:length(electrodes)]

TimerOutputs.reset_timer!()
@info prod(nel_torso)
geselowitz_ecg = Thunderbolt.Geselowitz1989ECGLeadCache(
    heart_fun,
    torso_grid,
    κᵢ, κ,
    geselowitz_electrodes; 
    ground = OrderedSet([Thunderbolt.get_closest_vertex(ground_vertex, torso_grid)]),
    torso_heart_domain=["heart"],
    ipc                  = LagrangeCollection{1}(),
    qrc                  = QuadratureRuleCollection(3),
    linear_solver        = Thunderbolt.LinearSolve.KrylovJL_CG(precs = (A, p) -> (L1GSPrecBuilder(PolyesterDevice(8))(A, ceil(Int, SparseArrays.size(A,1)/8); isSymA=true), LinearAlgebra.I)),
    # linear_solver        = Thunderbolt.LinearSolve.KrylovJL_GMRES(precs = (A, p) -> (L1GSPrecBuilder(PolyesterDevice(8))(A, ceil(Int, 24)), LinearAlgebra.I)),
    system_matrix_type   = Thunderbolt.SparseMatrixCSR{Float64,Int64}, #Thunderbolt.ThreadedSparseMatrixCSR{Float64,Int64},
    strategy = Thunderbolt.PerColorAssemblyStrategy(Thunderbolt.PolyesterDevice(8)),
)

TimerOutputs.print_timer()

You need to apply these patches.

diff --git a/src/solver/interface.jl b/src/solver/interface.jl
index 8234949c..4279e657 100644
--- a/src/solver/interface.jl
+++ b/src/solver/interface.jl
@@ -148,13 +148,17 @@ update_constraints_block!(f::NullFunction, i::Block, solver_cache::AbstractTimeS
 
 create_system_matrix(T::Type{<:AbstractMatrix}, f::AbstractSemidiscreteFunction) = create_system_matrix(T, f.dh)
 
-function create_system_matrix(::Type{<:ThreadedSparseMatrixCSR{Tv,Ti}}, dh::AbstractDofHandler) where {Tv,Ti}
-    Acsct = transpose(convert(SparseMatrixCSC{Tv,Ti}, allocate_matrix(dh)))
+function create_system_matrix(::Type{<:ThreadedSparseMatrixCSR}, dh::AbstractDofHandler)
+    Acsct = allocate_matrix(SparseMatrixCSR{1,Float64,Int64}, dh)
     return ThreadedSparseMatrixCSR(Acsct)
 end
 
 function create_system_matrix(SpMatType::Type{<:SparseMatrixCSC}, dh::AbstractDofHandler)
-    A = convert(SpMatType, allocate_matrix(dh))
+    A = allocate_matrix(SparseMatrixCSC{Float64,Int64}, dh)
+    return A
+end
+function create_system_matrix(SpMatType::Type{<:SparseMatrixCSR}, dh::AbstractDofHandler)
+    A = allocate_matrix(SparseMatrixCSR{1,Float64,Int64}, dh)
     return A
 end

diff --git a/src/modeling/electrophysiology/ecg.jl b/src/modeling/electrophysiology/ecg.jl
index b0f47170..9db426db 100644
--- a/src/modeling/electrophysiology/ecg.jl
+++ b/src/modeling/electrophysiology/ecg.jl
@@ -402,6 +402,7 @@ function Geselowitz1989ECGLeadCache(
     linear_solver        = LinearSolve.KrylovJL_CG(),
     solution_vector_type = Vector{Float64},
     system_matrix_type   = ThreadedSparseMatrixCSR{Float64,Int64},
+    strategy             = SequentialAssemblyStrategy(SequentialCPUDevice()),
 )
     return Geselowitz1989ECGLeadCache(
         heart_fun,
@@ -416,6 +417,7 @@ function Geselowitz1989ECGLeadCache(
         linear_solver       ,
         solution_vector_type,
         system_matrix_type  ,
+        strategy            ,
     )
 end

termi-official · 2025-12-03T20:17:27Z

test/test_preconditioners.jl

I see. Thanks!

However, we should use CG for the symmetric problems to ensure that the preconditioner actually conserves the symmetry.

termi-official · 2025-12-03T21:56:36Z

src/solver/linear/preconditioners/l1_gauss_seidel.jl

+        η = convert(Tf, η)
+        D_Dl1 = adapt(backend, zeros(Tf, N)) # D + Dˡ
+        last_partsize = N - (nparts - 1) * partsize # size of the last partition
+        SLbuffer_size = (partsize * (partsize - 1) * (nparts-1)) ÷ 2 +  last_partsize * (last_partsize - 1) ÷ 2


We should store the lower/upper triangular parts in some sparse format. I do not have opinions about which one exactly.

termi-official · 2025-12-03T21:59:26Z

src/solver/linear/preconditioners/l1_gauss_seidel.jl

+        (; backend) = partitioning
+        # The following code is required because there is no assumption on the compatibality of x with the backend.
+        @timeit_debug "adapt(backend, y)" _y = adapt(backend, y)
+        @timeit_debug "_forward_sweep!" _forward_sweep!(_y, P)


The type of sweep should be ideally a user option -- and must be symmetric if the problem is symmetric, as we tank the symmetry needed for the solver otherwise.

Abdelrahman912 · 2025-12-08T20:12:31Z

src/solver/linear/preconditioners/l1_gauss_seidel.jl

-## Structs & Constructors ##
+## User defined types for sweep direction and storage strategy ##
+abstract type Sweep end
+struct ForwardSweep <: Sweep end # forward sweep -> lower triangular
+struct BackwardSweep <: Sweep end # backward sweep -> upper triangular
+struct SymmetricSweep <: Sweep end # symmetric sweep -> both lower and upper triangular
+
+abstract type StorageStrategy end
+struct OriginalMatrix <: StorageStrategy end # store original matrix
+struct PackedBuffer <: StorageStrategy end # store lower/upper triangular parts in a vector buffer -> efficient for small partitions
+struct SparseTriangular <: StorageStrategy end # store sparse lower/upper triangular matrices
+
+
+## Internal types to encapsulate sweep and storage ##
+abstract type SweepStorage end
+
+# Forward sweep variants
+abstract type ForwardSweepStorage <: SweepStorage end
+struct ForwardSweepOriginalMatrix{MatrixType} <: ForwardSweepStorage
+    A::MatrixType  # Original matrix
+end
+struct ForwardSweepPackedBuffer{VectorType} <: ForwardSweepStorage
+    SLbuffer::VectorType  # Packed strictly lower triangular elements
+end
+struct ForwardSweepSparseTriangular{MatrixType} <: ForwardSweepStorage
+    L::MatrixType  # Sparse  strictly lower triangular matrix
+end
+
+# Backward sweep variants
+abstract type BackwardSweepStorage <: SweepStorage end
+struct BackwardSweepOriginalMatrix{MatrixType} <: BackwardSweepStorage
+    A::MatrixType  # Original matrix
+end
+struct BackwardSweepPackedBuffer{VectorType} <: BackwardSweepStorage
+    SUbuffer::VectorType  # Packed strictly upper triangular elements
+end
+struct BackwardSweepSparseTriangular{MatrixType} <: BackwardSweepStorage
+    U::MatrixType  # Sparse strictly upper triangular matrix
+end
+
+# Symmetric sweep variants
+abstract type SymmetricSweepStorage <: SweepStorage end
+struct SymmetricSweepOriginalMatrix{MatrixType} <: SymmetricSweepStorage
+    A::MatrixType  # Original symmetric matrix
+end
+
+
+# TODO: are these two buffers necessary? won't be implemented for now
+struct SymmetricSweepPackedBuffer{VectorType} <: SymmetricSweepStorage
+    SLbuffer::VectorType  # Packed strictly lower triangular elements
+    SUbuffer::VectorType  # Packed strictly upper triangular elements
+end
+struct SymmetricSweepSparseTriangular{MatrixType} <: SymmetricSweepStorage
+    L::MatrixType  # Sparse strictly lower triangular matrix
+    U::MatrixType  # Sparse strictly upper triangular matrix
+end
+


I need some opinion here about the design:
Three user-defined options for the sweep (fs, bs, ss) and three user-defined options for storage strategy (original matrix (i.e. A), packed buffer (what we had previously, because might be performant for GPU context), sparse strictly lower/upper triangular matrix) . These options are high level APIs by the user, which they boil down internally to multiple permutations of SweepStorage (e.g. ForwardSweepOriginalMatrix) where they encapsulate the sweep type and the buffer (which will help in dispatching the correct kernels).

Questions here:

Is this design appropriate?

If so do we need to implement these

# TODO: are these two buffers necessary? won't be implemented for now struct SymmetricSweepPackedBuffer{VectorType} <: SymmetricSweepStorage SLbuffer::VectorType # Packed strictly lower triangular elements SUbuffer::VectorType # Packed strictly upper triangular elements end struct SymmetricSweepSparseTriangular{MatrixType} <: SymmetricSweepStorage L::MatrixType # Sparse strictly lower triangular matrix U::MatrixType # Sparse strictly upper triangular matrix end

Why do we need different storage types for the different sweep types? To control the dispatch? These are orthogonal, right? Also, to have a unified naming, I think the storage types should be called "caches". Naming the inner vectors buffer is fine.

So, just to understand, the use-case to store A directly would be that the buffer will store the indices for the triangular parts in the block?

Note that symmetric sweep typically refers to a forward sweep followed by a backward sweep.

A small hint here, it might be helpful to have a layer to dispatch with, i.e. apply_sweep! (or something better, I am out of steam for today already) and some internal function like apply_forward_sweep_internal! which gets called on the unpacked data. This way you can reuse the code of the forward/backward sweep for the symmetric sweep.

Why do we need different storage types for the different sweep types?
yes mainly for dispatching the right kernel easily, an alternative is to use sth like

struct ForwardSweepCache cache::Union{Vector,SparseMatrix} # also this sprase matrix might be strictly lower or `A` itself , so might be hairy end

So, just to understand, the use-case to store A directly would be that the buffer will store the indices for the triangular parts in the block?

No, A literally means the original matrix A ...no indicies needs to be stored, because later in the iterator, we know the start and the end of each block

For caches, no problem I will change it right away. for a dispatch layer, yes sure no problem...this implementation was just a quick prototype to the previous structs.

Abdelrahman912 · 2025-12-08T20:23:50Z

Thunderbolt.jl/src/solver/linear/preconditioners/l1_gauss_seidel.jl

Lines 144 to 146 in b34c57a

    
           # Fields 
        
           - `device::AbstractDevice`: The backend used for the preconditioner. More info [AbstractDevice](@ref). 
        
           """

Also, I don't know whether this is done intentionally or simply overlooked, but the reference to AbstractDevice gives 404 (i.e. no api reference to it) even though, devices.jl has already doc strings

termi-official · 2025-12-08T21:32:34Z

reference to AbstractDevice gives 404

Yea, I am not sure how to fix this. Nothing worth to spend time on right now tho.

AbdAlazezAhmed · 2025-12-09T07:16:53Z

reference to AbstractDevice gives 404

Doesn't seem to be referenced in an md file

add timeroutputs

9715068

Abdelrahman912 commented Nov 29, 2025

View reviewed changes

src/solver/linear/preconditioners/l1_gauss_seidel.jl Outdated Show resolved Hide resolved

Abdelrahman912 requested a review from termi-official November 29, 2025 22:11

Abdelrahman912 added 3 commits November 30, 2025 01:38

add hot fix for sym check

131471d

big mat test

77e4e3a

minor fix

8a8e5f6

Abdelrahman912 commented Nov 30, 2025

View reviewed changes

src/solver/linear/preconditioners/l1_gauss_seidel.jl Outdated Show resolved Hide resolved

minor changes

1e6fb37

Abdelrahman912 commented Nov 30, 2025

View reviewed changes

src/solver/linear/preconditioners/l1_gauss_seidel.jl Show resolved Hide resolved

termi-official requested changes Nov 30, 2025

View reviewed changes

Abdelrahman912 and others added 5 commits December 1, 2025 04:59

fixes

2a8b1e0

minor fix

5679c04

add eq 6.3

1620a1b

fix test

c656c08

Merge branch 'main' into l1gs-mem-fix

5172c8d

termi-official requested changes Dec 3, 2025

View reviewed changes

Abdelrahman912 added 3 commits December 3, 2025 17:36

minor fixes

00de37e

add eta and docstring

3fc68b2

typo

8f872e3

termi-official requested changes Dec 3, 2025

View reviewed changes

init first sketches for new L1GS

b34c57a

Abdelrahman912 commented Dec 8, 2025

View reviewed changes

Abdelrahman912 added 3 commits January 4, 2026 23:47

Merge branch 'main' into l1gs-mem-fix

048fde2

minor fix

3b2df89

julia formatter

2e1db53

Abdelrahman912 added 3 commits January 5, 2026 00:24

apply jlfmt

13e4677

init

eb968ef

init arch

88d658f

Abdelrahman912 marked this pull request as draft January 7, 2026 17:18

Abdelrahman912 added 8 commits January 14, 2026 17:42

refactor _make_sweep_plan

7c52544

refactor _apply_sweep

e7448db

jlfmt

6e74bff

minor fix

75c708e

pre-commit

9e25288

minor fix

35b800f

fix prev tests

41c9237

minor fix

0ec39a1

	@testset "Symmetric A" begin
	md = mdopen("HB/bcsstk15")
	A = md.A # type here is Symmetric{Float64, SparseMatrixCSC{Float64, Int64}}
	b = ones(size(A, 1))
	test_l1gs_prec(A, b)
	end
	end

	function test_sym(testname, A, x, y_exp, D_Dl1_exp, SLbuffer_exp, partsize)
	@testset "$testname Symmetric" begin
	total_ncores = 8 # Assuming 8 cores for testing
	for ncores in 1:total_ncores # testing for multiple cores to check that the answer is independent of the number of cores
	builder = L1GSPrecBuilder(PolyesterDevice(ncores))
	P = A isa Symmetric ? builder(A, partsize) : builder(A, partsize; isSymA=true)
	@test P.D_Dl1 ≈ D_Dl1_exp
	@test P.SLbuffer ≈ SLbuffer_exp
	y = P \ x
	@test y ≈ y_exp
	end
	end
	end

Big memory allocations in L1GS #238

Are you sure you want to change the base?

Big memory allocations in L1GS #238

Uh oh!

Conversation

Abdelrahman912 commented Nov 29, 2025

Uh oh!

Abdelrahman912 commented Nov 29, 2025

Uh oh!

Uh oh!

codecov bot commented Nov 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

termi-official left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

termi-official left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

termi-official left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Abdelrahman912 commented Dec 8, 2025

Uh oh!

termi-official commented Dec 8, 2025

Uh oh!

AbdAlazezAhmed commented Dec 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

codecov bot commented Nov 29, 2025 •

edited

Loading

termi-official left a comment •

edited

Loading

termi-official left a comment •

edited

Loading

AbdAlazezAhmed commented Dec 9, 2025 •

edited

Loading