Skip to content

Conversation

@d-w-moore
Copy link
Collaborator

@d-w-moore d-w-moore commented May 19, 2025

Wherein we close down threads in an orderly way, so that things don't leave things to be disposed in the wrong order for the ever persnickety SSL shutdown logic.

Experiments show that SIGTERM actually does induce the Python interpreter to shut down non-daemonic threads, so installing a signal handler for that may not be necessary in the end.

@d-w-moore d-w-moore changed the title [_722] fix segfault and hung threads on SIGINT during parallel get [#722] fix segfault and hung threads on KeyboardIinterrupt during parallel get May 19, 2025
@d-w-moore d-w-moore self-assigned this May 19, 2025
@d-w-moore d-w-moore marked this pull request as draft May 19, 2025 17:09
@d-w-moore
Copy link
Collaborator Author

After a bit of manual testing, will attempt to make a proper test for SIGINT and SIGTERM to ensure things are left in an ok state.

@d-w-moore
Copy link
Collaborator Author

d-w-moore commented Jun 5, 2025

A GUI for example that maintains background asynch parallel transfers using PRC could trap and guard against Ctrl-C thusly:

from irods.parallel import abort_asynchronous_transfers
signal(SIGINT, lambda *_:exit(0 if abort_asynchronous_transfers() else 0))

Update: abort_asynchronous_transfers has transformed. It is now abort_parallel_transfers and may now be used to abort the current (just interrupted) synchronous transfer as well as all pending background ones. See the README updates in this pull request.

@d-w-moore d-w-moore force-pushed the segfault_parallel_io_722.m branch from abafff5 to fb36836 Compare June 6, 2025 13:48
Copy link
Contributor

@alanking alanking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonable. Just a couple of things in the test

@korydraughn
Copy link
Contributor

Looks like we have a conflict.

Seems this PR is close to completion?

@alanking
Copy link
Contributor

alanking commented Dec 5, 2025

Just checking to see if this PR is still being considered for 3.3.0.

@d-w-moore
Copy link
Collaborator Author

Just checking to see if this PR is still being considered for 3.3.0.

Will check its currency to see whether the segfault is still a concern. If so, then I think we can consider it for this release.

@d-w-moore d-w-moore force-pushed the segfault_parallel_io_722.m branch from fb36836 to 481952c Compare December 12, 2025 15:25
@korydraughn
Copy link
Contributor

What's the status of this PR?

@d-w-moore
Copy link
Collaborator Author

What's the status of this PR?

I believe it's almost ready. I want to look over it once more.

Copy link
Contributor

@alanking alanking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awaiting signal that this is ready

@d-w-moore
Copy link
Collaborator Author

Awaiting signal that this is ready

I've added a test (first draft, will run soon) that interrupts a put . We weren't testing that previously.

@d-w-moore
Copy link
Collaborator Author

Now open to comment ... once more. These changes are final in my mind, as far as the significant changes to library functionality.

@d-w-moore
Copy link
Collaborator Author

I can squash the last 6 commits or so, if that helps reviewers.

@korydraughn
Copy link
Contributor

Yes please.

@alanking
Copy link
Contributor

Also, take a peek at the 2 failing checks to see if there's anything actionable.

@d-w-moore d-w-moore force-pushed the segfault_parallel_io_722.m branch from 4b0458e to 14037f9 Compare January 15, 2026 21:23
@d-w-moore
Copy link
Collaborator Author

d-w-moore commented Jan 15, 2026

Sorry about the delay. (I see some reviews are already made.) I have just submitted a squash - but no actual changes of any kind since the last note I posted above.

@d-w-moore
Copy link
Collaborator Author

d-w-moore commented Jan 15, 2026

Still a TODO: Introduce a DataTransferInterruptedException and raise the RuntimeError from it, in the case that we need to signal a put or get didn't complete. We could also, possibly, deprecate catching the RuntimeError in case at time of 4.0.0 release we want to make it a bare DataTransferInterruptedException. (Name up for suggestions, but that's what I've settled on.)

Going to have to make an issue for this one and handle it later, that is in 4.0.0, I think. Things are looking pretty complex so far, and I don't want to complicate this release with unforeseen and as yet too-difficult-to-test problems....

@alanking
Copy link
Contributor

Awaiting resolution of open review comments.

@d-w-moore
Copy link
Collaborator Author

Awaiting resolution of open review comments.

Will take care of those later today. For now, putting in work done tho' new tests do not pass.

Multithread programming is evil.

@d-w-moore
Copy link
Collaborator Author

This appears ready for review. Should be the final round....

@korydraughn
Copy link
Contributor

Okay. Please take a look at codacy to see if there's anything worth addressing.

Copy link
Contributor

@korydraughn korydraughn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're very close.

@d-w-moore
Copy link
Collaborator Author

Okay. Please take a look at codacy to see if there's anything worth addressing.

Already done. I don't see any more codacy issues worth looking at , beyond the ones already addressed.

@d-w-moore d-w-moore force-pushed the segfault_parallel_io_722.m branch from 5fe8bda to af51ad0 Compare January 23, 2026 18:20
@d-w-moore
Copy link
Collaborator Author

Pushed another example to the README, that I thought might be informative for folks trying to understand the PUT/GET abort facility.

@d-w-moore d-w-moore force-pushed the segfault_parallel_io_722.m branch from e05f038 to e02c504 Compare January 23, 2026 20:10
@d-w-moore
Copy link
Collaborator Author

Ok. Things are quiet again and ready for final review.

@d-w-moore d-w-moore force-pushed the segfault_parallel_io_722.m branch from 0f73366 to 96fb8b2 Compare January 26, 2026 15:43
@korydraughn
Copy link
Contributor

New wording looks good.

Now we're waiting to see tests pass and then squashing?

README, docstring, and comment changes/corrections.
@d-w-moore d-w-moore force-pushed the segfault_parallel_io_722.m branch from 96fb8b2 to 4268312 Compare January 26, 2026 18:09
@d-w-moore
Copy link
Collaborator Author

New wording looks good.

Now we're waiting to see tests pass and then squashing?

Just squashed. Last code changes seemed good in terms of test results, but one more pass can't hurt.

@d-w-moore
Copy link
Collaborator Author

d-w-moore commented Jan 27, 2026

Ok ...recommended changes made. Ready for final review / approval.

Copy link
Contributor

@alanking alanking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing else from me. Up to you if you want to address Codacy or the type checker, I guess.

@korydraughn
Copy link
Contributor

Yeah, let's look into the type checker and codacy issues.

The type checker issue looks like a quick fix.

There are some medium to high issues reported by codacy though. Please stare at them and determine whether they are real issues.

@d-w-moore
Copy link
Collaborator Author

d-w-moore commented Jan 28, 2026

Yeah, let's look into the type checker and codacy issues.

The type checker issue looks like a quick fix.

There are some medium to high issues reported by codacy though. Please stare at them and determine whether they are real issues.

Fixed the type checker issue , pushed the change ....

Went over the codacy issues again, and to my eyes they do not seem to matter. The "cell-variable defined in loop" problems appear moot because the results of the lambda functions depending on those variables are all queried and used only within the same iteration of the loop that defined them. Also, with the Popen call - I can't imagine a scenario in which any of its input is untrusted.

I'd say we're good to go.

Copy link
Contributor

@alanking alanking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with your assessment. The variable used in the closure is defined on each iteration and the closure is passed to another method to be executed immediately (that is, not stored). Contrasting with the problematic example in the Pylint documentation here: https://pylint.readthedocs.io/en/latest/user_guide/messages/warning/cell-var-from-loop.html

I think we are good to squash.

@korydraughn
Copy link
Contributor

Sounds good to me.

Squash to taste.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants