Send SIGHUP to podman-executed tests#4975
Conversation
There was a problem hiding this comment.
Code Review
This pull request attempts to send a HUP signal to all processes in the podman container before stopping it. Use bash -c 'kill -HUP -1' to avoid dependency on an external kill binary, and catch tmt.utils.RunError instead of a broad Exception.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
The original `container stop --time` doesn't actually do anything useful, podman signals and waits only for the PID 1 to stop (an interactive bash session typically for a container's CMD), not the exec'd test. The SIGHUP is useful for tests that set up external state that needs to be shut down safely, and even for (most) use cases where the container is thrown away, the extra `kill` doesn't add much delay. Note that this is only a partial fix to the problem, because `podman stop --time N` always waits the full duration, because the interactive bash has SIG_IGN for SIGTERM (and even if it hadn't, it's running as PID 1 with kernel protection), so it sticks around. A proper fix would probably be to re-engineer the whole stopping logic, and perhaps overriding container CMD / ENTRYPOINT and maybe not closing the test output immediately. For now, this small fix keeps backwards compatibility for people who rely on the (public API) `stop-time` to continue working as it now does, while allowing tests to trap HUP. Note that I chose SIGHUP because sending SIGTERM to a hierarchy of processes (test wrapper, etc.) causes all of them to die at the same time, triggering SIGHUP anyway in the child processes (because tmt always allocates a TTY). Also, arguably, it's the more appropriate signal here. Related issue: teemtee#1362 Signed-off-by: Jiri Jaburek <comps@nomail.dom>
|
@comps why |
| @@ -646,6 +614,9 @@ def stop(self) -> None: | |||
| """ | |||
|
|
|||
| if self.container: | |||
| with contextlib.suppress(tmt.utils.RunError): | |||
There was a problem hiding this comment.
I for one would be interested in knowing that tmt tried this command, and it failed. A warning would be nice, try/except + tmt.utils.show_exception_as_warning() would take care of it without terminating the effort.
There was a problem hiding this comment.
I did try this too,
try:
self.podman(Command('exec', self.container, 'kill', '-HUP', '-1'))
except tmt.utils.RunError as exc:
tmt.utils.show_exception_as_warning(
exception=exc,
message="Failed to send SIGHUP to all processes while stopping.",
logger=self._logger,
)and it works nicely (tested by replacing -1 with 12345 to make it fail):
warn: Failed to send SIGHUP to all processes while stopping.
warn:
warn: Command 'podman exec tmt-022-SKDSfdeE kill -HUP 12345' returned 1.
warn:
warn: # stderr (1 lines)
warn: # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
warn: kill: sending signal to 12345 failed: No such process
warn: # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
warn:
but the problem is that for a test that does not leave background processes running, ie. a regular /bin/true, the "kill everything except PID 1" command doesn't have anything to kill, so it just fails with
kill: sending signal to -1 failed: No such process
So I can either add some logic for counting /proc/[0-9]* or spawning a shell with || true, or the PID extraction I mentioned for SIGTERM, or just ignore the failure for now.
Any suggestions?
I think the commit message already does mention it:
The PR description also mentions a version using I would avoid |
Well, yes and no. It does mention why you picked
^^ This is the missing piece, the bit about foreground process group and SSH example. It explains that
Yes, it's meaning is very much defined by the application itself, and that was my point. I was missing the "Why |
The original
container stop --timedoesn't actually do anything useful, podman signals and waits only for the PID 1 to stop (an interactive bash session typically for a container's CMD), not the exec'd test.The SIGHUP is useful for tests that set up external state that needs to be shut down safely, and even for (most) use cases where the container is thrown away, the extra
killdoesn't add much delay.Note that this is only a partial fix to the problem, as
podman stop --time Nalways waits the full duration, because the interactive bash hasSIG_IGNforSIGTERM(and even if it hadn't, it's running as PID 1 with kernel protection), so it sticks around.A proper fix would probably be to re-engineer the whole stopping logic, and perhaps overriding container CMD / ENTRYPOINT and maybe not closing the test output immediately.
For now, this small fix keeps backwards compatibility for people who rely on the (public API)
stop-timeto continue working as it now does, while allowing tests to trap HUP.Note that I chose
SIGHUPbecause sendingSIGTERMto a hierarchy of processes (test wrapper, etc.) causes all of them to die at the same time, triggeringSIGHUPanyway in the child processes (becausetmtalways allocates a TTY).Also, arguably, it's the more appropriate signal here, but opinions might vary on that.
SIGTERM version
Sending
SIGTERMwould be a lot more involved without much benefit overSIGHUP(both can be trapped, and typically are, along withSIGINT, as any of the 3 can legitimately appear on a clean termination).(probably with less path hardcoding)
Thanks!
Related issue: #1362
I've tested it on this main.fmf:
Pull Request Checklist