Send `SIGHUP` to podman-executed tests by comps · Pull Request #4975 · teemtee/tmt

comps · 2026-06-09T15:58:42Z

The original container stop --time doesn't actually do anything useful, podman signals and waits only for the PID 1 to stop (an interactive bash session typically for a container's CMD), not the exec'd test.

The SIGHUP is useful for tests that set up external state that needs to be shut down safely, and even for (most) use cases where the container is thrown away, the extra kill doesn't add much delay.

Note that this is only a partial fix to the problem, as podman stop --time N always waits the full duration, because the interactive bash has SIG_IGN for SIGTERM (and even if it hadn't, it's running as PID 1 with kernel protection), so it sticks around.
A proper fix would probably be to re-engineer the whole stopping logic, and perhaps overriding container CMD / ENTRYPOINT and maybe not closing the test output immediately.

For now, this small fix keeps backwards compatibility for people who rely on the (public API) stop-time to continue working as it now does, while allowing tests to trap HUP.

Note that I chose SIGHUP because sending SIGTERM to a hierarchy of processes (test wrapper, etc.) causes all of them to die at the same time, triggering SIGHUP anyway in the child processes (because tmt always allocates a TTY).
Also, arguably, it's the more appropriate signal here, but opinions might vary on that.

SIGTERM version

Sending SIGTERM would be a lot more involved without much benefit over SIGHUP (both can be trapped, and typically are, along with SIGINT, as any of the 3 can legitimately appear on a clean termination).

    Command('exec', self.container, '/bin/bash', '-c', 'read pid _ < /var/tmp/tmt-test.pid && kill -TERM $pid')

(probably with less path hardcoding)

Thanks!

Related issue: #1362

I've tested it on this main.fmf:

/plan:
  provision:
    how: container
    stop-time: 30
  discover:
    how: fmf
  execute:
    how: tmt

/test:
  test: |
    trap 'echo "trap fired" > $TMT_TEST_DATA/trap-proof; exit 0' HUP
    echo "waiting for signal..."
    sleep 300 &
    wait $!
    echo "sleep finished normally"

Pull Request Checklist

gemini-code-assist

Code Review

This pull request attempts to send a HUP signal to all processes in the podman container before stopping it. Use bash -c 'kill -HUP -1' to avoid dependency on an external kill binary, and catch tmt.utils.RunError instead of a broad Exception.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

LecrisUT

Aha, this might explain why I sometimes had issues killing a tmt run

The original `container stop --time` doesn't actually do anything useful, podman signals and waits only for the PID 1 to stop (an interactive bash session typically for a container's CMD), not the exec'd test. The SIGHUP is useful for tests that set up external state that needs to be shut down safely, and even for (most) use cases where the container is thrown away, the extra `kill` doesn't add much delay. Note that this is only a partial fix to the problem, because `podman stop --time N` always waits the full duration, because the interactive bash has SIG_IGN for SIGTERM (and even if it hadn't, it's running as PID 1 with kernel protection), so it sticks around. A proper fix would probably be to re-engineer the whole stopping logic, and perhaps overriding container CMD / ENTRYPOINT and maybe not closing the test output immediately. For now, this small fix keeps backwards compatibility for people who rely on the (public API) `stop-time` to continue working as it now does, while allowing tests to trap HUP. Note that I chose SIGHUP because sending SIGTERM to a hierarchy of processes (test wrapper, etc.) causes all of them to die at the same time, triggering SIGHUP anyway in the child processes (because tmt always allocates a TTY). Also, arguably, it's the more appropriate signal here. Related issue: teemtee#1362 Signed-off-by: Jiri Jaburek <comps@nomail.dom>

happz · 2026-06-09T19:14:44Z

@comps why SIGHUP? Does it have some special meaning to podman or shell sessions? I mean, it's not apparent why this signal would make such a difference, why not send e.g. SIGUSR1? E.g. "while allowing tests to trap HUP" - why they can't trap SIGUSR1? I know there is a special meaning to this signal, but 1. I'm not sure I understand it all, and 2. I'm absolutely sure in 6 months I will not recall it at all, mentioning the importance of SIGHUP in the commit message would be much appreciated. It looks as being picked arbitrarily, but that's not true.

happz · 2026-06-09T19:17:07Z

@@ -646,6 +614,9 @@ def stop(self) -> None:
        """

        if self.container:
+            with contextlib.suppress(tmt.utils.RunError):


I for one would be interested in knowing that tmt tried this command, and it failed. A warning would be nice, try/except + tmt.utils.show_exception_as_warning() would take care of it without terminating the effort.

I did try this too,

try: self.podman(Command('exec', self.container, 'kill', '-HUP', '-1')) except tmt.utils.RunError as exc: tmt.utils.show_exception_as_warning( exception=exc, message="Failed to send SIGHUP to all processes while stopping.", logger=self._logger, )

and it works nicely (tested by replacing -1 with 12345 to make it fail):

warn: Failed to send SIGHUP to all processes while stopping. warn: warn: Command 'podman exec tmt-022-SKDSfdeE kill -HUP 12345' returned 1. warn: warn: # stderr (1 lines) warn: # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ warn: kill: sending signal to 12345 failed: No such process warn: # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ warn:

but the problem is that for a test that does not leave background processes running, ie. a regular /bin/true, the "kill everything except PID 1" command doesn't have anything to kill, so it just fails with

kill: sending signal to -1 failed: No such process

So I can either add some logic for counting /proc/[0-9]* or spawning a shell with || true, or the PID extraction I mentioned for SIGTERM, or just ignore the failure for now.

Any suggestions?

comps · 2026-06-10T07:48:52Z

@comps why SIGHUP? Does it have some special meaning to podman or shell sessions? I mean, it's not apparent why this signal would make such a difference, why not send e.g. SIGUSR1? E.g. "while allowing tests to trap HUP" - why they can't trap SIGUSR1? I know there is a special meaning to this signal, but 1. I'm not sure I understand it all, and 2. I'm absolutely sure in 6 months I will not recall it at all, mentioning the importance of SIGHUP in the commit message would be much appreciated. It looks as being picked arbitrarily, but that's not true.

I think the commit message already does mention it:

Note that I chose SIGHUP because sending SIGTERM to a hierarchy
of processes (test wrapper, etc.) causes all of them to die at
the same time, triggering SIGHUP anyway in the child processes
(because tmt always allocates a TTY).

SIGHUP is also sent on any "hang up", historically a telephone line hang-up. These days it's sent by the kernel to the foreground process group (set by tcsetpgrp()) on a TTY if the session leader exits - typically when you run a program via ssh and the ssh disconnects. That's why I implied it's still a common signal for programs to trap for cleanup.

The PR description also mentions a version using SIGTERM, which would need to be more targeted in extracting the PID of the test process specifically (on the remote end), but I could re-do it like that if SIGHUP is a problem.

I would avoid SIGUSR1 as it typically has very specific per-application meaning, like "print dd statistics on console while copying".

happz · 2026-06-10T07:54:50Z

@comps why SIGHUP? Does it have some special meaning to podman or shell sessions? I mean, it's not apparent why this signal would make such a difference, why not send e.g. SIGUSR1? E.g. "while allowing tests to trap HUP" - why they can't trap SIGUSR1? I know there is a special meaning to this signal, but 1. I'm not sure I understand it all, and 2. I'm absolutely sure in 6 months I will not recall it at all, mentioning the importance of SIGHUP in the commit message would be much appreciated. It looks as being picked arbitrarily, but that's not true.

I think the commit message already does mention it:

Note that I chose SIGHUP because sending SIGTERM to a hierarchy
of processes (test wrapper, etc.) causes all of them to die at
the same time, triggering SIGHUP anyway in the child processes
(because tmt always allocates a TTY).

Well, yes and no. It does mention why you picked SIGHUP over SIGTERM, but it does not really say why no other signal was eligible.

SIGHUP is also sent on any "hang up", historically a telephone line hang-up. These days it's sent by the kernel to the foreground process group (set by tcsetpgrp()) on a TTY if the session leader exits - typically when you run a program via ssh and the ssh disconnects. That's why I implied it's still a common signal for programs to trap for cleanup.

^^ This is the missing piece, the bit about foreground process group and SSH example. It explains that SIGHUP has a special meaning rather than being an arbitrary selected signal whose only qualification was not being SIGTERM.

I would avoid SIGUSR1 as it typically has very specific per-application meaning, like "print dd statistics on console while copying".

Yes, it's meaning is very much defined by the application itself, and that was my point. I was missing the "Why SIGHUP and not let's say SIGUSR1? That's also a signal, right?" bit.

gemini-code-assist Bot reviewed Jun 9, 2026

View reviewed changes

Comment thread tmt/steps/provision/podman.py Outdated

comps force-pushed the podman_kill_hup branch from fc23a69 to 43044c3 Compare June 9, 2026 16:08

LecrisUT approved these changes Jun 9, 2026

View reviewed changes

comps force-pushed the podman_kill_hup branch from 43044c3 to d7e7985 Compare June 9, 2026 16:11

comps force-pushed the podman_kill_hup branch from d7e7985 to a0eea62 Compare June 9, 2026 16:14

happz reviewed Jun 9, 2026

View reviewed changes

therazix added this to planning Jun 10, 2026

github-project-automation Bot moved this to backlog in planning Jun 10, 2026

therazix moved this from backlog to review in planning Jun 10, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Send `SIGHUP` to podman-executed tests#4975

Send `SIGHUP` to podman-executed tests#4975
comps wants to merge 1 commit into
teemtee:mainfrom
comps:podman_kill_hup

comps commented Jun 9, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

LecrisUT left a comment •

edited

Loading

Uh oh!

happz commented Jun 9, 2026

Uh oh!

happz Jun 9, 2026

Uh oh!

comps Jun 10, 2026

Uh oh!

comps commented Jun 10, 2026

Uh oh!

happz commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

comps commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

LecrisUT left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

happz commented Jun 9, 2026

Uh oh!

happz Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

comps Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

comps commented Jun 10, 2026

Uh oh!

happz commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

comps commented Jun 9, 2026 •

edited

Loading

LecrisUT left a comment •

edited

Loading