Skip to content

Send SIGHUP to podman-executed tests#4975

Open
comps wants to merge 1 commit into
teemtee:mainfrom
comps:podman_kill_hup
Open

Send SIGHUP to podman-executed tests#4975
comps wants to merge 1 commit into
teemtee:mainfrom
comps:podman_kill_hup

Conversation

@comps

@comps comps commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

The original container stop --time doesn't actually do anything useful, podman signals and waits only for the PID 1 to stop (an interactive bash session typically for a container's CMD), not the exec'd test.

The SIGHUP is useful for tests that set up external state that needs to be shut down safely, and even for (most) use cases where the container is thrown away, the extra kill doesn't add much delay.

Note that this is only a partial fix to the problem, as podman stop --time N always waits the full duration, because the interactive bash has SIG_IGN for SIGTERM (and even if it hadn't, it's running as PID 1 with kernel protection), so it sticks around.
A proper fix would probably be to re-engineer the whole stopping logic, and perhaps overriding container CMD / ENTRYPOINT and maybe not closing the test output immediately.

For now, this small fix keeps backwards compatibility for people who rely on the (public API) stop-time to continue working as it now does, while allowing tests to trap HUP.

Note that I chose SIGHUP because sending SIGTERM to a hierarchy of processes (test wrapper, etc.) causes all of them to die at the same time, triggering SIGHUP anyway in the child processes (because tmt always allocates a TTY).
Also, arguably, it's the more appropriate signal here, but opinions might vary on that.

SIGTERM version

Sending SIGTERM would be a lot more involved without much benefit over SIGHUP (both can be trapped, and typically are, along with SIGINT, as any of the 3 can legitimately appear on a clean termination).

    Command('exec', self.container, '/bin/bash', '-c', 'read pid _ < /var/tmp/tmt-test.pid && kill -TERM $pid')

(probably with less path hardcoding)

Thanks!

Related issue: #1362


I've tested it on this main.fmf:

/plan:
  provision:
    how: container
    stop-time: 30
  discover:
    how: fmf
  execute:
    how: tmt

/test:
  test: |
    trap 'echo "trap fired" > $TMT_TEST_DATA/trap-proof; exit 0' HUP
    echo "waiting for signal..."
    sleep 300 &
    wait $!
    echo "sleep finished normally"

Pull Request Checklist

  • implement the feature
  • write the documentation
  • extend the test coverage
  • update the specification
  • adjust plugin docstring
  • modify the json schema
  • mention the version
  • include a release note

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request attempts to send a HUP signal to all processes in the podman container before stopping it. Use bash -c 'kill -HUP -1' to avoid dependency on an external kill binary, and catch tmt.utils.RunError instead of a broad Exception.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread tmt/steps/provision/podman.py Outdated
@comps comps force-pushed the podman_kill_hup branch from fc23a69 to 43044c3 Compare June 9, 2026 16:08

@LecrisUT LecrisUT left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha, this might explain why I sometimes had issues killing a tmt run

@comps comps force-pushed the podman_kill_hup branch from 43044c3 to d7e7985 Compare June 9, 2026 16:11
The original `container stop --time` doesn't actually do anything
useful, podman signals and waits only for the PID 1 to stop
(an interactive bash session typically for a container's CMD),
not the exec'd test.

The SIGHUP is useful for tests that set up external state that
needs to be shut down safely, and even for (most) use cases where
the container is thrown away, the extra `kill` doesn't add much
delay.

Note that this is only a partial fix to the problem, because
`podman stop --time N` always waits the full duration, because
the interactive bash has SIG_IGN for SIGTERM (and even if it hadn't,
it's running as PID 1 with kernel protection), so it sticks around.
A proper fix would probably be to re-engineer the whole stopping
logic, and perhaps overriding container CMD / ENTRYPOINT and maybe
not closing the test output immediately.

For now, this small fix keeps backwards compatibility for people
who rely on the (public API) `stop-time` to continue working as
it now does, while allowing tests to trap HUP.

Note that I chose SIGHUP because sending SIGTERM to a hierarchy
of processes (test wrapper, etc.) causes all of them to die at
the same time, triggering SIGHUP anyway in the child processes
(because tmt always allocates a TTY).
Also, arguably, it's the more appropriate signal here.

Related issue: teemtee#1362

Signed-off-by: Jiri Jaburek <comps@nomail.dom>
@comps comps force-pushed the podman_kill_hup branch from d7e7985 to a0eea62 Compare June 9, 2026 16:14
@happz

happz commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

@comps why SIGHUP? Does it have some special meaning to podman or shell sessions? I mean, it's not apparent why this signal would make such a difference, why not send e.g. SIGUSR1? E.g. "while allowing tests to trap HUP" - why they can't trap SIGUSR1? I know there is a special meaning to this signal, but 1. I'm not sure I understand it all, and 2. I'm absolutely sure in 6 months I will not recall it at all, mentioning the importance of SIGHUP in the commit message would be much appreciated. It looks as being picked arbitrarily, but that's not true.

@@ -646,6 +614,9 @@ def stop(self) -> None:
"""

if self.container:
with contextlib.suppress(tmt.utils.RunError):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I for one would be interested in knowing that tmt tried this command, and it failed. A warning would be nice, try/except + tmt.utils.show_exception_as_warning() would take care of it without terminating the effort.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did try this too,

try:
    self.podman(Command('exec', self.container, 'kill', '-HUP', '-1'))
except tmt.utils.RunError as exc:
    tmt.utils.show_exception_as_warning(
        exception=exc,
        message="Failed to send SIGHUP to all processes while stopping.",
        logger=self._logger,
    )

and it works nicely (tested by replacing -1 with 12345 to make it fail):

warn: Failed to send SIGHUP to all processes while stopping.
warn: 
warn: Command 'podman exec tmt-022-SKDSfdeE kill -HUP 12345' returned 1.
warn: 
warn: # stderr (1 lines)
warn: # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
warn: kill: sending signal to 12345 failed: No such process 
warn: # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
warn: 

but the problem is that for a test that does not leave background processes running, ie. a regular /bin/true, the "kill everything except PID 1" command doesn't have anything to kill, so it just fails with

kill: sending signal to -1 failed: No such process

So I can either add some logic for counting /proc/[0-9]* or spawning a shell with || true, or the PID extraction I mentioned for SIGTERM, or just ignore the failure for now.

Any suggestions?

@comps

comps commented Jun 10, 2026

Copy link
Copy Markdown
Contributor Author

@comps why SIGHUP? Does it have some special meaning to podman or shell sessions? I mean, it's not apparent why this signal would make such a difference, why not send e.g. SIGUSR1? E.g. "while allowing tests to trap HUP" - why they can't trap SIGUSR1? I know there is a special meaning to this signal, but 1. I'm not sure I understand it all, and 2. I'm absolutely sure in 6 months I will not recall it at all, mentioning the importance of SIGHUP in the commit message would be much appreciated. It looks as being picked arbitrarily, but that's not true.

I think the commit message already does mention it:

Note that I chose SIGHUP because sending SIGTERM to a hierarchy
of processes (test wrapper, etc.) causes all of them to die at
the same time, triggering SIGHUP anyway in the child processes
(because tmt always allocates a TTY).

SIGHUP is also sent on any "hang up", historically a telephone line hang-up. These days it's sent by the kernel to the foreground process group (set by tcsetpgrp()) on a TTY if the session leader exits - typically when you run a program via ssh and the ssh disconnects. That's why I implied it's still a common signal for programs to trap for cleanup.

The PR description also mentions a version using SIGTERM, which would need to be more targeted in extracting the PID of the test process specifically (on the remote end), but I could re-do it like that if SIGHUP is a problem.

I would avoid SIGUSR1 as it typically has very specific per-application meaning, like "print dd statistics on console while copying".

@happz

happz commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

@comps why SIGHUP? Does it have some special meaning to podman or shell sessions? I mean, it's not apparent why this signal would make such a difference, why not send e.g. SIGUSR1? E.g. "while allowing tests to trap HUP" - why they can't trap SIGUSR1? I know there is a special meaning to this signal, but 1. I'm not sure I understand it all, and 2. I'm absolutely sure in 6 months I will not recall it at all, mentioning the importance of SIGHUP in the commit message would be much appreciated. It looks as being picked arbitrarily, but that's not true.

I think the commit message already does mention it:

Note that I chose SIGHUP because sending SIGTERM to a hierarchy
of processes (test wrapper, etc.) causes all of them to die at
the same time, triggering SIGHUP anyway in the child processes
(because tmt always allocates a TTY).

Well, yes and no. It does mention why you picked SIGHUP over SIGTERM, but it does not really say why no other signal was eligible.

SIGHUP is also sent on any "hang up", historically a telephone line hang-up. These days it's sent by the kernel to the foreground process group (set by tcsetpgrp()) on a TTY if the session leader exits - typically when you run a program via ssh and the ssh disconnects. That's why I implied it's still a common signal for programs to trap for cleanup.

^^ This is the missing piece, the bit about foreground process group and SSH example. It explains that SIGHUP has a special meaning rather than being an arbitrary selected signal whose only qualification was not being SIGTERM.

I would avoid SIGUSR1 as it typically has very specific per-application meaning, like "print dd statistics on console while copying".

Yes, it's meaning is very much defined by the application itself, and that was my point. I was missing the "Why SIGHUP and not let's say SIGUSR1? That's also a signal, right?" bit.

@github-project-automation github-project-automation Bot moved this to backlog in planning Jun 10, 2026
@therazix therazix moved this from backlog to review in planning Jun 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: review

Development

Successfully merging this pull request may close these issues.

4 participants