Discussion:
Proposed Patch for Zombies Left by Dpi_start_dpid() in dpi.c
JG
2013-08-27 19:15:14 UTC
Permalink
It seems that the dpid children forked by Dpi_start_dpid() in dpi.c
are not reaped when they terminate themselves after a period of
inactivity while the parent dillo process is still running. The patch
below is simply to reap these zombies. The SIGCHLD handler should
probably do more, but I am not yet familiar enough with Dillo to know
what.

I believe this patch adheres to the Dillo rules of coding style, but
am new to the list and a novice at Dillo patching. And hence also
apologies to Jorge for this duplication of a previous communication by
private email (although with one small change to get the patch to
compile under OpenBSD as well as Linux).

######## BEGIN PATCH ########

--- a/src/IO/dpi.c 2013-01-27 12:26:38.000000000 -0500
+++ b/src/IO/dpi.c 2013-08-27 14:58:07.995235867 -0400
@@ -33,6 +33,8 @@
#include <netinet/tcp.h>
#include <arpa/inet.h>
#include <netdb.h>
+#include <sys/wait.h>
+#include <signal.h>

#include "../msg.h"
#include "../klist.h"
@@ -331,16 +333,33 @@
}
}

+static void Sigchld_handler (int signum) {
+ int status;
+ waitpid (-1, &status, WNOHANG);
+ }
+
/*
* Start dpid.
* Return: 0 starting now, 1 Error.
*/
static int Dpi_start_dpid(void)
{
+ static int install_sigchld_handler_flag = TRUE;
pid_t pid;
int st_pipe[2], ret = 1;
char *answer;

+ if ( install_sigchld_handler_flag ) {
+ /* Install SIGCHLD handler. */
+ struct sigaction sa;
+ sa.sa_handler = Sigchld_handler;
+ sigemptyset(&sa.sa_mask);
+ sa.sa_flags = SA_RESTART | SA_NOCLDSTOP;
+ if (sigaction(SIGCHLD, &sa, 0))
+ MSG ("Unable to install SIGCHLD handler.\n");
+ install_sigchld_handler_flag = FALSE;
+ }
+
/* create a pipe to track our child's status */
if (pipe(st_pipe))
return 1;

######## END PATCH ########
Johannes Hofmann
2013-08-28 17:51:34 UTC
Permalink
Hi JG,
Post by JG
It seems that the dpid children forked by Dpi_start_dpid() in dpi.c
are not reaped when they terminate themselves after a period of
inactivity while the parent dillo process is still running. The patch
below is simply to reap these zombies. The SIGCHLD handler should
probably do more, but I am not yet familiar enough with Dillo to know
what.
I believe this patch adheres to the Dillo rules of coding style, but
am new to the list and a novice at Dillo patching. And hence also
apologies to Jorge for this duplication of a previous communication by
private email (although with one small change to get the patch to
compile under OpenBSD as well as Linux).
I can reproduce the issue here by killing dpid manually.
For me the following also fixes it:

diff -r 659dc205c377 src/dillo.cc
--- a/src/dillo.cc Wed Aug 21 11:42:38 2013 +0200
+++ b/src/dillo.cc Wed Aug 28 19:48:10 2013 +0200
@@ -329,6 +329,7 @@

// Some OSes exit dillo without this (not GNU/Linux).
signal(SIGPIPE, SIG_IGN);
+ signal(SIGCHLD, SIG_IGN);

/* Handle command line options */
opt_argv = dNew0(char*, numOptions(Options) + 1);

Can you please check whether it works for you too? It would have the
advantage that we have all signal related stuff in one place.

Cheers,
Johannes
Jorge Arellano Cid
2013-08-29 00:48:35 UTC
Permalink
Hi,
Post by Johannes Hofmann
Hi JG,
Post by JG
It seems that the dpid children forked by Dpi_start_dpid() in dpi.c
are not reaped when they terminate themselves after a period of
inactivity while the parent dillo process is still running. The patch
below is simply to reap these zombies. The SIGCHLD handler should
probably do more, but I am not yet familiar enough with Dillo to know
what.
I believe this patch adheres to the Dillo rules of coding style, but
am new to the list and a novice at Dillo patching. And hence also
apologies to Jorge for this duplication of a previous communication by
private email (although with one small change to get the patch to
compile under OpenBSD as well as Linux).
I can reproduce the issue here by killing dpid manually.
diff -r 659dc205c377 src/dillo.cc
--- a/src/dillo.cc Wed Aug 21 11:42:38 2013 +0200
+++ b/src/dillo.cc Wed Aug 28 19:48:10 2013 +0200
@@ -329,6 +329,7 @@
// Some OSes exit dillo without this (not GNU/Linux).
signal(SIGPIPE, SIG_IGN);
+ signal(SIGCHLD, SIG_IGN);
/* Handle command line options */
opt_argv = dNew0(char*, numOptions(Options) + 1);
Can you please check whether it works for you too? It would have the
advantage that we have all signal related stuff in one place.
Oh, I'm looking at it now too!
For me this is an old TODO: with subtle issues lurking.


Somehow, setting SIG_IGN to a child rings me wrong... :-)


After some digging (Manual page sigaction(2)):

...

"A child created via fork(2) inherits a copy of its parent's
signal dispositions. During an execve(2), the dispositions of
handled signals are reset to the default; the dispositions of
ignored signals are left unchanged."

...

" POSIX.1-1990 disallowed setting the action for SIGCHLD to
SIG_IGN. POSIX.1-2001 allows this possibility, so that ignoring
SIGCHLD can be used to prevent the creation of zombies (see
wait(2)). Nevertheless, the historical BSD and System V behaviors
for ignoring SIGCHLD differ, so that the only completely portable
method of ensuring that terminated children do not become zombies
is to catch the SIGCHLD signal and perform a wait(2) or similar."


So, I'm more inclined to using the same method as in dpid and
downloads.dpi ...
--
Cheers
Jorge.-
Johannes Hofmann
2013-08-29 07:13:49 UTC
Permalink
Hi Jorge,
Post by Jorge Arellano Cid
Hi,
Post by Johannes Hofmann
Hi JG,
Post by JG
It seems that the dpid children forked by Dpi_start_dpid() in dpi.c
are not reaped when they terminate themselves after a period of
inactivity while the parent dillo process is still running. The patch
below is simply to reap these zombies. The SIGCHLD handler should
probably do more, but I am not yet familiar enough with Dillo to know
what.
I believe this patch adheres to the Dillo rules of coding style, but
am new to the list and a novice at Dillo patching. And hence also
apologies to Jorge for this duplication of a previous communication by
private email (although with one small change to get the patch to
compile under OpenBSD as well as Linux).
I can reproduce the issue here by killing dpid manually.
diff -r 659dc205c377 src/dillo.cc
--- a/src/dillo.cc Wed Aug 21 11:42:38 2013 +0200
+++ b/src/dillo.cc Wed Aug 28 19:48:10 2013 +0200
@@ -329,6 +329,7 @@
// Some OSes exit dillo without this (not GNU/Linux).
signal(SIGPIPE, SIG_IGN);
+ signal(SIGCHLD, SIG_IGN);
/* Handle command line options */
opt_argv = dNew0(char*, numOptions(Options) + 1);
Can you please check whether it works for you too? It would have the
advantage that we have all signal related stuff in one place.
Oh, I'm looking at it now too!
For me this is an old TODO: with subtle issues lurking.
Somehow, setting SIG_IGN to a child rings me wrong... :-)
...
"A child created via fork(2) inherits a copy of its parent's
signal dispositions. During an execve(2), the dispositions of
handled signals are reset to the default; the dispositions of
ignored signals are left unchanged."
I guess this is no problem, as dpid explicitely adds a signal
handler for SIGCHLD again, but I'm happy if you take care of the
whole issue :-)

Cheers,
Johannes
Jorge Arellano Cid
2013-08-30 16:41:41 UTC
Permalink
Post by Johannes Hofmann
Hi Jorge,
Post by Jorge Arellano Cid
Hi,
Post by Johannes Hofmann
Hi JG,
Post by JG
It seems that the dpid children forked by Dpi_start_dpid() in dpi.c
are not reaped when they terminate themselves after a period of
inactivity while the parent dillo process is still running. The patch
below is simply to reap these zombies. The SIGCHLD handler should
probably do more, but I am not yet familiar enough with Dillo to know
what.
I believe this patch adheres to the Dillo rules of coding style, but
am new to the list and a novice at Dillo patching. And hence also
apologies to Jorge for this duplication of a previous communication by
private email (although with one small change to get the patch to
compile under OpenBSD as well as Linux).
I can reproduce the issue here by killing dpid manually.
diff -r 659dc205c377 src/dillo.cc
--- a/src/dillo.cc Wed Aug 21 11:42:38 2013 +0200
+++ b/src/dillo.cc Wed Aug 28 19:48:10 2013 +0200
@@ -329,6 +329,7 @@
// Some OSes exit dillo without this (not GNU/Linux).
signal(SIGPIPE, SIG_IGN);
+ signal(SIGCHLD, SIG_IGN);
/* Handle command line options */
opt_argv = dNew0(char*, numOptions(Options) + 1);
Can you please check whether it works for you too? It would have the
advantage that we have all signal related stuff in one place.
Oh, I'm looking at it now too!
For me this is an old TODO: with subtle issues lurking.
Somehow, setting SIG_IGN to a child rings me wrong... :-)
...
"A child created via fork(2) inherits a copy of its parent's
signal dispositions. During an execve(2), the dispositions of
handled signals are reset to the default; the dispositions of
ignored signals are left unchanged."
I guess this is no problem, as dpid explicitely adds a signal
handler for SIGCHLD again, but I'm happy if you take care of the
whole issue :-)
The attached program is a stress test for SIGCHLD. The idea is to
find a formula that works well even in such cases. IIRC both of you
have access to BSD machines so please test it there.

The program spawns 255 children which sleep 5 seconds and
terminate generating a SIGCHLD avalanche, which should be handled
cleanly. It works OK in Debian.

$gcc -W -Wall sigcld.c -o sigcld
$./sigcld

in another terminal (during the 5 secs. sleep window):

$ps aux|grep sigcld

(shows the long list)

then while main is Sleeping 10 seconds:

$ps aux|grep sigcld

(shows only one sigcld process)

after it exits:

$ps aux|grep sigcld

(no process left)

TIA.
--
Cheers
Jorge.-
Johannes Hofmann
2013-09-01 18:49:06 UTC
Permalink
Hi Jorge,
Post by Jorge Arellano Cid
Post by Johannes Hofmann
Hi Jorge,
Post by Jorge Arellano Cid
Hi,
Post by Johannes Hofmann
Hi JG,
Post by JG
It seems that the dpid children forked by Dpi_start_dpid() in dpi.c
are not reaped when they terminate themselves after a period of
inactivity while the parent dillo process is still running. The patch
below is simply to reap these zombies. The SIGCHLD handler should
probably do more, but I am not yet familiar enough with Dillo to know
what.
I believe this patch adheres to the Dillo rules of coding style, but
am new to the list and a novice at Dillo patching. And hence also
apologies to Jorge for this duplication of a previous communication by
private email (although with one small change to get the patch to
compile under OpenBSD as well as Linux).
I can reproduce the issue here by killing dpid manually.
diff -r 659dc205c377 src/dillo.cc
--- a/src/dillo.cc Wed Aug 21 11:42:38 2013 +0200
+++ b/src/dillo.cc Wed Aug 28 19:48:10 2013 +0200
@@ -329,6 +329,7 @@
// Some OSes exit dillo without this (not GNU/Linux).
signal(SIGPIPE, SIG_IGN);
+ signal(SIGCHLD, SIG_IGN);
/* Handle command line options */
opt_argv = dNew0(char*, numOptions(Options) + 1);
Can you please check whether it works for you too? It would have the
advantage that we have all signal related stuff in one place.
Oh, I'm looking at it now too!
For me this is an old TODO: with subtle issues lurking.
Somehow, setting SIG_IGN to a child rings me wrong... :-)
...
"A child created via fork(2) inherits a copy of its parent's
signal dispositions. During an execve(2), the dispositions of
handled signals are reset to the default; the dispositions of
ignored signals are left unchanged."
I guess this is no problem, as dpid explicitely adds a signal
handler for SIGCHLD again, but I'm happy if you take care of the
whole issue :-)
The attached program is a stress test for SIGCHLD. The idea is to
find a formula that works well even in such cases. IIRC both of you
have access to BSD machines so please test it there.
The program spawns 255 children which sleep 5 seconds and
terminate generating a SIGCHLD avalanche, which should be handled
cleanly. It works OK in Debian.
$gcc -W -Wall sigcld.c -o sigcld
$./sigcld
$ps aux|grep sigcld
(shows the long list)
$ps aux|grep sigcld
(shows only one sigcld process)
$ps aux|grep sigcld
(no process left)
Works fine on DragonFlyBSD with the #include <wait.h> line removed.

Cheers,
Johannes
Jorge Arellano Cid
2013-09-01 18:58:02 UTC
Permalink
Post by Johannes Hofmann
Hi Jorge,
Post by Jorge Arellano Cid
Post by Johannes Hofmann
Hi Jorge,
Post by Jorge Arellano Cid
Hi,
Post by Johannes Hofmann
Hi JG,
Post by JG
It seems that the dpid children forked by Dpi_start_dpid() in dpi.c
are not reaped when they terminate themselves after a period of
inactivity while the parent dillo process is still running. The patch
below is simply to reap these zombies. The SIGCHLD handler should
probably do more, but I am not yet familiar enough with Dillo to know
what.
I believe this patch adheres to the Dillo rules of coding style, but
am new to the list and a novice at Dillo patching. And hence also
apologies to Jorge for this duplication of a previous communication by
private email (although with one small change to get the patch to
compile under OpenBSD as well as Linux).
I can reproduce the issue here by killing dpid manually.
diff -r 659dc205c377 src/dillo.cc
--- a/src/dillo.cc Wed Aug 21 11:42:38 2013 +0200
+++ b/src/dillo.cc Wed Aug 28 19:48:10 2013 +0200
@@ -329,6 +329,7 @@
// Some OSes exit dillo without this (not GNU/Linux).
signal(SIGPIPE, SIG_IGN);
+ signal(SIGCHLD, SIG_IGN);
/* Handle command line options */
opt_argv = dNew0(char*, numOptions(Options) + 1);
Can you please check whether it works for you too? It would have the
advantage that we have all signal related stuff in one place.
Oh, I'm looking at it now too!
For me this is an old TODO: with subtle issues lurking.
Somehow, setting SIG_IGN to a child rings me wrong... :-)
...
"A child created via fork(2) inherits a copy of its parent's
signal dispositions. During an execve(2), the dispositions of
handled signals are reset to the default; the dispositions of
ignored signals are left unchanged."
I guess this is no problem, as dpid explicitely adds a signal
handler for SIGCHLD again, but I'm happy if you take care of the
whole issue :-)
The attached program is a stress test for SIGCHLD. The idea is to
find a formula that works well even in such cases. IIRC both of you
have access to BSD machines so please test it there.
The program spawns 255 children which sleep 5 seconds and
terminate generating a SIGCHLD avalanche, which should be handled
cleanly. It works OK in Debian.
$gcc -W -Wall sigcld.c -o sigcld
$./sigcld
$ps aux|grep sigcld
(shows the long list)
$ps aux|grep sigcld
(shows only one sigcld process)
$ps aux|grep sigcld
(no process left)
Works fine on DragonFlyBSD with the #include <wait.h> line removed.
Good!

I'll patch and commit soon.
--
Cheers
Jorge.-
Jorge Arellano Cid
2013-09-02 15:33:43 UTC
Permalink
Post by Jorge Arellano Cid
Post by Johannes Hofmann
Hi Jorge,
[...]
Works fine on DragonFlyBSD with the #include <wait.h> line removed.
Good!
I'll patch and commit soon.
Committed.
--
Cheers
Jorge.-
JG
2013-08-29 00:28:56 UTC
Permalink
Post by Johannes Hofmann
Hi JG,
Post by JG
It seems that the dpid children forked by Dpi_start_dpid() in dpi.c
are not reaped when they terminate themselves after a period of
inactivity while the parent dillo process is still running. The patch
below is simply to reap these zombies. The SIGCHLD handler should
probably do more, but I am not yet familiar enough with Dillo to know
what.
I believe this patch adheres to the Dillo rules of coding style, but
am new to the list and a novice at Dillo patching. And hence also
apologies to Jorge for this duplication of a previous communication by
private email (although with one small change to get the patch to
compile under OpenBSD as well as Linux).
I can reproduce the issue here by killing dpid manually.
diff -r 659dc205c377 src/dillo.cc
--- a/src/dillo.cc Wed Aug 21 11:42:38 2013 +0200
+++ b/src/dillo.cc Wed Aug 28 19:48:10 2013 +0200
@@ -329,6 +329,7 @@
// Some OSes exit dillo without this (not GNU/Linux).
signal(SIGPIPE, SIG_IGN);
+ signal(SIGCHLD, SIG_IGN);
/* Handle command line options */
opt_argv = dNew0(char*, numOptions(Options) + 1);
Can you please check whether it works for you too? It would have the
advantage that we have all signal related stuff in one place.
Cheers,
Johannes
Thank you: your solution is of course not only correct but far simpler
and better.

(Though I thought that in general sigaction() is to be preferred to
signal()?)
John Gaffney
2013-08-31 00:44:47 UTC
Permalink
[...]
Post by Jorge Arellano Cid
The attached program is a stress test for SIGCHLD. The idea is to
find a formula that works well even in such cases. IIRC both of you
have access to BSD machines so please test it there.
The program spawns 255 children which sleep 5 seconds and
terminate generating a SIGCHLD avalanche, which should be handled
cleanly. It works OK in Debian.
[...]

1) OpenBSD is very strict about compliance with standards, so that,
whereas on my (and I believe most if not all) Linux systems wait.h
occurs redundantly in /usr/include/ and /usr/include/sys, on
OpenBSD wait.h is solely in /usr/include/sys and your code will not
compile unless the `#include<wait.h>' line is removed. Then `gcc
-W -Wall' compiles it without complaint.

2) OpenSBD is not so extravagant with the resources allocated to
unprivileged users (max 128 processes). But spawning 64 rather
than 256 children the test is passed with the expected behavior on
OpenBSD 5.2 GENERIC.MP#368 amd64.

The OpenBSD 5.2 ports include Dillo 2.2 and FLTK2, but I just recently
compiled FLTK 1.3.2 and Dillo 3.0.3 from their unmodified source on
that machine without (IIRC) any errors. If you were curious more
generally to see how they are modifying the Dillo source, the OpenBSD
patches for Dillo 3.0.3 (the version currently in ports for OpenBSD
5.3) may be found at

http://www.openbsd.org/cgi-bin/cvsweb/ports/www/dillo/patches/
Jorge Arellano Cid
2013-08-31 19:41:42 UTC
Permalink
Post by John Gaffney
[...]
Post by Jorge Arellano Cid
The attached program is a stress test for SIGCHLD. The idea is to
find a formula that works well even in such cases. IIRC both of you
have access to BSD machines so please test it there.
The program spawns 255 children which sleep 5 seconds and
terminate generating a SIGCHLD avalanche, which should be handled
cleanly. It works OK in Debian.
[...]
1) OpenBSD is very strict about compliance with standards, so that,
whereas on my (and I believe most if not all) Linux systems wait.h
occurs redundantly in /usr/include/ and /usr/include/sys, on
OpenBSD wait.h is solely in /usr/include/sys and your code will not
compile unless the `#include<wait.h>' line is removed. Then `gcc
-W -Wall' compiles it without complaint.
2) OpenSBD is not so extravagant with the resources allocated to
unprivileged users (max 128 processes). But spawning 64 rather
than 256 children the test is passed with the expected behavior on
OpenBSD 5.2 GENERIC.MP#368 amd64.
The OpenBSD 5.2 ports include Dillo 2.2 and FLTK2, but I just recently
compiled FLTK 1.3.2 and Dillo 3.0.3 from their unmodified source on
that machine without (IIRC) any errors. If you were curious more
generally to see how they are modifying the Dillo source, the OpenBSD
patches for Dillo 3.0.3 (the version currently in ports for OpenBSD
5.3) may be found at
http://www.openbsd.org/cgi-bin/cvsweb/ports/www/dillo/patches/
Thanks a lot, very informative.
--
Cheers
Jorge.-
Loading...