Discussion:
Dpi_connect_socket retry
p***@public.gmane.org
2012-12-09 10:58:43 UTC
Permalink
Dpi_connect_socket function from src/IO/dpi.c (also duplicated in
test/cookies.c) has retry argument. It is set to TRUE on first call,
and Dpi_connect_socket calls itself recursively with retry == FALSE in
case of error.

This retry function is not working. If connection fails, at line 645
sock_fd is set, but ret is still set to -1. Even if the second attempt
to connect is successful, -1 is returned and conected sock_fd is lost.

It is possible to fix it by adding "ret = sock_fd", but I think it is
better to just remove retries completely. It is unlikely that
connecting using loopback interface will fail on first try and succeed
on retry.

Also, comment says "the server closes sock_fd on error". However,
sock_fd should still be closed by client to free allocated file
descriptor. OS don't know if Dillo is going to read from it (in which
case error such as ECONNRESET should happen) so file descriptor can't
be reused.

File descriptor should be closed in case of error in a_Dpip_build_cmd
too.

Patch to fix this function and its calls in src/IO/dpi.c and
test/cookies.c is attached.
Jorge Arellano Cid
2012-12-16 22:43:19 UTC
Permalink
Hi,

Finally I had the time to review this issue in more detail...
Post by p***@public.gmane.org
Dpi_connect_socket function from src/IO/dpi.c (also duplicated in
test/cookies.c) has retry argument. It is set to TRUE on first call,
and Dpi_connect_socket calls itself recursively with retry == FALSE in
case of error.
This retry function is not working. If connection fails, at line 645
sock_fd is set, but ret is still set to -1. Even if the second attempt
to connect is successful, -1 is returned and conected sock_fd is lost.
It is possible to fix it by adding "ret = sock_fd", but I think it is
better to just remove retries completely. It is unlikely that
connecting using loopback interface will fail on first try and succeed
on retry.
Yes, this code is obsolete now. It was meant for Unix domain
sockets (when dpis/dpid used them). Sometimes the filename
lingered in the filesystem after a dpi or server crash. With
unlink&retry it was possible to recover then.

Details in patchset: 7faa2c7a544f (Nov, 2009 ;)

Removed in the repo (see below).
Post by p***@public.gmane.org
Also, comment says "the server closes sock_fd on error". However,
sock_fd should still be closed by client to free allocated file
descriptor. OS don't know if Dillo is going to read from it (in which
case error such as ECONNRESET should happen) so file descriptor can't
be reused.
File descriptor should be closed in case of error in a_Dpip_build_cmd
too.
The file descriptor is properly closed by its CCC afterwards
(Visible by defining VERBOSE in chain.c).

The CCC manages the error propagation/handling, in both the
requesting and answer branches.
Post by p***@public.gmane.org
Patch to fix this function and its calls in src/IO/dpi.c and
test/cookies.c is attached.
No need to fix, it works properly.
--
Cheers
Jorge.-
p***@public.gmane.org
2012-12-20 14:49:49 UTC
Permalink
Post by Jorge Arellano Cid
Post by p***@public.gmane.org
Also, comment says "the server closes sock_fd on error". However,
sock_fd should still be closed by client to free allocated file
descriptor. OS don't know if Dillo is going to read from it (in which
case error such as ECONNRESET should happen) so file descriptor can't
be reused.
File descriptor should be closed in case of error in a_Dpip_build_cmd
too.
The file descriptor is properly closed by its CCC afterwards
(Visible by defining VERBOSE in chain.c).
The CCC manages the error propagation/handling, in both the
requesting and answer branches.
File descriptor is properly closed if there were no errors. But if
there is an error in a_Dpip_build_cmd or Dpi_blocking_write, sock_fd
should be closed before returning from Dpi_connect_socket.

Dpi_connect_socket calls a_Dpip_build_cmd to build command and
Dpi_blocking_write to write this command to sock_fd. a_Dpip_build_cmd
does not accept sock_fd as its argument so it can't close it.
Dpi_blocking_write only writes to fd, but does not close it on error.

So if one of these two function fails, Dpi_connect_socket should close
sock_fd before returning -1. In the current state it just returns -1
and leave sock_fd open.

By the time Dpi_connect_socket returns, sock_fd is lost and will not
be closed by any code outside Dpi_connect_socket.

Attached patch adds code to close sock_fd in case of error.
Jorge Arellano Cid
2012-12-27 19:37:36 UTC
Permalink
Hi,
Post by p***@public.gmane.org
Post by Jorge Arellano Cid
Post by p***@public.gmane.org
Also, comment says "the server closes sock_fd on error". However,
sock_fd should still be closed by client to free allocated file
descriptor. OS don't know if Dillo is going to read from it (in which
case error such as ECONNRESET should happen) so file descriptor can't
be reused.
File descriptor should be closed in case of error in a_Dpip_build_cmd
too.
The file descriptor is properly closed by its CCC afterwards
(Visible by defining VERBOSE in chain.c).
The CCC manages the error propagation/handling, in both the
requesting and answer branches.
File descriptor is properly closed if there were no errors. But if
there is an error in a_Dpip_build_cmd or Dpi_blocking_write, sock_fd
should be closed before returning from Dpi_connect_socket.
Dpi_connect_socket calls a_Dpip_build_cmd to build command and
Dpi_blocking_write to write this command to sock_fd. a_Dpip_build_cmd
does not accept sock_fd as its argument so it can't close it.
Dpi_blocking_write only writes to fd, but does not close it on error.
So if one of these two function fails, Dpi_connect_socket should close
sock_fd before returning -1. In the current state it just returns -1
and leave sock_fd open.
By the time Dpi_connect_socket returns, sock_fd is lost and will not
be closed by any code outside Dpi_connect_socket.
Attached patch adds code to close sock_fd in case of error.
As from the patch you sent (same as before), and the
explanation you give (ignoring my explanation), I guess you're
making conclusions out of reading the code in
Dpi_connect_socket() *only*.

Have you checked what you state as fact? (i.e. that the FD is
leaked and not closed by the CCC). FWIW, I did check before
writing my previous answer.

Of course I'm human and do make mistakes, so if this is the
case: can you please verify and show how to reproduce the bug?
--
Cheers
Jorge.-
p***@public.gmane.org
2012-12-29 01:44:22 UTC
Permalink
Post by Jorge Arellano Cid
As from the patch you sent (same as before), and the
explanation you give (ignoring my explanation), I guess you're
making conclusions out of reading the code in
Dpi_connect_socket() *only*.
I made conclusions from reading Dpi_connect_socket() only, because in
the case of error in Dpi_blocking_write file descriptor is left open
and not passed anywhere else. If it was passed somewhere else, I would
read the code there and check if file descriptor was closed there.
Post by Jorge Arellano Cid
Have you checked what you state as fact? (i.e. that the FD is
leaked and not closed by the CCC). FWIW, I did check before
writing my previous answer.
Patch that I did for checking is attached. It replaces
Dpi_blocking_write call with -1 to simulate fault in
Dpi_blocking_write for Dpi_connect_socket().

Introducing the error in Dpi_blocking_write for all functions doesn't
work, in this case Dpi_connect_socket is not called at all (because
Dpi_start_dpid fails).

With this patch applied, I recompiled dillo and pressed "bookmarks"
button several times to trigger dpi code. Each time I pressed
"bookmarks" button, new descriptor was leaked:

% ls /proc/$(pidof dillo)/fd
0 1 2 3 4 5
% ls /proc/$(pidof dillo)/fd
0 1 2 3 4 5 6
% ls /proc/$(pidof dillo)/fd
0 1 2 3 4 5 6
% ls /proc/$(pidof dillo)/fd
0 1 2 3 4 5 6 7
% ls /proc/$(pidof dillo)/fd
0 1 2 3 4 5 6 7 8

Then I applied the patch I sent before (with hg qpush) on top of the
fault patch and rebuilt dillo: "make clean && make". I ran dillo again
with ./src/dillo and did the same test with bookmarks button:

% ls /proc/$(pidof dillo)/fd
0 1 2 3 4 5
% ls /proc/$(pidof dillo)/fd
0 1 2 3 4 5
% ls /proc/$(pidof dillo)/fd
0 1 2 3 4 5
Jorge Arellano Cid
2012-12-30 22:11:28 UTC
Permalink
Hi,
Post by p***@public.gmane.org
Post by Jorge Arellano Cid
As from the patch you sent (same as before), and the
explanation you give (ignoring my explanation), I guess you're
making conclusions out of reading the code in
Dpi_connect_socket() *only*.
I made conclusions from reading Dpi_connect_socket() only, because in
the case of error in Dpi_blocking_write file descriptor is left open
and not passed anywhere else. If it was passed somewhere else, I would
read the code there and check if file descriptor was closed there.
Post by Jorge Arellano Cid
Have you checked what you state as fact? (i.e. that the FD is
leaked and not closed by the CCC). FWIW, I did check before
writing my previous answer.
Patch that I did for checking is attached. It replaces
Dpi_blocking_write call with -1 to simulate fault in
Dpi_blocking_write for Dpi_connect_socket().
Introducing the error in Dpi_blocking_write for all functions doesn't
work, in this case Dpi_connect_socket is not called at all (because
Dpi_start_dpid fails).
With this patch applied, I recompiled dillo and pressed "bookmarks"
button several times to trigger dpi code. Each time I pressed
% ls /proc/$(pidof dillo)/fd
0 1 2 3 4 5
% ls /proc/$(pidof dillo)/fd
0 1 2 3 4 5 6
% ls /proc/$(pidof dillo)/fd
0 1 2 3 4 5 6
% ls /proc/$(pidof dillo)/fd
0 1 2 3 4 5 6 7
% ls /proc/$(pidof dillo)/fd
0 1 2 3 4 5 6 7 8
Then I applied the patch I sent before (with hg qpush) on top of the
fault patch and rebuilt dillo: "make clean && make". I ran dillo again
% ls /proc/$(pidof dillo)/fd
0 1 2 3 4 5
% ls /proc/$(pidof dillo)/fd
0 1 2 3 4 5
% ls /proc/$(pidof dillo)/fd
0 1 2 3 4 5
diff -r 8e10859b2aab src/IO/dpi.c
--- a/src/IO/dpi.c
+++ b/src/IO/dpi.c
@@ -641,7 +641,7 @@
/* send authentication Key (the server closes sock_fd on error) */
} else if (!(cmd = a_Dpip_build_cmd("cmd=%s msg=%s", "auth", SharedKey))) {
MSG_ERR("[Dpi_connect_socket] Can't make auth message.\n");
- } else if (Dpi_blocking_write(sock_fd, cmd, strlen(cmd)) == -1) {
+ } else if (-1 == -1) {
MSG_ERR("[Dpi_connect_socket] Can't send auth message.\n");
} else {
ret = sock_fd;
Ah, OK, now I see what you mean.

There's a semantic ambiguity, the comment:

/* send authentication Key (the server closes sock_fd on error) */

means: "if there's an error with the authentication" (I know because
I wrote the comment).

(and as the patch removed this comment) I made the same test but with:

- } else if (!(cmd = a_Dpip_build_cmd("cmd=%s msg=%s", "auth", SharedKey))) {
+ } else if (!(cmd = a_Dpip_build_cmd("cmd=%s msg=%s","auth",SharedKey+1))) {

and the CCC closes the file descriptor as expected.


The error case you noticed exists, and although highly unlikely
to happen, requires some thought. As from the test above, the
socket is connected, then nothing is written to simulate an
error, but the OS never sees an error, as it doesn't exist, and
doesn't notify either end (IOW, it leaves the socket passively
expecting for data to come).

Now, the canonical way to handle the whole process should have
been to build the CCC upon connect(), then try to authenticate,
and let the CCC handle any errors that may come, but this is
probably too much code for little gain, so it was all coded in
Dpi_connect_socket().

The problem with closing FDs outside the CCC is that it can
introduce race conditions, but in this case the CCC is partially
built, so your patch suggestion may be OK.

I'll have to give it some more thought and tests...
--
Cheers
Jorge.-
Jorge Arellano Cid
2013-01-01 22:38:46 UTC
Permalink
Post by Jorge Arellano Cid
Hi,
Post by p***@public.gmane.org
Post by Jorge Arellano Cid
As from the patch you sent (same as before), and the
explanation you give (ignoring my explanation), I guess you're
making conclusions out of reading the code in
Dpi_connect_socket() *only*.
I made conclusions from reading Dpi_connect_socket() only, because in
the case of error in Dpi_blocking_write file descriptor is left open
and not passed anywhere else. If it was passed somewhere else, I would
read the code there and check if file descriptor was closed there.
Post by Jorge Arellano Cid
Have you checked what you state as fact? (i.e. that the FD is
leaked and not closed by the CCC). FWIW, I did check before
writing my previous answer.
Patch that I did for checking is attached. It replaces
Dpi_blocking_write call with -1 to simulate fault in
Dpi_blocking_write for Dpi_connect_socket().
Introducing the error in Dpi_blocking_write for all functions doesn't
work, in this case Dpi_connect_socket is not called at all (because
Dpi_start_dpid fails).
With this patch applied, I recompiled dillo and pressed "bookmarks"
button several times to trigger dpi code. Each time I pressed
% ls /proc/$(pidof dillo)/fd
0 1 2 3 4 5
% ls /proc/$(pidof dillo)/fd
0 1 2 3 4 5 6
% ls /proc/$(pidof dillo)/fd
0 1 2 3 4 5 6
% ls /proc/$(pidof dillo)/fd
0 1 2 3 4 5 6 7
% ls /proc/$(pidof dillo)/fd
0 1 2 3 4 5 6 7 8
Then I applied the patch I sent before (with hg qpush) on top of the
fault patch and rebuilt dillo: "make clean && make". I ran dillo again
% ls /proc/$(pidof dillo)/fd
0 1 2 3 4 5
% ls /proc/$(pidof dillo)/fd
0 1 2 3 4 5
% ls /proc/$(pidof dillo)/fd
0 1 2 3 4 5
diff -r 8e10859b2aab src/IO/dpi.c
--- a/src/IO/dpi.c
+++ b/src/IO/dpi.c
@@ -641,7 +641,7 @@
/* send authentication Key (the server closes sock_fd on error) */
} else if (!(cmd = a_Dpip_build_cmd("cmd=%s msg=%s", "auth", SharedKey))) {
MSG_ERR("[Dpi_connect_socket] Can't make auth message.\n");
- } else if (Dpi_blocking_write(sock_fd, cmd, strlen(cmd)) == -1) {
+ } else if (-1 == -1) {
MSG_ERR("[Dpi_connect_socket] Can't send auth message.\n");
} else {
ret = sock_fd;
Ah, OK, now I see what you mean.
/* send authentication Key (the server closes sock_fd on error) */
means: "if there's an error with the authentication" (I know because
I wrote the comment).
- } else if (!(cmd = a_Dpip_build_cmd("cmd=%s msg=%s", "auth", SharedKey))) {
+ } else if (!(cmd = a_Dpip_build_cmd("cmd=%s msg=%s","auth",SharedKey+1))) {
and the CCC closes the file descriptor as expected.
The error case you noticed exists, and although highly unlikely
to happen, requires some thought. As from the test above, the
socket is connected, then nothing is written to simulate an
error, but the OS never sees an error, as it doesn't exist, and
doesn't notify either end (IOW, it leaves the socket passively
expecting for data to come).
Now, the canonical way to handle the whole process should have
been to build the CCC upon connect(), then try to authenticate,
and let the CCC handle any errors that may come, but this is
probably too much code for little gain, so it was all coded in
Dpi_connect_socket().
The problem with closing FDs outside the CCC is that it can
introduce race conditions, but in this case the CCC is partially
built, so your patch suggestion may be OK.
I'll have to give it some more thought and tests...
OK, just committed a slightly modified and bigger patch that
also corrects the wrong error error message given (CCC side).

Please check how it works, and send feedback.
--
Cheers
Jorge.-
Loading...