Discussion:
Segfault when opening Dillo bugs database
Alexander Voigt
2012-12-14 13:11:16 UTC
Permalink
Dear Dillo developers,

with the current Dillo development version 2672:4d0bdcf10ee7 (Fri Dec
14 12:24:54 2012 +0100) I get a segfault when I try to access the
Dillo bug database.

How to reproduce
================

1. open www.dillo.org/bugtrack/Dquery.html
2. press the "Find it!" button in the "Bug type search" section
3. segfault

gdb output and backtrace
========================

Nav_open_url: new url='http://www.dillo.org/cgi-bin/bugtrack/Dillo_query.cgi?what=all&Submit=Find+It%21'
Connecting to 134.102.206.165
*** [dillo/3.0.2] This should not happen! ***

Program received signal SIGABRT, Aborted.
0x00007ffff5eed1b5 in *__GI_raise (sig=<value optimized out>)
at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
64 ../nptl/sysdeps/unix/sysv/linux/raise.c: Datei oder Verzeichnis nicht gefunden.
in ../nptl/sysdeps/unix/sysv/linux/raise.c
Current language: auto
The current source language is "auto; currently c".
(gdb) bt
#0 0x00007ffff5eed1b5 in *__GI_raise (sig=<value optimized out>)
at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1 0x00007ffff5eeffc0 in *__GI_abort () at abort.c:92
#2 0x0000000000471616 in assertNotReached (s=<value optimized out>)
at misc.hh:38
#3 _nextUtf8Char (s=<value optimized out>) at unicode.cc:92
#4 0x000000000047163c in lout::unicode::nextUtf8Char (s=0x98d10c "\267",
len=1) at unicode.cc:114
#5 0x000000000044e8d0 in dw::Textblock::addText (this=<value optimized out>,
text=0x98d10c "\267", len=<value optimized out>,
style=<value optimized out>) at textblock.cc:1430
#6 0x000000000042ee54 in Html_process_word (html=0x933960,
buf=<value optimized out>, bufsize=<value optimized out>,
Eof=<value optimized out>) at html.cc:1216
#7 Html_write_raw (html=0x933960, buf=<value optimized out>,
bufsize=<value optimized out>, Eof=<value optimized out>) at html.cc:3923
#8 0x000000000042f6bc in DilloHtml::write (this=0x1bcb,
Buf=<value optimized out>, BufSize=<value optimized out>, Eof=-1)
at html.cc:531
#9 0x0000000000419fd3 in Cache_process_queue (entry=0x847120) at cache.c:1214
#10 0x00000000004167c9 in a_Chain_fcb (Op=7115, Info=<value optimized out>,
Data1=<value optimized out>, Data2=<value optimized out>) at chain.c:114
#11 0x0000000000441cd3 in Dpi_parse_token (Op=<value optimized out>,
---Type <return> to continue, or q <return> to quit---
Branch=<value optimized out>, Dir=<value optimized out>,
Info=<value optimized out>, Data1=<value optimized out>, Data2=0x0)
at dpi.c:220
#12 Dpi_process_dbuf (Op=<value optimized out>, Branch=<value optimized out>,
Dir=<value optimized out>, Info=<value optimized out>,
Data1=<value optimized out>, Data2=0x0) at dpi.c:339
#13 a_Dpi_ccc (Op=<value optimized out>, Branch=<value optimized out>,
Dir=<value optimized out>, Info=<value optimized out>,
Data1=<value optimized out>, Data2=0x0) at dpi.c:735
#14 0x00000000004167c9 in a_Chain_fcb (Op=7115, Info=<value optimized out>,
Data1=<value optimized out>, Data2=<value optimized out>) at chain.c:114
#15 0x000000000044223f in a_IO_ccc (Op=2, Branch=<value optimized out>, Dir=1,
Info=0x7576a0, Data1=0x883af0, Data2=0x0) at IO.c:428
#16 0x000000000044246d in IO_read (io=0x883af0) at IO.c:197
#17 0x00000000004424ed in IO_callback (io=0x883af0) at IO.c:262
#18 0x00000000004425ec in IO_fd_read_cb (fd=8, data=<value optimized out>)
at IO.c:283
#19 0x0000000000496932 in fl_wait(double) ()
#20 0x000000000047306b in Fl::wait(double) ()
#21 0x00000000004730db in Fl::run() ()
#22 0x000000000040810d in main (argc=1, argv=0x7fffffffe3d8) at dillo.cc:451

Could you please have a look at this issue?

Kind regards,
Alex
Sebastian Geerken
2012-12-14 15:44:18 UTC
Permalink
The HTML parser passes invalid UTF-8 to dw::Textblock. I will make
nextUtf8Char more robust (of course, dillo should not crash), ...
Done. Both pages work now.

Sebastian
Sebastian Geerken
2012-12-14 15:48:23 UTC
Permalink
Somehow this post got lost ...
Jorge Arellano Cid
2012-12-14 16:43:11 UTC
Permalink
Post by Sebastian Geerken
Somehow this post got lost ...
Date: Fri, 14 Dec 2012 15:43:41 +0100
Subject: Re: [Dillo-dev] Dillo early exit
Nav_open_url: new url='http://news.bress.net/search.php?feed=149'
Dns_server [0]: news.bress.net is 67.205.59.213
Connecting to 67.205.59.213
NumPendingStyleSheets=1
*** [dillo/3.0.2] This should not happen! ***
Aborted
This is new, as dillo from Nov 14 doesn't exit.
Any clues?
with the current Dillo development version 2672:4d0bdcf10ee7 (Fri Dec
14 12:24:54 2012 +0100) I get a segfault when I try to access the
Dillo bug database.
[...]
#3 _nextUtf8Char (s=<value optimized out>) at unicode.cc:92
#4 0x000000000047163c in lout::unicode::nextUtf8Char (s=0x98d10c "\267",
len=1) at unicode.cc:114
#5 0x000000000044e8d0 in dw::Textblock::addText (this=<value optimized out>,
text=0x98d10c "\267", len=<value optimized out>,
style=<value optimized out>) at textblock.cc:1430
The HTML parser passes invalid UTF-8 to dw::Textblock. I will make
nextUtf8Char more robust (of course, dillo should not crash), but
000009b0 34 38 22 3e 2d 20 4b 65 79 73 74 72 6f 6b 65 20 |48">- Keystroke |
000009c0 4c 6f 67 67 69 6e 67 20 77 69 74 68 20 42 65 61 |Logging with Bea|
000009d0 63 6f 6e 20 ab 20 53 74 72 61 74 65 67 69 63 20 |con . Strategic |
^^
It seems that the Fltk functions do some checks, and sometimes decode
as ISO-8859-1.
AFAIR from comments in fltk, some utf8 functions dealt with mixed
latin1, utf8 and some windows codec.

They got into it because the mix was inevitable for them.
--
Cheers
Jorge.-
Sebastian Geerken
2012-12-15 14:59:47 UTC
Permalink
Post by Jorge Arellano Cid
The HTML parser passes invalid UTF-8 to dw::Textblock. I will make
nextUtf8Char more robust (of course, dillo should not crash), but
000009b0 34 38 22 3e 2d 20 4b 65 79 73 74 72 6f 6b 65 20 |48">- Keystroke |
000009c0 4c 6f 67 67 69 6e 67 20 77 69 74 68 20 42 65 61 |Logging with Bea|
000009d0 63 6f 6e 20 ab 20 53 74 72 61 74 65 67 69 63 20 |con . Strategic |
^^
It seems that the Fltk functions do some checks, and sometimes decode
as ISO-8859-1.
AFAIR from comments in fltk, some utf8 functions dealt with mixed
latin1, utf8 and some windows codec.
They got into it because the mix was inevitable for them.
I've modified my code so that it works in a similar way, but I've not
yet cared about the differences between ISO-8859-1, ISO-8859-15, and
Windows-1252. Anyway these differences are marginal.

However, IMO there should be a conversion to clean UTF-8 so that only
a small part of dillo should have to bother about such problems, while
most parts can rely on clean UTF-8. (Something to consider after the
release.)

Sebastian
corvid
2012-12-15 16:50:42 UTC
Permalink
Post by Sebastian Geerken
However, IMO there should be a conversion to clean UTF-8 so that only
a small part of dillo should have to bother about such problems, while
most parts can rely on clean UTF-8. (Something to consider after the
release.)
I just checked whether we could get free stripping of non-utf-8 by asking
iconv to convert from utf-8 to utf-8 when that's the claimed charset, but
unsurprisingly it didn't do anything.

Loading...