Discussion:
Implementing </div> closing <span>, and friends
James C
2014-09-15 09:11:33 UTC
Permalink
Hello all,

[Sorry if this is a dup; gmail may be playing tricks on me.]

This page displays, with distortions, in dillo 3.0.4, but the image
does not display after commit 317f010:
https://medium.com/@verynicetweets/getting-out-of-the-house-is-so-hard-these-days-68d6c36b5f44

A partially cleaned-up and stripped version is attached, where the
comment block shows a suppression of the problem. The page is
illegal: two unclosed <span>s interfere with a </div> and the
following <div>.

There is code around html.cc:1355, which is intended to implement
heuristics to close unclosed tags but, in the cases where it is
triggered, no tags are closed.

Here is a patch which I think implements the intent of the heuristics:
-----------------
$ hg diff
diff -r 26378f85c4ea src/html.cc
--- a/src/html.cc Sun Sep 14 09:56:22 2014 +0000
+++ b/src/html.cc Mon Sep 15 18:13:21 2014 +1200
@@ -1355,7 +1355,9 @@
i_SELECT = a_Html_tag_index("select"),
i_TEXTAREA = a_Html_tag_index("textarea");
int w3c_mode = !prefs.w3c_plus_heuristics;
- int stack_idx, tag_idx, matched = 0, expected = 0;
+ // TODO: If there was a clean way of never having a tag index zero,
+ // (it's currently <a>), then expected_idx could be prettier.
+ int stack_idx, tag_idx, matched = 0, expected_idx = -1, skipped = 0;
TagInfo new_tag = Tags[new_idx];

/* Look for the candidate tag to close */
@@ -1368,26 +1370,42 @@
break;
} else if (Tags[tag_idx].EndTag == 'O') {
/* skip an optional tag */
+ ++skipped;
continue;
} else if ((new_idx == i_BUTTON && html->InFlags & IN_BUTTON) ||
(new_idx == i_SELECT && html->InFlags & IN_SELECT) ||
(new_idx == i_TEXTAREA && html->InFlags & IN_TEXTAREA)) {
/* let these elements close tags inside them */
+ ++skipped;
continue;
} else if (w3c_mode || Tags[tag_idx].TagLevel >= new_tag.TagLevel) {
/* this is the tag that should have been closed */
- expected = 1;
- break;
+ /* skip it as if it were optional, but complain */
+ ++skipped;
+ if (-1 == expected_idx) {
+ // ie this is the first thing we're unwinding
+ expected_idx = tag_idx;
+ }
+ continue;
}
}

+ // Either or both matched and expected_idx could now be set
+
+ // TODO: decide whether to carry line numbers in the parser state stack,
+ // in order to report them here.
+
if (matched) {
+ if (-1 != expected_idx) {
+ BUG_MSG("Unexpected closing tag: </%s> -- expected </%s>",
+ new_tag.name, Tags[expected_idx].name);
+ }
+
Html_tag_cleanup_to_idx(html, stack_idx);
- } else if (expected) {
- BUG_MSG("Unexpected closing tag: </%s> -- expected </%s>.",
- new_tag.name, Tags[tag_idx].name);
} else {
- BUG_MSG("Unexpected closing tag: </%s>.", new_tag.name);
+ BUG_MSG("Unexpected closing tag: </%s>"
+ " -- explored %d layer(s) and not closed.",
+ new_tag.name, skipped);
}
}

-----------------

Closing tags that match no opening tag (I used </area>, which is
illegal) do not cause catastrophic failure. So far, it has not broken
on my usual workload. Gmail is noisy, so I wonder if something's
going wrong there, but it looks fairly normal.

I have no good idea what a general test suite for this code looks
like. If someone wants to point me toward related tests, then I will
think about extending them.

Regards,
James.

Loading...