On Tue, May 18, 2004 at 06:31:33PM -0700, Joshua Kwan wrote: > I confirm that this patch fixes the problem for me as well. Just like Bernd, > I got a spam with binary junk in the header, and it segfaulted Mutt. > Please apply it for the next Mutt upload... Another problem has been found and fixed. See http://www.emaillab.org/mutt/download15.html.en adjust_line.3 patch fixes a problem displaying Chinese characters in UTF-8 environment. (This problem does not cause any serious symptoms like segfaults.) For debian packagers: Note that ``compat'' patch is equal to (assumed_charset + adjust_edited_file + adjust_line + create_rfc2047_params). For mutt developers: The attached HTML file describes the detail information of this bug, and describes why we need adjust_line patch. -- tamoTitle: Mutt: Bug 1869
The original problem was reported to be caused by a invalid cast:
j = (int)*s;
j was size_t, s was char*. TAKIZAWA Takashi sent a patch to TAKAHASHI Tamotsu, and he reported it to the mutt BTS. The problem seemed fixed by the patch removing the cast.
The second problem has been found by TAKIZAWA Takashi himself. This problem affects only Chinese/Korean/Japanese languages, i.e. the languages use 0x80-0xFF range as parts of multibyte characters, in UTF-8 environment. This problem is that mutt displays smaller number of characters per line.
This is caused by removing the cast. Without the cast,
(*s < M_MAX_TREE)
is true even when *s is in 0x80-0xFF. And mutt_mbswidth() treat it as one column width.
The conditional has to check ((0 <= *s) && (*s < M_MAX_TREE)). So, this problem is fixed by a cast:
unsigned int i;
i = (unsigned int)*s;
TAKIZAWA Takashi has found the real root of Problem 1. Before describing the detail, see this table:
Data Length (bytes) | Display Width (columns) | Unpatched | adjust_line.1 (compat.1) | adjust_line.2 | adjust_line.3 | |
---|---|---|---|---|---|---|
ASCII | 1 | 1 | OK | OK | OK | OK |
some Japanese chars | 2 | 1 | NG | OK | OK | OK |
kanji(EUC-JP) | 2 | 2 | OK | OK | OK | OK |
kanji(UTF-8) | 3 | 2 | NG | OK | NG | OK |
In many cases, data length is equal to column width. But, in UTF-8, kanji Chinese characters have three bytes (0x80-0xFF) per char. And they have two-column width per char. So, mutt_FormatString() has to handle the two parameters: data length and column width. mutt_mbswidth() is to calculate the latter.
TAKIZAWA Takashi tried to store the two into one variable, wlen. And this was the root of muttbug#1869. He has already written a correct patch, which uses two variables; wlen and col.
Anyway, default mutt can't handle multibyte characters correctly. Default mutt_FormatString() treats COLS, strlen() and sizeof() as both length and width. It must be a pain for multibyte people. So, try adjust_line patch.
Thomas, please include this patch. This is not only useful, but also stable and safe now.