mutt/2029: Non-ASCII characters stripped from manual.txt
>Number: 2029
>Notify-List:
>Category: mutt
>Synopsis: Non-ASCII characters stripped from manual.txt
>Confidential: no
>Severity: minor
>Priority: medium
>Responsible: mutt-dev
>State: open
>Keywords:
>Class: doc-bug
>Submitter-Id: net
>Arrival-Date: Mon Aug 08 16:44:17 +0200 2005
>Originator: Vincent Lefevre
>Release:
>Organization:
>Environment:
>Description:
The file manual.sgml.tail contains non-ASCII characters in "Björn Jacke" and
"Jimmy Mäkelä". This leads to warnings when the file manual.txt is generated:
$ linuxdoc -B txt --pass='-P -c' manual
Processing file manual
<standard input>:12561: warning: can't find numbered character 246
<standard input>:12575: warning: can't find numbered character 228
<standard input>:12575: warning: can't find numbered character 228
and these characters are stripped from this file: "Bjrn Jacke" and "Jimmy
Mkel". There's also a problem here:
Equivalence Classes
An equivalence class is a locale-specific name for a list of
characters that are equivalent. The name is enclosed in ``[=''
and ``=]''. For example, the name ``e'' might be used to
represent all of ``e'' ``e'' and ``e''. In this case, [[=e=]]
is a regexp that matches any of ``e'', ``e'' and ``e''.
These problems can be solved with the --charset=latin option, but another one
occurs:
$ linuxdoc -B txt --charset=latin --pass='-P -c' manual
Processing file manual
<standard input>:4038: warning: can't find numbered character 160
But this seems to be due to another bug (which I'm going to report separately).
So, I think that the above option is the thing to do.
>How-To-Repeat:
>Fix:
Unknown
>Add-To-Audit-Trail:
>Unformatted: