<<< Date Index >>>     <<< Thread Index >>>

mutt/2029: Non-ASCII characters stripped from manual.txt



>Number:         2029
>Notify-List:    
>Category:       mutt
>Synopsis:       Non-ASCII characters stripped from manual.txt
>Confidential:   no
>Severity:       minor
>Priority:       medium
>Responsible:    mutt-dev
>State:          open
>Keywords:       
>Class:          doc-bug
>Submitter-Id:   net
>Arrival-Date:   Mon Aug 08 16:44:17 +0200 2005
>Originator:     Vincent Lefevre
>Release:        
>Organization:
>Environment:
>Description:
The file manual.sgml.tail contains non-ASCII characters in "Björn Jacke" and 
"Jimmy Mäkelä". This leads to warnings when the file manual.txt is generated:

$ linuxdoc -B txt --pass='-P -c' manual
Processing file manual
<standard input>:12561: warning: can't find numbered character 246
<standard input>:12575: warning: can't find numbered character 228
<standard input>:12575: warning: can't find numbered character 228

and these characters are stripped from this file: "Bjrn Jacke" and "Jimmy 
Mkel". There's also a problem here:

     Equivalence Classes
        An equivalence class is a locale-specific name for a list of
        characters that are equivalent. The name is enclosed in ``[=''
        and ``=]''.  For example, the name ``e'' might be used to
        represent all of ``e'' ``e'' and ``e''.  In this case, [[=e=]]
        is a regexp that matches any of ``e'', ``e'' and ``e''.

These problems can be solved with the --charset=latin option, but another one 
occurs:

$ linuxdoc -B txt --charset=latin --pass='-P -c' manual
Processing file manual
<standard input>:4038: warning: can't find numbered character 160

But this seems to be due to another bug (which I'm going to report separately). 
So, I think that the above option is the thing to do.
>How-To-Repeat:
>Fix:
Unknown
>Add-To-Audit-Trail:

>Unformatted: