mutt/2029: Non-ASCII characters stripped from manual.txt

To: Mutt Developers <mutt-dev@xxxxxxxx>, cb@xxxxxxxx
Subject: mutt/2029: Non-ASCII characters stripped from manual.txt
From: vincent@xxxxxxxxxx
Date: Mon, 08 Aug 2005 16:44:17 +0200
List-unsubscribe: <mailto:mutt-dev-request@mutt.org?body=unsubscribe>
References: <mutt-pr-2029@xxxxxxxxxxxxx>
Reply-to: vincent@xxxxxxxxxx
Sender: owner-mutt-dev@xxxxxxxx

>Number:         2029
>Notify-List:    
>Category:       mutt
>Synopsis:       Non-ASCII characters stripped from manual.txt
>Confidential:   no
>Severity:       minor
>Priority:       medium
>Responsible:    mutt-dev
>State:          open
>Keywords:       
>Class:          doc-bug
>Submitter-Id:   net
>Arrival-Date:   Mon Aug 08 16:44:17 +0200 2005
>Originator:     Vincent Lefevre
>Release:        
>Organization:
>Environment:
>Description:
The file manual.sgml.tail contains non-ASCII characters in "Björn Jacke" and 
"Jimmy Mäkelä". This leads to warnings when the file manual.txt is generated:

$ linuxdoc -B txt --pass='-P -c' manual
Processing file manual
<standard input>:12561: warning: can't find numbered character 246
<standard input>:12575: warning: can't find numbered character 228
<standard input>:12575: warning: can't find numbered character 228

and these characters are stripped from this file: "Bjrn Jacke" and "Jimmy 
Mkel". There's also a problem here:

     Equivalence Classes
        An equivalence class is a locale-specific name for a list of
        characters that are equivalent. The name is enclosed in ``[=''
        and ``=]''.  For example, the name ``e'' might be used to
        represent all of ``e'' ``e'' and ``e''.  In this case, [[=e=]]
        is a regexp that matches any of ``e'', ``e'' and ``e''.

These problems can be solved with the --charset=latin option, but another one 
occurs:

$ linuxdoc -B txt --charset=latin --pass='-P -c' manual
Processing file manual
<standard input>:4038: warning: can't find numbered character 160

But this seems to be due to another bug (which I'm going to report separately). 
So, I think that the above option is the thing to do.
>How-To-Repeat:
>Fix:
Unknown
>Add-To-Audit-Trail:

>Unformatted:

Prev by Date: mutt/2028: manual.sgml has a reference to unmailboxes but no associated label
Next by Date: mutt/2030: ~ character in manual.sgml.head not correctly interpreted
Previous by thread: Re: Docbook patch
Next by thread: Re: mutt/2029: Non-ASCII characters stripped from manual.txt
Index(es):
- Date
- Thread