<<< Date Index >>>     <<< Thread Index >>>

Re: Help needed to configure w3m as my HTML viewer



On 2008-08-07, Moritz Barsnick <barsnick@xxxxxxx> wrote:

> I use the attached lines in my ~/.mailcap to use w3m as an HTML viewer.
> I also have another entry (earlier within mailcap, not attached here)
> which tests for a display and starts w3m in a separate xterm.
> 
> One rule "dump"s the rendered HTML page in the mail view, which is very
> often sufficient If I want to open it in a browsable way (with the
> requested hyperlinks), I press "v" (<view-attachments>) and press
> return (<view-attach>) on the HTML attachment.
> 
> Let me capture this thread and ask a question: How can mutt tell w3m
> which charset to use? In the attached mailcap, I force ISO-8859-1, but
> that's just a hack. In true honesty, I get some ISO and some UTF-8
> coded HTML attachments. Assuming the attachment is correctly
> MIME-denoted in the header (which is mostly the case), how can I pass
> this on to w3m? Any ideas?

> text/html;    w3m -T text/html -I ISO-8859-1 -o frame=0 -o meta_refresh=0 -o 
> auto_image=0 %s; needsterminal; \
>       description=HTML Text; nametemplate=%s.html
> text/html;    w3m -T text/html -I ISO-8859-1 -o frame=0 -o meta_refresh=0 -o 
> auto_image=0 -dump %s; copiousoutput; \
>       description=HTML Text; nametemplate=%s.html

The last time I tried to do this, I couldn't get w3m to properly
display some character sets, so I gave up trying with w3m as a
browser and wrote a script around "w3m -dump".  From my mailcap
file:

    text/html;          w3m %s; nametemplate=%s.html
    text/html;          html2text %{charset} %s; \
                                    nametemplate=%s.html; \
                                    copiousoutput

The core of the html2text script is this:

    charset="$1"

    # Implement the equivalent of mutt's charset-hook for these:
    #
    # charset-hook ^us-ascii$   windows-1252
    # charset-hook ^iso-8859-1$ windows-1252
    #
    case $charset in
        us-ascii)               charset=windows-1252;;
        US-ASCII)               charset=windows-1252;;
        iso-8859-1)             charset=windows-1252;;
        ISO-8859-1)             charset=windows-1252;;
        ks_c_5601-1987) charset=gb2312;;
        KS_C_5601-1987) charset=gb2312;;
    esac

    file="$2"
    w3m_args="-dump -T text/html -o frame=0 -o meta_refresh=0 -o auto_image=0 
-I $charset -O $charset"

    w3m $w3m_args $file |
    iconv -c -f $charset -t ISO-8859-1//TRANSLIT

Note that rather than have w3m perform the charset conversion, I
told w3m to leave the charset unchanged and instead let iconv do the
conversion.  This is admittedly a hack, but it has worked well for
displaying HTML mutt's pager.

The //TRANSLIT suffix is important for me to be able to display M$
characters in my environment, and as I recall, w3m doesn't
understand //TRANSLIT.

Regards,
Gary