Re: Help needed to configure w3m as my HTML viewer
On 2008-08-07, Moritz Barsnick <barsnick@xxxxxxx> wrote:
> I use the attached lines in my ~/.mailcap to use w3m as an HTML viewer.
> I also have another entry (earlier within mailcap, not attached here)
> which tests for a display and starts w3m in a separate xterm.
>
> One rule "dump"s the rendered HTML page in the mail view, which is very
> often sufficient If I want to open it in a browsable way (with the
> requested hyperlinks), I press "v" (<view-attachments>) and press
> return (<view-attach>) on the HTML attachment.
>
> Let me capture this thread and ask a question: How can mutt tell w3m
> which charset to use? In the attached mailcap, I force ISO-8859-1, but
> that's just a hack. In true honesty, I get some ISO and some UTF-8
> coded HTML attachments. Assuming the attachment is correctly
> MIME-denoted in the header (which is mostly the case), how can I pass
> this on to w3m? Any ideas?
> text/html; w3m -T text/html -I ISO-8859-1 -o frame=0 -o meta_refresh=0 -o
> auto_image=0 %s; needsterminal; \
> description=HTML Text; nametemplate=%s.html
> text/html; w3m -T text/html -I ISO-8859-1 -o frame=0 -o meta_refresh=0 -o
> auto_image=0 -dump %s; copiousoutput; \
> description=HTML Text; nametemplate=%s.html
The last time I tried to do this, I couldn't get w3m to properly
display some character sets, so I gave up trying with w3m as a
browser and wrote a script around "w3m -dump". From my mailcap
file:
text/html; w3m %s; nametemplate=%s.html
text/html; html2text %{charset} %s; \
nametemplate=%s.html; \
copiousoutput
The core of the html2text script is this:
charset="$1"
# Implement the equivalent of mutt's charset-hook for these:
#
# charset-hook ^us-ascii$ windows-1252
# charset-hook ^iso-8859-1$ windows-1252
#
case $charset in
us-ascii) charset=windows-1252;;
US-ASCII) charset=windows-1252;;
iso-8859-1) charset=windows-1252;;
ISO-8859-1) charset=windows-1252;;
ks_c_5601-1987) charset=gb2312;;
KS_C_5601-1987) charset=gb2312;;
esac
file="$2"
w3m_args="-dump -T text/html -o frame=0 -o meta_refresh=0 -o auto_image=0
-I $charset -O $charset"
w3m $w3m_args $file |
iconv -c -f $charset -t ISO-8859-1//TRANSLIT
Note that rather than have w3m perform the charset conversion, I
told w3m to leave the charset unchanged and instead let iconv do the
conversion. This is admittedly a hack, but it has worked well for
displaying HTML mutt's pager.
The //TRANSLIT suffix is important for me to be able to display M$
characters in my environment, and as I recall, w3m doesn't
understand //TRANSLIT.
Regards,
Gary