<<< Date Index >>>     <<< Thread Index >>>

Re: Docbook patch



On Thu, 11 Aug 2005, Brendan Cully wrote:
> On Thursday, 11 August 2005 at 03:10, Vincent Lefevre wrote:
> > Perhaps html2text[*] from Martin Bayer, with a good html2textrc file?
> > 
> > * http://www.mbayer.de/html2text/
> 
> It supports less HTML than w3m, lynx, and links. It doesn't do a very
> good job with the manual produced by the docbook xsl I'm using. links,
> btw, doesn't produce output as good as w3m's, and doesn't do
> bold/underline anyway.

I don't think it is so bad. It is the only one program
natively supports bold/underline, isn't it?

An example html2textrc:
=======================
 DD.indent.left=6
 DT.indent.left=3
 PRE.indent.left=3
 H1.vspace.before=1
 H2.vspace.before=1
 H3.prefix=
 H3.suffix=
 P.vspace.after=1
 PRE.vspace.after=1
 HR.vspace.after=1
 A.attributes.internal_link=NONE
 A.attributes.external_link=NONE
 EM.attributes=UNDERLINE
=======================


w3m is good if bold/underline is implemented.
For example, a dirty hack:
=======================
 sed 's,<strong>,MsS,;s,</strong>,MsE,;s,<em>,MiS,;s,</em>,MiE,' \
 manual.html | \
 w3m -T text/html -O latin-1 -S -no-graph -dump | \
 ruby strong-em.rb
=======================

strong-em.rb:
=======================
 STDIN.each do |l|
        l.gsub!(/MsS(.*?)MsE/) do
                s = $1
                t = ""
                s.scan(/./) do |c|
                        t += c
                        t += "\b"
                        t += c
                end
                t
        end
        l.gsub!(/MiS(.*?)MiE/) do
                s = $1
                t = ""
                s.scan(/./) do |c|
                        t += "_"
                        t += "\b"
                        t += c
                end
                t
        end
        print l
 end
=======================

-- 
tamo