Re: Docbook patch
On Thu, 11 Aug 2005, Brendan Cully wrote:
> On Thursday, 11 August 2005 at 03:10, Vincent Lefevre wrote:
> > Perhaps html2text[*] from Martin Bayer, with a good html2textrc file?
> >
> > * http://www.mbayer.de/html2text/
>
> It supports less HTML than w3m, lynx, and links. It doesn't do a very
> good job with the manual produced by the docbook xsl I'm using. links,
> btw, doesn't produce output as good as w3m's, and doesn't do
> bold/underline anyway.
I don't think it is so bad. It is the only one program
natively supports bold/underline, isn't it?
An example html2textrc:
=======================
DD.indent.left=6
DT.indent.left=3
PRE.indent.left=3
H1.vspace.before=1
H2.vspace.before=1
H3.prefix=
H3.suffix=
P.vspace.after=1
PRE.vspace.after=1
HR.vspace.after=1
A.attributes.internal_link=NONE
A.attributes.external_link=NONE
EM.attributes=UNDERLINE
=======================
w3m is good if bold/underline is implemented.
For example, a dirty hack:
=======================
sed 's,<strong>,MsS,;s,</strong>,MsE,;s,<em>,MiS,;s,</em>,MiE,' \
manual.html | \
w3m -T text/html -O latin-1 -S -no-graph -dump | \
ruby strong-em.rb
=======================
strong-em.rb:
=======================
STDIN.each do |l|
l.gsub!(/MsS(.*?)MsE/) do
s = $1
t = ""
s.scan(/./) do |c|
t += c
t += "\b"
t += c
end
t
end
l.gsub!(/MiS(.*?)MiE/) do
s = $1
t = ""
s.scan(/./) do |c|
t += "_"
t += "\b"
t += c
end
t
end
print l
end
=======================
--
tamo