Huge thanks to our Platinum Members Endace and LiveAction,
and our Silver Member Veeam, for supporting the Wireshark Foundation and project.

Wireshark-dev: Re: [Wireshark-dev] Buildbot Man Page Generation

From: Jeff Morriss <jeff.morriss.ws@xxxxxxxxx>
Date: Thu, 14 Aug 2014 15:13:11 -0400
On 08/12/14 14:31, Jeff Morriss wrote:
On 08/09/14 22:41, Evan Huus wrote:
http://buildbot.wireshark.org/trunk/builders/Clang%20Code%20Analysis/builds/2911/steps/check-abi/logs/stdio


I took a quick look at the recent check-abi buildbot failure, which
appears to be manpage related:

wireshark.pod around line 3525: Non-ASCII character seen before
=encoding in 'KovE<aacute>ř'. Assuming UTF-8
POD document had syntax errors at /usr/bin/pod2man line 71.

Which is curious, because wireshark.pod.template *does* have an
=encoding line...

[As discussed later on this thread] the current master doesn't give this
warning but I did notice that the generated man page doesn't contain the
actual UTF8 characters required by some people's names. E.g.,
"doc/wireshark.1" on my system contains:

        XXXXX XXXXXXXX      <dpb[AT]...]

though, interestingly, Joerg's name got "translated" from what's in
AUTHORS (which contains an o-umlaut) to:

        Joerg Mayer              <jmayer[AT]...]

Ah, OK, pod2man does that unless you specify "-u":

       -u, --utf8
           By default, pod2man produces the most conservative possible *roff output to try to ensure that it will work with as many different *roff implementations as possible.  Many *roff
           implementations cannot handle non-ASCII characters, so this means all non-ASCII characters are converted either to a *roff escape sequence that tries to create a properly accented character
           (at least for troff output) or to "X".

           This option says to instead output literal UTF-8 characters.  If your *roff implementation can handle it, this is the best output format to use and avoids corruption of documents containing
           non-ASCII characters.  However, be warned that *roff source with literal UTF-8 characters is not supported by many implementations and may even result in segfaults and other bad behavior.

At least on my system here "-u" makes for an ugly rendering of the man page (though this system seems to have LANG/locale issues).