Wireshark-bugs: [Wireshark-bugs] [Bug 5405] Unescaped accent in interface name
Date: Tue, 21 Dec 2010 11:49:59 -0800 (PST)

--- Comment #7 from Guy Harris <[email protected]> 2010-12-21 11:49:58 PST ---
> I would say yes, convert it to UTF-8 right away.  UTF-8 can represent any unicode character, but it may take more bytes than a native UTF-16 or UTF-32 would.

Although, as most of the characters in the description will probably be ASCII,
in practice it'll probably take fewer bytes.

> However, if the conversion is done from UTF-16 in Winodws to a local ANSI code page by WinPcap and by us to UTF-8, I'm afraid that information could be lost in the conversion to and from an ANSI code page.

Perhaps WinPcap should be changed provide the description in UTF-8 rather than
the local code page - or, when I get around to doing the new *pcap APIs to get
interface lists and information (providing the information in a form similar to
the pcap-ng Interface Description Block, so it's extensible), define them as
returning the description in UTF-8 on Windows.

> On Unix, the OS only provides the short name for the adapter (such as eth0 or re1 for example).  The interface's longer description (where this bug is happening) is only available on Windows AFAIK.

Depends on the UN*X:

    FreeBSD and OpenBSD have a network interface ioctl to set and get a
description string - if it's set, recent versions of libpcap will return it.

    The "any" device has a description string on Linux.

There's no guarantee that the ioctl-based description is in any particular
encoding (other than that the encoding is presumably ASCII-based).  At best, we
can probably assume it's in the current locale, although it's probably in the
"system" locale, which might not be the same locale as the user's locale. 
(Hopefully UN*Xes will drift towards UTF-8 as the encoding in most locales.)

Configure bugmail: https://bugs.wireshark.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.