Huge thanks to our Platinum Members Endace and LiveAction,
and our Silver Member Veeam, for supporting the Wireshark Foundation and project.

Wireshark-dev: Re: [Wireshark-dev] UTF8 vs. locale in error messages (bug 5715)

From: Guy Harris <guy@xxxxxxxxxxxx>
Date: Tue, 28 Jun 2011 10:35:20 -0700
On Jun 28, 2011, at 10:01 AM, Guy Harris wrote:

> In any case, that means that using strerror() is probably not going to be sufficient to fix the problem.  What we might want to do is use UTF-8 everywhere we can, and, for non-GUI output, convert to the appropriate character encoding - whatever that might be - at the last minute.

And then there's input.

Input from the GUI is in UTF-8.

We don't have any programs that read interactive user input from the command line, unless I've missed something, *but* we have programs that take arguments from the command line.  If you're typing commands interactively, those are presumably in the encoding of the terminal or terminal emulator.  If you're running commands from a script file, they're in whatever the character encoding is for the file.

I note that GLib, at least, appears to allow the file name character encoding to differ from the locale's character encoding.  I originally didn't see why this made sense, but I guess they might differ if, say, you're looking at somebody else's files and you're not both using UTF-8 and you're using different encodings, which could conceivably happen on UN*X.

(As

	http://developer.gnome.org/glib/stable/glib-Character-Set-Conversion.html#g-get-filename-charsets

notes, "on Unix, regardless of the locale character set or G_FILENAME_ENCODING value, the actual file names present on a system might be in any random encoding or just gibberish", but I digress.)