Wireshark-dev: Re: [Wireshark-dev] UTF8 vs. locale in error messages (bug 5715)
From: Guy Harris <[email protected]>
Date: Mon, 27 Jun 2011 17:58:35 -0700
On Jun 27, 2011, at 11:54 AM, Stig Bjørlykke wrote:

> When looking at bug 5715 I found that we use both UTF8 (from file
> names) and locale (from strerror()) in the error messages presented
> from simple_dialog().  In vsimple_dialog() we convert all messages
> with g_locale_to_utf8(), which will wrongly convert the file name
> (like in the bug report).  When using Norwegian characters in the file
> name the text in the dialog is empty.

I suspect this wouldn't be an issue on my machine, given that if, on my machine, g_locale_to_utf8() behaves differently from strcpy(), there's either a misconfiguration or a bug in g_locale_to_utf8():

	$ echo $LANG
	en_US.UTF-8

I.e., this issue should, modulo bugs, only show up in locales where the character encoding isn't UTF-8, meaning:

	1) UN*Xes where LANG etc. aren't set to a locale with UTF-8 as the encoding (are you seeing the issue with Norwegian characters on your system?  If so, what's the setting of LANG?);

	2) Windows, where "Unicode" generally means "UTF-16", and APIs that return strings encoded as sequences of octets rather than hexadectets probably return strings in the local code page.

> Any ideas how we should fix this?  Convert all messages from
> strerror() when putting the text into the error string and remove the
> conversion in vsimple_dialog()?

I would say "yes", given that GTK+ uses UTF-8 as the string encoding for all GUI functions, and I think any other toolkit we might use as an alternative would also use some encoding of Unicode (UTF-8 or UTF-16, most likely).

> We have about 240 calls to strerror().

...and, unfortunately, a variant that converts to UTF-8 and is API-compatible is non-trivial, as any version that allocates a buffer for the result of the conversion would leak memory we just globally replaced strerror() with ws_strerror().

(Of course, strerror() is also not thread-safe, so there might be other reasons to avoid routines with such an API; the latest shiniest Single UNIX Specification has strerror_r(), which takes a buffer that it fills in, which has its own issues (as in "how big a buffer do you need"?), and I don't know how many platforms have it.

But if you're doing enough calls to strerror() that throwing a mutex around strerror() in your wrapper causes performance problems, those performance problems are probably the least of your problems....)