Wireshark · Wireshark-dev: Re: [Wireshark-dev] UTF8 vs. locale in error messages (bug 5715)

Wireshark-dev: Re: [Wireshark-dev] UTF8 vs. locale in error messages (bug 5715)

From: Guy Harris <guy@xxxxxxxxxxxx>

Date: Mon, 27 Jun 2011 17:58:35 -0700

On Jun 27, 2011, at 11:54 AM, Stig Bjørlykke wrote:

> When looking at bug 5715 I found that we use both UTF8 (from file
> names) and locale (from strerror()) in the error messages presented
> from simple_dialog().  In vsimple_dialog() we convert all messages
> with g_locale_to_utf8(), which will wrongly convert the file name
> (like in the bug report).  When using Norwegian characters in the file
> name the text in the dialog is empty.

I suspect this wouldn't be an issue on my machine, given that if, on my machine, g_locale_to_utf8() behaves differently from strcpy(), there's either a misconfiguration or a bug in g_locale_to_utf8():

	$ echo $LANG
	en_US.UTF-8

I.e., this issue should, modulo bugs, only show up in locales where the character encoding isn't UTF-8, meaning:

	1) UN*Xes where LANG etc. aren't set to a locale with UTF-8 as the encoding (are you seeing the issue with Norwegian characters on your system?  If so, what's the setting of LANG?);

	2) Windows, where "Unicode" generally means "UTF-16", and APIs that return strings encoded as sequences of octets rather than hexadectets probably return strings in the local code page.

> Any ideas how we should fix this?  Convert all messages from
> strerror() when putting the text into the error string and remove the
> conversion in vsimple_dialog()?

I would say "yes", given that GTK+ uses UTF-8 as the string encoding for all GUI functions, and I think any other toolkit we might use as an alternative would also use some encoding of Unicode (UTF-8 or UTF-16, most likely).

> We have about 240 calls to strerror().

...and, unfortunately, a variant that converts to UTF-8 and is API-compatible is non-trivial, as any version that allocates a buffer for the result of the conversion would leak memory we just globally replaced strerror() with ws_strerror().

(Of course, strerror() is also not thread-safe, so there might be other reasons to avoid routines with such an API; the latest shiniest Single UNIX Specification has strerror_r(), which takes a buffer that it fills in, which has its own issues (as in "how big a buffer do you need"?), and I don't know how many platforms have it.

But if you're doing enough calls to strerror() that throwing a mutex around strerror() in your wrapper causes performance problems, those performance problems are probably the least of your problems....)

Follow-Ups:
- Re: [Wireshark-dev] UTF8 vs. locale in error messages (bug 5715)
  - From: Jakub Zawadzki
- Re: [Wireshark-dev] UTF8 vs. locale in error messages (bug 5715)
  - From: Graham Bloice
- Re: [Wireshark-dev] UTF8 vs. locale in error messages (bug 5715)
  - From: Stig Bjørlykke

References:
- [Wireshark-dev] UTF8 vs. locale in error messages (bug 5715)
  - From: Stig Bjørlykke

Prev by Date: [Wireshark-dev] buildbot failure in Wireshark (development) on OSX-10.5-x86
Next by Date: Re: [Wireshark-dev] [Wireshark-commits] rev 37802: /trunk/ /trunk/: capture.c dumpcap.c tshark.c
Previous by thread: [Wireshark-dev] UTF8 vs. locale in error messages (bug 5715)
Next by thread: Re: [Wireshark-dev] UTF8 vs. locale in error messages (bug 5715)
Index(es):
- Date
- Thread