ANNOUNCEMENT: Live Wireshark University & Allegro Packets online APAC Wireshark Training Session
July 17th, 2024 | 10:00am-11:55am SGT (UTC+8) | Online

Ethereal-dev: Re: [Ethereal-dev] packet-x11-keysym.h cleanup

Note: This archive is from the project's previous web site, ethereal.com. This list is no longer active.

From: Guy Harris <guy@xxxxxxxxxx>
Date: Wed, 3 Oct 2001 17:05:52 -0700 (PDT)
> This seemed reasonable and an easy problem to fix so I did a quick
> grep for lines with "a character with MSB=1".  Attached are the two
> trivial patches.

Actually, it's not clear that non-ASCII characters should be used at all
in some of the places you found.  For example:

	proto_tree_add_text(ext_tree_hdr_list, tvb, offset+2+i, 1, "N°%u --> Extension Header Type value : %s (%u)", i+1, val_to_str(hdr, gtp_val, "Unknown Extension Header Type"), hdr);
	/* \260 == ° (degrees) */
	proto_tree_add_text(ext_tree_hdr_list, tvb, offset+2+i, 1, "N\260%u --> Extension Header Type value : %s (%u)", i+1, val_to_str(hdr, gtp_val, "Unknown Extension Header Type"), hdr);

\260 is a degree symbol only in some character sets; it may be a degree
symbol in ISO 8859/1, but I don't know whether it's one in all of the
other 8859/n character sets, and it's probably not one in other
character sets.

We should probably not use the degree symbol in strings.

(This brings up another problem - when displaying strings from a packet,
we currently hand them, ultimately, to the standard I/O library in
Tethereal, and to GTK+ in Ethereal, but

	there's no guarantee that the terminal, if you're sending
	output in Tethereal, will interpret the characters in the same
	character set that the protocol from which the string came uses;

	if Tethereal is writing to a file, there's no guarantee that
	whatever tools you use to view/edit/compare/print/etc. the file
	will interpret the characters in the right character set;

	if Tethereal is writing to a pipe, there's no guarantee that
	whever tool you're piping its out to will interpret the
	characters in the right character set;

	if you're using Ethereal, there's no guarantee that GTK+ will
	interpret the characters in the right character set, and the
	above comments about Tethereal writing to a file or pipe also
	apply to the "File->Print" and "File->Print Packet" menu items.

Character strings in GTK+ 1.x are, I think, in whatever character set
the font you're using happens to use - we make an effort to get ISO
8859/1 character sets, but somebody could manually override that.

GTK+ 2.0 will use UTF-8 internally, I think, so we could just use UTF-8
and rely on GTK+ to do the right thing.

Qt 2.0 and later, I think, also support Unicode.

As for text output from Tethereal and File->Print in Ethereal, I don't
know whether there's any better way of handling it than attempting to
find out the current locale's codeset, as per, say,

	http://www.ibm.com/developerworks/linux/library/l-linuni?open&l=252,t=grl,p=uniLX

(most of which applies, to some degree, to all flavors of UNIX) and
somehow translating from UTF-8 to that codeset.  I'm not sure what the
right thing to do would be on Windows - output text as Unicode?)