Wireshark-dev: Re: [Wireshark-dev] guint8* and gchar* ... and Vim ?! :)
From: Guy Harris <[email protected]>
Date: Thu, 14 Dec 2006 11:49:35 -0800
Sebastien Tandel wrote:

   is there any reason to use guint8* instead of gchar*?
For what purpose?

If you're dealing with an array of 8-bit bytes, or a pointer to a sequence of those, guint8 is the right type; it makes it clear that they're bytes, not characters (it might be binary, it might be a sequence of 16-bit "bytes" in a UTF-16-encoded string, it might be a UTF-8 string, etc.).
I.e., tvb_get_ptr(), for example, should return a "guint8 *", as should 
tvb_memdup(), and the raw packet data you get from Wiretap should be 
pointed to by a "guint8 *".
Note also that you can safely pass a guint8 or guchar to one of the 
<ctype.h> routines, but you can't safely pass a gchar to them, as they 
might get sign-extended into negative values if the 8th bit is set (I 
think that none of the popular platforms for Windows and modern UN*Xes 
have C compilers with "char" an unsigned type, so I think "might" can be 
replaced by "will" in practice).
With gcc-4.0, there is the new feature warning you that "pointer target
differs in signedness" (which is not such a bad thing).
I suspect most of those warnings are for cases where you're treating 
byte sequences as character strings.
What I think we *really* need to do, for those cases, is have a 
different way of handling strings.  The current way we handle strings 
doesn't take into account the fact that there are a number of different 
character encodings for strings - "ASCII" (which would imply that a byte 
with the 8th bit set is an error), ISO 8859/n, other EUC encodings, 
Shift-JIS, KOI8, UTF-8, UTF-16, etc..
See the first item under "Dissector infrastructure" on the


page. (That discusses two items - the dissector APIs for handling strings, and the UI aspects of this. The former doesn't require the latter - we can continue to display non-ASCII characters as escape sequences - but the latter, which is something we should ultimately do, requires some way of getting all strings from packets translated into Unicode.)
May we change these guint8* to gchar* ? I mean may we change the type of
the concerned variables and not cast to every call of a function ?
Which ones are you thinking of?  We shouldn't globally replace guint8 
with gchar, as per my comments in the beginning.