ANNOUNCEMENT: Live Wireshark University & Allegro Packets online APAC Wireshark Training Session
July 17th, 2024 | 10:00am-11:55am SGT (UTC+8) | Online

Wireshark-dev: Re: [Wireshark-dev] How to print out string encoded data that contains nul chara

From: "John Dill" <John.Dill@xxxxxxxxxxxxxxxxx>
Date: Thu, 10 Apr 2014 09:44:45 -0400
>Message: 1
>Date: Wed, 9 Apr 2014 14:24:53 -0700
>From: Guy Harris <guy@xxxxxxxxxxxx>
>To: Developer support list for Wireshark <wireshark-dev@xxxxxxxxxxxxx>
>Subject: Re: [Wireshark-dev] How to print out string encoded data that
>	contains nul characters?
>Message-ID: <570E5517-8137-466F-AEB1-C32CC47C12B3@xxxxxxxxxxxx>
>Content-Type: text/plain; charset=iso-8859-1
>
>
>On Apr 9, 2014, at 2:06 PM, "John Dill" <John.Dill@xxxxxxxxxxxxxxxxx> wrote:
>
>>I have several character data fields that happen to contain sections of
>>non-ascii binary data including nul characters.  I'd like to get a string
>>display that shows all of the characters according to the length of the
>>field, i.e.
>> 
>> 20 20 20 20 20 20 01 00 01 00 48 31 20 20 20 20
>> 
>> produces
>> 
>> "      \001\000\001\000H1    "
>> 
>> In proto.c, I see that all of the format_text calls use strlen(bytes) as the length.
>> 
>> case FT_STRING:
>> case FT_STRINGZ:
>> case FT_UINT_STRING:
>>         bytes = (guint8 *)fvalue_get(&fi->value);
>>         label_fill(label_str, hfinfo, format_text(bytes, strlen(bytes)));
>> 
>>What is the recommended way of creating a text string that uses the octal
>>encoding '\xxx' for non-ASCII data including nul characters that uses the
>>'length' field of 'proto_tree_add_item'?
>
>The right short-term way would be to use proto_tree_add_string_format_value()
>to add the field, and format the string's value yourself, using format_text()
>with a byte count rather than strlen().

Cool, thanks!

>The right long-term way is to modify Wireshark so that this works.  The way we
>handle strings should probably be changed so that we:
>
>store the raw string octets as a counted array, along with the string encoding;
>
>convert the octets from the encoding to UTF-8 *with invalid octets and sequences
>shown as escapes* when displaying the strings;
>
>convert the octets from the encoding to UTF-8 with invalid octets and sequences
>shown as Unicode REPLACEMENT CHARACTERS when making the string available for
>processing by other software (e.g., "-T fields", etc.) (or somehow saying "this
>isn't a valid string in this encoding);
>
>somehow arrange that strings with invalid octets or sequences are *always* unequal
>to any character string in packet-matching expressions (display/read filters,
>color "filters", etc.), and perhaps allow strings to be compared against octet
>sequences (e.g. "foobar.name = 20:20:20:20:20:20:01:00:01:00:48:31:20:20:20:20"
>matches the raw octets of the string), and use that with "Prepare As Filter" etc..

Sounds pretty reasonable to me.  For now I'll have to use the short method since
this dissector is still hanging over my head (and I haven't even started the TCP
side yet).

>Alternatively, if they're *not* really character strings, display them as a set
>of subfields, with the text part shown as strings and the binary data shown as
>whatever it is, e.g.
>
>Frobozz text 1: {blanks}
>Frobozz count 1: 1
>Frobozz count 2: 1
>Frobozz text 2: H1{and more blanks}
>
>or whatever it is.

I had considered that, but I've found that the field is mapped to an array of
16 byte characters but for whatever reason, the array is used differently
depending on the radio its addressed to.  Sometimes the 16 bytes of data is a
valid string, and other times it looks like pair of 6 character strings with
binary data of either garbage or has some unknown meaning (at least to me).

The string itself is in a structure that is grouped as an array so preset 1
out of 10 could be 16 character string in one message and two pair of strings
in another.  I would need to analyze the other fields in the message to be
able to make that distinction and I didn't want to go to that level of effort
(since there are over 20 different preset types where each type could
potentially have its own string format).  Just displaying the string encoding
showing nul bytes is simpler.

Best regards,
John Dill

<<winmail.dat>>