Huge thanks to our Platinum Members Endace and LiveAction,
and our Silver Member Veeam, for supporting the Wireshark Foundation and project.

Wireshark-dev: [Wireshark-dev] [PATCH] nstring and astring support in PIDL and Wireshark

From: Julien Kerihuel <j.kerihuel@xxxxxxxxxxxxxx>
Date: Thu, 14 Jan 2010 14:54:25 +0100
Hi Lists,

First of all, sorry for this cross-posting, non-subscriber post which
furthermore also includes attachments. This probably breaks some of the
most elementary mailing-list rules, so feel free to mention if it's
inappropriate and/or where these emails better belongs in the future.

Introduction:
=============
I'm currently working on improving the MAPI wireshark dissector - add
support for Exchange 2003/2007/2010 RPC functions (EcDoConnectEx and
EcDoRpcExt2, respectively 0xA and 0xB).

The OpenChange IDL is pretty big, it works smoothly with Samba4, but the
point is that MAPI structures are non-NDR compatible and passing the IDL
to pidl Wireshark's parser produces numerous errors.

The series of patches I'm working on are trying to solve these
incompatibilities issues, providing (as much as possible) an
auto-generated dissector rather than a long, painful and prone errors
hand-written one.

nstring and astring patch explained:
====================================
One of the recurrent type of strings we are using in OpenChange for MAPI
structures are astring and nstring. These are strings which are not
prefixed with any NDR fields such as [size][offset][length] and which
overall length is found by parsing the string until we find NUL
terminated characters.

To make it more obvious, a classical NDR-encoded string would be:
[size][offset][length][my super string][NULL]

The astring version would be:
[my super string][NULL]

Wireshark is already providing an API for UTF8 strings with the
tvb_get_stringz family in epan/tvbuff.c.

This patch adds:
- similar support for Unicode strings in epan/tvbuff.c
- dissect_nastring function in packet-dcerpc.c for pidl which will let
pidl generate code both for astring and nstring.


Notes:
======
In this case, the provided pidl patch is naive, probably have side
effects and surely require changes. It basically fails back to
dissect_nastring if the string couldn't be handled by previous cases.
There's probably a saner way to do this.

Regarding the wireshark patch, the only thing I'm not really happy with
is the guint16_find function which clearly extends its intended scope
and assume the tvbuff parameter IS an unicode string.

Furthermore, the way to detect unicode end of string is pretty bad, but
I followed the *_fake_unicode wireshark's logic here.

Conclusion:
===========
The wireshark patch can be applied independently from Samba's PIDL one.
The new functions won't impact the existing wireshark source code.

Some wireshark's developers may be reluctant adding such non-NDR
compatible dissection functions to packet-dcerpc.c but I really do think
this is a worthwhile move to speed up MAPI dissector development.

For the record, I'm currently developing the new MAPI dissector on a
separate repository where I can play, apply changes to pidl and save
wireshark patches properly, until this goes upstream (or not):

$ svn co https://svn.openchange.org/dissector openchange_dissector


Cheers,
Julien.

-- 
Julien Kerihuel
j.kerihuel@xxxxxxxxxxxxxx
OpenChange Project Manager

GPG Fingerprint: 0B55 783D A781 6329 108A  B609 7EF6 FE11 A35F 1F79

Index: epan/tvbuff.c
===================================================================
--- epan/tvbuff.c	(revision 31523)
+++ epan/tvbuff.c	(working copy)
@@ -990,7 +990,27 @@
 	return NULL;
 }
 
+static const guint8*
+guint16_find(const guint8* haystack, size_t haystacklen, guint16 needle)
+{
+	const guint8	*b;
+	int		i;
 
+	for (b = haystack, i = 0; (guint) i < haystacklen; i++, b++) {
+		if (!i && (*b == ((guint8)(needle & 0xFF00))) && 
+		    ((guint)(i + 1) < haystacklen) &&
+		    (haystack[i+1] == ((guint8)(needle & 0xFF)))) {
+			return b;
+		} else if (!(haystack[i-1]) && 
+			   (*b == ((guint8)(needle & 0xFF00))) && 
+			   ((guint)(i + 1) < haystacklen) && 
+			   (haystack[i+1] == ((guint8)(needle & 0xFF)))) {
+			return b;
+		}
+	}
+	
+	return NULL;
+}
 
 /************** ACCESSORS **************/
 
@@ -1783,6 +1803,72 @@
 	return tvb_get_bits8(tvb, bit_offset, no_of_bits);
 }
 
+/* Find first occurence of needle in tvbuff, starting at
+ * offset. Searches at most maxlength number of bytes; if maxlength is
+ * -1, searches to end of tvbuff.
+ * Return the offset of the found needle, or -1 if not found.
+ * Will not throw an exception, even if maxlength exceeds bounday of
+ * tvbuff; in that case, -1 will be returned if the boundary is
+ * reached bfore finding needle.
+ */
+gint
+tvb_find_guint16(tvbuff_t *tvb, gint offset, gint maxlength, guint16 needle)
+{
+	const guint8	*result;
+	guint		abs_offset, junk_length;
+	guint		tvbufflen;
+	guint		limit;
+
+	DISSECTOR_ASSERT(tvb && tvb->initialized);
+
+	check_offset_length(tvb->length, tvb->reported_length, offset, 0, &abs_offset, &junk_length);
+
+	/* Only search to end of tvbuff, w/o throwing exception. */
+	tvbufflen = tvb_length_remaining(tvb, abs_offset);
+	if (maxlength == -1) {
+		/* No maximum length specified; search to end of tvbuff. */
+		limit = tvbufflen;
+	}
+	else if (tvbufflen < (guint) maxlength) {
+		/* Maximum length goes past end of tvbuff; search to end
+		   of tvbuff. */
+		limit = tvbufflen;
+	}
+	else {
+		/* Maximum length doesn't go past end of tvbuff; search
+		   to that value. */
+		limit = maxlength;
+	}
+
+	/* If we have real data, perform our search now. */
+	if (tvb->real_data) {
+		result = guint16_find(tvb->real_data + abs_offset, limit, needle);
+		if (result == NULL) {
+			return -1;
+		}
+		else {
+			return (gint) (result - tvb->real_data);
+		}
+	}
+
+	switch(tvb->type) {
+		case TVBUFF_REAL_DATA:
+			DISSECTOR_ASSERT_NOT_REACHED();
+
+		case TVBUFF_SUBSET:
+			return tvb_find_guint16(tvb->tvbuffs.subset.tvb,
+					abs_offset - tvb->tvbuffs.subset.offset,
+					limit, needle);
+
+		case TVBUFF_COMPOSITE:
+			DISSECTOR_ASSERT_NOT_REACHED();
+			/* XXX - return composite_find_guint16(tvb, offset, limit, needle); */
+	}
+
+	DISSECTOR_ASSERT_NOT_REACHED();
+	return -1;
+}
+
 /* Find first occurence of needle in tvbuff, starting at offset. Searches
  * at most maxlength number of bytes; if maxlength is -1, searches to
  * end of tvbuff.
@@ -1913,6 +1999,42 @@
 	return -1;
 }
 
+/*
+ * Find size of unicode stringz (NUL-terminated string) by looking for
+ * 2 terminating NUL. The size of the string includes the terminating NULs.
+ *
+ * If the NUL isn't found, it throws the appropriate exception.
+ */
+guint
+tvb_strsize_unicode(tvbuff_t *tvb, gint offset)
+{
+	guint	abs_offset, junk_length;
+	gint	nul_offset;
+
+	DISSECTOR_ASSERT(tvb && tvb->initialized);
+
+	check_offset_length(tvb->length, tvb->reported_length, offset, 0, &abs_offset, &junk_length);
+	nul_offset = tvb_find_guint16(tvb, abs_offset, -1, 0);
+	if (nul_offset == -1) {
+		/*
+		 * OK we hit the end of the tvbuff, so we should throw
+		 * an exception.
+		 *
+		 * Did we hit the end of the captured data, or the end
+		 * of the actual data? If there's less captured data
+		 * than actual data, we presumably hit the end of the
+		 * captured data, otherwise we hit the end of the
+		 * actual data.
+		 */
+		if (tvb_length(tvb) < tvb_reported_length(tvb)) {
+			THROW(BoundsError);
+		} else {
+			THROW(ReportedBoundsError);
+		}
+	}
+	return (nul_offset - abs_offset) + 2;
+}
+
 /* Find size of stringz (NUL-terminated string) by looking for terminating
  * NUL.  The size of the string includes the terminating NUL.
  *
@@ -2369,6 +2491,57 @@
 	return strptr;
 }
 
+/*
+ * Given a tvbuff and an offset, with the offset assumed to refer to a
+ * null-terminated unicode string, find the length of that string (and
+ * throw an exception if the tvbuff ends before we find the null),
+ * allocate a buffer big enough to hold the string, copy the (fake
+ * unicode) string into it, and return a pointer to the string. Also
+ * return the length of the string (including the terminating nulls)
+ * through a pointer
+ *
+ */
+guint8 *
+tvb_get_stringz_unicode(tvbuff_t *tvb, gint offset, gint *lengthp)
+{
+	guint size;
+	guint8 *strptr;
+
+	size = tvb_strsize_unicode(tvb, offset);
+	strptr = tvb_fake_unicode(tvb, offset, size / 2, TRUE);
+	if (lengthp)
+		*lengthp = size;
+	return strptr;
+}
+
+/*
+ * Given a tvbuff and an offset, with the offset assumed to refer to a
+ * null-terminated unicode string, find the length of that string (and
+ * throw an exception if the tvbuff ends before we find the null),
+ * allocate a buffer big enough to hold the string, copy the (fake
+ * unicode) string into it, and return a pointer to the string. Also
+ * return the length of the string (including the terminating null)
+ * through a pointer.
+ *
+ * This function allocates memory from a buffer with packet
+ * lifetime. You do not have to free this buffer, it will be
+ * automatically freed when wireshark starts decoding the next packet.
+ * Do no use this function if you want the allocated memory to be
+ * persistent after the current packet has been dissected.
+ */
+guint8 *
+tvb_get_ephemeral_stringz_unicode(tvbuff_t *tvb, gint offset, gint *lengthp)
+{
+	guint size;
+	guint8 *strptr;
+
+	size = tvb_strsize_unicode(tvb, offset);
+	strptr = tvb_get_ephemeral_faked_unicode(tvb, offset, size / 2, TRUE);
+	if (lengthp)
+		*lengthp = size;
+	return strptr;
+}
+
 /* Looks for a stringz (NUL-terminated string) in tvbuff and copies
  * no more than bufsize number of bytes, including terminating NUL, to buffer.
  * Returns length of string (not including terminating NUL), or -1 if the string was
Index: epan/tvbuff.h
===================================================================
--- epan/tvbuff.h	(revision 31523)
+++ epan/tvbuff.h	(working copy)
@@ -408,6 +408,16 @@
  * Will not throw an exception, even if maxlength exceeds boundary of tvbuff;
  * in that case, -1 will be returned if the boundary is reached before
  * finding needle. */
+extern gint tvb_find_guint16(tvbuff_t*, gint offset, gint maxlength,
+    guint16 needle);
+
+/** Find first occurence of any of the needles in tvbuff, starting at offset.
+ * Searches at most maxlength number of bytes; if maxlength is -1, searches
+ * to end of tvbuff.
+ * Returns the offset of the found needle, or -1 if not found.
+ * Will not throw an exception, even if maxlength exceeds boundary of tvbuff;
+ * in that case, -1 will be returned if the boundary is reached before
+ * finding needle. */
 extern gint tvb_find_guint8(tvbuff_t*, gint offset, gint maxlength,
     guint8 needle);
 
@@ -419,6 +429,14 @@
 extern gint tvb_pbrk_guint8(tvbuff_t *, gint offset, gint maxlength,
     const guint8 *needles);
 
+/** Find size of unicode stringz (NULL-terminated string) by looking
+ * for 2 terminating NUL. The size of the string includes the
+ * terminating NULs.
+ *
+ * If the NUL isn't found, it throws the appropritate exception.
+ */
+extern guint tvb_strsize_unicode(tvbuff_t *tvb, gint offset);
+
 /** Find size of stringz (NUL-terminated string) by looking for terminating
  * NUL.  The size of the string includes the terminating NUL.
  *
@@ -521,6 +539,28 @@
 extern guint8 *tvb_get_ephemeral_stringz(tvbuff_t *tvb, gint offset, gint *lengthp);
 extern guint8 *tvb_get_seasonal_stringz(tvbuff_t *tvb, gint offset, gint *lengthp);
 
+/**
+ * Given a tvbuff and an offset, with the offset assumed to refer to
+ * a null-terminated unicode string, find the length of that string
+ * (and throw an exception if the tvbuff ends before we find the
+ * nulls), allocate a buffer big enough to hold the string, copy the
+ * string into it, and return a pointer to the string. Also return
+ * the length of the string (include the terminating nulls) through a
+ * pointer.
+ *
+ * tvb_get_stringz_unicode() returns a string (initially unicode
+ *			     encoded) allocated by g_malloc() and
+ *			     therefore MUST be g_free() by the caller
+ *			     in order not to leak memory.
+ *
+ * tvb_get_ephemeral_stringz_unicode() returns a string (initially
+ *			     unicode encoded) that does not need to be
+ *			     freed, instead it will automatically be
+ *			     freed once the next packet is dissected.
+ */
+extern guint8 *tvb_get_stringz_unicode(tvbuff_t *tvb, gint offset, gint *lengthp);
+extern guint8 *tvb_get_ephemeral_stringz_unicode(tvbuff_t *tvb, gint offset, gint *lengthp);
+
 /** Looks for a stringz (NUL-terminated string) in tvbuff and copies
  * no more than bufsize number of bytes, including terminating NUL, to buffer.
  * Returns length of string (not including terminating NUL), or -1 if the string was
Index: epan/dissectors/packet-dcerpc.c
===================================================================
--- epan/dissectors/packet-dcerpc.c	(revision 31523)
+++ epan/dissectors/packet-dcerpc.c	(working copy)
@@ -1379,6 +1379,71 @@
 
 /* For dissecting arrays that are to be interpreted as strings.  */
 
+/* Dissect a non-NDR compatible string which doesn't have any
+   prefixing NDR fields such as [size][offset][length]; the elements
+   are assumed to be characters or wide characters */
+int
+dissect_nastring(tvbuff_t *tvb, int offset, packet_info *pinfo,
+		 proto_tree *tree, guint8 *drep, int size_is,
+		 int hfindex, gboolean add_subtree, char **data)
+{
+	dcerpc_info *di;
+	proto_item *string_item;
+	proto_item *string_tree;
+	guint32 buffer_len;
+	char *s;
+	header_field_info *hfinfo;
+
+	di = pinfo->private_data;
+	if (di->conformant_run) {
+		/* just a run to handle conformant arrays, no scalards to dissect */
+		return offset;
+	}
+
+	if (add_subtree) {
+		string_item = proto_tree_add_text(tree, tvb, offset, -1, "%s",
+						  proto_registrar_get_name(hfindex));
+		string_tree = proto_item_add_subtree(string_item, ett_dcerpc_string);
+	} else {
+		string_item = NULL;
+		string_tree = tree;
+	}
+	
+	/* Adjust offset */
+	if (offset % size_is)
+		offset += size_is - (offset % size_is);
+
+	if (size_is == sizeof(guint16)) {
+		s = tvb_get_stringz_unicode(tvb, offset, &buffer_len);
+		if (tree && buffer_len) {
+			hfinfo = proto_registrar_get_nth(hfindex);
+			tvb_ensure_bytes_exist(tvb, offset, buffer_len);
+			if (hfinfo->type == FT_STRING) {
+				proto_tree_add_string(string_tree, hfindex, tvb, offset,
+						      buffer_len, s);
+			} else {
+				proto_tree_add_item(string_tree, hfindex, tvb, offset,
+						    buffer_len, drep[0] & 0x10);
+			}
+		}
+	} else {
+		s = tvb_get_stringz(tvb, offset, &buffer_len);
+		if (tree && buffer_len) {
+			proto_tree_add_item(string_tree, hfindex, tvb, offset,
+					    buffer_len, drep[0] & 0x10);
+		}
+	}
+
+    if (data)
+	    *data = s;
+
+	offset += buffer_len;
+
+	proto_item_set_end(string_item, tvb, offset);
+
+	return offset;
+}
+
 /* Dissect an NDR conformant varying string of elements.
    The length of each element is given by the 'size_is' parameter;
    the elements are assumed to be characters or wide characters.
Index: epan/dissectors/packet-dcerpc.h
===================================================================
--- epan/dissectors/packet-dcerpc.h	(revision 31523)
+++ epan/dissectors/packet-dcerpc.h	(working copy)
@@ -264,6 +264,12 @@
 int dissect_ndr_wchar_vstring(tvbuff_t *tvb, int offset, packet_info *pinfo, 
                             proto_tree *tree, guint8 *drep);
 
+/* dissect non-NDR compatible strings */
+int
+dissect_nastring(tvbuff_t *tvb, int offset, packet_info *pinfo,
+		 proto_tree *tree, guint8 *drep, int size_is,
+		 int hfindex, gboolean add_subtree, char **data);
+
 typedef struct _dcerpc_sub_dissector {
     guint16 num;
     const gchar   *name;
Index: lib/Parse/Pidl/Wireshark/NDR.pm
===================================================================
--- lib/Parse/Pidl/Wireshark/NDR.pm	(revision 18)
+++ lib/Parse/Pidl/Wireshark/NDR.pm	(revision 19)
@@ -324,7 +324,7 @@
 			} elsif (property_matches($e, "flag", ".*LIBNDR_FLAG_STR_SIZE4.*")) {
 				$self->pidl_code("offset = dissect_ndr_vstring(tvb, offset, pinfo, tree, drep, $bs, $hf, FALSE, NULL);");
 			} else {
-				warn("Unable to handle string with flags $e->{PROPERTIES}->{flag}");
+				    $self->pidl_code("offset = dissect_nastring(tvb, offset, pinfo, tree, drep, $bs, $hf, FALSE, NULL);");
 			}
 		} else {
 			my $call;

Attachment: signature.asc
Description: This is a digitally signed message part