Wireshark · Ethereal-dev: Re: [Ethereal-dev] Proposed change to tethereal hex dump format

Ethereal-dev: Re: [Ethereal-dev] Proposed change to tethereal hex dump format

Note: This archive is from the project's previous web site, ethereal.com. This list is no longer active.

From: Ashok Narayanan <ashokn@xxxxxxxxx>

Date: Wed, 2 May 2001 17:40:56 -0400

> >    0  0010 7b2c 78c0 0010 7b2c 785d 0800 4500   ..{,x...{,x]..E. 
> >   10  0074 248f 0000 ff2e 7ebb 0a01 020f 0a01   .t$.....~....... 
> >   20  0201 1002 e3e8 ff00 0060 000c 0101 e600   .........`...... 
> >   30  0001 1100 000a 000c 0301 0a01 020f 0000   ................ 
> >
> >
> > 0000  00 10 7b 2c 78 c3 00 10 7b 2c 78 d5 08 00 45 00   ..{,x...{,x...E.
> > 0010  00 74 46 53 00 00 ff 2e 5a e9 0a 01 03 10 0a 01   .tFS....Z.......
> > 0020  03 0e 10 02 e4 67 ff 00 00 60 00 0c 01 01 e6 00   .....g...`......
> > 0030  00 01 11 00 00 0a 00 0c 03 01 0a 01 03 10 00 00   ................
> 
> This is more than 79 characters per line (82 if I calculate
> correctly). Please don't make the lines longer than that they
> still fit in a "standard" 80 characters wide window.

This has already been addressed. The dump is only 72 characters wide.

> > My reasons are:
> > 
> > 1) It is a more standard hexdump format; we use it internally 
> > in Ethereal (GUI) as well.
> > 
> > 2) This format is easier to deal with during parsing as well.
> 
> I fail to see how the second format should be any easier to
> parse than the first one. Unless you consider endianness...

The problem lies in the determination of the offset versus the bytes
themselves. In the proposed (standard?) format, you can differentiate between
offset and bytes by simply looking at the length of the hex string - two
characters is a byte, more than two is an offset. But in the earlier format,
you quickly run into a situation where your offset and bytes are
indistinguishable. Except for the fact that the offset is at the start of the
line. But what if you don't have any offsets at all - just bytestrings? 

Also, it seems strange that the GUI Ethereal displays hexdumps in one format,
and the text dump (or text printout) of the same hexdump appears in another
format. A case could be made to unify these two formats at a minimum.

> > It's a very small change to the code; I've tried it out. If 
> > this proposed change is made, then text2pcap will be able to
> > read in a trace dumped by tethereal using -V -x, and be able
> > to build a capture file out of the packets (minus the 
> > timestamps), a feature which I think is pretty cool.
> > 
> > Thoughts?
> 
> Preferably you should be able to parse both formats. There is
> no reason to limit yourself to just one format when reading
> in the file. 

True. I am trying to make this as flexible as possible. The tradeoff of
flexibility is, how much context do you place on a value depending on its
position in a line vs depending on it's format (two digits, more than two
digits, etc.). I've tried to place less context on its position in the line;
this allows for better processing of strange formats. The cost is that today
it only works for individual bytes, not pairs of bytes. For example, text2pcap
is able to extract the packet from this email without editing (don't need to
remove the '> '). I've even tried stuff like prefixing eight '> ' forward
marks, then sending the text through a word-wrapping email editor - text2pcap
handles that as well.

> Actually, you should be able to parse a number
> followed by any number of two-digit hexnumbers (with or
> without separating whitespace).

Yeah, but you want slightly stronger rules in order to a) discard the text at
the end of the bytestring, even if it contains hex digits, and b) actually use
the offset for counting, which means you need to differentiate the offset from
the bytes, which brings us to the above point. 

In point of fact, my parser does almost exactly what you mentioned. I
recognize a line as optional prefix text, an offset, one or more bytes, and
optional suffix text. Prefix and suffix text can include bytes which are
ignored. The offset is used for counting as well as to indicate the start of a
bytestring (or a new packet, if the offset is 0). In addition, the code will
be capable of not using offsets altogether (not yet, though).

-Ashok

--- Asok the Intern ----------------------------------------
Ashok Narayanan
IOS Network Protocols, Cisco Systems
250 Apollo Drive, Chelmsford, MA 01824
Ph: 978-244-8387.  Fax: 978-244-8126 (Attn: Ashok Narayanan)

References:
- RE: [Ethereal-dev] Proposed change to tethereal hex dump format
  - From: Peter Kjellerstedt

Prev by Date: RE: [Ethereal-dev] Proposed change to tethereal hex dump format
Next by Date: [Ethereal-dev] Updates for wtls
Previous by thread: Re: [Ethereal-dev] Proposed change to tethereal hex dump format
Next by thread: RE: [Ethereal-dev] Proposed change to tethereal hex dump format
Index(es):
- Date
- Thread