Huge thanks to our Platinum Members Endace and LiveAction,
and our Silver Member Veeam, for supporting the Wireshark Foundation and project.

Wireshark-users: Re: [Wireshark-users] Get all the duplicate packets

From: Jim Young <jyoung@xxxxxxx>
Date: Thu, 20 Sep 2012 03:38:02 +0000
Hello Boaz,

On 9/19/12 4:19 PM, "Boaz Galil" <boaz20@xxxxxxxxx> wrote:
> Editcap -d will remove all the duplicates! I actually want
> to find all the duplicate packets....
<snip>

Assuming that you are looking at frame level duplicates
there's a couple ways of determining which frames may
be duplicates.  This involves displaying the MD5 hash
for each frame.   

NOTE: The MD5 hash technique described below will NOT work
for L3 level duplicates one might see on a one-armed router
interface where each packet might be seen twice, ingressing
on one vlan tag and egressing on another.

NOTE2: Using the MD5 hash technique, its possible (though
very unlikely) to have a false postive where two unrelated
packets generate the exact same MD5 hash value.

Assuming that the Wireshark preference frame.generate_md5_hash
is TRUE (see note towards bottom of this message on how to
check and set) then the following tshark command line can be
used to generate a potentially large display filter for
duplicate packets. 

echo "+++The filter ..."
echo $(tshark -r MYFILE.PCAP -Tfields -e frame.md5_hash \
  | sort \
  | uniq -c \
  | sort -n -r \
  | grep -v ' 1 ' \
  | awk 'BEGIN {printf "frame.number==0"} \
    {printf "||frame.md5_hash=="$2} END {print ""}')

The constructed display filter starts with the do-nothing
clause "frame.number==0" so that the first "||" has something
to its left.

The command line above can be augmented to actually display the
duplicate frames by invoking tshark twice.

echo "+++The command to display the duplicates..."
tshark -r MYFILE.PCAP \
  $(tshark -r MYFILE.PCAP -Tfields -e frame.md5_hash \
  | sort \
  | uniq -c \
  | sort -n -r \
  | grep -v ' 1 ' \
  | awk 'BEGIN {printf "frame.number==0"} \
    {printf "||frame.md5_hash=="$2} END {print ""}')

Or the command line can be modified to save the duplicates
to a new pcap file (MYDUPLICATES.PCAP).  Again this involves
invoking tshark twice.

echo "+++The command to save the duplicates..."
tshark -w MYDUPLICATES.PCAP -r MYFILE.PCAP \
  $(tshark -r MYFILE.PCAP -Tfields -e frame.md5_hash \
  | sort \
  | uniq -c \
  | sort -n -r \
  | grep -v ' 1 ' \
  | awk 'BEGIN {printf "frame.number==0"} \
    {printf "||frame.md5_hash=="$2} END {print ""}')

FWIW:  In addition to using tshark, you can also use editcap to
display the frame MD5 hashes for each frame.

Here's an MD5 hash example using editcap:

> $ editcap -v -D 0 MYFILE.PCAP /dev/null
> File MYFILE.pcap is a Wireshark - pcapng capture file.
> Packet: 1, Len: 42, MD5 Hash: dc69bb2da069731e40367bed2cb44d56
> Packet: 2, Len: 42, MD5 Hash: dc69bb2da069731e40367bed2cb44d56
> Packet: 3, Len: 42, MD5 Hash: dc69bb2da069731e40367bed2cb44d56
> <snip>

And here's the MD5 example using tshark:

> $ tshark -o -o frame.generate_md5_hash:TRUE -r MYFILE.PCAP -Tfields -e
>frame.md5_hash
> dc69bb2da069731e40367bed2cb44d56
> dc69bb2da069731e40367bed2cb44d56
> dc69bb2da069731e40367bed2cb44d56
<snip>

While the tshark report is cleaner (1 column versus 7), both
the editcap and tshark output can then be post processed to
extract the same counts of any duplicate MD5 hashes.

I believe using editcap to generate the MD5 hashes is faster
than using tshark if processing large trace files.

The examples below illustrate how both editcap and
tshark can be used to generate virtually identical list
of any duplicate MD5 hashes.

#1 - Using editcap:

> $ editcap -v -D 0 MYFILE.PCAP /dev/null \
>   | grep Hash: \
>   | awk '{ print $7 }' \
>   | sort \
>   | uniq -c \
>   | grep -v ' 1 ' \
> File MYFILE.pcap is a Wireshark - pcapng capture file.
>   18 198e273fe9792cbf54919701db49b9cf
>   12 1e848f674c60a07d23f7104b8a205a1c
>    4 28c92df42bbf9c94a93560a5fb3decf0
>    2 3aabbf2969b96da88ee9b5937345eb75
>    6 636c43db7e87aa86c0afaf479ded30cf
>    4 67a1a4f23bf565d2ab946955a0dc4b70
>    3 6e30d01d335343eed4dca273d95d6347
>   24 8d7780d026fb1d883717a6957abf2476
>   12 92063b2f67c0246413959046bf455c26
>    3 dc69bb2da069731e40367bed2cb44d56
>    2 e7177c946c4638b72fc62fe05bc5e30a
>    9 fdaf0bcb2fe45420232fdd990c4fa655
> $

#2 - Using tshark:

> $ tshark -r MYFILE.PCAP -Tfields -e frame.md5_hash \
>   | sort \
>   | uniq -c \
>   | sort -n -r \
>   | grep -v ' 1 '
>   18 198e273fe9792cbf54919701db49b9cf
>   12 1e848f674c60a07d23f7104b8a205a1c
>    4 28c92df42bbf9c94a93560a5fb3decf0
>    2 3aabbf2969b96da88ee9b5937345eb75
>    6 636c43db7e87aa86c0afaf479ded30cf
>    4 67a1a4f23bf565d2ab946955a0dc4b70
>    3 6e30d01d335343eed4dca273d95d6347
>   24 8d7780d026fb1d883717a6957abf2476
>   12 92063b2f67c0246413959046bf455c26
>    3 dc69bb2da069731e40367bed2cb44d56
>    2 e7177c946c4638b72fc62fe05bc5e30a
>    9 fdaf0bcb2fe45420232fdd990c4fa655
> $

NOTE:  For the tshark MD5 hash pipelines to work the
Wireshark preference "frame.generate_md5_hash" must be
enabled.  You can easily determine if the frame.generate_md5_hash
preference is enabled using the following tshark pipeline:

> $ tshark -G currentprefs | grep frame.generate_md5_hash
> frame.generate_md5_hash: TRUE
> $

If MD5 hashes are disabled (which I believe is the default)
then it can be manually enabled on the tshark command line
using tshark's -o option: -o frame.generate_md5_hash:TRUE

That would make the tshark command line that saved the
packets to a new file look like:

tshark -o frame.generate_md5_hash:TRUE \
  -w MYDUPLICATES.PCAP -r MYFILE.PCAP \
  $(tshark -o frame.generate_md5_hash:TRUE \
    -r MYFILE.PCAP -Tfields -e frame.md5_hash \
<snip>

But its probably easier to just permanently enable MD5 hashes
within Wireshark's preference file so that you don't have to
remember to use the tshark -o frame.generate_md5_hash:TRUE
option.

Hope this helps,

Jim Y.