ANNOUNCEMENT: Live Wireshark University & Allegro Packets online APAC Wireshark Training Session
April 17th, 2024 | 14:30-16:00 SGT (UTC+8) | Online

Wireshark-bugs: [Wireshark-bugs] [Bug 4070] Add a facility within wireshark to remove duplicate

Date: Thu, 17 Dec 2009 20:01:08 -0800 (PST)
https://bugs.wireshark.org/bugzilla/show_bug.cgi?id=4070

Jim Young <jyoung@xxxxxxx> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jyoung@xxxxxxx

--- Comment #7 from Jim Young <jyoung@xxxxxxx> 2009-12-17 20:01:04 PST ---
I think that editcap's de-dup logic would really impact dumpcap's ability to
capture all packets with dense bursts of traffic on high speed circuits.

editcap's de-dup technique involves calculating the MD5 hash of each packet and
then comparing the MD5 hash and the packet length to the previous N packets. 
Large N values can really make a difference in how quickly editcap can process
a tracefile.

Wireshark and tshark already have the ability to calculate the MD5 hashes of
the packets but the MD5 calculation is disabled by default.  MD5 hashes can be
enabled via the "frame" protocol preference option or via the option -o
frame.generate_md5_hash:TRUE. 

Quite a while ago I started to implement editcap's de-dup techniques (both -D
and -w) within wireshark/tshark.  I added a de-dup option to the frame protocol
panel with some parameters.  When the feature was enabled the packet details
included a boolean frame.duplicate field.  A value of TRUE would mean that the
packet was determined to be a duplicate of some previous packet.  Unfortunately
I got side tracked on other issues/problems so I never got a chance to finish
that particular piece of code.  Perhaps it's time to dust off and cleanup that
de-dup patch.  

FWIW: Without the wireshark de-dup patch I have found that enabling the MD5
hash generation, creating a "frame.md5_hash" custom column and sorting the
trace by the MD5 hash column generally works as a quick and dirty way of
identifying duplicates.

-- 
Configure bugmail: https://bugs.wireshark.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.