Wireshark · Wireshark-users: Re: [Wireshark-users] filter for ONLY initial get request

Wireshark-users: Re: [Wireshark-users] filter for ONLY initial get request

Date: Wed, 11 Aug 2010 16:08:20 -0400

On 8/11/2010 9:35 AM, Thierry Emmanuel wrote:


-----Original Message-----
From: wireshark-users-bounces@xxxxxxxxxxxxx [mailto:wireshark-users-bounces@xxxxxxxxxxxxx] On Behalf Of Jeffs
Sent: mercredi 11 aoï¿½t 2010 15:07
To: Community support list for Wireshark
Subject: Re: [Wireshark-users] filter for ONLY initial get request

This formula, however, only returns results minus the links and images
embedded in the web page:

tshark -r test.cap -T fields -e http.host  | sed 's/?.*$//' | sed -n
'/www./p'  | sort | uniq -c | sort -rn | head -n 100

15 www.propertyshark.com
       8 www.nytimes.com
       2 www.google-analytics.com
       1 www.facebook.com


However, I am new to regex so I'm sure I may be missing  something or
losing some links.


It is a common mistake to consider that every websites have their main
address on a "www" subdomain. If you want a generic filter, you cannot
rely on it. If you want a relevant result, you'll have to build a
non-restrictive regexp and manually filter unappropriate results,
eventually making some rules to exclude well-known advertising sites.

A fully automatic solution would be to parse the data checking it is
a well-formed html (or xml or plain-text) document. This will purge
videos and images from your results.

I agree that not all websites have their main address as "www". Butgiven that I am up until now unable to effectively remove all the extradomains that are captured and I am therefore bringing in a lot ofextraneous domain names, I have to choose between the lesser of twoevils -- lose some domains or pull in a lot of unwanted domain namesthat totally pollute my desired results.

I wish there was a way to capture ONLY the initially requested URL thatis either clicked or typed into the browser address bar.

I was thinking that maybe a tap might solve this problem because itwould capture only one half of a duplex conversation on one wire (theoutgoing request) and thus only capture the requested URL.

Your suggestion of parsing the data is indeed unique and intersting.Are you suggesting that dumpcap or ethereal would somehow interogate thelink, follow it and then make a determination. This sounds like a veryinteresting prospect but I'm not fully sure I understand how it would work.


Thank you.

Follow-Ups:
- Re: [Wireshark-users] filter for ONLY initial get request
  - From: Thierry Emmanuel

References:
- Re: [Wireshark-users] filter for ONLY initial get request
  - From: j.snelders
- Re: [Wireshark-users] filter for ONLY initial get request
  - From: Jeffs
- Re: [Wireshark-users] filter for ONLY initial get request
  - From: Sake Blok
- Re: [Wireshark-users] filter for ONLY initial get request
  - From: Jeffs
- Re: [Wireshark-users] filter for ONLY initial get request
  - From: Jeffs
- Re: [Wireshark-users] filter for ONLY initial get request
  - From: Sake Blok
- Re: [Wireshark-users] filter for ONLY initial get request
  - From: Jeffs
- Re: [Wireshark-users] filter for ONLY initial get request
  - From: Sake Blok
- Re: [Wireshark-users] filter for ONLY initial get request
  - From: Jeffs
- Re: [Wireshark-users] filter for ONLY initial get request
  - From: Thierry Emmanuel

Prev by Date: Re: [Wireshark-users] Memory vs Wireshark
Next by Date: [Wireshark-users] Rate
Previous by thread: Re: [Wireshark-users] filter for ONLY initial get request
Next by thread: Re: [Wireshark-users] filter for ONLY initial get request
Index(es):
- Date
- Thread