Huge thanks to our Platinum Members Endace and LiveAction,
and our Silver Member Veeam, for supporting the Wireshark Foundation and project.

Wireshark-dev: Re: [Wireshark-dev] [tcpdump-workers] mmap consumes more CPU

From: Guy Harris <guy@xxxxxxxxxxxx>
Date: Mon, 26 Nov 2012 14:24:13 -0800
On Nov 26, 2012, at 12:58 PM, abhinav narain <abhinavnarain10@xxxxxxxxx> wrote:

> @Guy,
> Basically, I was adding my own header (instead of radiotap) in kernel and
> processing it in userland with my own code. Basically I wrote my own pcap
> for that.

For your own radio header, what you'd need would be:

	your own ARPHRD_ value (which you'd need the Linux kernel developers to assign - *DO NOT* just pick one and use it yourself unless the Linux kernel has a "private use" range of ARPHRD_ values, in which case use one of those but don't expect the official libpcap, tcpdump, or Wireshark releases to support it);

	your own LINKTYPE_/DLT_ value to which to map that ARPHRD_ value (which you'd need the libpcap/tcpdump developers to assign, unless you choose to use one of the "private use" values DLT_USER0 through DLT_USER15, in which case don't expect the official libpcap, tcpdump, or Wireshark releases to use that value);

	a version of libpcap with a pcap-linux.c that maps from your ARPHRD_ value to your DLT_ value.

> Since, I did not get the performance, Now I have added extra fields in
> radiotap.

Note that, unless those extra fields are listed in

	http://www.radiotap.org

the official tcpdump and Wireshark releases will not ever support them, and, if some other extra fields get officially assigned the same "presence bit" values, tcpdump and Wireshark will interpret those values as corresponding to the official field assignment, not corresponding to your field assignment.  If you plan to add extra fields to radiotap, you should follow the official procedure for standardizing them, as indicated on that page.

> But I still see high CPU usage.

So you're getting high CPU usage with regular libpcap?  Are you getting higher CPU usage if libpcap is using the memory-mapped mechanism than if libpcap is built from the same source, but with the memory-mapped mechanism artificially compiled out, and therefore is *not* using the memory-mapped mechanism?

Have you built profiled versions of libpcap, and a profiled version of whatever program you're using, and gotten the result of profiling, to see where the CPU time is being spent?

> Its interesting that you point out there are more errors during mmap calls.

No, I don't.

What I point out is that *if* recv() is being called a lot *in the standard libpcap mmap-on-Linux code*, *then* you are getting a lot of errors; I am *not* saying that you would be getting *more* errors from that code than from anything else, such as the non-mmap code.  For the non-mmap-on-Linux code, the recvfrom() call will return both packets and error indications, so you wouldn't be making more system calls if you have more errors; for the mmap-on-Linux code, the only system calls made in the non-error case are the select() calls that wait for a new packet to arrive.

> Is this anything to do with allignment of frames ?

No.  The errors come from the Linux kernel, which just processes raw skbuffs, and, in that code path, doesn't know what's part of the radiotap header and what isn't.

*IF* you are getting errors from recv() calls in the memory-mapped code path, then you need to find out *what those errors are* - i.e., what errno value is being returned - to have any chance of being able to figure out why the errors are occurring.

> @Dave : I am running this code on a Netgear router running OpenWrt, so I am
> not sure if there is profiler that can help me out.

The "profiler" is a combination of:

	1) the kernel and libc support for profiling;

	2) the support for the "-pg" flag in the compiler and linker you're using;

	3) the tools to process the files written out by a profiled program after it executes (e.g., the gprof command).

The only part that needs to be present on the machine running the profiled program is 1); if you're doing cross-compiling, 2) needs to be present in the cross-development tools on the machine on which you're cross-compiling, and 3) needs to be present on some machine in a form that can handle the files in the format in which they're written on the machine running the profiled program, e.g., that can handle the endianness of the files as written on the machine running the profiled program.  According to

	http://www.sourceware.org/binutils/docs-2.10/gprof_9.html#SEC26

the files in question identify the byte order of the file and gprof automatically handles that.