Wireshark-dev: Re: [Wireshark-dev] Wireshark memory handling
From: Guy Harris <[email protected]>
Date: Fri, 9 Oct 2009 19:08:08 -0700
On Oct 9, 2009, at 7:43 AM, Jeff Morriss wrote:

One advantage of using memory mapped files instead of swap is that if
your OS is swapping, *everything* is slow.  If only Wireshark is, er,
swapping, only Wireshark is slow.
That depends on the OS's policies for managing main memory - and on  
any policy hints given to the OS by the application.  If, for example,  
when it searches for a page frame to use to satisfy a page fault, it  
uses the same policy when servicing a page fault for a page backed by  
a mapped file and when servicing a page fault for a page backed by  
swap space (an "anonymous" page), the only advantage to memory mapping  
would be
	1) if the file is mapped into multiple process's address spaces (and  
either read-only or not copy-on-write), those processes can share a  
single page frame for a page from the file - but that's not the case  
here, as I understand it;
	2) if the data in anonymous pages is a copy of data from a file,  
memory-mapping the file even in only one process means that you don't  
even temporarily have two copies of the data in memory.
Using memory mapped files would probably help quite a bit with keeping
the UI responsive because only Wireshark's, for example, packet data
would be on disk but the executable pages and "core" memory like the
statistics could be kept in RAM (or at least whatever the OS gives us).
As per my mail to Erlend, the frame data isn't kept in Wireshark's  
address space, although reassembled data is (and frame_data structures  
are, and some or all column text is).
However, if Wireshark reads a large capture file, on many OSes the  
blocks of the file will be brought into the page pool (as, on many  
OSes, the "buffer cache" is implemented atop the page pool, so pages  
being read in with read()/ReadFile() compete for memory with pages  
faulted in - it may even be that a read is done by mapping into the  
kernel's address space the region of the file being read and copying  
from that region into the userland buffer space, so that the actual  
file system reads are done in response to page faults).  *Hopefully*  
the OS will recognize it as sequential access and, at least, not  
completely blow the page cache if the file is big enough (although, if  
you have enough memory that you *don't* blow the page cache, you might  
as well keep the pages in memory; my menagerie of capture files I use  
for Wireshark/tcpdump regression testing for some changes can fit  
entirely in main memory on my machine, so if I run the tests twice in  
a row, the disk hardly does anything).