Huge thanks to our Platinum Members Endace and LiveAction,
and our Silver Member Veeam, for supporting the Wireshark Foundation and project.

Wireshark-dev: [Wireshark-dev] RFD: The Future of Memory Management in Wireshark

From: Evan Huus <eapache@xxxxxxxxx>
Date: Thu, 18 Oct 2012 21:01:24 -0400
TL;DR - Jakub recently proposed a few changes to emem [1]. While I
think they are a very good idea, I believe that in the long term the
current emem design has too many fundamental limitations to make it
worth adapting for our future needs. I propose that it should be
gradually deprecated in favour of something else, and I have written a
simple example of what that something else might be.

--- The Long Version ---

After my recent adventures with emem, I took some time to work through
what features Wireshark will likely want in its memory manager now and
in the future. Predicting the future is always a tricky business, but
I tried not to stray too far from the obvious:
- Arbitrary instances of the current seasonal and ephemeral
allocators. These will be necessary if we ever want to support opening
multiple files at once.
- Thread-safe allocators. Necessary if we ever want to do
multi-threaded dissection or re-dissection of a single file.
- Allocators with different scopes. I can think of a couple of
different places where a new allocator scope might simplify existing
code, and I'm sure there are others.

With these ideas in mind, I then took a closer look at the current
emem design and tried to estimate the amount of work involved in
adapting (and maintaining) it going forward. I concluded that in the
long run, emem has too many fundamental limitations to make it worth
adapting:
- Supporting arbitrary seasonal and ephemeral pools would require
non-trivial API changes, causing a lot of pain in dependent code.
- The current glib and mmap allocators are jammed together, often
living side-by-side in the same functions. As I found out recently,
trying to tweak either one can have a lot of unexpected side-effects.
- Adding a single new scope requires writing a new wrapper for every
API function (*_alloc, *_alloc0, *_strdup, *_strndup, etc). At best
this is just extra work, but at worst it leads to API inconsistencies,
where, for example, there's an ep_strbuf implementation but no
equivalent se_ functions (that's a real example by the way).

I would like to propose that emem is gradually deprecated in favour of
a new memory management framework that is designed with these
requirements in mind. A separate new interface will ease the pain of
migration by allowing us to maintain emem as a deprecated interface
for as long as necessary.

In the spirit of backing up ideas with working code, I have written a
simple version of what I think such a new framework might look like.
It cleanly separates out the allocator back-ends (glib, mmap, etc) and
the utility functions (strdup etc.) from the core interface. It also
requires that all API functions are explicitly passed the scoped
allocator instance they want to use. This makes it trivial to add new
scopes or to support multiple instances of the current scopes. Because
the allocators are cleanly distinguished, making any one of them
thread-safe is a fairly simple operation.

I have linked a tarball [2] containing the following files:
- wmem_allocator.h - the definition of the allocator interface
- wmem_allocator_glib.* - a simple implementation of the allocator
interface backed by g_malloc and a singly-linked list.
- wmem_core.* - implementations of wmem_alloc() and wmem_alloc0()
- wmem_strutl.* - implementations of wmem_strcpy() and wmem_strncpy()
- wmem.h - the general header file for inclusion by consumers of wmem,
it simply wraps inclusion of wmem_core.h, wmem_strutl.h and any others
that might be created

The usage might look something like this:

wmem_allocator_t *ep_scope = wmem_create_glib_allocator();
doWork(ep_scope);
wmem_destroy_glib_allocator(ep_scope);

and then in doWork, instead of ep_alloc(numBytes) you would call
wmem_alloc(ep_scope, numBytes).

Alternatively, if the outer block is in a loop then it doesn't have to
create/destroy each time, and can simply call wmem_free_all(ep_scope)
between calls to doWork().

Ideas, comments and constructive criticisms are always welcome.
What do you think?
Evan

[1] https://www.wireshark.org/lists/wireshark-dev/201210/msg00116.html
[2] https://dl.dropbox.com/u/171647/wmem.tar.gz