Huge thanks to our Platinum Members Endace and LiveAction,
and our Silver Member Veeam, for supporting the Wireshark Foundation and project.

Wireshark-dev: Re: [Wireshark-dev] [Wireshark-core] Hash Tables and Algorithmic Complexity Atta

From: Evan Huus <eapache@xxxxxxxxx>
Date: Tue, 22 Apr 2014 08:23:41 -0400
> On Apr 22, 2014, at 7:52 AM, Bálint Réczey <balint@xxxxxxxxxxxxxxx> wrote:
> 
> [Bringing the discussion to -dev with Evan's permission]
> 
> 2014-04-22 10:15 GMT+02:00 Anders Broman <anders.broman@xxxxxxxxxxxx>:
>> 
>> 
>> -----Original Message-----
>> From: wireshark-core-bounces@xxxxxxxxxxxxx [mailto:wireshark-core-bounces@xxxxxxxxxxxxx] On Behalf Of Evan Huus
>> Sent: den 22 april 2014 05:36
>> To: Wireshark core developers
>> Subject: Re: [Wireshark-core] Hash Tables and Algorithmic Complexity Attacks
>> 
>>> On Mon, Apr 21, 2014 at 6:28 PM, Balint Reczey <rbalint@xxxxxxxxx> wrote:
>>> 
>>>> On Apr 22, 2014 12:11 AM, Evan Huus <eapache@xxxxxxxxx> wrote:
>>>> 
>>>>> On Mon, Apr 21, 2014 at 3:59 PM, Bálint Réczey <balint@xxxxxxxxxxxxxxx> wrote:
>>>>> Hi Evan,
>>>>> 
>>>>> 2014-04-21 18:21 GMT+02:00 Evan Huus <eapache@xxxxxxxxx>:
>>>>>> (sending to -core because of potential security implications)
>>>>>> 
>>>>>> I was recently investigating implementing a wmem-based hash table,
>>>>>> since many of the current users of the wmem-tree do not actually
>>>>>> need the predecessor-lookup feature which is the only advantage it
>>>>>> provides over a hash table (whereas a hash table is otherwise much faster).
>>>>>> 
>>>>>> I ended up wandering down the rabbit-hole of hash collisions,
>>>>>> algorithmic complexity attacks, and universal hashing ([1], [2],
>>>>>> [3] and more).
>>>>>> 
>>>>>> Then I read the Glib hash table documentation: [4]. A quick grep
>>>>>> through the Wireshark source reveals numerous locations where we
>>>>>> use packet data in hash tables with insecure hash functions. As
>>>>>> such, I believe that Wireshark is vulnerable to a whole class of
>>>>>> denial-of-service attacks in this area.
>>>>>> 
>>>>>> Has this come up before? Am I overlooking something clever we're doing?
>>>>> It is true that Wireshark is vulnerable to hash collision attacks,
>>>>> and I think it did not come up before because we had enough
>>>>> vulnerabilities of other simpler classes. ;-)
>>>> 
>>>> Makes sense :)
>>>> 
>>>>> I think we could create wrappers to provide random seed to standard
>>>>> glib hash functions which is the standard way of handling such
>>>>> vulnerabilities.
>>>> 
>>>> Unfortunately (based on my reading) that will not be sufficient.
>>>> There are two vectors for an attack via collisions: the mapping from
>>>> key to guint, and the mapping from guint to bucket index. The first
>>>> (key->guint) is user-settable so we can provide a random seed. The
>>>> second (guint->bucket) is not user-controllable. From the glib
>>>> documentation:
>>>> 
>>>> "The modulus is taken with the hash table size (a prime number) to
>>>> find the 'bucket' to place each key into."
>>>> and then a few paragraphs down
>>>> "Even cryptographic hashes are very easy to find collisions for when
>>>> the remainder is taken modulo a somewhat predictable prime number."
>>>> 
>>>> So basically, it doesn't matter how strong we make the initial
>>>> mapping because the secondary bucket mapping is predictable anyways.
>>>> Fortunately there are easy and efficient ways to make the secondary
>>>> mapping unpredictable (it's actually simpler than the initial
>>>> mapping) so I guess that a good secure wmem map implementation is
>>>> actually fairly important to have.
>>> Luckily ghashtable resizes itself increasing the number of buckets when the collision rate is high, thus this attack can't be performed effectively.
>>> The described problem is valid only for hash tables with fixed bucket count.
>>> 
>>> Regarding the wmem hash tables I think C wrapping C++ STL hash tables with wmem based custom allocators would do the job.
>>> 
>>> Unfortunately this doesn't work; the free-all operation which wmem provides doesn't play nice with C++ destructors (besides which we are still pure-C for libwireshark for the time being).
> We lost being C only with the Qt UI. :-)

Not in libwireshark, there may be 3rd-party c-only apps which build against that.

> It can be implemented several ways which work, one is registering each
> hash table to the proper wmem pool and calling their destructor when
> freeing the pool - this way we don't have to play with a custom
> allocator.

But then we lose the fast free-all which was the entire point. Otherwise we might as well use smart pointers.

> Other technique would be not calling destructors, just freeing all the
> allocated connected objects. This is not nice, but would work as well
> IMO, since we would not hold refs to the free()-d objects.

Might work in practice, but definitely not guaranteed to work by the standard.

>>> 
>>> I have whipped up a very basic wmem-map implementation at [1]. It uses universal multiply-shift hashing [2] for bucket placement, and provides a wmem_secure_hash function for replacing >g_str_hash and friends, based on work done by the Perl community [3] in hardening their hash tables. To the best of my knowledge the resulting hash map is secure against complexity attacks.
>>> 
>>> It still needs cleanup (comments, tests, etc). I would also like to replace one tree somewhere and run benchmarks to make sure it's actually faster. Suggestions and concerns are welcome.
> It would be nice to see how it compares to other implementations:
> http://incise.org/hash-table-benchmarks.html
> 
> I prefer using existing building building blocks instead of inventing our own.

I agree, I just couldn't find an existing implementation that was easily adaptable to memory pools.

> Cheers,
> Balint
> 
>>> 
>>> [1] https://code.wireshark.org/review/1272
>>> [2] https://en.wikipedia.org/wiki/Universal_hashing#Avoiding_modular_arithmetic
>>> [3] http://blog.booking.com/hardening-perls-hash-function.html
>> 
>> 
>> Would it be good to have initial hastable size in the API? So a table could be pre allocated to a custom size for the cases where we know the size will be larger than standard default.
> I think it would not be good. I would save a constant number resizes
> (O(1) gain) but we would reserve more memory for each hash table
> initially.
> 
> Cheers,
> Balint
> 
>> Regards
>> Anders
>> 
>> 
>>> Cheers,
>>> Balint
>>> 
>>>> 
>>>>> AFAIK ghashtable does not employ randomization of hash function seed:
>>>>> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=655044
>>>>> 
>>>>> Cheers,
>>>>> Balint
>>>>> 
>>>>>> 
>>>>>> Evan
>>>>>> 
>>>>>> P.S. I believe it's still perfectly OK to use hash tables with
>>>>>> non-packet data such as static value_strings or as in [5].
>>>>>> 
>>>>>> [1] http://lwn.net/Articles/474912/ [2]
>>>>>> http://perldoc.perl.org/perlsec.html#Algorithmic-Complexity-Attack
>>>>>> s [3] https://en.wikipedia.org/wiki/Universal_hashing
>>>>>> [4]
>>>>>> https://developer.gnome.org/glib/unstable/glib-Hash-Tables.html#GH
>>>>>> ashFunc [5]
>>>>>> https://code.wireshark.org/review/gitweb?p=wireshark.git;a=commit;
>>>>>> h=5983cda769fa6615dda5fc7b8f87819d40f0a8d5
>>>>> 
> ___________________________________________________________________________
> Sent via:    Wireshark-dev mailing list <wireshark-dev@xxxxxxxxxxxxx>
> Archives:    http://www.wireshark.org/lists/wireshark-dev
> Unsubscribe: https://wireshark.org/mailman/options/wireshark-dev
>             mailto:wireshark-dev-request@xxxxxxxxxxxxx?subject=unsubscribe