ANNOUNCEMENT: Live Wireshark University & Allegro Packets online APAC Wireshark Training Session
April 17th, 2024 | 14:30-16:00 SGT (UTC+8) | Online

Wireshark-dev: Re: [Wireshark-dev] [Wireshark-core] Hash Tables and Algorithmic Complexity Atta

From: Bálint Réczey <balint@xxxxxxxxxxxxxxx>
Date: Tue, 22 Apr 2014 13:52:32 +0200
[Bringing the discussion to -dev with Evan's permission]

2014-04-22 10:15 GMT+02:00 Anders Broman <anders.broman@xxxxxxxxxxxx>:
>
>
> -----Original Message-----
> From: wireshark-core-bounces@xxxxxxxxxxxxx [mailto:wireshark-core-bounces@xxxxxxxxxxxxx] On Behalf Of Evan Huus
> Sent: den 22 april 2014 05:36
> To: Wireshark core developers
> Subject: Re: [Wireshark-core] Hash Tables and Algorithmic Complexity Attacks
>
> On Mon, Apr 21, 2014 at 6:28 PM, Balint Reczey <rbalint@xxxxxxxxx> wrote:
>>
>> On Apr 22, 2014 12:11 AM, Evan Huus <eapache@xxxxxxxxx> wrote:
>>>
>>> On Mon, Apr 21, 2014 at 3:59 PM, Bálint Réczey <balint@xxxxxxxxxxxxxxx> wrote:
>>> > Hi Evan,
>>> >
>>> > 2014-04-21 18:21 GMT+02:00 Evan Huus <eapache@xxxxxxxxx>:
>>> >> (sending to -core because of potential security implications)
>>> >>
>>> >> I was recently investigating implementing a wmem-based hash table,
>>> >> since many of the current users of the wmem-tree do not actually
>>> >> need the predecessor-lookup feature which is the only advantage it
>>> >> provides over a hash table (whereas a hash table is otherwise much faster).
>>> >>
>>> >> I ended up wandering down the rabbit-hole of hash collisions,
>>> >> algorithmic complexity attacks, and universal hashing ([1], [2],
>>> >> [3] and more).
>>> >>
>>> >> Then I read the Glib hash table documentation: [4]. A quick grep
>>> >> through the Wireshark source reveals numerous locations where we
>>> >> use packet data in hash tables with insecure hash functions. As
>>> >> such, I believe that Wireshark is vulnerable to a whole class of
>>> >> denial-of-service attacks in this area.
>>> >>
>>> >> Has this come up before? Am I overlooking something clever we're doing?
>>> > It is true that Wireshark is vulnerable to hash collision attacks,
>>> > and I think it did not come up before because we had enough
>>> > vulnerabilities of other simpler classes. ;-)
>>>
>>> Makes sense :)
>>>
>>> > I think we could create wrappers to provide random seed to standard
>>> > glib hash functions which is the standard way of handling such
>>> > vulnerabilities.
>>>
>>> Unfortunately (based on my reading) that will not be sufficient.
>>> There are two vectors for an attack via collisions: the mapping from
>>> key to guint, and the mapping from guint to bucket index. The first
>>> (key->guint) is user-settable so we can provide a random seed. The
>>> second (guint->bucket) is not user-controllable. From the glib
>>> documentation:
>>>
>>> "The modulus is taken with the hash table size (a prime number) to
>>> find the 'bucket' to place each key into."
>>> and then a few paragraphs down
>>> "Even cryptographic hashes are very easy to find collisions for when
>>> the remainder is taken modulo a somewhat predictable prime number."
>>>
>>> So basically, it doesn't matter how strong we make the initial
>>> mapping because the secondary bucket mapping is predictable anyways.
>>> Fortunately there are easy and efficient ways to make the secondary
>>> mapping unpredictable (it's actually simpler than the initial
>>> mapping) so I guess that a good secure wmem map implementation is
>>> actually fairly important to have.
>> Luckily ghashtable resizes itself increasing the number of buckets when the collision rate is high, thus this attack can't be performed effectively.
>> The described problem is valid only for hash tables with fixed bucket count.
>>
>> Regarding the wmem hash tables I think C wrapping C++ STL hash tables with wmem based custom allocators would do the job.
>>
>>Unfortunately this doesn't work; the free-all operation which wmem provides doesn't play nice with C++ destructors (besides which we are still pure-C for libwireshark for the time being).
We lost being C only with the Qt UI. :-)
It can be implemented several ways which work, one is registering each
hash table to the proper wmem pool and calling their destructor when
freeing the pool - this way we don't have to play with a custom
allocator.
Other technique would be not calling destructors, just freeing all the
allocated connected objects. This is not nice, but would work as well
IMO, since we would not hold refs to the free()-d objects.

>>
>>I have whipped up a very basic wmem-map implementation at [1]. It uses universal multiply-shift hashing [2] for bucket placement, and provides a wmem_secure_hash function for replacing >g_str_hash and friends, based on work done by the Perl community [3] in hardening their hash tables. To the best of my knowledge the resulting hash map is secure against complexity attacks.
>>
>>It still needs cleanup (comments, tests, etc). I would also like to replace one tree somewhere and run benchmarks to make sure it's actually faster. Suggestions and concerns are welcome.
It would be nice to see how it compares to other implementations:
http://incise.org/hash-table-benchmarks.html

I prefer using existing building building blocks instead of inventing our own.

Cheers,
Balint

>>
>>[1] https://code.wireshark.org/review/1272
>>[2] https://en.wikipedia.org/wiki/Universal_hashing#Avoiding_modular_arithmetic
>>[3] http://blog.booking.com/hardening-perls-hash-function.html
>
>
> Would it be good to have initial hastable size in the API? So a table could be pre allocated to a custom size for the cases where we know the size will be larger than standard default.
I think it would not be good. I would save a constant number resizes
(O(1) gain) but we would reserve more memory for each hash table
initially.

Cheers,
Balint

> Regards
> Anders
>
>
>> Cheers,
>> Balint
>>
>>>
>>> > AFAIK ghashtable does not employ randomization of hash function seed:
>>> > https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=655044
>>> >
>>> > Cheers,
>>> > Balint
>>> >
>>> >>
>>> >> Evan
>>> >>
>>> >> P.S. I believe it's still perfectly OK to use hash tables with
>>> >> non-packet data such as static value_strings or as in [5].
>>> >>
>>> >> [1] http://lwn.net/Articles/474912/ [2]
>>> >> http://perldoc.perl.org/perlsec.html#Algorithmic-Complexity-Attack
>>> >> s [3] https://en.wikipedia.org/wiki/Universal_hashing
>>> >> [4]
>>> >> https://developer.gnome.org/glib/unstable/glib-Hash-Tables.html#GH
>>> >> ashFunc [5]
>>> >> https://code.wireshark.org/review/gitweb?p=wireshark.git;a=commit;
>>> >> h=5983cda769fa6615dda5fc7b8f87819d40f0a8d5
>>> >