Huge thanks to our Platinum Members Endace and LiveAction,
and our Silver Member Veeam, for supporting the Wireshark Foundation and project.

Wireshark-dev: [Wireshark-dev] slow sip voip flow for large captures

From: Cristian Constantin <const.crist@xxxxxxxxxxxxxx>
Date: Fri, 3 Feb 2012 13:01:29 +0100
hi!

wireshark can draw call flows for sip voip calls
(accessible through the menu Telephony/VoIP Calls).

however, when the capture is large, containing tens of
thousands of sip voip calls, wireshark becomes very slow
at producing the list of calls and the call flows.

here are my experiences with a capture file of 500 MB containing
approx. 40K calls on a debian machine having:

4xAMD Phenom(tm) II X4 940 Processor
4GB RAM

with the svn stock it takes about 10 hours to do it; the
report of the profiling done with oprofile looks like this:

CPU: AMD64 family10, speed 3e+06 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a
unit mask of 0x00 (No unit mask) count 750000
samples  %        image name               app name                 symbol name
8146520  55.2011  lt-wireshark             lt-wireshark
append_to_frame_graph.isra.7
3698478  25.0610  libglib-2.0.so.0.2800.6  lt-wireshark             g_list_last
1566933  10.6176  lt-wireshark             lt-wireshark
SIPcalls_packet
1162377   7.8763  libc-2.13.so             lt-wireshark
__strcmp_sse2
8734      0.0592  libwireshark.so.0.0.1    lt-wireshark
serv_name_lookup
8130      0.0551  libwireshark.so.0.0.1    lt-wireshark
dissect_sip_common
6504      0.0441  libwireshark.so.0.0.1    lt-wireshark             format_text
5749      0.0390  libwireshark.so.0.0.1    lt-wireshark
tap_push_tapped_queue
4753      0.0322  libglib-2.0.so.0.2800.6  lt-wireshark
g_unichar_isprint
4627      0.0314  libglib-2.0.so.0.2800.6  lt-wireshark
g_hash_table_lookup
3699      0.0251  libwireshark.so.0.0.1    lt-wireshark             guint8_pbrk

step 1
------

by changing SIPcalls_packet() to use a hash table for the lookup of the calls
(on sip call-id) the runtime decreases to an (estimated) 3-4 hours.
SIPcalls_packet()'s cpu usage has decreased from 10.61% to 0.0115%
(1000 times faster)

CPU: AMD64 family10, speed 3e+06 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a
unit mask of 0x00 (No unit mask) count 750000
samples  %        image name               symbol name
411305   69.4523  lt-wireshark             append_to_frame_graph.isra.7
171549   28.9675  libglib-2.0.so.0.2800.6  g_list_last
475       0.0802  libwireshark.so.0.0.1    serv_name_lookup
439       0.0741  libwireshark.so.0.0.1    dissect_sip_common
375       0.0633  libwireshark.so.0.0.1    format_text
360       0.0608  libwireshark.so.0.0.1    tap_push_tapped_queue
303       0.0512  libglib-2.0.so.0.2800.6  g_hash_table_lookup
283       0.0478  libglib-2.0.so.0.2800.6  g_unichar_isprint
202       0.0341  libwireshark.so.0.0.1    guint8_pbrk
172       0.0290  libwireshark.so.0.0.1    proto_tree_add_item
149       0.0252  libwireshark.so.0.0.1    compute_offset_length
143       0.0241  libc-2.13.so             __memcpy_sse2
141       0.0238  libwireshark.so.0.0.1    call_dissector_work
141       0.0238  libwireshark.so.0.0.1    emem_free_all
115       0.0194  libc-2.13.so             __strncmp_sse2
112       0.0189  libc-2.13.so             vfprintf
105       0.0177  libwireshark.so.0.0.1    proto_item_add_subtree
104       0.0176  libglib-2.0.so.0.2800.6  g_slice_free_chain_with_offset
99        0.0167  libwireshark.so.0.0.1    dissect_sdp
91        0.0154  libc-2.13.so             free
91        0.0154  libwireshark.so.0.0.1    dissect_ip
90        0.0152  libwireshark.so.0.0.1    proto_field_is_referenced
90        0.0152  libwireshark.so.0.0.1    tvb_length_remaining
90        0.0152  lt-wireshark
process_specified_packets.constprop.10
89        0.0150  libglib-2.0.so.0.2800.6  g_slice_free1
89        0.0150  libwiretap.so.0.0.1      file_seek
88        0.0149  libwireshark.so.0.0.1    dissect
86        0.0145  libwireshark.so.0.0.1    dissect_frame
82        0.0138  libwireshark.so.0.0.1    proto_is_protocol_enabled
79        0.0133  libc-2.13.so             __strcmp_sse2
79        0.0133  libc-2.13.so             _int_malloc
78        0.0132  libglib-2.0.so.0.2800.6  g_slice_alloc
75        0.0127  libpthread-2.13.so       pthread_getspecific
73        0.0123  libwireshark.so.0.0.1    tvb_find_guint8
72        0.0122  libc-2.13.so             malloc
71        0.0120  libwireshark.so.0.0.1    fast_ensure_contiguous
68        0.0115  lt-wireshark             SIPcalls_packet

step 2
------

the next improvment was to change append_to_frame_graph() to use a
hash table for graph analysis (still slow, over 1 hour):

CPU: AMD64 family10, speed 3e+06 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a
unit mask of 0x00 (No unit mask) count 750000
samples  %        image name               symbol name
565268   94.6185  libglib-2.0.so.0.2800.6  g_list_last
1863      0.3118  libwireshark.so.0.0.1    serv_name_lookup
1620      0.2712  libwireshark.so.0.0.1    dissect_sip_common
1340      0.2243  libwireshark.so.0.0.1    format_text
987       0.1652  libglib-2.0.so.0.2800.6  g_unichar_isprint
982       0.1644  libglib-2.0.so.0.2800.6  g_hash_table_lookup
819       0.1371  libwireshark.so.0.0.1    tap_push_tapped_queue
731       0.1224  libwireshark.so.0.0.1    guint8_pbrk
628       0.1051  libwireshark.so.0.0.1    proto_tree_add_item
587       0.0983  libglib-2.0.so.0.2800.6  g_hash_table_insert_internal
574       0.0961  libwireshark.so.0.0.1    compute_offset_length
405       0.0678  libwireshark.so.0.0.1    proto_is_protocol_enabled
401       0.0671  libwireshark.so.0.0.1    emem_free_all
395       0.0661  libwireshark.so.0.0.1    dissect_ip
391       0.0654  libc-2.13.so             __strncmp_sse2
367       0.0614  libwireshark.so.0.0.1    dissect
...
20        0.0017  lt-wireshark             append_to_frame_graph.isra.8

however, append_to_frame_graph() got from 69.45% to 0.0017% (40852 times better)

step 3
------

I have changed the graph analysis list and the callsinfo list to use
prepend instead of append:

CPU: AMD64 family10, speed 3e+06 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a
unit mask of 0x00 (No unit mask) count 750000
samples  %        image name               symbol name
130285   18.6875  libc-2.13.so             msort_with_tmp
66464     9.5333  libglib-2.0.so.0.2800.6  node_get_pos
53790     7.7154  libglib-2.0.so.0.2800.6  g_hash_table_insert_internal
51146     7.3361  libc-2.13.so             __memcpy_sse2
44419     6.3712  libglib-2.0.so.0.2800.6  node_get_next
42461     6.0904  libgtk-x11-2.0.so.0.2400.4 gtk_rbtree_reorder_fixup
41265     5.9189  libglib-2.0.so.0.2800.6  is_end
40819     5.8549  libglib-2.0.so.0.2800.6  g_hash_table_resize
38905     5.5803  libgtk-x11-2.0.so.0.2400.4 _gtk_rbtree_next
36069     5.1736  libgtk-x11-2.0.so.0.2400.4 _gtk_rbtree_reorder
35828     5.1390  libglib-2.0.so.0.2800.6  g_hash_table_lookup
29469     4.2269  libgtk-x11-2.0.so.0.2400.4 save_positions
26858     3.8524  libgtk-x11-2.0.so.0.2400.4 generate_order
7892      1.1320  libc-2.13.so             __memset_sse2
7752      1.1119  libgtk-x11-2.0.so.0.2400.4 gtk_rbtree_reorder_sort_func
6860      0.9840  libgtk-x11-2.0.so.0.2400.4 gtk_rbtree_reorder_invert_func
6835      0.9804  libglib-2.0.so.0.2800.6  g_hash_table_remove_all_nodes
4085      0.5859  libglib-2.0.so.0.2800.6  g_array_append_vals
3864      0.5542  libglib-2.0.so.0.2800.6  g_sequence_iter_is_end
3591      0.5151  libc-2.13.so             __memset_x86_64
2057      0.2950  libglib-2.0.so.0.2800.6  g_array_maybe_expand

step 4
------

reordering is performed _continously_ while adding elements to the
voip calls list.
(by _gtk_rbtree_reorder which is calling qsort from libc which is
calling msort_with_tmp)

after disabling continous reordering, by using in voip_calls_dlg_update():

       /* disable the re-ordering */
       gtk_tree_sortable_set_sort_column_id(sortable,
GTK_TREE_SORTABLE_UNSORTED_SORT_COLUMN_ID, GTK_SORT_ASCENDING);

       listx = g_list_first(listx);
       while (listx) {
           add_to_list_store((voip_calls_info_t*)(listx->data));
           listx = g_list_next(listx);
       }

       /* enable the re-ordering */
       gtk_tree_sortable_set_sort_column_id(sortable,
CALL_COL_START_TIME, GTK_SORT_ASCENDING);

CPU: AMD64 family10, speed 3e+06 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a
unit mask of 0x00 (No unit mask) count 750000
samples  %        image name               symbol name
13495     8.8003  libwireshark.so.0.0.1    format_text
10698     6.9763  libglib-2.0.so.0.2800.6  g_unichar_isprint
7421      4.8394  libwireshark.so.0.0.1    guint8_pbrk
4973      3.2430  libwireshark.so.0.0.1    compute_offset_length
4776      3.1145  libglib-2.0.so.0.2800.6  g_hash_table_lookup
4029      2.6274  libwireshark.so.0.0.1    dissect_sip_common
3851      2.5113  libglib-2.0.so.0.2800.6  g_atomic_pointer_get
3186      2.0776  libc-2.13.so             __memcpy_sse2
3064      1.9981  libglib-2.0.so.0.2800.6  g_hash_table_insert_internal
3008      1.9616  libwireshark.so.0.0.1    emem_free_all
2964      1.9329  libwireshark.so.0.0.1    fast_ensure_contiguous
2803      1.8279  libwireshark.so.0.0.1    serv_name_lookup
2793      1.8214  libwireshark.so.0.0.1    check_offset_length_no_exception
2114      1.3786  libc-2.13.so             _int_malloc
2063      1.3453  libglib-2.0.so.0.2800.6  g_slice_free1
2046      1.3342  libwireshark.so.0.0.1    tap_push_tapped_queue
1841      1.2005  libwireshark.so.0.0.1    tvb_get_guint8
1713      1.1171  libc-2.13.so             memchr
1668      1.0877  libwireshark.so.0.0.1    tvb_length_remaining
1625      1.0597  libglib-2.0.so.0.2800.6  g_atomic_int_get
1550      1.0108  libwireshark.so.0.0.1    proto_tree_add_item
1524      0.9938  libc-2.13.so             vfprintf
1475      0.9619  libwireshark.so.0.0.1    emem_alloc_chunk
1439      0.9384  libc-2.13.so             __strncmp_sse2
1395      0.9097  libglib-2.0.so.0.2800.6  g_slice_alloc
1312      0.8556  libwireshark.so.0.0.1    tvb_find_guint8
1279      0.8341  libwireshark.so.0.0.1    ensure_contiguous_no_exception
1052      0.6860  libc-2.13.so             malloc
1028      0.6704  libwireshark.so.0.0.1    dissect_sdp
1022      0.6665  libwireshark.so.0.0.1    tvb_pbrk_guint8

the runtime decreased now to approx. 1 minute (pretty reasonable
when compared to 10 hours)

I can publish the patches on bugzilla; however,
the changes/tests have been explicitly targeted at sip voip calls.
some of the data structures are global and used by _all_
voip protocol taps; it won't work correctly with dumps having
also non-sip voip calls.

pls. let me know your opinion.

thanks.
cristian