Wireshark-dev: Re: [Wireshark-dev] slow sip voip flow for large captures
From: Cristian Constantin <[email protected]>
Date: Fri, 3 Feb 2012 14:53:36 +0100
On Fri, Feb 3, 2012 at 1:44 PM, Anders Broman
<[email protected]> wrote:
> Hi,
> Please put the patches in bugzilla, if possible split them in two parts one that's ok for all types(Step 2,3,4?) of calls and one which
> Implements the stuff for SIP but not for the other call types.

cristian: step 1 is dealing only with sip packets.

step 2's improvement could be used all voip taps.

step 3 is tricky since the lists have to be reversed (trivial operation
for double linked lists; glib offers an api for this). however
currently the prepend
and reversing is performed only for sip calls. extra-work to be done here
for other voip taps to use more efficient linked lists operations (prepend and
eventl. reverse). also, this step modifies the callback for the button
"Flow" of the
"VoIP" window  - on_graph_bt_clicked() - to reverse (once) the lists:

* voip_calls_get_info()->callsinfo_list
* voip_calls_get_info()->graph_analysis->list

so, in order to work correctly with the other taps, they have to use
the "prepend" operation/convention as well.

step 4 could be used for all taps

let me see what I can do about the patches.

> I think the SIP dissector keeps a table over the calls as well could that table be shared?

cristian: I do not know which table is that.

> Regards
> Anders
>
> -----Original Message-----
> From: [email protected] [mailto:[email protected]] On Behalf Of Cristian Constantin
> Sent: den 3 februari 2012 13:01
> To: Developer support list for Wireshark
> Subject: [Wireshark-dev] slow sip voip flow for large captures
>
> hi!
>
> wireshark can draw call flows for sip voip calls (accessible through the menu Telephony/VoIP Calls).
>
> however, when the capture is large, containing tens of thousands of sip voip calls, wireshark becomes very slow at producing the list of calls and the call flows.
>
> here are my experiences with a capture file of 500 MB containing approx. 40K calls on a debian machine having:
>
> 4xAMD Phenom(tm) II X4 940 Processor
> 4GB RAM
>
> with the svn stock it takes about 10 hours to do it; the report of the profiling done with oprofile looks like this:
>
> CPU: AMD64 family10, speed 3e+06 MHz (estimated) Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0x00 (No unit mask) count 750000 samples  %        image name               app name                 symbol name 8146520  55.2011  lt-wireshark             lt-wireshark
> append_to_frame_graph.isra.7
> 3698478  25.0610  libglib-2.0.so.0.2800.6  lt-wireshark             g_list_last
> 1566933  10.6176  lt-wireshark             lt-wireshark SIPcalls_packet
> 1162377   7.8763  libc-2.13.so             lt-wireshark
> __strcmp_sse2
> 8734      0.0592  libwireshark.so.0.0.1    lt-wireshark serv_name_lookup 8130      0.0551  libwireshark.so.0.0.1    lt-wireshark dissect_sip_common
> 6504      0.0441  libwireshark.so.0.0.1    lt-wireshark             format_text
> 5749      0.0390  libwireshark.so.0.0.1    lt-wireshark tap_push_tapped_queue
> 4753      0.0322  libglib-2.0.so.0.2800.6  lt-wireshark g_unichar_isprint
> 4627      0.0314  libglib-2.0.so.0.2800.6  lt-wireshark g_hash_table_lookup
> 3699      0.0251  libwireshark.so.0.0.1    lt-wireshark             guint8_pbrk
>
> step 1
> ------
>
> by changing SIPcalls_packet() to use a hash table for the lookup of the calls (on sip call-id) the runtime decreases to an (estimated) 3-4 hours.
> SIPcalls_packet()'s cpu usage has decreased from 10.61% to 0.0115% (1000 times faster)
>
> CPU: AMD64 family10, speed 3e+06 MHz (estimated) Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0x00 (No unit mask) count 750000 samples  %        image name               symbol name
> 411305   69.4523  lt-wireshark             append_to_frame_graph.isra.7
> 171549   28.9675  libglib-2.0.so.0.2800.6  g_list_last
> 475       0.0802  libwireshark.so.0.0.1    serv_name_lookup
> 439       0.0741  libwireshark.so.0.0.1    dissect_sip_common
> 375       0.0633  libwireshark.so.0.0.1    format_text 360       0.0608  libwireshark.so.0.0.1    tap_push_tapped_queue
> 303       0.0512  libglib-2.0.so.0.2800.6  g_hash_table_lookup
> 283       0.0478  libglib-2.0.so.0.2800.6  g_unichar_isprint
> 202       0.0341  libwireshark.so.0.0.1    guint8_pbrk
> 172       0.0290  libwireshark.so.0.0.1    proto_tree_add_item
> 149       0.0252  libwireshark.so.0.0.1    compute_offset_length
> 143       0.0241  libc-2.13.so             __memcpy_sse2
> 141       0.0238  libwireshark.so.0.0.1    call_dissector_work
> 141       0.0238  libwireshark.so.0.0.1    emem_free_all
> 115       0.0194  libc-2.13.so             __strncmp_sse2
> 112       0.0189  libc-2.13.so             vfprintf
> 105       0.0177  libwireshark.so.0.0.1    proto_item_add_subtree
> 104       0.0176  libglib-2.0.so.0.2800.6  g_slice_free_chain_with_offset
> 99        0.0167  libwireshark.so.0.0.1    dissect_sdp
> 91        0.0154  libc-2.13.so             free
> 91        0.0154  libwireshark.so.0.0.1    dissect_ip 90        0.0152  libwireshark.so.0.0.1    proto_field_is_referenced 90        0.0152  libwireshark.so.0.0.1    tvb_length_remaining 90        0.0152  lt-wireshark process_specified_packets.constprop.10
> 89        0.0150  libglib-2.0.so.0.2800.6  g_slice_free1
> 89        0.0150  libwiretap.so.0.0.1      file_seek
> 88        0.0149  libwireshark.so.0.0.1    dissect
> 86        0.0145  libwireshark.so.0.0.1    dissect_frame
> 82        0.0138  libwireshark.so.0.0.1    proto_is_protocol_enabled
> 79        0.0133  libc-2.13.so             __strcmp_sse2
> 79        0.0133  libc-2.13.so             _int_malloc
> 78        0.0132  libglib-2.0.so.0.2800.6  g_slice_alloc
> 75        0.0127  libpthread-2.13.so       pthread_getspecific
> 73        0.0123  libwireshark.so.0.0.1    tvb_find_guint8
> 72        0.0122  libc-2.13.so             malloc
> 71        0.0120  libwireshark.so.0.0.1    fast_ensure_contiguous
> 68        0.0115  lt-wireshark             SIPcalls_packet
>
> step 2
> ------
>
> the next improvment was to change append_to_frame_graph() to use a hash table for graph analysis (still slow, over 1 hour):
>
> CPU: AMD64 family10, speed 3e+06 MHz (estimated) Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0x00 (No unit mask) count 750000 samples  %        image name               symbol name
> 565268   94.6185  libglib-2.0.so.0.2800.6  g_list_last
> 1863      0.3118  libwireshark.so.0.0.1    serv_name_lookup 1620      0.2712  libwireshark.so.0.0.1    dissect_sip_common 1340      0.2243  libwireshark.so.0.0.1    format_text
> 987       0.1652  libglib-2.0.so.0.2800.6  g_unichar_isprint
> 982       0.1644  libglib-2.0.so.0.2800.6  g_hash_table_lookup
> 819       0.1371  libwireshark.so.0.0.1    tap_push_tapped_queue
> 731       0.1224  libwireshark.so.0.0.1    guint8_pbrk
> 628       0.1051  libwireshark.so.0.0.1    proto_tree_add_item
> 587       0.0983  libglib-2.0.so.0.2800.6  g_hash_table_insert_internal
> 574       0.0961  libwireshark.so.0.0.1    compute_offset_length
> 405       0.0678  libwireshark.so.0.0.1    proto_is_protocol_enabled
> 401       0.0671  libwireshark.so.0.0.1    emem_free_all
> 395       0.0661  libwireshark.so.0.0.1    dissect_ip
> 391       0.0654  libc-2.13.so             __strncmp_sse2
> 367       0.0614  libwireshark.so.0.0.1    dissect ...
> 20        0.0017  lt-wireshark             append_to_frame_graph.isra.8
>
> however, append_to_frame_graph() got from 69.45% to 0.0017% (40852 times better)
>
> step 3
> ------
>
> I have changed the graph analysis list and the callsinfo list to use prepend instead of append:
>
> CPU: AMD64 family10, speed 3e+06 MHz (estimated) Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0x00 (No unit mask) count 750000 samples  %        image name               symbol name
> 130285   18.6875  libc-2.13.so             msort_with_tmp
> 66464     9.5333  libglib-2.0.so.0.2800.6  node_get_pos 53790     7.7154  libglib-2.0.so.0.2800.6  g_hash_table_insert_internal
> 51146     7.3361  libc-2.13.so             __memcpy_sse2
> 44419     6.3712  libglib-2.0.so.0.2800.6  node_get_next
> 42461     6.0904  libgtk-x11-2.0.so.0.2400.4 gtk_rbtree_reorder_fixup
> 41265     5.9189  libglib-2.0.so.0.2800.6  is_end
> 40819     5.8549  libglib-2.0.so.0.2800.6  g_hash_table_resize
> 38905     5.5803  libgtk-x11-2.0.so.0.2400.4 _gtk_rbtree_next
> 36069     5.1736  libgtk-x11-2.0.so.0.2400.4 _gtk_rbtree_reorder
> 35828     5.1390  libglib-2.0.so.0.2800.6  g_hash_table_lookup
> 29469     4.2269  libgtk-x11-2.0.so.0.2400.4 save_positions
> 26858     3.8524  libgtk-x11-2.0.so.0.2400.4 generate_order
> 7892      1.1320  libc-2.13.so             __memset_sse2
> 7752      1.1119  libgtk-x11-2.0.so.0.2400.4 gtk_rbtree_reorder_sort_func 6860      0.9840  libgtk-x11-2.0.so.0.2400.4 gtk_rbtree_reorder_invert_func
> 6835      0.9804  libglib-2.0.so.0.2800.6  g_hash_table_remove_all_nodes
> 4085      0.5859  libglib-2.0.so.0.2800.6  g_array_append_vals
> 3864      0.5542  libglib-2.0.so.0.2800.6  g_sequence_iter_is_end
> 3591      0.5151  libc-2.13.so             __memset_x86_64
> 2057      0.2950  libglib-2.0.so.0.2800.6  g_array_maybe_expand
>
> step 4
> ------
>
> reordering is performed _continously_ while adding elements to the voip calls list.
> (by _gtk_rbtree_reorder which is calling qsort from libc which is calling msort_with_tmp)
>
> after disabling continous reordering, by using in voip_calls_dlg_update():
>
>        /* disable the re-ordering */
>        gtk_tree_sortable_set_sort_column_id(sortable,
> GTK_TREE_SORTABLE_UNSORTED_SORT_COLUMN_ID, GTK_SORT_ASCENDING);
>
>        listx = g_list_first(listx);
>        while (listx) {
>            add_to_list_store((voip_calls_info_t*)(listx->data));
>            listx = g_list_next(listx);
>        }
>
>        /* enable the re-ordering */
>        gtk_tree_sortable_set_sort_column_id(sortable,
> CALL_COL_START_TIME, GTK_SORT_ASCENDING);
>
> CPU: AMD64 family10, speed 3e+06 MHz (estimated) Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0x00 (No unit mask) count 750000 samples  %        image name               symbol name
> 13495     8.8003  libwireshark.so.0.0.1    format_text
> 10698     6.9763  libglib-2.0.so.0.2800.6  g_unichar_isprint
> 7421      4.8394  libwireshark.so.0.0.1    guint8_pbrk
> 4973      3.2430  libwireshark.so.0.0.1    compute_offset_length
> 4776      3.1145  libglib-2.0.so.0.2800.6  g_hash_table_lookup
> 4029      2.6274  libwireshark.so.0.0.1    dissect_sip_common
> 3851      2.5113  libglib-2.0.so.0.2800.6  g_atomic_pointer_get
> 3186      2.0776  libc-2.13.so             __memcpy_sse2
> 3064      1.9981  libglib-2.0.so.0.2800.6  g_hash_table_insert_internal
> 3008      1.9616  libwireshark.so.0.0.1    emem_free_all
> 2964      1.9329  libwireshark.so.0.0.1    fast_ensure_contiguous
> 2803      1.8279  libwireshark.so.0.0.1    serv_name_lookup
> 2793      1.8214  libwireshark.so.0.0.1    check_offset_length_no_exception
> 2114      1.3786  libc-2.13.so             _int_malloc
> 2063      1.3453  libglib-2.0.so.0.2800.6  g_slice_free1
> 2046      1.3342  libwireshark.so.0.0.1    tap_push_tapped_queue
> 1841      1.2005  libwireshark.so.0.0.1    tvb_get_guint8
> 1713      1.1171  libc-2.13.so             memchr
> 1668      1.0877  libwireshark.so.0.0.1    tvb_length_remaining
> 1625      1.0597  libglib-2.0.so.0.2800.6  g_atomic_int_get 1550      1.0108  libwireshark.so.0.0.1    proto_tree_add_item
> 1524      0.9938  libc-2.13.so             vfprintf
> 1475      0.9619  libwireshark.so.0.0.1    emem_alloc_chunk
> 1439      0.9384  libc-2.13.so             __strncmp_sse2
> 1395      0.9097  libglib-2.0.so.0.2800.6  g_slice_alloc
> 1312      0.8556  libwireshark.so.0.0.1    tvb_find_guint8
> 1279      0.8341  libwireshark.so.0.0.1    ensure_contiguous_no_exception
> 1052      0.6860  libc-2.13.so             malloc
> 1028      0.6704  libwireshark.so.0.0.1    dissect_sdp
> 1022      0.6665  libwireshark.so.0.0.1    tvb_pbrk_guint8
>
> the runtime decreased now to approx. 1 minute (pretty reasonable when compared to 10 hours)
>
> I can publish the patches on bugzilla; however, the changes/tests have been explicitly targeted at sip voip calls.
> some of the data structures are global and used by _all_ voip protocol taps; it won't work correctly with dumps having also non-sip voip calls.
>
> pls. let me know your opinion.
>
> thanks.
> cristian
> ___________________________________________________________________________
> Sent via:    Wireshark-dev mailing list <[email protected]>
> Archives:    http://www.wireshark.org/lists/wireshark-dev
> Unsubscribe: https://wireshark.org/mailman/options/wireshark-dev
>             mailto:[email protected]?subject=unsubscribe
> ___________________________________________________________________________
> Sent via:    Wireshark-dev mailing list <[email protected]>
> Archives:    http://www.wireshark.org/lists/wireshark-dev
> Unsubscribe: https://wireshark.org/mailman/options/wireshark-dev
>             mailto:[email protected]?subject=unsubscribe