Sampling is a powerful tool to reduce the processing overhead in various systems. NetFlow uses a local table for counting records per flow, and sFlow sends out the collected packet headers periodically to a collecting server over the network. Any measurement system falls into either one of these two models. To reduce the overhead, as in sFlow, simple random sampling (SRS) has been widely used in practice because of its simplicity. However, SRS provides non-uniform sampling rates for different fine-grained flows (defined by 5-tuple), because it samples packets over an aggregated data flow (defined by switch port or VLAN). Consequently, some flows are sampled more than the designated sampling rate (resulting in over-estimation), and others are sampled fewer (resulting in under-estimation). Starting with a simple idea that independent per-flow packet sampling provides the most accurate estimation of each flow, we introduce a new concept of per-flow systematic sampling, aiming to provide the same sampling rate across all flows. In addition, we provide a concrete sampling method called SketchFlow, which approximates the idea of the per-flow systematic sampling using a sketch saturation event. We demonstrate SketchFlow's performance in terms of accuracy, sampling rate, and overhead using real-world datasets, including a backbone network trace, I/O trace, and Twitter dataset. Experimental results show that SketchFlow outperforms SRS (i.e., sFlow) and the non-linear sampling method while requiring a small CPU overhead to measure high-speed traffic in real-time.
|Title of host publication
|INFOCOM 2020 - IEEE Conference on Computer Communications
|Institute of Electrical and Electronics Engineers Inc.
|Number of pages
|Published - Jul 2020
|38th IEEE Conference on Computer Communications, INFOCOM 2020 - Toronto, Canada
Duration: 6 Jul 2020 → 9 Jul 2020
|Proceedings - IEEE INFOCOM
|38th IEEE Conference on Computer Communications, INFOCOM 2020
|6/07/20 → 9/07/20
Bibliographical noteFunding Information:
This research was supported by Grobal Research Laboratory (GRL) Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science and ICT (NRF-2016K1A1A2912757).This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (NRF-2017R1A2B4010657). DaeHun Nyang is the corresponding author.
© 2020 IEEE.