The problem of TCP incast in data centers attracts a lot of attention in our research community. TCP incast is a catastrophic throughput collapse that occurs when multiple senders transmitting TCP data simultaneously to a single aggregator. Based on several experiments, researchers found that TCP timeouts are the primary cause of incast problem. Particularly, timeouts due to insufficient duplicate acknowledgments is unavoidable when at least one of the last three segments is lost from the tail of a window. As a result, this type of timeouts should be avoided to improve the goodput of TCP in data center networks. A few attempts have been made to reduce timeouts, but still the problem is not completely solved especially in the case of timeouts due to insufficient duplicate acknowledgments. In this paper, we present an efficient TCP fast retransmission approach, called TCP-EFR, which is capable to reduce TCP timeouts due to lack of duplicate acknowledgments which is caused by the loss of packets from the tail of a window in data center networks. TCP-EFR makes changes in the fast retransmission and recovery algorithm of TCP by using the congestion signal mechanism of DCTCP based on instantaneous queue length. In addition, TCP-EFR controls the sending rate for avoiding the overflow of switch buffer in order to reduce the loss of packets. The results of a series of simulations in single as well as multiple bottleneck topologies using qualnet 4.5 demonstrates that TCP-EFR can significantly reduce the timeouts due to inadequate duplicate acknowledgments and noticeably improves the performance compared to DCTCP, ICTCP and TCP in terms of goodput, accuracy and stability under various network conditions.
- Data center networks
- Insufficient duplicate acknowledgments
- TCP incast