We propose a BitTorrent-like protocol based on an online learning (reinforcement learning) mechanism, which can replace the peer selection mechanisms in the regular BitTorrent protocol. We model the peers' interactions in the BitTorrent-like network as a repeated stochastic game, where the strategic behaviors of the peers are explicitly considered. A peer that applies the reinforcement learning (RL)-based mechanism uses the observations on the associated peers' statistical reciprocal behaviors to determine its best responses and estimate the corresponding impact on its expected utility. The policy determines the peer's resource reciprocations such that the peer can maximize its long-term performance. We have implemented the proposed mechanism and incorporated it into an existing BitTorrent client. Our experiments performed on a controlled Planetlab testbed confirm that the proposed protocol 1) promotes fairness and provides incentives to contributed resources, i.e., high capacity peers improve their download completion time by up to 33 percent, 2) improves the system stability and robustness, i.e., reduces the peer selection fluctuations by 57 percent, and (3) discourages free-riding, i.e., peers reduce their uploads to free-riders by 64 percent as compared to the regular BitTorrent protocol.
|Number of pages||9|
|Journal||IEEE Transactions on Parallel and Distributed Systems|
|State||Published - 2012|
Bibliographical noteFunding Information:
The material in this paper was presented in part at the Thirtieth IEEE International Conference on Computer Communications (IEEE INFOCOM 2011), Shanghai, China, April 2011. This work was supported in part by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science and Technology (2012-0002917), in part by the MKE (Ministry of Knowledge Economy) under the ITRC (Information Technology Research Center) support program supervised by the NIPA (National IT Industry Promotion Agency) (NIPA-2012-H0301-12-1008, NIPA-2012-H0301-12-4004), in part by the Korea Meteorological Administration Research and Development Program under grant CATER 2012-3064, and in part by NSF grant CNS 0831549. The corresponding author is H. Park. This work was mainly performed while the R. Izhak-Ratzin and the H. Park were with UCLA.
- Peer-to-peer (P2P)
- foresighted resource reciprocation strategy
- reinforcement learning