A significant percentage of Web objects are replicas. For example, a vast majority of image files such as banners, buttons, and logos are duplicated throughout the WWW. Nevertheless, Web caching systems generally treat the replicas as different objects because they have different URLs. In this paper, we propose a simple and efficient way to manage the replicated objects for Web proxy caches. In the proposed scheme, the MD5 checksum, together with the size of an object, forms an identifier of a Web object that can distinguish replicas. Experimental results show that the proposed scheme significantly improves the cache hit rate and the byte hit rate by removing the redundant objects from the cache and reflecting the popularity of objects more precisely.
- World Wide Web