ACCELERATING DYNAMIC WEB CONTENT DELIVERY USING KEYWORD-BASED FRAGMENT DETECTION
Keywords:
Object Characteristics, Dynamic Web Content, Dynamic Content Adapter, Fragment-based CachingAbstract
Recent advances in Web engineering have enabled the rapid growth of dynamic Web services such as Web-based email, online banking, online shopping and entertainment. We envision that finding an effective way to deliver these dynamic Web services and understanding the relationship between Web application design and delivery are two important Web engineering issues, and have not been seriously considered in the community. In this paper, we intend to tackle the first problem and pave the way for solving the second problem in the future . To efficiently serve this trend, several server-side and cache-side fragment-based techniques, which exploit reuse of Web pages at the sub-document (also known as fragment) level, have been proposed. Most of these techniques do not focus on the creation of the fragmented content from existing dynamic content. Also, existing caching techniques do not support fragment movement across the document, a common behavior in dynamic Web content. This paper presents two proposals that we have suggested to solve these problems. The first, DyCA, a dynamic content adapter, takes original dynamic Web content and converts it to fragment-enabled content. Thus the dynamic parts of the document are separated into separate fragments from the static template of the document. This is dependent on our proposed keyword-based fragment detection approach that uses predefined keywords to find these fragments and to split them out of the core document. Our second proposal, an augmentation to the ESI standard, allows splitting the information of the position of each fragment in the template from the template data itself by using a mapping table. Using this, a fragment enabled cache can have a more fine grained level of identifying fragments independent of their location on the template, which enables it to take into account fragment behaviors such as fragment movement. We used the content taken from three real Web sites to achieve a detailed performance evaluation of our proposals. Our results show that our keyword-based approach for fragment detection and extraction provides us with cacheable fragments that, when combined with our proposed mapping table augmentation, can provide significant advantages for fragment-based Web caching of existing dynamic Web content.
Downloads
References
Akamai Technologies Inc., http://www.akamai.com/.
Apache HTTP Server Project, http://httpd.apache.org.
A. Arasu and H. Garcia-Molina. Extracting structured data from web pages. Proc. of ACM
SIGMOD’03, pp. 337-348, June 2003.
A. Awadallah and M. Rosenblum. The vMatrix: A network of virtual machine monitors for
dynamic content distribution. Proc. of the 7th International Workshop on Web Caching and
Content Distribution (WCW’02), Aug. 2002.
Z. Bar-Yossef and S. Rajagopalan. Template detection via data mining and its applications. Proc.
of the 11th International World Wide Web Conference (2002), pp. 580-591, May 2002.
R. Belotti et al. Interplay of content and context. Proceedings of the 4th International Conference
on Web Engineering (ICWE’04), pp. 187-200, July 2004.
L. Bent, M. Rabinovich, G. Voelker, and Z. Xiao. Characterization of a large web site population
with implications for content delivery. Proc. of the 13th International World Wide Web Conference
(2004), pp. 522-533, May 2004.
D. Butler and L. Liu. A Fully Automated Object Extraction System for the World Wide Web.
Proceedings of ICDCS-2001, 2001.
R. Caceres, F. Douglis, A. Feldmann, G. Glass, and M. Rabinovich. Web proxy caching: The devil
is in the details. Proceedings of ACM SIGMETRICS Internet Server Performance Workshop,
June 1998, http://www.douglis.org/fred/work/papers/wisp98.ps.
P. Cao, J. Zhang, and K. Beach. Active cache: Caching dynamic contents on the web. Proc.
of IFIP Int’l Conf. Dist. Sys. Platforms and Open Dist. Processing, pp. 373-388, 1998, http:
//www.cs.wisc.edu/~cao/papers/active-cache.ps.
S. Ceri, P. Dolog, M. Matera, and W. Nejdl. Model-driven design of web applications. Proceedings
of the 4th International Conference on Web Engineering (ICWE’04), pp. 201-214, July 2004.
J. Challenger, A. Iyengar, and P. Dantzig. A scalable system for consistently caching dynamic
web data. Proc. of IEEE Conference on Computer Communications (INFOCOM’99), Mar. 1999.
J. Challenger, A. Iyengar, K. Witting, C. Ferstat, and P. Reed. A publishing system for efficiently
creating dynamic web content. Proc. of IEEE Conference on Computer Communications
(INFOCOM’00), Mar. 2000.
A. Datta, K. Dutta, H. Thomas, D. VanderMeer, and K. Ramamritham. Accelerating dynamic
web content generation. IEEE Internet Computing 6(5):26–35, September/October 2002.
J. Dilley, B. Maggs, J. Parikh, H. Prokop, R. Sitaraman, and B. Weihl. Globally distributed
content delivery. IEEE Internet Computing Magazine 6(5):50–58, September/October 2002.
F. Douglis, A. Haro, and M. Rabinovich. HPP:HTML macro-pre-processing to support dynamic
document caching. Proc. of the 1st USENIX Symposium on Internet Technologies and Systems
(USITS’97), pp. 83-94, Dec. 1997, http://www.douglis.org/fred/work/papers/hpp.pdf.
A. Feldmann, R. Caceres, F. Douglis, G. Glass, and M. Rabinovich. Performance of web proxy
caching in heterogeneous bandwidth environments. Proc. of IEEE Conference on Computer
Communications (INFOCOM’99), pp. 107-116, Mar. 1999, http://www.douglis.org/fred/work/
papers/hetproxcache.pdf.
M. Gaedke and J. Rehse. Supporting compositional reuse in component-based web engineering.
Proceedings of the ACM Symposium on Applied Computing, pp. 101-106, 2000.
L. Gao, M. Dahlin, A. Nayate, and A. Iyengar. Application specific data replication for edge
services. Proc. of the 12th International World Wide Web Conference (2003), May 2003.
X. Gu et al. Visual based content understanding towards web adaptation. Proceedings of AH-2002,
IBM Corp. Websphere platform, http://www.ibm.com/websphere.
Indiatimes web site, http://www.indiatimes.com.
Jigsaw Project, http://www.w3.org/Jigsaw.
A. Kiryakov et al. Semantic annotation, indexing, and retrieval. Journal of Web Semantics 2(1),
B. Krishnamurthy, C. Wills, and Y. Zhang. On the use and performance of content distribution
networks. Proceedings of SIGCOMM IMW 2001, pp. 169-182, Nov. 2001, http://www.research.
att.com/~bala/papers/imw01-abcd.pdf.
U. Manber, A. Patel, and J. Robison. Experience with personalization on Yahoo! Communications
of ACM 43(8):35–39, Aug. 2000.
V. Mastoli, V. Desai, and W. Shi. SEE: a service execution environment for edge services. Proceedings
of the 3rd IEEE Workshop on Internet Applications (WIAPP’03), pp. 61-65, June 2003.
M. Mikhailov and C. E. Wills. Change and relationship-driven content caching, distribution
and assembly. Tech. Rep. WPI-CS-TR-01-03, Computer Science Department, WPI, Mar. 2001,
http://www.cs.wpi.edu/~cew/papers/tr01-03.pdf.
J. C. Mogul, F. Douglis, a. Feldmann, and B. Krishnamurthy. Potential Benefits of Delta-Encoding
and Data Compression for HTTP. Proc. of the 13th ACM SIGCOMM’97, pp. 181-194, Sept. 1997,
http://www.douglis.org/fred/work/papers/sigcomm97.pdf.
A. Myers, J. Chuang, U. Hengartner, Y. Xie, W. Zhang, and H. Zhang. A secure and publishercentric
web caching infrastructure. Proc. of IEEE Conference on Computer Communications
(INFOCOM’01), Apr. 2001.
M. Naaman, H. Garcia-Molina, and A. Paepcke. Evaluation of esi and class-based delta encoding.
Proc. of the 8th International Workshop on Web Caching and Content Distribution (WCW’03),
Sept. 2003.
Nytimes web site, http://www.nytimes.com.
M. Rabinovich, Z. Xiao, and A. Aggarwal. Computing on the edge: A platform for replicating
internet applications. Proc. of the 8th International Workshop on Web Caching and Content
Distribution (WCW’03), Sept. 2003.
M. Rabinovich, Z. Xiao, F. Douglis, and C. Kamanek. Moving edge side includes to the real
edge – the clients. Proc. of the 4th USENIX Symposium on Internet Technologies and Systems
(USITS’03), Mar. 2003.
L. Ramaswamy, A. Iyengar, L. Liu, and F. Douglis. Automatic detection of fragments in dynamically
generated web pages. Proc. of the 13th International World Wide Web Conference (2004),
pp. 443-454, May 2004.
J. Ravi, W. Shi, and C. Xu. Personalized email management at network edges. IEEE Internet
Computing 9(2), March/April 2005.
W. Shi, E. Collins, and V. Karamcheti. DYCE: A synthetic dynamic web content emulator. Poster
Proc. of 11th International World Wide Web Conference, May 2002, http://www.cs.wayne.edu/
~weisong/papers/dyce.pdf.
W. Shi, E. Collins, and V. Karamcheti. Modeling object characteristics of dynamic web content.
Journal of Parallel and Distributed Computing 63(10):963–980, Oct. 2003.
W. Shi and V. Karamcheti. CONCA: An architecture for consistent nomadic content access.
Workshop on Cache, Coherence, and Consistency(WC3’01), June 2001, http://www.cs.wayne.
edu/~weisong/papers/wc301.pdf.
Slashdot web site, http://www.slashdot.com.
M. Taguchi et al. Comparison of two approaches for automatic construction of web applications:
Annotation approach and diagram approach. Proceedings of the 4th International Conference on
Web Engineering (ICWE’04), pp. 230-243, July 2004.
M. Tsimelzon, B.Weihl, and L. Jacobs. ESI language specification 1.0, 2000, http://www.esi.org.
C. E. Wills and M. Mikhailov. Towards a better understanding of web resources and server
responses for improved caching. Proc. of the 8th International World Wide Web Conference
(1999), May 1999.
C. E. Wills and M. Mikhailov. Studying the impact of more complete server information on
web caching. Proc. of the 5th International Workshop on Web Caching and Content Distribution
(WCW’00), 2000.
A. Wolman, G. M. Voelker, N. Sharma, N. Cardwell, A. Karlin, and H. M. Levy. On the scale
and performance of cooperative web proxy caching. Proc. of 17th ACM Symposium on Operating
Systems Principles (SOSP), pp. 16-31, Dec. 1999.
W3C XSL Working Group, http://www.w3.org/Style/XSL/.
C. Yuan, Y. Chen, and Z. Zhang. Evaluation of edge caching/offloading for dynamic content
delivery. Proc. of the 12th International World Wide Web Conference (2003), May 2003.
C. Yuan, Z. Hua, and Z. Zhang. Proxy+: Simple proxy augmentation for dynamic content
processing. Proc. of the 8th International Workshop on Web Caching and Content Distribution
(WCW’03), Sept. 2003.
H. Zhu and T. Yang. Class-based cache management for dynamic web content. Proc. of IEEE Conference
on Computer Communications (INFOCOM’01), Apr. 2001, http://www.cs.ucsb.edu/
projects/swala/cache2001.ps.
Z. Zhu, Y. Mao, and W. Shi. Workload characterization of uncacheable web content. Proceedings
of the 4th International Conference on Web Engineering (ICWE’04), pp. 391-395, July 2004.