ACCELERATING DYNAMIC WEB CONTENT DELIVERY USING KEYWORD-BASED FRAGMENT DETECTION

Authors

  • DANIEL BRODIE Department of Computer Science, Wayne State University Detroit, Michigan, 48202, USA
  • AMRISH GUPTA Department of Computer Science, Wayne State University Detroit, Michigan, 48202, USA
  • WEISONG SHI Department of Computer Science, Wayne State University Detroit, Michigan, 48202, USA

Keywords:

Object Characteristics, Dynamic Web Content, Dynamic Content Adapter, Fragment-based Caching

Abstract

Recent advances in Web engineering have enabled the rapid growth of dynamic Web services such as Web-based email, online banking, online shopping and entertainment. We envision that finding an effective way to deliver these dynamic Web services and understanding the relationship between Web application design and delivery are two important Web engineering issues, and have not been seriously considered in the community. In this paper, we intend to tackle the first problem and pave the way for solving the second problem in the future . To efficiently serve this trend, several server-side and cache-side fragment-based techniques, which exploit reuse of Web pages at the sub-document (also known as fragment) level, have been proposed. Most of these techniques do not focus on the creation of the fragmented content from existing dynamic content. Also, existing caching techniques do not support fragment movement across the document, a common behavior in dynamic Web content. This paper presents two proposals that we have suggested to solve these problems. The first, DyCA, a dynamic content adapter, takes original dynamic Web content and converts it to fragment-enabled content. Thus the dynamic parts of the document are separated into separate fragments from the static template of the document. This is dependent on our proposed keyword-based fragment detection approach that uses predefined keywords to find these fragments and to split them out of the core document. Our second proposal, an augmentation to the ESI standard, allows splitting the information of the position of each fragment in the template from the template data itself by using a mapping table. Using this, a fragment enabled cache can have a more fine grained level of identifying fragments independent of their location on the template, which enables it to take into account fragment behaviors such as fragment movement. We used the content taken from three real Web sites to achieve a detailed performance evaluation of our proposals. Our results show that our keyword-based approach for fragment detection and extraction provides us with cacheable fragments that, when combined with our proposed mapping table augmentation, can provide significant advantages for fragment-based Web caching of existing dynamic Web content.

 

Downloads

Download data is not yet available.

References

Akamai Technologies Inc., http://www.akamai.com/.

Apache HTTP Server Project, http://httpd.apache.org.

A. Arasu and H. Garcia-Molina. Extracting structured data from web pages. Proc. of ACM

SIGMOD’03, pp. 337-348, June 2003.

A. Awadallah and M. Rosenblum. The vMatrix: A network of virtual machine monitors for

dynamic content distribution. Proc. of the 7th International Workshop on Web Caching and

Content Distribution (WCW’02), Aug. 2002.

Z. Bar-Yossef and S. Rajagopalan. Template detection via data mining and its applications. Proc.

of the 11th International World Wide Web Conference (2002), pp. 580-591, May 2002.

R. Belotti et al. Interplay of content and context. Proceedings of the 4th International Conference

on Web Engineering (ICWE’04), pp. 187-200, July 2004.

L. Bent, M. Rabinovich, G. Voelker, and Z. Xiao. Characterization of a large web site population

with implications for content delivery. Proc. of the 13th International World Wide Web Conference

(2004), pp. 522-533, May 2004.

D. Butler and L. Liu. A Fully Automated Object Extraction System for the World Wide Web.

Proceedings of ICDCS-2001, 2001.

R. Caceres, F. Douglis, A. Feldmann, G. Glass, and M. Rabinovich. Web proxy caching: The devil

is in the details. Proceedings of ACM SIGMETRICS Internet Server Performance Workshop,

June 1998, http://www.douglis.org/fred/work/papers/wisp98.ps.

P. Cao, J. Zhang, and K. Beach. Active cache: Caching dynamic contents on the web. Proc.

of IFIP Int’l Conf. Dist. Sys. Platforms and Open Dist. Processing, pp. 373-388, 1998, http:

//www.cs.wisc.edu/~cao/papers/active-cache.ps.

S. Ceri, P. Dolog, M. Matera, and W. Nejdl. Model-driven design of web applications. Proceedings

of the 4th International Conference on Web Engineering (ICWE’04), pp. 201-214, July 2004.

J. Challenger, A. Iyengar, and P. Dantzig. A scalable system for consistently caching dynamic

web data. Proc. of IEEE Conference on Computer Communications (INFOCOM’99), Mar. 1999.

J. Challenger, A. Iyengar, K. Witting, C. Ferstat, and P. Reed. A publishing system for efficiently

creating dynamic web content. Proc. of IEEE Conference on Computer Communications

(INFOCOM’00), Mar. 2000.

A. Datta, K. Dutta, H. Thomas, D. VanderMeer, and K. Ramamritham. Accelerating dynamic

web content generation. IEEE Internet Computing 6(5):26–35, September/October 2002.

J. Dilley, B. Maggs, J. Parikh, H. Prokop, R. Sitaraman, and B. Weihl. Globally distributed

content delivery. IEEE Internet Computing Magazine 6(5):50–58, September/October 2002.

F. Douglis, A. Haro, and M. Rabinovich. HPP:HTML macro-pre-processing to support dynamic

document caching. Proc. of the 1st USENIX Symposium on Internet Technologies and Systems

(USITS’97), pp. 83-94, Dec. 1997, http://www.douglis.org/fred/work/papers/hpp.pdf.

A. Feldmann, R. Caceres, F. Douglis, G. Glass, and M. Rabinovich. Performance of web proxy

caching in heterogeneous bandwidth environments. Proc. of IEEE Conference on Computer

Communications (INFOCOM’99), pp. 107-116, Mar. 1999, http://www.douglis.org/fred/work/

papers/hetproxcache.pdf.

M. Gaedke and J. Rehse. Supporting compositional reuse in component-based web engineering.

Proceedings of the ACM Symposium on Applied Computing, pp. 101-106, 2000.

L. Gao, M. Dahlin, A. Nayate, and A. Iyengar. Application specific data replication for edge

services. Proc. of the 12th International World Wide Web Conference (2003), May 2003.

X. Gu et al. Visual based content understanding towards web adaptation. Proceedings of AH-2002,

IBM Corp. Websphere platform, http://www.ibm.com/websphere.

Indiatimes web site, http://www.indiatimes.com.

Jigsaw Project, http://www.w3.org/Jigsaw.

A. Kiryakov et al. Semantic annotation, indexing, and retrieval. Journal of Web Semantics 2(1),

B. Krishnamurthy, C. Wills, and Y. Zhang. On the use and performance of content distribution

networks. Proceedings of SIGCOMM IMW 2001, pp. 169-182, Nov. 2001, http://www.research.

att.com/~bala/papers/imw01-abcd.pdf.

U. Manber, A. Patel, and J. Robison. Experience with personalization on Yahoo! Communications

of ACM 43(8):35–39, Aug. 2000.

V. Mastoli, V. Desai, and W. Shi. SEE: a service execution environment for edge services. Proceedings

of the 3rd IEEE Workshop on Internet Applications (WIAPP’03), pp. 61-65, June 2003.

M. Mikhailov and C. E. Wills. Change and relationship-driven content caching, distribution

and assembly. Tech. Rep. WPI-CS-TR-01-03, Computer Science Department, WPI, Mar. 2001,

http://www.cs.wpi.edu/~cew/papers/tr01-03.pdf.

J. C. Mogul, F. Douglis, a. Feldmann, and B. Krishnamurthy. Potential Benefits of Delta-Encoding

and Data Compression for HTTP. Proc. of the 13th ACM SIGCOMM’97, pp. 181-194, Sept. 1997,

http://www.douglis.org/fred/work/papers/sigcomm97.pdf.

A. Myers, J. Chuang, U. Hengartner, Y. Xie, W. Zhang, and H. Zhang. A secure and publishercentric

web caching infrastructure. Proc. of IEEE Conference on Computer Communications

(INFOCOM’01), Apr. 2001.

M. Naaman, H. Garcia-Molina, and A. Paepcke. Evaluation of esi and class-based delta encoding.

Proc. of the 8th International Workshop on Web Caching and Content Distribution (WCW’03),

Sept. 2003.

Nytimes web site, http://www.nytimes.com.

M. Rabinovich, Z. Xiao, and A. Aggarwal. Computing on the edge: A platform for replicating

internet applications. Proc. of the 8th International Workshop on Web Caching and Content

Distribution (WCW’03), Sept. 2003.

M. Rabinovich, Z. Xiao, F. Douglis, and C. Kamanek. Moving edge side includes to the real

edge – the clients. Proc. of the 4th USENIX Symposium on Internet Technologies and Systems

(USITS’03), Mar. 2003.

L. Ramaswamy, A. Iyengar, L. Liu, and F. Douglis. Automatic detection of fragments in dynamically

generated web pages. Proc. of the 13th International World Wide Web Conference (2004),

pp. 443-454, May 2004.

J. Ravi, W. Shi, and C. Xu. Personalized email management at network edges. IEEE Internet

Computing 9(2), March/April 2005.

W. Shi, E. Collins, and V. Karamcheti. DYCE: A synthetic dynamic web content emulator. Poster

Proc. of 11th International World Wide Web Conference, May 2002, http://www.cs.wayne.edu/

~weisong/papers/dyce.pdf.

W. Shi, E. Collins, and V. Karamcheti. Modeling object characteristics of dynamic web content.

Journal of Parallel and Distributed Computing 63(10):963–980, Oct. 2003.

W. Shi and V. Karamcheti. CONCA: An architecture for consistent nomadic content access.

Workshop on Cache, Coherence, and Consistency(WC3’01), June 2001, http://www.cs.wayne.

edu/~weisong/papers/wc301.pdf.

Slashdot web site, http://www.slashdot.com.

M. Taguchi et al. Comparison of two approaches for automatic construction of web applications:

Annotation approach and diagram approach. Proceedings of the 4th International Conference on

Web Engineering (ICWE’04), pp. 230-243, July 2004.

M. Tsimelzon, B.Weihl, and L. Jacobs. ESI language specification 1.0, 2000, http://www.esi.org.

C. E. Wills and M. Mikhailov. Towards a better understanding of web resources and server

responses for improved caching. Proc. of the 8th International World Wide Web Conference

(1999), May 1999.

C. E. Wills and M. Mikhailov. Studying the impact of more complete server information on

web caching. Proc. of the 5th International Workshop on Web Caching and Content Distribution

(WCW’00), 2000.

A. Wolman, G. M. Voelker, N. Sharma, N. Cardwell, A. Karlin, and H. M. Levy. On the scale

and performance of cooperative web proxy caching. Proc. of 17th ACM Symposium on Operating

Systems Principles (SOSP), pp. 16-31, Dec. 1999.

W3C XSL Working Group, http://www.w3.org/Style/XSL/.

C. Yuan, Y. Chen, and Z. Zhang. Evaluation of edge caching/offloading for dynamic content

delivery. Proc. of the 12th International World Wide Web Conference (2003), May 2003.

C. Yuan, Z. Hua, and Z. Zhang. Proxy+: Simple proxy augmentation for dynamic content

processing. Proc. of the 8th International Workshop on Web Caching and Content Distribution

(WCW’03), Sept. 2003.

H. Zhu and T. Yang. Class-based cache management for dynamic web content. Proc. of IEEE Conference

on Computer Communications (INFOCOM’01), Apr. 2001, http://www.cs.ucsb.edu/

projects/swala/cache2001.ps.

Z. Zhu, Y. Mao, and W. Shi. Workload characterization of uncacheable web content. Proceedings

of the 4th International Conference on Web Engineering (ICWE’04), pp. 391-395, July 2004.

Downloads

Published

2005-12-30

How to Cite

BRODIE, D. ., GUPTA, A., & SHI, W. . (2005). ACCELERATING DYNAMIC WEB CONTENT DELIVERY USING KEYWORD-BASED FRAGMENT DETECTION. Journal of Web Engineering, 4(1), 079–099. Retrieved from https://journals.riverpublishers.com/index.php/JWE/article/view/4301

Issue

Section

Articles