INVESTIGATING THE DISTRIBUTIONAL PROPERTY OF THE SESSION WORKLOAD

Authors

  • JAMES MILLER University of Alberta, Canada
  • TOAN HUYNH University of Alberta, Canada

Keywords:

Web session length, Session workload property, Web log analysis

Abstract

Companies now rely on the World Wide Web for communication with their customers. As reliance on web servers grows, the need for companies to better understand the workload placed upon these servers also increases. The session workload unit is a popular unit of measurement used to analyze recorded information from server logs. In fact, many web applications, from shopping carts to online banking systems, require session information to function correctly. Web data mining is also dependent on session workload information. However, the distributional properties of this session workload are not understood. Whether the session workload can be described as a short-tailed or heavy-tailed distribution is a fundamental question for the investigation of the session workload unit. This paper empirically explores claims that the session workload can be described using a heavytailed distribution. The paper concludes that, for the samples used in this paper, a method to accurately determine whether the session workload is drawn from a heavy-tailed distribution does not exist. Hence, the conclusion that they are drawn from such a distribution cannot be made.

 

Downloads

Download data is not yet available.

References

Arlitt, M., Jin, T., A workload characterization study of the 1998 World Cup Web site, IEEE

Network, 14(3), pp30-37, 2000.

Arlitt, M. F. and Williamson, C. L., Internet Web servers: workload characterization and

performance implications. IEEE/ACM Transactions on Networking, Vol.5(5), pp.631-645,

Arlitt, M., Friedrich, R., and Jin, T., Workload characterization of a Web proxy in a cable

modem environment, ACM Sigmetrics Performance Evaluation Review, Vol.27(2), pp25 –

, 1998.

Barford, P., and Crovella, M. E., Generating representative Web workloads for network and

server performance evaluation, Performance SIGMETRICS ’98, pp151-160, 1998.

Barford, P., Bestavros, A., Bradley, A., and Crovella, M., Changes in Web client access

patterns: Characteristics and caching implications. World Wide Web: Special Issue on

Characterization and Performance Evaluation, Vol.2, pp15-28, 1999.

Berendt, B., Mobasher, B., Spiliopoulou, M., Wiltshire, J., Measuring the accuracy of

sessionizers for web usage analysis. Proceedings of the workshop on web mining at the first

SIAM international conference on data mining, pp. 7-14, 2001.

Brockwell, P.; Davis, R., Time Series: theory and Methods, Springer-Verlag, 1991.

Catledge, L.D., Pitkow, J.E., Characterizing browsing strategies in the World-Wide Web,

Proceedings of the Third International World-Wide Web conference on Technology, tools and

applications, pp.1065-1073, 1995.

Chen, Y-T., On the Robustness of Ljung-Box and McLeod-Li Q tests: a simulation study,

Economics Bulletin, Vol. 3(17), pp. 1 – 10, 2002.

Cherkasova, L., Phaal, P., Session-Based Admission Control: A Mechanism for Peak Load

Management of Commercial Web Sites, Transactions on Computers, 51(6), pp. 669-685,

Crovella, M.E., Bestavros, A., Self-Similarity in Word Wide Web Traffic: Evidence and

Possible Causes, IEEE/ACM Transactions on Networking, Vol. 5(6), pp. 835 – 846, 1997.

Davis, R.; Resnick, S., Limit theory for the sample covariance and correlation functions of

moving averages, Annuals of Statistics, Vol. 13, pp. 179 – 195, 1985.

Downey, A.B., Evidence for Long-tailed distributions in the Internet, Proceedings of the 1st

ACM SIGCOMM Workshop on Internet Measurement, pp. 229 – 241, 2001.

Downey A.B., The structural cause of fie size distributions, Proceedings of the IEE/ACM

International Symposium on Modeling, Analysis, and Simulation of Computer and

Telecommunication Systems, pp. 361 – 370, 2001.

Downey, A.B., Lognormal and Pareto Distributions in the Internet, Computer

Communications, Vol. 28(7), pp. 790-801, 2005.

Eirinaki, M., Vazirgiannis, M., Web mining for web personalization, ACM Transactions on

Internet Technology, 3(1), pp. 1-27, 2003.

Feigen, P.D.; Resnick, S.I., Pitfalls of fitting autoregressive models for heavy-tailed time

series, Extremes, Vol. 1(4), pp. 391 – 422, 1999.

Figueiredo, D.R., Jiu, B., Feldmann, A., Misra, V., Towsley, D. Willinger, W., On TCP and

self-similar traffic, Performance Evaluation, Vol. 61, pp. 129 – 141, 2005.

Fisher, N.I., Graphical Methods in Nonparametric Statistics: A Review and Annotated

Bibliography, International Statistical Review, 51, 25-58, 1983.

Gabaix, X., Zipf’s law for cities: an explanation, Quarterly Journal of Economics, Vol.

(3), pp. 739 – 767, 1999.

Goldstein, M.L., Morris, S.A., Yen, G.G., Problems with fitting to the power-law distribution,

European Physics Journal B, Vol. 41, pp. 255- 258, 2004.

Gong, W. Liu, Y. Misra, V. Towsley, D., On the tails of web file size distributions,

Proceedings of the 39th Allerton Conference on Communication, Control and Computing,

Goševa-Popstojanova, K., Mazimdar, S., and Singh, A., “Empirical Study of Session-based

Workload and Reliability for Web Servers”, 15th IEEE International Symposium on Software

Reliability, pp. 403-414, 2004.

Goševa-Popstojanova, K., Singh, A.D., Mazimdar, S., Li, F., Empirical Characterization of

Session–Based Workload and Reliability for Web Servers, Empirical Software Engineering,

Springer Netherlands, Vol. 11(1), pp. 71-117, 2006(a).

Goševa-Popstojanova, K., Li, F., Wang, X., Sangle, A., A Contribution Towards Solving the

Web Workload Puzzle, International Conference on Dependable Systems and Networks

(DSN'06), pp. 505-516, 2006(b).

He, D., and Goker, A., Detecting session boundaries from Web user logs. Proceedings of the

nd Annual Colloquium on Information Retrieval Research, pp.57-66, British Computer

Society, 2000.

Hernández-Campos, F., Marron, J. S., Samorodnitsky, G., and Smith, F. D., Variable heavy

tails in Internet traffic. Performance Evaluation, Vol. 58(2+3), pp. 261-284, 2004.

Hill, B., A simple approach to inference about the tail of a distribution, Annuals of Statistics,

Vol. 3, pp. 1163 – 1774, 1975.

Huntington, P., Nicholas, D., Jamali, H.R., Website usage metrics: A re-assessment of session

data. Information Processing & Management. Vol. 44., pp. 358-372, 2008.

Huynh, T., Miller, J., A Formal Model for the Session Timeout Threshold. Journal of

Information Processing & Management. In Print.

Jansen, D.W. and de Vries, C.G., On the frequency of large stock returns: putting booms and

busts into perspective, Review of Economics and Statistics, Vol. 73, pp. 18 – 24, 1991.

Jansen, B.J., Spink, A., An Analysis of Web Documents Retrieved and Viewed, The 4th

International Conference on Internet Computing, pp.65-69, 2003.

Kristol, D.M., and Montulli, L., HTTP State Management Mechanism, RFC 2965

(http://tools.ietf.org/html/rfc2965), October 2000.

Ljung, G. M. and Box, G. E. P., "On a measure of lack of fit in time series models."

Biometrika 65, pp. 553-564, 1978.

Mahoui, M., Cunningham, S.J., A comparative transaction log analysis of two computing

collections. Lecture Notes in Computer Science. Vol 1923, pp.418-423, 2000.

Mat-Hassan, M., Levene, M., Associating search and navigation behavior through log

analysis. Journal of the American Society for Information Science and Technology, 56(9),

pp.913-934, 2005.

Mobasher, B., Cooley, R., Srivastava, J., Automatic personalization based on Web usage

mining, Communications of the ACM, 43(8) pp. 142-151, 2000.

Mitzenmacher, M., Dynamic Models for File Sizes and Double Pareto Distributions, Internet

Mathematics, Vol 1(3), pp. 305 – 333, 2003.

Nicholas, D., Huntington, P., Lievesley, N., Wasti, A., Evaluating consumer Web site logs:

Case study The Times/Sunday Times Web site. Journal of Information Science, 26(6), pp.

-411, 2000.

Nicholas, D., Huntington, P., Jamali, H.R., Watkinson, A., What deep log analysis tells us

about the impact of big deal, case study OhioLink. Journal of Documentation, 62(4), pp. 482-

2006.

Nicholas, D., Huntington, P., Jamali, H.R., Watkinson, A., The information seeking

behaviour of the users of digital scholarly journals. Information Processing and Management,

(5), pp. 1345-1365. 2006.

Pankratz, A., Forecasting with univariate Box-Jenkins models: Concepts and cases. New

York: John Wiley and Sons, 1983.

Reed, J.W., Jorgensen, M., The Double Pareto-Lognormal Distribution—A New Parametric

Model for Size Distributions, Communications in Statistics – Theory and Methods, pp. 1733 –

, 2004.

Resnick, S.I., Heavy Tail modeling and teletraffic data, The Annuals of Statistics, Vol. 25(5),

pp 1805 – 1849, 1997.

Rezaul, K.M. & Grout, V., A Comparison of Methods for Estimating the Tail Index of

Heavy-tailed Internet Traffic, Proceedings of the 2nd International Joint e-Conference on

Computer, Information, and Systems Sciences, and Engineering, 2006.

Spiliopoulou, M., Mobasher, B., Berendt, B., Nakagawa, M., A framework for the evaluation

of session reconstruction heuristics in Web usage analysis. INFORMS Journal of Computing,

(2), pp. 171-190, 2003.

Tian, J., Rudraraju, S., Li, Z., Evaluating Web Software Reliability Based on Workload and

Failure Data Extracted from Server Logs, IEEE Transactions on Software Engineering, Vol.

(11), pp.754-769, 2004.

Tsourti, Z., and Panaretos, J., "Extreme Value Index Estimators and Smoothing Alternatives:

Review and Simulation Comparison”, Athens University of Economics and Business,

Statistics Technical Report No. 149, 2001.

Zipf, G.K., Human Behavior and the principle of least effort, Addison-Wesley, 1949.

Downloads

Published

2010-02-26

How to Cite

MILLER, J. ., & HUYNH, T. . (2010). INVESTIGATING THE DISTRIBUTIONAL PROPERTY OF THE SESSION WORKLOAD. Journal of Web Engineering, 9(1), 25–47. Retrieved from https://journals.riverpublishers.com/index.php/JWE/article/view/4027

Issue

Section

Articles