• QICHEN MA School of Computer Engineering and Science, Shanghai University, Shanghai, China
  • XIANGFENG LUO School of Computer Engineering and Science, Shanghai University, Shanghai, China
  • JUNYU XUAN School of Computer Engineering and Science, Shanghai University, Shanghai, China
  • HUIMIN LIU School of Computer Engineering and Science, Shanghai University, Shanghai, China


topic detection and tracking, event classification, Bayesian model, web mining


There are a large number of web events emerging on the web and attracting people’s attention every day, and it is of great interest and significance to distinguish the different types of these web events in practice. For example, the distinguished emergent web events should be paid more attentions by the departments of the government to save lives and damages or by news websites to increase their hit-rates using limited resources. However, how to efficiently distinguish the types of web events remains a challenge issue due to the seldom efforts paid to this issue in the community. In this paper, we conduct a thorough consideration on this problem and then propose an innovative Bayesian-based model to distinguish the different types of web events. To be specific, all web events are firstly assumed within three types whose formal definitions are given by considering their properties. Aiming to sufficiently describe and distinguish three types web events, a set of specially designed features are then extracted from the volume and the content of web events. Finally, a Bayesian-based model is proposed based on the designed features. The experimental results demonstrate the capability of the proposed model to distinguish types of web events, and the comparisons with other state-of-the-art classifiers also show the efficiency of the proposed model.



Download data is not yet available.


J. Allan, R. Papka, and V. Lavrenko, “On-line New Event Detection and Tracking”, Proceedings

of the 21st International ACM SIGIR Conference on Research and Development in Information

Retrieval, Melbourne, Australia,1998, pp. 37-45.

Y.M. Yang, J.G. Carbonell, R.F. Brown, T. Pierce, B.T.Archibald and X. Liu, “Learning

Approaches for Detecting and Tracking News Events”, IEEE Intelligent System, 1999,14(4), pp.


J. Allan, J.G. Carbonell, G. Doddington, J. Yamron and Y.Yang, “Topic Detection and Tracking

Pilot Study: Final Report”, Proceedings of the DARPA Broadcast News Transcription and

Understanding Workshop, Virginia, USA, 1998, pp. 194-218.

J. Bengel, S. Gauch, E. Mittur and R. Vijayaraghavan. “Chat track: Chat room topic detection

using classification”. In 2nd Symposium on Intelligence and Security Informatics, Tucson,

Arizona, 2004, pp. 266-277.

T. Brants, F. Chen and A. Farahat. “A System for New Event Detection”. In Proc. of ACM

SIGIR‘03, 2003, 330-337.

H.Liu, “Internet public opinion hotspot detection and analysis based on K-means and SVM

algorithm”, 2010 International Conference of Information Science and Management Engineering,


Q.Guan,S.Ye,etc. “Research and Design of Internet Public OpinionAnalysis System”, 2009 IITA

International Conference on Services Science, Management and Engineering, pp.173-177.

X.Li, “The Design and Implementation of Internet Public Opinion Monitoring and Analysis

System”, 2nd International Conference on e-Business and Information System Security, 2010,


Griffiths, T. L., Kemp, C., &Tenenbaum, J. B. (in press). Bayesian models of cognition. In R. Sun

(Ed.),Cambridge handbook of computational cognitive modeling. Cambridge: Cambridge

University Press.

C. Kemp et al. Learning causal schemata. S. McNamara, J.G. Trafton (Eds.), Proceedings of the

Twenty-Ninth Annual Conference of the Cognitive Science Society, Cognitive Science Society

(2007), pp. 389–394.

Thomas L. Griffiths, Nick Chater, Charles Kemp, Amy Perfors, Joshua B.Tenenbaum.

Probabilistic models of cognition: exploring representations and inductive biases. Trends in

Cognitive Sciences, Volume 14, Issue 8, August 2010, Pages 357-364.

L. Fei-Fei and P. Perona. A Bayesian hierarchical model for learning natural scene categories. In

Proc. CVPR, 2005.

S. Aksoy, K. Koperski, C. Tusk, G. Marchisio, J.C. Tilton. Learning Bayesian classifiers for scene

classification with a visual grammar. IEEE Transactions on Geoscience and Remote Sensing, 43

(3) (2005), pp. 581–589.

S. Paek, S.-F. Chang, A knowledge engineering approach for image classification based on

probabilistic reasoning systems, in: IEEE International Conference on Multimedia and Expo, vol.

II, New York, 2000, pp. 1133–1136.

R. Schwartz, T. Imai, L. Nguyen, and J. Makhoul. “AMaximum Likelihood Model for Topic

Classification of Broadcast News.” Euro speech ’97, Rhodes, Greece. September, 1997.

J. Allan.Topic Detection and Tracking: Event-Based Information Organization. Norwell, MA:

Kluwer, 2000.

Korb, K., & Nicholson, A. (2003). Bayesian artificial intelligence. Boca Raton, FL: Chapman and


Ge, X. & Smyth P. (2001). Segmental Semi-Markov Models for Endpoint Detection in Plasma

Etching. To appear in IEEE Transactions on Semiconductor Engineering.

Lee, M. D. (2006). A hierarchical Bayesian model of human decision-making on an optimal

stopping problem. Cognitive Science, 30, 555–580.

Neal, R. M. (1993). Probabilistic inference using Markov chain Monte Carlo methods (Tech.

Rep.No. CRG-TR-93-1). University of Toronto.

Sloman, S. (2005). Causal models: How people think about the world and its alternatives. Oxford:

Oxford University Press.

Xiaochun He, Conghui Zhu , Tiejun Zhao . Research on short text classification for web forum.

Fuzzy Systems and Knowledge Discovery (FSKD), 2011 Eighth International Conference on,

Page(s): 1052 – 1056.

Yulei Zhang , Yan Dang , Hsinchun Chen . Gender Classification for Web Forums. Systems, Man

and Cybernetics, Part A: Systems and Humans, IEEE Transactions on, 41(4) (2011), pp. 668-677.

Ayyasamy, R.K.Alhashmi, S.M. ; Siew Eu-Gene .Concept based modeling approach for blog

classification using fuzzy similarity. Fuzzy Systems and Knowledge Discovery (FSKD), 2011

Eighth International Conference on, Page(s): 1007 – 1011.

Lee, K.; Palsetia, D.; Narayanan, R.; Patwary, M.M.A.; Agrawal, A.; Choudhary, A.Twitter

Trending Topic Classification. Data Mining Workshops (ICDMW), 2011 IEEE 11th International

Conference on , (2011), Page(s): 251 – 258.

On B W, Omar M, Choi G S, et al. Gathering web pages of entities with high precision[J]. Journal

of Web Engineering, 2014, 13(5-6): 378-404.

Keramati A, Jafari-Marandi R. Webpage clustering: taking the zero step—a case study of an

Iranian website[J]. Journal of Web Engineering, 2014, 13(3-4): 333-360.

Luo X, Xuan J, Liu H. Web event state prediction model: combining prior knowledge with real

time data[J]. Journal of Web Engineering, 2014, 13(5-6): 483-506.

Han X, Sun L, Zhao J. Collective entity linking in web text: a graph-based

method[C]//Proceedings of the 34th international ACM SIGIR conference on Research and

development in Information Retrieval. ACM, 2011: 765-774.

Xu Z, Chen H Y. Semantic Outbreak Power Based Evolution of Web Event in Large-Scale

Ubiquitous Contexts[J]. International Journal of Distributed Sensor Networks, 2014.

Wang X, Luo X, Liu H. Measuring the veracity of web event via uncertainty[J]. Journal of

Systems and Software, 2014.

Li Q, Lau R W H, Wah B, et al. Guest Editors' Introduction: Emerging Internet Technologies for

E-Learning[J]. Internet Computing, IEEE, 2009, 13(4): 11-17.