BAYESIAN-BASED TYPE DISCRIMINATION OF WEB EVENTS
Keywords:topic detection and tracking, event classification, Bayesian model, web mining
There are a large number of web events emerging on the web and attracting people’s attention every day, and it is of great interest and significance to distinguish the different types of these web events in practice. For example, the distinguished emergent web events should be paid more attentions by the departments of the government to save lives and damages or by news websites to increase their hit-rates using limited resources. However, how to efficiently distinguish the types of web events remains a challenge issue due to the seldom efforts paid to this issue in the community. In this paper, we conduct a thorough consideration on this problem and then propose an innovative Bayesian-based model to distinguish the different types of web events. To be specific, all web events are firstly assumed within three types whose formal definitions are given by considering their properties. Aiming to sufficiently describe and distinguish three types web events, a set of specially designed features are then extracted from the volume and the content of web events. Finally, a Bayesian-based model is proposed based on the designed features. The experimental results demonstrate the capability of the proposed model to distinguish types of web events, and the comparisons with other state-of-the-art classifiers also show the efficiency of the proposed model.
J. Allan, R. Papka, and V. Lavrenko, “On-line New Event Detection and Tracking”, Proceedings
of the 21st International ACM SIGIR Conference on Research and Development in Information
Retrieval, Melbourne, Australia,1998, pp. 37-45.
Y.M. Yang, J.G. Carbonell, R.F. Brown, T. Pierce, B.T.Archibald and X. Liu, “Learning
Approaches for Detecting and Tracking News Events”, IEEE Intelligent System, 1999,14(4), pp.
J. Allan, J.G. Carbonell, G. Doddington, J. Yamron and Y.Yang, “Topic Detection and Tracking
Pilot Study: Final Report”, Proceedings of the DARPA Broadcast News Transcription and
Understanding Workshop, Virginia, USA, 1998, pp. 194-218.
J. Bengel, S. Gauch, E. Mittur and R. Vijayaraghavan. “Chat track: Chat room topic detection
using classification”. In 2nd Symposium on Intelligence and Security Informatics, Tucson,
Arizona, 2004, pp. 266-277.
T. Brants, F. Chen and A. Farahat. “A System for New Event Detection”. In Proc. of ACM
SIGIR‘03, 2003, 330-337.
H.Liu, “Internet public opinion hotspot detection and analysis based on K-means and SVM
algorithm”, 2010 International Conference of Information Science and Management Engineering,
Q.Guan,S.Ye,etc. “Research and Design of Internet Public OpinionAnalysis System”, 2009 IITA
International Conference on Services Science, Management and Engineering, pp.173-177.
X.Li, “The Design and Implementation of Internet Public Opinion Monitoring and Analysis
System”, 2nd International Conference on e-Business and Information System Security, 2010,
Griffiths, T. L., Kemp, C., &Tenenbaum, J. B. (in press). Bayesian models of cognition. In R. Sun
(Ed.),Cambridge handbook of computational cognitive modeling. Cambridge: Cambridge
C. Kemp et al. Learning causal schemata. S. McNamara, J.G. Trafton (Eds.), Proceedings of the
Twenty-Ninth Annual Conference of the Cognitive Science Society, Cognitive Science Society
(2007), pp. 389–394.
Thomas L. Griffiths, Nick Chater, Charles Kemp, Amy Perfors, Joshua B.Tenenbaum.
Probabilistic models of cognition: exploring representations and inductive biases. Trends in
Cognitive Sciences, Volume 14, Issue 8, August 2010, Pages 357-364.
L. Fei-Fei and P. Perona. A Bayesian hierarchical model for learning natural scene categories. In
Proc. CVPR, 2005.
S. Aksoy, K. Koperski, C. Tusk, G. Marchisio, J.C. Tilton. Learning Bayesian classifiers for scene
classification with a visual grammar. IEEE Transactions on Geoscience and Remote Sensing, 43
(3) (2005), pp. 581–589.
S. Paek, S.-F. Chang, A knowledge engineering approach for image classification based on
probabilistic reasoning systems, in: IEEE International Conference on Multimedia and Expo, vol.
II, New York, 2000, pp. 1133–1136.
R. Schwartz, T. Imai, L. Nguyen, and J. Makhoul. “AMaximum Likelihood Model for Topic
Classification of Broadcast News.” Euro speech ’97, Rhodes, Greece. September, 1997.
J. Allan.Topic Detection and Tracking: Event-Based Information Organization. Norwell, MA:
Korb, K., & Nicholson, A. (2003). Bayesian artificial intelligence. Boca Raton, FL: Chapman and
Ge, X. & Smyth P. (2001). Segmental Semi-Markov Models for Endpoint Detection in Plasma
Etching. To appear in IEEE Transactions on Semiconductor Engineering.
Lee, M. D. (2006). A hierarchical Bayesian model of human decision-making on an optimal
stopping problem. Cognitive Science, 30, 555–580.
Neal, R. M. (1993). Probabilistic inference using Markov chain Monte Carlo methods (Tech.
Rep.No. CRG-TR-93-1). University of Toronto.
Sloman, S. (2005). Causal models: How people think about the world and its alternatives. Oxford:
Oxford University Press.
Xiaochun He, Conghui Zhu , Tiejun Zhao . Research on short text classification for web forum.
Fuzzy Systems and Knowledge Discovery (FSKD), 2011 Eighth International Conference on,
Page(s): 1052 – 1056.
Yulei Zhang , Yan Dang , Hsinchun Chen . Gender Classification for Web Forums. Systems, Man
and Cybernetics, Part A: Systems and Humans, IEEE Transactions on, 41(4) (2011), pp. 668-677.
Ayyasamy, R.K.Alhashmi, S.M. ; Siew Eu-Gene .Concept based modeling approach for blog
classification using fuzzy similarity. Fuzzy Systems and Knowledge Discovery (FSKD), 2011
Eighth International Conference on, Page(s): 1007 – 1011.
Lee, K.; Palsetia, D.; Narayanan, R.; Patwary, M.M.A.; Agrawal, A.; Choudhary, A.Twitter
Trending Topic Classification. Data Mining Workshops (ICDMW), 2011 IEEE 11th International
Conference on , (2011), Page(s): 251 – 258.
On B W, Omar M, Choi G S, et al. Gathering web pages of entities with high precision[J]. Journal
of Web Engineering, 2014, 13(5-6): 378-404.
Keramati A, Jafari-Marandi R. Webpage clustering: taking the zero step—a case study of an
Iranian website[J]. Journal of Web Engineering, 2014, 13(3-4): 333-360.
Luo X, Xuan J, Liu H. Web event state prediction model: combining prior knowledge with real
time data[J]. Journal of Web Engineering, 2014, 13(5-6): 483-506.
Han X, Sun L, Zhao J. Collective entity linking in web text: a graph-based
method[C]//Proceedings of the 34th international ACM SIGIR conference on Research and
development in Information Retrieval. ACM, 2011: 765-774.
Xu Z, Chen H Y. Semantic Outbreak Power Based Evolution of Web Event in Large-Scale
Ubiquitous Contexts[J]. International Journal of Distributed Sensor Networks, 2014.
Wang X, Luo X, Liu H. Measuring the veracity of web event via uncertainty[J]. Journal of
Systems and Software, 2014.
Li Q, Lau R W H, Wah B, et al. Guest Editors' Introduction: Emerging Internet Technologies for
E-Learning[J]. Internet Computing, IEEE, 2009, 13(4): 11-17.