A FRAMEWORK FOR PRODUCT DESCRIPTION CLASSIFICATION IN E-COMMERCE

Authors

  • DAMIR VANDIC Econometric Institute, Erasmus University Rotterdam P.O. Box 1738, 3000 DR Rotterdam, the Netherlands
  • FLAVIUS FRASINCAR Econometric Institute, Erasmus University Rotterdam P.O. Box 1738, 3000 DR Rotterdam, the Netherlands
  • UZAY KAYMAK Department of Industrial Engineering & Innovation Sciences, Eindhoven University of Technology P.O. Box 513, 5600 MB Eindhoven, the Netherlands

Keywords:

Product descriptions, hierarchical clustering, feature selection, e-commerce

Abstract

We propose the Hierarchical Product Classication (HPC) framework for the purpose of classifying products using a hierarchical product taxonomy. The framework uses a classication system with multiple classication nodes, each residing on a dierent level of the taxonomy. The innovative part of the framework stems from the denition of classication recipes that can be used to construct high-quality classier nodes, using the product descriptions in the most optimal way. These classier recipes are specically tailored for the e-commerce domain. The use of these classier recipes enables exible classiers that adjust to the taxonomy depth-specic characteristics of product taxonomies. Furthermore, in order to gain insight into which components are required to perform high quality product classication, we evaluate several feature selection methods and classication techniques in the context of our framework. Based on 3000 product descriptions obtained from Amazon.com, HPC achieves an overall accuracy of 76.80% for product classication. Using 110 categories from CircuitCity.com and Amazon.com, we obtain a precision of 93.61% for mapping the categories to the taxonomy of shopping.com.

Downloads

Download data is not yet available.

References

Amazon.com. AWS - Amazon Web Services, 2017. http://aws.amazon.com/.

C. M. Bishop. Pattern Recognition And Machine Learning. Springer-Verlag, 2007.

S. Chakrabarti, B. Dom, R. Agrawal, and P. Raghavan. Using Taxonomy, Discriminants,

and Signatures for Navigating in Text Databases. In Proceedings of the 23rd International

A Framework for Product Description Classi cation in E-commerce

Conference on Very Large Data Bases, pages 446{455. Morgan Kaufmann Publishers Inc.,

K. W. Church and P. Hanks. Word Association Norms, Mutual Information, and Lexicography.

Computational Linguistics, 16(1):22{29, 1990.

S. D'Alessio, K. Murray, R. Schiano, and A. Kershenbaum. The E ect of Using Hierarchical

Classi ers in Text Categorization. In Proceedings of 6th International Conference

Recherche d'Information Assistee par Ordinateur, pages 302{313, 2000.

Y. Ding, M. Korotkiy, B. Omelayenko, V. Kartseva, V. Zykov, M. Klein, E. Schulten, and

D. Fensel. GoldenBullet: Automated Classi cation of Product Data in E-commerce. In

Proceedings of the 5th International Conference on Business Information Systems, 2002.

S. Dumais and H. Chen. Hierarchical classi cation of Web content. In Proceedings of

the 23rd Annual International Conference on Research and Development in Information

Retrieval, pages 256{263. ACM, 2000.

C. Fellbaum, editor. WordNet: An Electronic Lexical Database (Language, Speech, and

Communication). The MIT Press, May 1998.

D. Fensel, Y. Ding, B. Omelayenko, E. Schulten, G. Botquin, M. Brown, and A. Flett.

Product Data Integration in B2B E-Commerce. IEEE Intelligent Systems, 16(4):54{59,

E.-H. Han, G. Karypis, and V. Kumar. Text Categorization Using Weight Adjusted

k-Nearest Neighbor Classi cation. In Proceedings of the 5th Paci c-Asia Conference on

Knowledge Discovery and Data Mining, pages 53{65. Springer-Verlag, 2001.

P.-Y. Hao, J.-H. Chiang, and Y.-K. Tu. Hierarchically svm classi cation based on support

vector clustering method and its application to document categorization. Expert Systems

with applications, 33(3):627{635, 2007.

J. Huang, J. Lu, and C. X. Ling. Comparing Naive Bayes, Decision Trees, and SVM

with AUC and Accuracy. In Data Mining, 2003. ICDM 2003. Third IEEE International

Conference on, pages 553{556. IEEE, 2003.

T. Joachims. Text Categorization with Support Vector Machines: Learning with Many

Relevant Features. In Proceedings of the European Conference on Machine Learning, pages

{142. Springer-Verlag, 1998.

Y. S. Kim, B.-J. Yum, J. Song, and S. M. Kim. Development of a recommender system

based on navigational and behavioral patterns of customers in e-commerce sites. Expert

Systems with Applications, 28(2):381{393, 2005.

D. Koller and M. Sahami. Hierarchically Classifying Documents Using Very Few Words.

In Proceedings of the 14th International Conference on Machine Learning, pages 170{178.

Morgan Kaufmann Publishers Inc., 1997.

Y.-H. Lee, P. J.-H. Hu, T.-H. Cheng, and Y.-F. Hsieh. A Cost-sensitive Technique

for Positive-Example Learning Supporting Content-Based Product Recommendations in

B-to-C E-commerce. Decision Support Systems, 53(1):245 { 256, 2012.

V. I. Levenshtein. Binary Codes Capable of Correction Deletions, Insertions, and Reversals.

Soviet Physics Doklady, 10(8):707{710, 1966.

D. D. Lewis, R. E. Schapire, J. P. Callan, and R. Papka. Training Algorithms for Linear

Text Classi ers. In Proceedings of the 19th Annual International Conference on Research

and Development in Information Retrieval, pages 298{306. ACM, 1996.

T. Li, S. Zhu, and M. Ogihara. Hierarchical Document Classi cation Using Automatically

Generated Hierarchy. Journal of Intelligent Information Systems, 29(2):211{230, 2007.

C.-F. Lin and S.-D. Wang. Fuzzy Support Vector Machines. IEEE Transactions on Neural

Networks, 13(2):464{471, 2002.

C.-H. Lin and H. Chen. An Automatic Indexing and Neural Network Approach to

Concept Retrieval and Classi cation of Multilingual (Chinese-English) Documents. IEEE

Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 26(1):75{88, Feb

A. McCallum, R. Rosenfeld, T. M. Mitchell, and A. Y. Ng. Improving Text Classi cation

by Shrinkage in a Hierarchy of Classes. In Proceedings of the 15th International Conference

on Machine Learning, pages 359{367. Morgan Kaufmann, 1998.

G. J. McLachlan. Discriminant Analysis and Statistical Pattern Recognition. Wiley, 2004.

T. Mitchell. Machine Learning. McGraw Hill, 1996.

S. Mulpuru, V. Boutan, C. Johnson, S. Wu, and L. Naparstek. Forrester Research

eCommerce Forecast, 2014 to 2019. https://goo.gl/6b1fh3, 2017.

L. J. Nederstigt, D. Vandic, and F. Frasincar. A lexical approach for taxonomy mapping.

Journal of Web Engineering, 15(1&2):84{109, 2016.

W. K. Ng, G. Yan, and E.-P. Lim. Heterogeneous Product Description in Electronic

Commerce. SIGecom Exchanges, 1(1):7{13, 2000.

N. Oza, J. Castle, and J. Stutz. Classi cation of Aeronautics System Health and Safety

Documents. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications

and Reviews, 39(6):670{680, Nov 2009.

M. F. Porter. An Algorithm for Sux Stripping. Readings in information retrieval, pages

{316, 1997.

M. E. Ruiz and P. Srinivasan. Hierarchical Text Categorization Using Neural Networks.

Information Retrieval, 5(1):87{118, 2002.

G. Salton, A. Wong, and C. S. Yang. A Vector Space Model for Automatic Indexing.

Communications of the ACM, 18(7):613{620, 1975.

M. Sasaki and K. Kita. Rule-Based Text Categorization Using Hierarchical Categories. In

Proceedings of the 1998 IEEE International Conference on Systems, Man, and Cybernetics,

volume 3, pages 2827{2830, 1998.

F. Shih and S.-S. Chen. Adaptive Document Block Segmentation and Classi cation. IEEE

Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 26(5):797{802, Oct

S. R. Singh, H. A. Murthy, and T. A. Gonsalves. Feature Selection for Text Classi cation

Based on Gini Coecient of Inequality. In Proceedings of the Fourth International

Workshop on Feature Selection in Data Mining (FSDM 2010), volume 10, pages 76{85,

M. Steinbach, G. Karypis, and V. Kumar. A Comparison of Document Clustering

Techniques. 00 034, University of Minnesota, 2000.

A. Sun and E. P. Lim. Hierarchical Text Classi cation and Evaluation. In Proceedings of

the 2001 IEEE International Conference on Data Mining, pages 521{528. IEEE Computer

Society, 2001.

A Framework for Product Description Classi cation in E-commerce

C. Sun, N. Rampalli, F. Yang, and A. Doan. Chimera: Large-scale classi cation using

machine learning, rules, and crowdsourcing. Proceedings of the VLDB Endowment,

(13):1529{1540, 2014.

K. Toutanova, F. Chen, K. Popat, and T. Hofmann. Text Classi cation in a Hierarchical

Mixture Model for Small Training Sets. In Proceedings of the 10th International Conference

on Information and Knowledge Management, pages 105{113. ACM, 2001.

UNSPSC.org. United Nations Standard Products and Services Code, 2017. http://www.

unspsc.org.

D. Vandic, S. S. Aanen, F. Frasincar, and U. Kaymak. Dynamic facet ordering for

faceted product search engines. IEEE Transactions on Knowledge and Data Engineering,

(5):1004{1016, 2017.

D. Vandic, J.-W. van Dam, and F. Frasincar. Faceted Product Search Powered by the

Semantic Web. Decision Support Systems, 53(3):425{437, 2012.

H. Wang, Q. Wei, and G. Chen. From Clicking to Consideration: A Business Intelligence

Approach to Estimating Consumers' Consideration Probabilities. Decision Support Systems,

(0):397 { 405, 2013.

K. Wang, S. Zhou, and S. C. Liew. Building Hierarchical Classi ers Using Class Proximity.

In Proceedings of the 25th International Conference on Very Large Data Bases, pages

{374. Morgan Kaufmann, 1999.

T.-Y. Wang and H.-M. Chiang. Fuzzy Support Vector Machine for Multi-class Text

Categorization. Information Processing & Management, 43(4):914{929, 2007.

A. S. Weigend, E. D. Wiener, and J. O. Pedersen. Exploiting Hierarchy in Text Categorization.

Information Retrieval, 1(3):193{216, 1999.

W. J. Wilbur and K. Sirotkin. The Automatic Identi cation of Stop Words. Journal of

information science, 18(1):45{55, 1992.

Y. Yang. Expert Network: E ective and Ecient Learning from Human Decisions in Text

Categorization and Retrieval. In Proceedings of the 17th Annual International Conference

on Research and Development in Information Retrieval, pages 13{22. Springer-Verlag New

York, Inc., 1994.

Y. Yang. An Evaluation of Statistical Approaches to MEDLINE Indexing. In Proceedings

of the American Medical Informatics Association Annual Fall Symposium, pages 358{362,

Y. Yang. An Evaluation of Statistical Approaches to Text Categorization. Information

retrieval, 1(1-2):69{90, 1999.

Y. Yang and C. G. Chute. An Example-Based Mapping Method for Text Categorization

and Retrieval. ACM Transactions on Information Systems, 12(3):252{277, 1994.

Y. Yang and X. Liu. A Re-examination of Text Categorization Methods. In Proceedings

of the 22nd Annual International ACM SIGIR Conference on Research and Development

in Information Retrieval, pages 42{49. ACM, 1999.

Y. Yang and J. O. Pedersen. A Comparative Study on Feature Selection in Text Categorization.

In Proceedings of the Fourteenth International Conference on Machine Learning,

pages 412{420. Morgan Kaufmann Publishers Inc., 1997.

Y. C. Yang. Web User Behavioral Pro ling for User Identi cation. Decision Support

Systems, 49(3):261 { 271, 2010.

H. Yu, J. Yang, and J. Han. Classifying Large Data Sets Using SVM's with Hierarchical

Clusters. In Proceedings of the 9th International Conference on Knowledge Discovery and

Data Mining, pages 306{315. ACM, 2003.

Downloads

Published

2017-10-01

Issue

Section

Articles