A FRAMEWORK FOR PRODUCT DESCRIPTION CLASSIFICATION IN E-COMMERCE
Keywords:
Product descriptions, hierarchical clustering, feature selection, e-commerceAbstract
We propose the Hierarchical Product Classication (HPC) framework for the purpose of classifying products using a hierarchical product taxonomy. The framework uses a classication system with multiple classication nodes, each residing on a dierent level of the taxonomy. The innovative part of the framework stems from the denition of classication recipes that can be used to construct high-quality classier nodes, using the product descriptions in the most optimal way. These classier recipes are specically tailored for the e-commerce domain. The use of these classier recipes enables exible classiers that adjust to the taxonomy depth-specic characteristics of product taxonomies. Furthermore, in order to gain insight into which components are required to perform high quality product classication, we evaluate several feature selection methods and classication techniques in the context of our framework. Based on 3000 product descriptions obtained from Amazon.com, HPC achieves an overall accuracy of 76.80% for product classication. Using 110 categories from CircuitCity.com and Amazon.com, we obtain a precision of 93.61% for mapping the categories to the taxonomy of shopping.com.
Downloads
References
Amazon.com. AWS - Amazon Web Services, 2017. http://aws.amazon.com/.
C. M. Bishop. Pattern Recognition And Machine Learning. Springer-Verlag, 2007.
S. Chakrabarti, B. Dom, R. Agrawal, and P. Raghavan. Using Taxonomy, Discriminants,
and Signatures for Navigating in Text Databases. In Proceedings of the 23rd International
A Framework for Product Description Classi cation in E-commerce
Conference on Very Large Data Bases, pages 446{455. Morgan Kaufmann Publishers Inc.,
K. W. Church and P. Hanks. Word Association Norms, Mutual Information, and Lexicography.
Computational Linguistics, 16(1):22{29, 1990.
S. D'Alessio, K. Murray, R. Schiano, and A. Kershenbaum. The E ect of Using Hierarchical
Classi ers in Text Categorization. In Proceedings of 6th International Conference
Recherche d'Information Assistee par Ordinateur, pages 302{313, 2000.
Y. Ding, M. Korotkiy, B. Omelayenko, V. Kartseva, V. Zykov, M. Klein, E. Schulten, and
D. Fensel. GoldenBullet: Automated Classi cation of Product Data in E-commerce. In
Proceedings of the 5th International Conference on Business Information Systems, 2002.
S. Dumais and H. Chen. Hierarchical classi cation of Web content. In Proceedings of
the 23rd Annual International Conference on Research and Development in Information
Retrieval, pages 256{263. ACM, 2000.
C. Fellbaum, editor. WordNet: An Electronic Lexical Database (Language, Speech, and
Communication). The MIT Press, May 1998.
D. Fensel, Y. Ding, B. Omelayenko, E. Schulten, G. Botquin, M. Brown, and A. Flett.
Product Data Integration in B2B E-Commerce. IEEE Intelligent Systems, 16(4):54{59,
E.-H. Han, G. Karypis, and V. Kumar. Text Categorization Using Weight Adjusted
k-Nearest Neighbor Classi cation. In Proceedings of the 5th Paci c-Asia Conference on
Knowledge Discovery and Data Mining, pages 53{65. Springer-Verlag, 2001.
P.-Y. Hao, J.-H. Chiang, and Y.-K. Tu. Hierarchically svm classi cation based on support
vector clustering method and its application to document categorization. Expert Systems
with applications, 33(3):627{635, 2007.
J. Huang, J. Lu, and C. X. Ling. Comparing Naive Bayes, Decision Trees, and SVM
with AUC and Accuracy. In Data Mining, 2003. ICDM 2003. Third IEEE International
Conference on, pages 553{556. IEEE, 2003.
T. Joachims. Text Categorization with Support Vector Machines: Learning with Many
Relevant Features. In Proceedings of the European Conference on Machine Learning, pages
{142. Springer-Verlag, 1998.
Y. S. Kim, B.-J. Yum, J. Song, and S. M. Kim. Development of a recommender system
based on navigational and behavioral patterns of customers in e-commerce sites. Expert
Systems with Applications, 28(2):381{393, 2005.
D. Koller and M. Sahami. Hierarchically Classifying Documents Using Very Few Words.
In Proceedings of the 14th International Conference on Machine Learning, pages 170{178.
Morgan Kaufmann Publishers Inc., 1997.
Y.-H. Lee, P. J.-H. Hu, T.-H. Cheng, and Y.-F. Hsieh. A Cost-sensitive Technique
for Positive-Example Learning Supporting Content-Based Product Recommendations in
B-to-C E-commerce. Decision Support Systems, 53(1):245 { 256, 2012.
V. I. Levenshtein. Binary Codes Capable of Correction Deletions, Insertions, and Reversals.
Soviet Physics Doklady, 10(8):707{710, 1966.
D. D. Lewis, R. E. Schapire, J. P. Callan, and R. Papka. Training Algorithms for Linear
Text Classi ers. In Proceedings of the 19th Annual International Conference on Research
and Development in Information Retrieval, pages 298{306. ACM, 1996.
T. Li, S. Zhu, and M. Ogihara. Hierarchical Document Classi cation Using Automatically
Generated Hierarchy. Journal of Intelligent Information Systems, 29(2):211{230, 2007.
C.-F. Lin and S.-D. Wang. Fuzzy Support Vector Machines. IEEE Transactions on Neural
Networks, 13(2):464{471, 2002.
C.-H. Lin and H. Chen. An Automatic Indexing and Neural Network Approach to
Concept Retrieval and Classi cation of Multilingual (Chinese-English) Documents. IEEE
Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 26(1):75{88, Feb
A. McCallum, R. Rosenfeld, T. M. Mitchell, and A. Y. Ng. Improving Text Classi cation
by Shrinkage in a Hierarchy of Classes. In Proceedings of the 15th International Conference
on Machine Learning, pages 359{367. Morgan Kaufmann, 1998.
G. J. McLachlan. Discriminant Analysis and Statistical Pattern Recognition. Wiley, 2004.
T. Mitchell. Machine Learning. McGraw Hill, 1996.
S. Mulpuru, V. Boutan, C. Johnson, S. Wu, and L. Naparstek. Forrester Research
eCommerce Forecast, 2014 to 2019. https://goo.gl/6b1fh3, 2017.
L. J. Nederstigt, D. Vandic, and F. Frasincar. A lexical approach for taxonomy mapping.
Journal of Web Engineering, 15(1&2):84{109, 2016.
W. K. Ng, G. Yan, and E.-P. Lim. Heterogeneous Product Description in Electronic
Commerce. SIGecom Exchanges, 1(1):7{13, 2000.
N. Oza, J. Castle, and J. Stutz. Classi cation of Aeronautics System Health and Safety
Documents. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications
and Reviews, 39(6):670{680, Nov 2009.
M. F. Porter. An Algorithm for Sux Stripping. Readings in information retrieval, pages
{316, 1997.
M. E. Ruiz and P. Srinivasan. Hierarchical Text Categorization Using Neural Networks.
Information Retrieval, 5(1):87{118, 2002.
G. Salton, A. Wong, and C. S. Yang. A Vector Space Model for Automatic Indexing.
Communications of the ACM, 18(7):613{620, 1975.
M. Sasaki and K. Kita. Rule-Based Text Categorization Using Hierarchical Categories. In
Proceedings of the 1998 IEEE International Conference on Systems, Man, and Cybernetics,
volume 3, pages 2827{2830, 1998.
F. Shih and S.-S. Chen. Adaptive Document Block Segmentation and Classi cation. IEEE
Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 26(5):797{802, Oct
S. R. Singh, H. A. Murthy, and T. A. Gonsalves. Feature Selection for Text Classi cation
Based on Gini Coecient of Inequality. In Proceedings of the Fourth International
Workshop on Feature Selection in Data Mining (FSDM 2010), volume 10, pages 76{85,
M. Steinbach, G. Karypis, and V. Kumar. A Comparison of Document Clustering
Techniques. 00 034, University of Minnesota, 2000.
A. Sun and E. P. Lim. Hierarchical Text Classi cation and Evaluation. In Proceedings of
the 2001 IEEE International Conference on Data Mining, pages 521{528. IEEE Computer
Society, 2001.
A Framework for Product Description Classi cation in E-commerce
C. Sun, N. Rampalli, F. Yang, and A. Doan. Chimera: Large-scale classi cation using
machine learning, rules, and crowdsourcing. Proceedings of the VLDB Endowment,
(13):1529{1540, 2014.
K. Toutanova, F. Chen, K. Popat, and T. Hofmann. Text Classi cation in a Hierarchical
Mixture Model for Small Training Sets. In Proceedings of the 10th International Conference
on Information and Knowledge Management, pages 105{113. ACM, 2001.
UNSPSC.org. United Nations Standard Products and Services Code, 2017. http://www.
unspsc.org.
D. Vandic, S. S. Aanen, F. Frasincar, and U. Kaymak. Dynamic facet ordering for
faceted product search engines. IEEE Transactions on Knowledge and Data Engineering,
(5):1004{1016, 2017.
D. Vandic, J.-W. van Dam, and F. Frasincar. Faceted Product Search Powered by the
Semantic Web. Decision Support Systems, 53(3):425{437, 2012.
H. Wang, Q. Wei, and G. Chen. From Clicking to Consideration: A Business Intelligence
Approach to Estimating Consumers' Consideration Probabilities. Decision Support Systems,
(0):397 { 405, 2013.
K. Wang, S. Zhou, and S. C. Liew. Building Hierarchical Classi ers Using Class Proximity.
In Proceedings of the 25th International Conference on Very Large Data Bases, pages
{374. Morgan Kaufmann, 1999.
T.-Y. Wang and H.-M. Chiang. Fuzzy Support Vector Machine for Multi-class Text
Categorization. Information Processing & Management, 43(4):914{929, 2007.
A. S. Weigend, E. D. Wiener, and J. O. Pedersen. Exploiting Hierarchy in Text Categorization.
Information Retrieval, 1(3):193{216, 1999.
W. J. Wilbur and K. Sirotkin. The Automatic Identi cation of Stop Words. Journal of
information science, 18(1):45{55, 1992.
Y. Yang. Expert Network: E ective and Ecient Learning from Human Decisions in Text
Categorization and Retrieval. In Proceedings of the 17th Annual International Conference
on Research and Development in Information Retrieval, pages 13{22. Springer-Verlag New
York, Inc., 1994.
Y. Yang. An Evaluation of Statistical Approaches to MEDLINE Indexing. In Proceedings
of the American Medical Informatics Association Annual Fall Symposium, pages 358{362,
Y. Yang. An Evaluation of Statistical Approaches to Text Categorization. Information
retrieval, 1(1-2):69{90, 1999.
Y. Yang and C. G. Chute. An Example-Based Mapping Method for Text Categorization
and Retrieval. ACM Transactions on Information Systems, 12(3):252{277, 1994.
Y. Yang and X. Liu. A Re-examination of Text Categorization Methods. In Proceedings
of the 22nd Annual International ACM SIGIR Conference on Research and Development
in Information Retrieval, pages 42{49. ACM, 1999.
Y. Yang and J. O. Pedersen. A Comparative Study on Feature Selection in Text Categorization.
In Proceedings of the Fourteenth International Conference on Machine Learning,
pages 412{420. Morgan Kaufmann Publishers Inc., 1997.
Y. C. Yang. Web User Behavioral Pro ling for User Identi cation. Decision Support
Systems, 49(3):261 { 271, 2010.
H. Yu, J. Yang, and J. Han. Classifying Large Data Sets Using SVM's with Hierarchical
Clusters. In Proceedings of the 9th International Conference on Knowledge Discovery and
Data Mining, pages 306{315. ACM, 2003.