DISCOVER SEMANTIC TOPICS IN PATENTS WITHIN A SPECIFIC DOMAIN
Keywords:
Patent topic discovery, Latent Dirichlet Allocation, Backbone Association Link Network, Domain knowledgeAbstract
Patent topic discovery is critical for innovation-oriented enterprises to hedge the patent application risks and raise the success rate of patent application. Topic models are commonly recognized as an efficient tool for this task by researchers from both academy and industry. However, many existing well-known topic models, e.g., Latent Dirichlet Allocation (LDA), which are particularly designed for the documents represented by word-vectors, exhibit low accuracy and poor interpretability on patent topic discovery task. The reason is that 1) the semantics of documents are still under-explored in a specific domain 2) and the domain background knowledge is not successfully utilized to guide the process of topic discovery. In order to improve the accuracy and the interpretability, we propose a new patent representation and organization with additional inter-word relationships mined from title, abstract, and claim of patents. The representation can endow each patent with more semantics than word-vector. Meanwhile, we build a Backbone Association Link Network (Backbone ALN) to incorporate domain background semantics to further enhance the semantics of patents. With new semantic-rich patent representations, we propose a Semantic LDA model to discover semantic topics from patents within a specific domain. It can discover semantic topics with association relations between words rather than a single word vector. At last, accuracy and interpretability of the proposed model are verified on real-world patents datasets from the United States Patent and Trademark Office. The experimental results show that Semantic LDA model yields better performance than other conventional models (e.g., LDA). Furthermore, our proposed model can be easily generalized to other related text mining corpus.
Downloads
References
Wang W M, Cheung C F. A Semantic-based Intellectual Property Management System (SIPMS) for
supporting patent analysis[J]. Engineering Applications of Artificial Intelligence, 2011, 24(8):
-1520.
Feng L, Peng Z, Liu B, et al. Finding Novel Patents Based on Patent Association[C]//International
Conference on Web-Age Information Management. Springer International Publishing, 2014: 5-17.
Venugopalan S, Rai V. Topic based classification and pattern identification in patents[J].
Technological Forecasting and Social Change, 2015, 94: 236-250.
Chen H, Zhang G, Zhu D, et al. A patent time series processing component for technology
intelligence by trend identification functionality[J]. Neural Computing and Applications, 2015,
(2): 345-353.
Noh H, Jo Y, Lee S. Keyword selection and processing strategy for applying text mining to patent
analysis[J]. Expert Systems with Applications, 2015, 42(9): 4348-4360.
Hu Z, Fang S, Liang T. Empirical study of constructing a knowledge organization system of patent
documents using topic modeling[J]. Scientometrics, 2014, 100(3): 787-799.
Montecchi T, Russo D, Liu Y. Searching in Cooperative Patent Classification: Comparison between
keyword and concept-based search[J]. Advanced Engineering Informatics, 2013, 27(3): 335-345.
Park S, Jun S. New technology management using time series regression and clustering[J].
International Journal of Software Engineering and Its Applications, 2012, 6(2): 155-160.
Kim K, Khabsa M, Giles C L. Inventor Name Disambiguation for a Patent Database Using a
Random Forest and DBSCAN[C]//Proceedings of the 16th ACM/IEEE-CS on Joint Conference on
Digital Libraries. ACM, 2016: 269-270.
Kang I S, Na S H, Kim J, et al. Cluster-based patent retrieval[J]. Information processing &
management, 2007, 43(5): 1173-1182.
Blei D M, Ng A Y, Jordan M I. Latent dirichlet allocation[J]. Journal of machine Learning
research, 2003, 3(Jan): 993-1022.
Hofmann T. Probabilistic latent semantic indexing[C]//Proceedings of the 22nd annual
international ACM SIGIR conference on Research and development in information retrieval.
ACM, 1999: 50-57.
Supraja A M, Archana S, Suvetha S, et al. Patent search and trend analysis[C]//Advance
Computing Conference (IACC), 2015 IEEE International. IEEE, 2015: 501-506.
Luo X, Xu Z, Yu J, et al. Building association link network for semantic link on web resources[J].
IEEE transactions on automation science and engineering, 2011, 8(3): 482-494.
Tang J, Wang B, Yang Y, et al. PatentMiner: topic-driven patent analysis and
mining[C]//Proceedings of the 18th ACM SIGKDD international conference on Knowledge
discovery and data mining. ACM, 2012: 1366-1374.
Venugopalan S, Rai V. Topic based classification and pattern identification in patents[J].
Technological Forecasting and Social Change, 2015, 94: 236-250.
Kim G, Park S, Jang D. Technology analysis from patent data using latent dirichlet
allocation[M]//Soft Computing in Big Data Processing. Springer International Publishing, 2014:
-80.
Du L, Buntine W, Jin H. A segmented topic model based on the two-parameter Poisson-Dirichlet
process[J]. Machine learning, 2010, 81(1): 5-19.
Xuan J, Lu J, Zhang G, et al. Topic model for graph mining[J]. IEEE transactions on cybernetics,
, 45(12): 2792-2803.
Kim Y G, Suh J H, Park S C. Visualization of patent analysis for emerging technology[J]. Expert
Systems with Applications, 2008, 34(3): 1804-1812.
Che H C, Wang S Y, Lai Y H. Assessment of patent legal value by regression and backpropagation
neural network[J]. International Journal of Systematic Innovation, 2010, 1(1).
Shih M J, Liu D R. Patent Classification Using Ontology-Based Patent Network
Analysis[C]//PACIS. 2010: 95.
Chen H, Zhang G, Lu J, et al. A fuzzy approach for measuring development of topics in patents
using Latent Dirichlet Allocation[C]//Fuzzy Systems (FUZZ-IEEE), 2015 IEEE International
Conference on. IEEE, 2015: 1-7.
Liu Y, Borhan N, Luo X, et al. Association Link Network Based Core Events Discovery on the
Web[C]//Computational Science and Engineering (CSE), 2013 IEEE 16th International
Conference on. IEEE, 2013: 553-560.
Salton G, Wong A, Yang C S. A vector space model for automatic indexing[J]. Communications
of the ACM, 1975, 18(11): 613-620.
Luo X, Zhang J, Ye F, et al. Power series representation model of text knowledge based on human
concept learning[J]. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2014, 44(1):
-102.
Heinrich G. Parameter estimation for text analysis[J]. University of Leipzig, Tech. Rep, 2008.
Forman G. An extensive empirical study of feature selection metrics for text classification[J].
Journal of machine learning research, 2003, 3(Mar): 1289-1305.
Griffiths T L, Steyvers M. Finding scientific topics[J]. Proceedings of the National academy of
Sciences, 2004, 101(suppl 1): 5228-5235.
Zhang M L, Wu L. LIFT: Multi-label learning with label-specific features[J]. IEEE transactions on
pattern analysis and machine intelligence, 2015, 37(1): 107-120.
Cabral R, De la Torre F, Costeira J P, et al. Matrix completion for weakly-supervised multi-label
Image classification[J]. IEEE transactions on pattern analysis and machine intelligence, 2015,
(1): 121-135.
Ng B, Li F W B, Lau R W H, et al. A performance study on multi-server DVE systems[J].
Information Sciences, 2003, 154(1): 85-93.
Li F W B, Li L W F, Lau R W H. Supporting continuous consistency in multiplayer online
games[C]//Proceedings of the 12th annual ACM international conference on Multimedia. ACM,
: 388-391.
Yan T, Lau R W H, Xu Y, et al. Depth mapping for stereoscopic videos[J]. International Journal of
Computer Vision, 2013, 102(1-3): 293-307.