Similarity Analysis of Single-Vendor Marketplaces in the Tor-Network




Tor, vendor sites, similarity detection, feature importance, darknet, offer


Single-vendor shops are darknet marketplaces where individuals offer their own goods or services on their own darknet website. There are many single-vendor shops with a wide range of offers in the Tor-network. This paper presents a method to find similarities between these vendor websites to discover possible operational structures between them. In order to achieve this, similarity values between the darknet websites are determined by combining different features from the categories content, structure and metadata. Our results show that the features HTML-Tag, HTML-Class, HTML-DOM-Tree as well as File-Content, Open Ports and Links-To proved to be particularly important and very effective in revealing commonalities between darknet websites. Using the similarity detection method, it was found that only 49% of the 258 single-vendor marketplaces were unique, meaning that there were no similar websites. In addition, 20% of all vendor shops are duplicates. 31% of all single-vendor marketplaces can be sorted into seven similarity groups.


Download data is not yet available.

Author Biographies

Fabian Brenner, Fraunhofer SIT, ATHENE, Germany

Fabian Brenner was a former auxiliary scientist at the Fraunhofer Institute for Secure Information Technology. He wrote his master’s thesis in the area of darknet and its marketplaces in the institute’s Panda project in 2020. Fabian studied IT security at the Technical University of Darmstadt and currently works as a penetration tester at a consulting company.

Martin Steinebach, Fraunhofer SIT, ATHENE, Germany

Martin Steinebach is the manager of the Media Security and IT Forensics division at Fraunhofer SIT. From 2003 to 2007 he was the manager of the Media Security in IT division at Fraunhofer IPSI. He studied computer science at the Technical University of Darmstadt and finished his diploma thesis on copyright protection for digital audio in 1999. In 2003 he received his PhD at the Technical University of Darmstadt for this work on digital audio watermarking. In 2016 he became honorary professor at the TU Darmstadt. He gives lectures on Multimedia Security as well as Civil Security. He is Principle Investigator at ATHENE and represents IT Forensics and AI security. Before he was Principle Investigator at CASED with the topics Multimedia Security and IT Forensics. In 2012 his work on robust image hashing for detection of child pornography reached the second rank “Deutscher ITSicherheitspreis”, an award funded by Host Görtz.


Mhd Wesam Al-Nabki, Eduardo Fidalgo, Enrique Alegre, and Laura Fernández-Robles. Torank: Identifying the most influential suspicious domains in the tor network. Expert Systems with Applications, 123: 212–226, 2019.

Monica Bianchini, Marco Gori, and Franco Scarselli. Inside PageRank. ACM Transactions on Internet Technology, 5(1):92–128, 2005.

Alex Biryukov, Ivan Pustogarov, Fabrice Thill, and Ralf-Philipp Weinmann. Content and popularity analysis of tor hidden services. In 2014 IEEE 34th International Conference on Distributed Computing Systems Workshops (ICDCSW), pages 188–193. IEEE, 2014.

Fabian Brenner, Florian Platzer, and Martin Steinebach. Discovery of single-vendor marketplace operators in the tor-network. In The 16th International Conference on Availability, Reliability and Security, pages 1–10, 2021.

Julian Broséus, Damien Rhumorbarbe, Marie Morelato, Ludovic Staehli, and Quentin Rossy. A geographical analysis of trafficking on a popular darknet market. Forensic Science International, 277:88–102, 2017.

David Buttler. A short survey of document structure similarity algorithms. Proceedings of the International Conference on Internet Computing, IC’04, 1:3–9, 2004.

Guangyu Chen and Ben Choi. Web page genre classification. Proceedings of the ACM Symposium on Applied Computing, pages 2353–2357, 2008.

Ian Clarke, Oskar Sandberg, Matthew Toseland, and Vilhelm Verendel. Private communication through a network of trusted connections: The dark freenet. Network, 2010.

Jürgen Cleve and Uwe Lämmel. Data Mining. De Gruyter Oldenbourg, 2014.

Isabel F. Cruz, Slava Borisov, Michael A. Marks, and Timothy R. Webb. Measuring structural similarity among web documents: Preliminary results. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 1375:513–524, 1998.

Edwin Dauber, Aylin Caliskan, Richard Harang, Gregory Shearer, Michael Weisman, Frederica Nelson, and Rachel Greenstadt. Git Blame Who?: Stylistic Authorship Attribution of Small, Incomplete Source Code Fragments. Proceedings on Privacy Enhancing Technologies, 2019(3):389–408, jul 2019.

Kai Denker, Marcel Schäfer, and Martin Steinebach. Darknets as tools for cyber warfare. In Information Technology for Peace and Security, pages 107–135. Springer, 2019.

Roger Dingledine, Nick Mathewson, and Paul Syverson. Tor: The second-generation onion router. SSYM’04 Proceedings of the 13th conference on USENIX Security Symposium, 13:21, 2004.

Romain Espinosa. Scamming and the reputation of drug dealers on Darknet Markets. International Journal of Industrial Organization, 67, 2019.

Oana Goga, Howard Lei, Sree Hari Krishnan Parthasarathi, Gerald Friedland, Robin Sommer, and Renata Teixeira. Exploiting innocuous activity for correlating users across sites. WWW 2013 – Proceedings of the 22nd International Conference on World Wide Web, pages 447–457, 2013.

A. D. Gordon, L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. Classification and Regression Trees. Biometrics, 40(3):874, sep 1984.

Ramzi A Haraty and Bassam Zantout. I2P Data Communication System Damage Assessment and Recovery from Malicious Attacks for Defensive Information Warfare View project High-Performance and Accurate Mathematical Solvers in Hardware View project I2P Data Communication System. 2002.

Thanh Nghia Ho and Wee Keong Ng. Application of stylometry to DarkWeb forum user identification. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), volume 9977 LNCS, pages 173–183. Springer Verlag, 2016.

Ankit Kumar Jain and B. B. Gupta. Phishing detection: Analysis of visual similarity based approaches. Security and Communication Networks, 2017(i), 2017.

Jane Yung jen Hsu and Wen tau Yih. Template-based information mining from HTML documents. Proceedings of the National Conference on Artificial Intelligence, pages 256–262, 1997.

Min Hyung Lee, Yeon Seok Kim, and Kyong Ho Lee. Logical structure analysis: From HTML to XML. Computer Standards and Interfaces, 29(1):109–124, 2007.

Michael Levandowsky and David Winter. Distance between sets. Nature, 234(5323):34–35, 1971.

Chul Su Lim, Kong Joo Lee, and Gil Chang Kim. Multiple sets of features for automatic genre classification of web documents. Information Processing and Management, 41(5):1263–1276, 2005.

Gilles Louppe, Louis Wehenkel, Antonio Sutera, and Pierre Geurts. Understanding variable importances in Forests of randomized trees. Advances in Neural Information Processing Systems, pages 1–9, 2013.

Steve Mansfield-Devine. Darknets. Computer Fraud and Security, 2009(12):4–6, 2009.

Mhd Wesam Al Nabki, Eduardo Fidalgo, Enrique Alegre, and Ivan De Paz. Classifying illegal activities on tor network based on web textual contents. In 15th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2017 – Proceedings of Conference, volume 1, pages 35–43, 2017.

Gareth Owen and Nick Savage. Empirical analysis of tor hidden services. IET Information Security, 10(3):113–118, 2016.

Mateusz Pawlik and Nikolaus Augsten. Tree edit distance: Robust and memory-efficient. Information Systems, 56:157–173, 2016.

Florian Platzer, Marcel Schäfer, and Martin Steinebach. Critical traffic analysis on the tor network. In Proceedings of the 15th International Conference on Availability, Reliability and Security, pages 1–10, 2020.

Ramyaa, Congzhou He, and Khaled Rasheed. Using machine learning techniques for stylometry. Proceedings of the International Conference on Artificial Intelligence, IC-AI’04, 2:897–903, 2004.

Angelo P.E. Rosiello, Engin Kirda, Christopher Kruegel, and Fabrizio Ferrandi. A layout-similarity-based approach for detecting phishing pages. Proceedings of the 3rd International Conference on Security and Privacy in Communication Networks, SecureComm, pages 454–463, 2007.

Dennis Shasha, JT-L Wang, Kaizhong Zhang, and Frank Y Shih. Exact and approximate algorithms for unordered tree matching. IEEE Transactions on Systems, Man, and Cybernetics, 24(4):668–678, 1994.

Martijn Spitters, Femke Klaver, Gijs Koot, and Mark Van Staalduinen. Authorship Analysis on Dark Marketplace Forums. In Proceedings – 2015 European Intelligence and Security Informatics Conference, EISIC 2015, pages 1–8. Institute of Electrical and Electronics Engineers Inc., jan 2016.

Martijn Spitters, Stefan Verbruggen, and Mark Van Staalduinen. Towards a comprehensive insight into the thematic organization of the tor hidden services. In 2014 IEEE Joint Intelligence and Security Informatics Conference, pages 220–223. IEEE, 2014.

Efstathios Stamatatos. A survey of modern authorship attribution methods. Journal of the American Society for Information Science and Technology, 60(3):538–556, mar 2009.

Martin Steinebach. File-sharing and the darknet. In Encyclopedia of Criminal Activities and the Deep Web, pages 165–176. IGI Global, 2020.

Martin Steinebach, Marcel Schäfer, Alexander Karakuz, and Katharina Brandl. Detection and Analysis of Tor Onion Services. Journal of Cyber Security and Mobility, 9(1):141–174, 2020.

Martin Steinebach, Marcel Schäfer, Alexander Karakuz, Katharina Brandl, and York Yannikos. Detection and analysis of tor onion services. In Proceedings of the 14th International Conference on Availability, Reliability and Security, pages 1–10, 2019.

Martin Steinebach, Sascha Zenglein, and Katharina Brandl. Phishing detection on tor hidden services. Forensic Science International: Digital Investigation, 36:301117, 2021.

Jiří Štěpánek and Monika Šimková. Comparing Web Pages in Terms of Inner Structure. Procedia – Social and Behavioral Sciences, 83:458–462, 2013.

Xiao Hui Tai, Kyle Soska, and Nicolas Christin. Adversarial matching of dark net market vendor accounts. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1871–1880, 2019.

Anastasios Tombros and Zeeshan Ali. Factors affecting Web page similarity. Lecture Notes in Computer Science, 3408:487–501, 2005.

Vedrana Vidulin, M Lustrek, and M Gams. Multi-Label Approaches to Web Genre Identification. Jlcl, 24(1):97–114, 2009.

Xiangwen Wang, Gang Wang, Michel J Pleimling, and Danfeng Yao. Photo-based Vendor Re-identification on Darknet Marketplaces using Deep Neural Networks. 2018.

Daniel Watson. Source Code Stylometry and Authorship Attribution for Open Source. 2019.

Jessica Wood. The Darknet: A Digital Copyright Revolution. Richmond Journal of Law and Technology, 16(4):14, 2010.

Yudong Yang and Hong Jiang Zhang. HTML page analysis based on visual cues. Proceedings of the International Conference on Document Analysis and Recognition, ICDAR, 2001-Janua(49):859–864, 2001.

York Yannikos, Quang Anh Dang, and Martin Steinebach. Comparison of cyber attacks on services in the clearnet and darknet. In IFIP International Conference on Digital Forensics, pages 39–61. Springer, 2021.

York Yannikos, Julian Heeger, and Maria Brockmeyer. An analysis framework for product prices and supplies in darknet marketplaces. In Proceedings of the 14th International Conference on Availability, Reliability and Security, pages 1–7, 2019.

York Yannikos, Annika Schäfer, and Martin Steinebach. Monitoring product sales in darknet shops. ACM International Conference Proceeding Series, 2018.

Kaizhong Zhang and Dennis Shasha. Simple fast algorithms for the editing distance between trees and related problems. SIAM Journal on Computing, 18(6):1245–1262, 1989.

Yiming Zhang, Yujie Fan, Liang Zhao, Wei Song, Shifu Hou, Chuan Shi, Yanfang Ye, Xin Li, Jiabin Wang, and Qi Xiong. Your style your identity: Leveraging writing and photography styles for drug trafficker identification in darknet markets over attributed heterogeneous information network. The Web Conference 2019 – Proceedings of the World Wide Web Conference, WWW 2019, pages 3448–3454, 2019.