Big Data Security Analysis with TARZAN Platform


  • Marek Rychl Brno University of Technology, Faculty of Information Technology, Department of Information Systems, IT4Innovations Centre of Excellence, Brno, Czech Republic
  • Ondˇrej Ryˇsav´ Brno University of Technology, Faculty of Information Technology, Department of Information Systems, IT4Innovations Centre of Excellence, Brno, Czech Republic



Security, Big data, Framework


The TARZAN platform is an integrated platform for analysis of digital data from security incidents. The platform serves primarily as a middleware between data sources and data processing applications, however, it also provides several supporting services and a runtime environment for the applications. The supporting services, such as a data storage, a resource and application registry, a synchronization service, and a distributed computing platform, are utilized by the TARZAN applications for various securityoriented analyses on the integrated data ranging from an IT security incident detection to inference analyses of data from social networks or crypto-currency transactions. To cope with a large amount of distributed data, both streamed in real-time and stored, and for the need of a large scale distributed computing, the platform has been designed as a big data processing system ensuring reliable, scalable, and cost-effective solution. The platform is demonstrated on the case of a security analysis of network traffic.



Download data is not yet available.

Author Biographies

Marek Rychl, Brno University of Technology, Faculty of Information Technology, Department of Information Systems, IT4Innovations Centre of Excellence, Brno, Czech Republic

Marek Rychlý is an assistant professor at Brno University of Technology, Faculty of Information Technology (BUT FIT). He received PhD in Computer Science and Engineering in 2010 from BUT FIT. His research interests are in the area of software architecture and focus on dynamic reconfiguration and component mobility in component-based and service-oriented architectures, formal description of software architectures and their evolution, functional and quality-driven automatic Web services composition and testing, and on distributed software systems. He authored over 20 scholarly journal articles and conference papers on varied topics related to software engineering and software architectures.

Ondˇrej Ryˇsav´, Brno University of Technology, Faculty of Information Technology, Department of Information Systems, IT4Innovations Centre of Excellence, Brno, Czech Republic

Ondřej Ryšavý is an associate professor at Brno University of Technology (Czech Republic). He has a PhD in Information Technology. His research projects include Programmability in Rina for European Supremacy of vir- tualised Networks, Modern Tools for Detection and Mitigation of Cyber Criminality on the New Generation Internet, SCADA system for control and monitoring RT processes and Dependable Systems International Research and Educational Experience.


Apache ZooKeeper, 2010.

Welcome to Apache Hadoop! 2014.

Apache Cassandra, 2016.

Apache Kafka: A high-throughput distributed messaging system, 2016.

Apache Metron: Real-time big data security, 2016.

Apache Spark: Lightning-fast cluster computing, 2016.

Apache Spot (incubating). (2016). A community approach to fighting cyber threats.

Aupetit, M., Zhauniarovich, Y., Vasiliadis, G., Dacier, M., and Boshmaf, Y. (2016). Visualization of actionable knowledge to mitigate DRDoS attacks. In 2016 IEEE Symposium on Visualization for Cyber Security (VizSec), (pp. 1–8). IEEE.

Cardenas, A. A., Manadhata, P. K., and Rajan, S. P. (2013). Big data analytics for security. IEEE Security and Privacy, 11(6), 74–76.

Cohen, M. I. (2008). Pyfiag: An advanced network forensic framework. In Proceedings of the 2008 Digital Forensics Research Workshop. DFRWS.

Casey, E. (2004). Network traffic as a source of evidence: tool strengths, weaknesses, and future needs. Digital Investigation, 1(1), 28–43.

Gantz, J., and Reinsel, D. (2011). Extracting value from chaos. IDC iview, 1142(2011), 1–12.

Guarino, A. (2013). Digital forensics as a big data challenge. In ISSE 2013 securing electronic business processes (pp. 197–203). Springer Vieweg, Wiesbaden.

He, L., Tang, B., Zhu, M., Lu, B., and Huang, W. (2016). NetflowVis: A Temporal Visualization System for Netflow Logs Analysis. In International Conference on Cooperative Design, Visualization and Engineering (pp. 202–209). Springer, Cham.

Irons, A., and Lallie, H. S. (2014). Digital forensics to intelligent forensics. Future Internet, 6(3), 584–596.

Jagadish, H. V., Gehrke, J., Labrinidis, A., Papakonstantinou, Y., Patel, J. M., Ramakrishnan, R., and Shahabi, C. (2014). Big data and its technical challenges. Communications of the ACM, 57(7), 86–94.

Lukashin, A., Laboshin, L., Zaborovsky, V., and Mulukha, V. (2014). Distributed packet trace processing method for information security analysis. In International Conference on Next Generation Wired/Wireless Networking (pp. 535–543). Springer, Cham.

Mohammed, H., Clarke, N., and Li, F. (2016). An automated approach for digital forensic analysis of heterogeneous big data. JDFSL, 11(2), 137–152.

Pilli, E. S., Joshi, R. C., and Niyogi, R. (2010). Network forensic frameworks: Survey and research challenges. Digital Investigation, 7(1–2), 14–27.

Promrit, N., and Mingkhwan, A. (2015). Traffic flow classification and visualization for network forensic analysis. In 2015 IEEE 29th International Conference on Advanced Information Networking and Applications (AINA), (pp. 358–364). IEEE.

Quick, D., and Choo, K. K. R. (2016). Big forensic data reduction: digital forensic images and electronic evidence. Cluster Computing, 19(2), 723–740.

Schales, D. L., Hu, X., Jang, J., Sailer, R., Stoecklin, M. P., and Wang, T. (2015). FCCE: highly scalable distributed feature collection and correlation engine for low latency big data analytics. In 2015 IEEE 31st International Conference on Data Engineering (ICDE), (pp. 1316–1327). IEEE.

Van der Veen, J. S., Van der Waaij, B., and Meijer, R. J. (2012). Sensor data storage performance: SQL or NoSQL, physical or virtual. In 2012 IEEE 5th International Conference on Cloud Computing (CLOUD), (pp. 431–438). IEEE.

Wullink, M., Moura, G. C., Müller, M., and Hesselman, C. (2016). ENTRADA: A high-performance network traffic data streaming warehouse. In 2016 IEEE/IFIP Network Operations and Management Symposium (NOMS), (pp. 913–918). IEEE.

Zaharia, M., Das, T., Li, H., Shenker, S., and Stoica, I. (2012). Discretized Streams: An Efficient and Fault-Tolerant Model for Stream Processing on Large Clusters. In Proceedings of the 4th USENIX Conference on Hot Topics in Cloud Computing, HotCloud’12, Berkeley, CA, USA, 2012. USENIX Association.

Zawoad, S., and Hasan, R. (2015). Digital forensics in the age of big data: Challenges, approaches, and opportunities. In 2015 IEEE 17th International Conference on High Performance Computing and Communications (HPCC), 2015 IEEE 7th International Symposium on Cyberspace Safety and Security (CSS), 2015 IEEE 12th International Conference on Embedded Software and Systems (ICESS), (pp. 1320–1325). IEEE.