Big Data Security Analysis with TARZAN Platform
Keywords:Security, Big data, Framework
The TARZAN platform is an integrated platform for analysis of digital data from security incidents. The platform serves primarily as a middleware between data sources and data processing applications, however, it also provides several supporting services and a runtime environment for the applications. The supporting services, such as a data storage, a resource and application registry, a synchronization service, and a distributed computing platform, are utilized by the TARZAN applications for various securityoriented analyses on the integrated data ranging from an IT security incident detection to inference analyses of data from social networks or crypto-currency transactions. To cope with a large amount of distributed data, both streamed in real-time and stored, and for the need of a large scale distributed computing, the platform has been designed as a big data processing system ensuring reliable, scalable, and cost-effective solution. The platform is demonstrated on the case of a security analysis of network traffic.
Apache ZooKeeper, 2010.
Welcome to Apache Hadoop! 2014.
Apache Cassandra, 2016.
Apache Kafka: A high-throughput distributed messaging system, 2016.
Apache Metron: Real-time big data security, 2016.
Apache Spark: Lightning-fast cluster computing, 2016.
Apache Spot (incubating). (2016). A community approach to fighting cyber threats.
Aupetit, M., Zhauniarovich, Y., Vasiliadis, G., Dacier, M., and Boshmaf, Y. (2016). Visualization of actionable knowledge to mitigate DRDoS attacks. In 2016 IEEE Symposium on Visualization for Cyber Security (VizSec), (pp. 1–8). IEEE.
Cardenas, A. A., Manadhata, P. K., and Rajan, S. P. (2013). Big data analytics for security. IEEE Security and Privacy, 11(6), 74–76.
Cohen, M. I. (2008). Pyfiag: An advanced network forensic framework. In Proceedings of the 2008 Digital Forensics Research Workshop. DFRWS.
Casey, E. (2004). Network traffic as a source of evidence: tool strengths, weaknesses, and future needs. Digital Investigation, 1(1), 28–43.
Gantz, J., and Reinsel, D. (2011). Extracting value from chaos. IDC iview, 1142(2011), 1–12.
Guarino, A. (2013). Digital forensics as a big data challenge. In ISSE 2013 securing electronic business processes (pp. 197–203). Springer Vieweg, Wiesbaden.
He, L., Tang, B., Zhu, M., Lu, B., and Huang, W. (2016). NetflowVis: A Temporal Visualization System for Netflow Logs Analysis. In International Conference on Cooperative Design, Visualization and Engineering (pp. 202–209). Springer, Cham.
Irons, A., and Lallie, H. S. (2014). Digital forensics to intelligent forensics. Future Internet, 6(3), 584–596.
Jagadish, H. V., Gehrke, J., Labrinidis, A., Papakonstantinou, Y., Patel, J. M., Ramakrishnan, R., and Shahabi, C. (2014). Big data and its technical challenges. Communications of the ACM, 57(7), 86–94.
Lukashin, A., Laboshin, L., Zaborovsky, V., and Mulukha, V. (2014). Distributed packet trace processing method for information security analysis. In International Conference on Next Generation Wired/Wireless Networking (pp. 535–543). Springer, Cham.
Mohammed, H., Clarke, N., and Li, F. (2016). An automated approach for digital forensic analysis of heterogeneous big data. JDFSL, 11(2), 137–152.
Pilli, E. S., Joshi, R. C., and Niyogi, R. (2010). Network forensic frameworks: Survey and research challenges. Digital Investigation, 7(1–2), 14–27.
Promrit, N., and Mingkhwan, A. (2015). Traffic flow classification and visualization for network forensic analysis. In 2015 IEEE 29th International Conference on Advanced Information Networking and Applications (AINA), (pp. 358–364). IEEE.
Quick, D., and Choo, K. K. R. (2016). Big forensic data reduction: digital forensic images and electronic evidence. Cluster Computing, 19(2), 723–740.
Schales, D. L., Hu, X., Jang, J., Sailer, R., Stoecklin, M. P., and Wang, T. (2015). FCCE: highly scalable distributed feature collection and correlation engine for low latency big data analytics. In 2015 IEEE 31st International Conference on Data Engineering (ICDE), (pp. 1316–1327). IEEE.
Van der Veen, J. S., Van der Waaij, B., and Meijer, R. J. (2012). Sensor data storage performance: SQL or NoSQL, physical or virtual. In 2012 IEEE 5th International Conference on Cloud Computing (CLOUD), (pp. 431–438). IEEE.
Wullink, M., Moura, G. C., Müller, M., and Hesselman, C. (2016). ENTRADA: A high-performance network traffic data streaming warehouse. In 2016 IEEE/IFIP Network Operations and Management Symposium (NOMS), (pp. 913–918). IEEE.
Zaharia, M., Das, T., Li, H., Shenker, S., and Stoica, I. (2012). Discretized Streams: An Efficient and Fault-Tolerant Model for Stream Processing on Large Clusters. In Proceedings of the 4th USENIX Conference on Hot Topics in Cloud Computing, HotCloud’12, Berkeley, CA, USA, 2012. USENIX Association.
Zawoad, S., and Hasan, R. (2015). Digital forensics in the age of big data: Challenges, approaches, and opportunities. In 2015 IEEE 17th International Conference on High Performance Computing and Communications (HPCC), 2015 IEEE 7th International Symposium on Cyberspace Safety and Security (CSS), 2015 IEEE 12th International Conference on Embedded Software and Systems (ICESS), (pp. 1320–1325). IEEE.