Research Article |
SimCoDe-NET: Similarity Detection in Binary Code using Deep Learning Network
Author(s): S. Poornima and R. Mahalakshmi
Published In : International Journal of Electrical and Electronics Research (IJEER) Volume 12, Issue 1
Publisher : FOREX Publication
Published : 20 march 2024
e-ISSN : 2347-470X
Page(s) : 262-267
Abstract
Binary code similarity detection is a fundamental task in the field of computer binary security. However, code similarity is crucial today because of the prevalence of issues like plagiarism, code cloning, and recycling in software due to the ongoing increase of software scale. To resolve these issues, a novel SIMilarity detection in binary COde using DEep learning NETwork (SimCoDe-NET) has been proposed. Initially, op-code features are extracted from the input data by using reverse engineering process and the opcode embedding is generated using N-skip gram method. The extracted features are fed into Bi-GRU neural network for classifying the similarity of the binary codes. The Bi-GRU neural network compares two data samples in feature space to identify whether they belong to similar data or non-similar data. The SimCoDe-NET framework is evaluated by using generated dataset to assess the efficiency of this method. The efficacy of the proposed SimCoDe-NET framework is assessed in terms of precision, accuracy, sensitivity, recall, similarity detection time and similarity detection rate. The accuracy of the proposed method is 99.10% which is relatively high compared to the existing method. The proposed SimCoDe-NET approach improves the accuracy by 84.9%, 88.58%, and 93.9% better than jTrans, UPPC, and HEBCS respectively.
Keywords: Binary code analysis
, N-skip gram
, Deep Learning
, Bi-GRU Network
, Internet of Things
.
S. Poornima*, Research Scholar, Department of Computer Science Engineering, Presidency University, Bangalore, Karnataka, India; Email: poornima.spa@gmail.com
R. Mahalakshmi, Professor, Department of Computer Science Engineering, Presidency University, Bangalore, Karnataka, India
-
[1] Shin, E.C.R., Song, D. and Moazzezi, R. 2015. Recognizing functions in binaries with neural networks. In 24th {USENIX} Security Symposium, 611–626.
-
[2] Lou, A., Cheng, S., Huang J. and Jiang, F. 2019. Tfdroid: android malware detection by topics and sensitive data flows using machine learning techniques. in Proceedings of the 2019 IEEE 2nd International Conference on Information and Computer Technologies, ICICT, Hawaii, HI, USA, 30–36. [CrossRef]
-
[3] Shalev, N. and Partush, N. 2018. Binary similarity detection using machine learning. In: Proceedings of the 13th workshop on programming languages and analysis for security. ACM, New York, NY, USA, 42–47. [CrossRef]
-
[4] Egele, M., Woo, M., Chapman, P. and Brumley, D. 2014. Blanket execution: Dynamic similarity testing for program binaries and components. In Proceedings of the 23rd USENIX Conference on Security Symposium. Berkeley, CA, USA: USENIX Association, 303–317.
-
[5] Eschweiler, S., Yakdan, K. and Gerhards-Padilla, E. 2016. Discovre: Efficient crossarchitecture identification of bugs in binary code. In Proceedings of the 2016 network and distributed systems security symposium (NDSS). [CrossRef]
-
[6] Wang, Y., Shen, J., Lin, J. and Lou, R. 2019. Staged method of code similarity analysis for firmware vulnerability detection, IEEE Access, 7, 14171–14185. [CrossRef]
-
[7] Zhao, L., Li, D., Zheng, G. and Shi, W. 2018. Deep neural network based on android mobile malware detection system using opcode sequences. In: 2018 IEEE 18th International Conference on Communication Technology (ICCT) 1141–1147. [CrossRef]
-
[8] Ananya Aswathy, Amal, T. R., Swathy, P. G., Vinod, P. and Mohammad, S. 2020. SysDroid: a dynamic ML-based android malware analyzer using system call traces. Cluster Computing 23(4), 2789–2808. [CrossRef]
-
[9] Bromley, J., Guyon, I., LeCun, Y., Säckinger, E. and Shah, R. Signature verification using a siamese time delay neural network. Advances in neural information processing systems, 737–737. [CrossRef]
-
[10] Zhang, X., Sun, M., Wang, J. and Wang, J. 2018. Malware detection based on opcode sequence and resnet. in Proceedings of the International Conference on Security with Intelligent Computing and Big-Data Services, Springer, Guilin, China, 489–502. [CrossRef]
-
[11] Şahın, D.Ö., Kural, O.E., Akleylek, S. and Kiliç, E. 2018. New results on permission based static analysis for android malware. In 2018 6th International Symposium on Digital Forensic and Security (ISDFS) 1–4. [CrossRef]
-
[12] Yu, Z.,Cao, R.; Tang, Q., Nie, S., Huang, J. and Wu, S. 2020. Order matters: semantic-aware neural networks for binary code similarity detection. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34, pp. 1145–1152. [CrossRef]
-
[13] Shukla, S., Kolhe, G., PD, S.M. and Rafatirad, S. 2019. Stealthy malware detection using rnn-based automated localized feature extraction and classifier. In 2019 IEEE 31st international conference on tools with artificial intelligence (ICTAI) (IEEE), 590–597. [CrossRef]
-
[14] Wang, H., Qu, W., Katz, G., Zhu, W., Gao, Z., Qiu, Zhuge, J. and Zhang, C. 2022. jtrans: Jump-aware transformer for binary code similarity. [CrossRef]
-
[15] Zhang, W., Xu, Z., Xiao, Y. and Xue, Y. 2022. Unleashing the power of pseudo-code for binary code similarity analysis. Cybersecurity, 5(1), 23. [CrossRef]
-
[16] XJiang, S., Wang, X., Yu and Gong, Y. 2022. Double‐Layer Positional Encoding Embedding Method for Cross‐Platform Binary Function Similarity Detection. Chinese Journal of Electronics, 31(4), 604-611. [CrossRef]
-
[17] Sun, X., Wei, Q., Du, J. and Wang, Y. 2023. HEBCS: A High-Efficiency Binary Code Search Method. Electronics 12(16), 3464. [CrossRef]
-
[18] Liu, G., Zhou, X., Pang, J., Yue, F., Liu, W. and Wang, J. 2023. Codeformer: A GNN-Nested Transformer Model for Binary Code Similarity Detection. Electronics, 12 (7), 1722. [CrossRef]
-
[19] Wu, G. and Tang, H. 2023. Binary Code Vulnerability Detection Based on Multi-level Feature Fusion. IEEE Access.
-
[20] Guo, J., Zhao, B., Liu, H., Leng, D.; An, Y. and Shu, G. 2023. DeepDual-SD: Deep Dual Attribute-Aware Embedding for Binary Code Similarity Detection. International Journal of Computational Intelligence Systems 16(1), 35. [CrossRef]
-
[21] Huang, C., Zhu, G., Ge, G., Li, T. and Wang, J. FastBCSD: Fast and Efficient Neural Network for Binary Code Similarity Detection.
-
[22] Yang, S., Dong, C., Xiao, Y., Cheng, Y., Shi, Z., Li, Z. and Sun, L.2003. Asteria-Pro: Enhancing Deep-Learning Based Binary Code Similarity Detection by Incorporating Domain Knowledge. ACM Transactions on Software Engineering and Methodology. [CrossRef]
-
[23] Zuo, F., Li, X.,Young, P., Luo, L., Zeng, Q. and Zhang, Z. 2018. Neural machine translation inspired binary code similarity comparison beyond function pairs. arXiv preprint arXiv:1808.04706. [CrossRef]
-
[24] Arutunian, M., Hovhannisyan, H., Vardanyan, V., Sargsyan, S., Kurmangaleev, S. and Aslanyan, H. 2021. A Method to Evaluate Binary Code Comparison Tools. In 2021 Ivannikov Memorial Workshop (IVMEM), 3-5. [CrossRef]
-
[25] Massarelli, L., Di Luna, G.A., Petroni, F., Baldoni, R. and Querzoni, L. 2019. Safe: Self-attentive function embeddings for binary similarity. In Detection of Intrusions and Malware, and Vulnerability Assessment. 16th International Conference, DIMVA 2019, Gothenburg, Sweden, June 19–20, 2019 Proceedings 16, 309-329. [CrossRef]