Our paper is published by Network and Distributed Systems Security Symposium (NDSS) 2019. You are encouraged to cite the following paper if you use the provided resources for academic research.
Neural Machine Translation Inspired Binary Code Similarity Comparison beyond Function Pairs.
@inproceedings{zuo2019neural,* We previously submitted the paper to NDSS 2018 in August 2017 and S&P 2019 in May 2018, and finally got accepted to NDSS 2019 after significant improvement. However, the main NMT-inspired idea remains the same. Here is our NDSS 2018 submission page.
title={Neural Machine Translation Inspired Binary Code Similarity Comparison beyond Function Pairs},
author={Zuo, Fei and Li, Xiaopeng and Young, Patrick and Luo,Lannan and Zeng,Qiang and Zhang, Zhexin},
booktitle={Proceedings of the 2019 Network and Distributed Systems Security Symposium (NDSS)},
year={2019} }
Make sure you have installed all of following packages or libraries (including dependencies if necessary) in your computer:
We have trained three instruction embeddings of which dimension is 50, 100, 150. To further use our Siamese based tool for binaries similarity detection, you should first download them from the link.
Normally, we prefer all instructions can find its embedding in pre-trained .w2v files. If not, any unknown word will be replaced with zero vector.
Only well pre-processed data can be accepted by the Siamese neural network based binaries similary detector. To display the usage of Siamese model, we provide some input samples (e.g. test_set_O2.csv) which you can download from the link.
As an example, we provided the pre-trained model weights for a Siamese based binary similarity detector, please click the link to download it. In detail, each sub-network in such Siamese network is a double-layer LSTM with 100D vector (for each instruction embedding) as input. Please refer to the example script in Python to re-run the test case.
You can modify the backend of llvm to automatically output the bounderies of basic blocks (in .s file) and assign an identifier for each of them. To do that, you need to replace the orginial AsmPrinter.cpp file in llvm project with the one provided here.