Goshawk is an automated memory corruption bug detection system, which first automatically annotates memory management (MM) fucntions with NLP-assisted classification, and then utilizes structure-aware and object centric memory operation synopsis (MOS) to abstract MM functions in source code and help find bugs.
Goshawk is currently built on top of Clang Static Analyzer and we used it to discover 92 use-after-free and double-free bugs in Linux kernel, FreeBSD, OpenSSL, Redis and IoT SDKs.
We have impemented Goshawk as multiple Clang plugins and Python scripts. The source code of Goshawk is available here: source code.
We provided a prebuilt binary about the above source codes, which can run Goshawk in a command. The prebuilt binary is available here: prebuilt binary.
The MOS dataset of our experiment is available: dataset. It covers the following projects and versions:
The evaluation dataset contains manaually selected 200 allocators, 200 deallocators, and 600 non-MM functions which is used to test the accuracy of MM function identifcation of Goshawk. This dataset is available: dataset. The revlant experiment was presented in Section V-B of our Paper.
This dataset contians the MM functions both truely and falsely identified by existing approaches (i.e, K-MELD and SinkFinder), and these functions are used to compare the effectiveness of MM funtion identifcation with the existing approaches. This dataset is available: dataset. The revlant experiment was presented in Section V-B of our Paper.
Take OpenSSL project as an example, the step by step tutorials of how Goshawk analyze a project are provided as below.
git clone https://github.com/openssl/openssl.git
cd openssl/
./Configure CC=clang HOSTCC=clang
codechecker log -b "make CC=clang HOSTCC=clang -j128" -o compilation.json
After these steps, the compilation commands of each C/C++ files are recorded in the compilation.json.
cd Goshawk_binary/
./run xxx/openssl
After the above two steps, the MOS of identified MM functions at the "output" directory.
cp xxx/Goshawk_binary/output/alloc/* /tmp/CSA/ & cp xxx/Goshawk_binary/output/free/* /tmp/CSA/
echo " -Xclang -load -Xclang xxx/Goshawk_binary/plugins/MemMisuseAnalyzerProPlugin.so -Xclang -analyzer-checker=alpha.unix.MemMisuseChecker -Xclang -analyzer-max-loop -Xclang 1 -Xclang -analyzer-inline-max-stack-depth -Xclang 5" > static_analyzer.cfg
codeChecker analyze --analyzers clangsa -j128 xxx/openssl/compilation.json --saargs static_analyzer.cfg -d apiModeling -d cplusplus -d nullability -d optin -d security -d unix -d valist -d deadcode -d core -d security.insecureAPI.rand --ctu --output ./reports
codeChecker parse ./reports -e html -o ./reports_html
Finally, the violations of current project are described in "reports_html" with the human-readable html format.
Please refer to our IEEE S&P paper: pdf|slides|presentation
© 2022 G.O.S.S.I.P / Email: [contact at securitygossip dot com] or [romangol at securitygossip dot com]