goshawk

Goshawk: Hunting Memory Corruptions via Structure-Aware and Object-Centric Memory Operation Synopsis

1. Introduction

Goshawk is an automated memory corruption bug detection system, which first automatically annotates memory management (MM) fucntions with NLP-assisted classification, and then utilizes structure-aware and object centric memory operation synopsis (MOS) to abstract MM functions in source code and help find bugs.

Goshawk is currently built on top of Clang Static Analyzer and we used it to discover 92 use-after-free and double-free bugs in Linux kernel, FreeBSD, OpenSSL, Redis and IoT SDKs.

2. Latest news

2022-03-13 Goshawk is now public on Github. You can obtian the latest version of Goshawk on: https://github.com/Yunlongs/Goshawk.
2022-02-22 The data about the call-chains length of identified MM functions can be downloadad from here.
2021-12-20 The evaluation dataset are released.
2021-12-10 The comparison dataset now released.
2021-12-05 The bug detection results of IOT SDKs are provided.

3. Source Code & Prebuilt binary

We have impemented Goshawk as multiple Clang plugins and Python scripts. The source code of Goshawk is available here: source code.

3.1 Environment Prerequisites

Clang compiler v10.0.1, can be downloaded from here.
CodeChecker. The Installation instruction can be found here.

3.2 Source Code Structure

Python Scripts: Main scripts of NLP-assisted classification of Goshawk and call the plugins to perform Data Flow Analysis based Validation and MOS Generation.
Clang Plugins: Engines of Data Flow Tracking of Goshawk, and the instance of MOS Interface and a use-after-free and double free checker on Clang Static Analyzer.

3.3 Prebuilt binary

We provided a prebuilt binary about the above source codes, which can run Goshawk in a command. The prebuilt binary is available here: prebuilt binary.

4. Dataset

4.1 MOS Dataset

The MOS dataset of our experiment is available: dataset. It covers the following projects and versions:

Linux kernel v5.12-rc2
FreeBSD v13.0.0
OpenSSL v3.0.0
Redis v6.2.1
Azure SDK LTS_01_2021_Ref01
QcloudE SDK v3.1.8
QcloudH SDK v3.2.3

4.2 Evaluation Dataset

The evaluation dataset contains manaually selected 200 allocators, 200 deallocators, and 600 non-MM functions which is used to test the accuracy of MM function identifcation of Goshawk. This dataset is available: dataset. The revlant experiment was presented in Section V-B of our Paper.

4.3 Comparison Dataset

This dataset contians the MM functions both truely and falsely identified by existing approaches (i.e, K-MELD and SinkFinder), and these functions are used to compare the effectiveness of MM funtion identifcation with the existing approaches. This dataset is available: dataset. The revlant experiment was presented in Section V-B of our Paper.

5. Tutorials

Take OpenSSL project as an example, the step by step tutorials of how Goshawk analyze a project are provided as below.

5.1 Record Compilation Commands

Download the OpenSSL project:git clone https://github.com/openssl/openssl.git
Enter the OpenSSL project: cd openssl/
Generate Makefile: ./Configure CC=clang HOSTCC=clang
Use codecheck to wrap the compilation process: codechecker log -b "make CC=clang HOSTCC=clang -j128" -o compilation.json

After these steps, the compilation commands of each C/C++ files are recorded in the compilation.json.

5.2 MM functions and MOS Annotation

Enter the Goshawk_binary directory: cd Goshawk_binary/
Run the binary 'run' with specific arguments : ./run xxx/openssl

After the above two steps, the MOS of identified MM functions at the "output" directory.

5.3 Bug Detection On CSA

Copy the output MM function files to a certain directory: cp xxx/Goshawk_binary/output/alloc/* /tmp/CSA/ & cp xxx/Goshawk_binary/output/free/* /tmp/CSA/
Write the CSA configuration file: echo " -Xclang -load -Xclang xxx/Goshawk_binary/plugins/MemMisuseAnalyzerProPlugin.so -Xclang -analyzer-checker=alpha.unix.MemMisuseChecker -Xclang -analyzer-max-loop -Xclang 1 -Xclang -analyzer-inline-max-stack-depth -Xclang 5" > static_analyzer.cfg
Call codecheck to perform bug detection: codeChecker analyze --analyzers clangsa -j128 xxx/openssl/compilation.json --saargs static_analyzer.cfg -d apiModeling -d cplusplus -d nullability -d optin -d security -d unix -d valist -d deadcode -d core -d security.insecureAPI.rand --ctu --output ./reports
Generate the html bug reports: codeChecker parse ./reports -e html -o ./reports_html

Finally, the violations of current project are described in "reports_html" with the human-readable html format.

6 Detection Results

Please refer to our IEEE S&P paper: pdf|slides|presentation