A Novel Pipeline for Classifying and Repairing Smart Contracts at Scale

cover
1 Jul 2025

Authors:

(1) Abhinav Jain, Westborough High School, Westborough, MA and contributed equally to this work (jain3abhinav@gmail.com);

(2) Ehan Masud, Sunset High School, Portland, OR and contributed equally to this work (ehanmasud2006@gmail.com);

(3) Michelle Han, Granite Bay High School, Granite Bay, CA (michellehan2007agt@gmail.com);

(4) Rohan Dhillon, Lakeside School, Seattle, WA (rohand25@lakesideschool.org);

(5) Sumukh Rao, Bellarmine College Preparatory, San Jose, CA (sumukhsf@gmail.com);

(6) Arya Joshi, Robbinsville High School, Robbinsville, NJ (arya.joshi@gmail.com);

(7) Salar Cheema, University of Illinois, Champaign, IL (salarwc2@illinois.edu);

(8) Saurav Kumar, University of Illinois, Champaign, IL (sauravk4@illinois.edu).

Abstract and I. Introduction

II. Methods

III. Results

IV. Conclusion, Future Work, and References

Abstract—Due to the modern relevance of blockchain technology, smart contracts present both substantial risks and benefits. Vulnerabilities within them can trigger a cascade of consequences, resulting in significant losses. Many current papers primarily focus on classifying smart contracts for malicious intent, often relying on limited contract characteristics, such as bytecode or opcode. This paper proposes a novel, two-layered framework: 1) classifying and 2) directly repairing malicious contracts. Slither’s vulnerability report is combined with source code and passed through a pre-trained RandomForestClassifier (RFC) and Large Language Models (LLMs), classifying and repairing each suggested vulnerability. Experiments demonstrate the effectiveness of fine-tuned and prompt-engineered LLMs. The smart contract repair models, built from pre-trained GPT-3.5-Turbo and finetuned Llama-2-7B models, reduced the overall vulnerability count by 97.5% and 96.7% respectively. A manual inspection of repaired contracts shows that all retain functionality, indicating that the proposed method is appropriate for automatic batch classification and repair of vulnerabilities in smart contracts.

I. INTRODUCTION

As we delve into the crucial role smart contracts play in the global blockchain, it becomes increasingly imperative that we understand the severity of cyberattacks that exploit weak code. 2018 saw $23.5 million worth of cryptocurrencies stolen from the Bancor network due to the compromise of a wallet used to upgrade smart contracts, sparking controversy online over the safety of decentralized exchange and smart contract systems [16]. More recently, in 2020, a hacker drained Harvest Finance of $24 million by implementing a smart contract that manipulated the share values of the vaults [17]. The common theme across these hacks is that vulnerabilities within smart contracts were exploited to steal millions of dollars, highlighting the importance of strengthening smart contracts to prevent vulnerabilities from arising.

Smart contracts provide a secure platform for transactions without the need for a trusted intermediary. For this reason, they have become increasingly common in blockchain applications. But because most blockchain applications prevent users from editing smart contracts after they have been deployed, there is a need for analysis tools that can accurately and precisely determine the vulnerabilities of smart contracts. Although most tools rely on expert-developed frameworks, recent research has begun developing deep learning models that can evaluate a smart contract’s vulnerability. However, most existing deep learning models fail to provide helpful feedback on a smart contract’s vulnerabilities — instead, they determine whether or not a smart contract is vulnerable.

DLVA [1] introduces a three-step approach involving mapping bytecode to high-dimensional vectors, classifying vectors based on training data, and using neural networks to infer vulnerable contracts. However, a significant weakness in this approach was the high false positive rate during the prediction process. Similarly, MRN-GCN [5] utilizes deep learning with a nest contract graph capturing syntactic and semantic information, enabling the classification of vulnerable functions, but like [1], retained mixed recall percentages ranging from 98.18% to 79.59%. The authors of [3] take a different approach by proposing peer-to-peer voting and reward-and-slash mechanisms to mitigate and discourage malicious behavior in smart contracts.

Large Language Models (LLMs) models prove to be exceptional in performing complex tasks. The authors of [8] demonstrated the capabilities of various LLMs in identifying vulnerabilities in DeFi smart contracts with F1-scores significantly higher than random baselines, which has the potential arXiv:2309.07841v1 [cs.CR] 14 Sep 2023 to be improved by the tool enhancement framework developed in [4]. Prompt engineering allows LLMs to be substantially enhanced. One powerful LLM prompt engineering method involves Chain of Thought (CoT) prompting [2] that significantly improves the ability of LLMs to perform complex reasoning. In eight CoT exemplars, [2] achieves an accuracy of 56.9 on PaLM-540B in the GSM8K benchmark, demonstrating an accuracy improvement of 39. However, the paper chooses to rely solely on CoT, neglecting fine-tuning entirely. In a similar implementation, the authors of [7] present a framework that improves upon CoT by transferring advanced reasoning abilities from large models to smaller ones through knowledge distillation, resulting in improved question-answering performance. In another scenario, [6] utilized prompt engineering by giving ChatGPT specific information, such as the translation’s purpose and target audience, leading to industry standard translation quality.

A comprehensive survey [11] described the current landscape of smart contract security, identifying eight core defense methods across 133 models. This finding underscores the complexity of the field but also reveals limitations. One limitation is seen in applying automated smart contract tools to DeFi systems [12]. Surprisingly, these tools only detected 8% of attacks, indicating a challenge with intricate vulnerabilities. Addressing this, [13] evaluated five smart contract detection tools, focusing on three types of vulnerabilities. [13]’s analysis determined that different detection models have varying strengths and weaknesses, suggesting a combination of methods may be more effective. Furthermore, this notion is corroborated by [9] and [10], which both utilize Multi-Task Learning, a combination method that leverages concurrent learning and optimization of multiple tasks. Notably, [14] advances this methodology by using an approach that blends K-means clustering and LSTM networks with a universal sentence encoder. This approach understood the smart contract code’s semantic meaning, outperforming baseline models.

Moreover, current work regarding repairing smart contracts has been shown to be reliable. For example, [19] utilizes a framework called ContractFix to repair vulnerabilites with 94% accuracy. ContractFix was based around static code analyzers and focused on repairing broken patches. Similarly, [15] utilizes a tool, Elysium, to repair patches in bytecode for seven vulnerabilities. However, this paper improves on these frameworks in two main ways. First, our framework is built on LLMs which allow for a more robust repairing process, that is adaptable to zero-day vulnerabilities. Secondly, we work directly with source code, which is a novel approach to repair vulnerabilities.

These existing methods have been shown to work well in vulnerability detection across various situations with relatively little statistical error. However, we show that existing vulnerability detection methods face the following problems: 1) lack of a broad approach, 2) little detail on specific errors, 3) high false positive evaluations, and 4) lack of a direct repair framework. To address all these problems, we propose a novel pipeline. The pipeline first utilizes Slither and a RandomForestClassifier to detect and provide specific vulnerabilities within smart contract source code. After filtering out non-malicious contracts, two LLMs, GPT-3.5-Turbo and a fine-tuned Llama-2-7b generation model, each repair the vulnerable smart contract source code. The repaired contract is then evaluated by Slither against its vulnerable counterpart, assessing the effectiveness of the repair.

The rest of this paper is outlined as follows: Section II details our novel pipeline approach that utilizes two layers for vulnerability detection: Slither and RandomForestClassifier, to classify vulnerable smart contracts and two LLM models (Llama-2-7B and GPT-3.5-Turbo) to repair them. Section III exhibits the results of our approach in comparison to existing methods. Section IV provides a conclusion.

This paper is available on arxiv under ATTRIBUTION-NONCOMMERCIAL-SHAREALIKE 4.0 INTERNATIONAL license.