Can AI Fix a Broken Contract? The Answer is 97.5% Yes.

2 Jul 2025

Table of Links

Abstract and I. Introduction

II. Methods

III. Results

IV. Conclusion, Future Work, and References

III. RESULTS

A. Results from the RandomForestClassifier (RFC)

Of the 2000 contracts used on the model, the RFC was tested on 800 (40%). 717 out of the 800 contracts were predicted accurately for an accuracy of 89.6% and an F1 score of 0.76. The generated confusion matrix further detailed that for positive predictions (“True”), 133 were true positives, and 23 were false positives. For negative predictions, 584 were true negatives, and 60 were false negatives. The false positive rate was only 3.8%, successfully fulfilling our goal. This is a significant improvement over just static analysis tools, such as Slither, which alone has a false positive rate of 10.9% [20]. Furthermore, the RFC is able to examine the source code without a limited number of vulnerability detectors, making it more adaptable to syntax changes.

B. Results from the GPT-3.5-Turbo and Llama-2-7B Error Correction Models

To test the GPT-3.5-Turbo and the fine tuned Llama-2-7B model with our prompt, we aimed to repair vulnerabilities as reported by Slither. The results are shown in the graphs above. The results of Slither checks on GPT-corrected smart contracts are promising, with the fine-tuned GPT-3.5 Turbo model able to repair 97.5% of vulnerabilities. Specifically, out of the 40 vulnerabilities encountered while running through the source code, only a single medium level vulnerability remained. Meanwhile, the fine-tuned Llama-2 model was able to correct all but two errors across 60 vulnerabilities encountered, with one medium- and one low-impact vulnerability remaining. Thus the Llama-2 model was able to decrease the proportion of vulnerabilities by 96.7%. We reviewed a random third of repaired smart contracts and found that all of them had retained their previous functionality, with the models usually correcting syntax-level errors rather than changing underlying structures.

The CoT GPT-3.5-Turbo prompts and fine-tuning of the Llama-2-7B classifier were vital to the accuracy of these models. Upon initial testing, the GPT-3.5-Turbo was able to repair fewer than 85% of smart contracts and the Llama-2-7B model was unable to produce code that could be compilied. However, with the methods outlined above, the results demonstrate a reliable process to repair smart contracts.

Indeed, these results demonstrate that the LLMs were able to successfully repair vulnerable smart contracts with near perfect accuracy, with only three total vulnerabilities remaining. The error correction rate was well above that of any existing methods, making them state-of-the-art tools with impressive error reduction capabilities. Moreover, due to the “Two Timin’” framework described above, only malicious contracts were repaired, cutting down on computing time and maximizing the quantity of secure, reliable smart contracts available. Due to the tens of millions of smart contracts on blockchains such as Etherscan [21], minimizing computational complexity and cost in an already energy-intensive industry is beneficial to users, companies, and the environment.

IV. CONCLUSION

In this paper, we used the Solidity source code of smart contracts to build a novel approach to identify and repair vulnerabilities. This approach utilized a two tiered flow for identifying and repairing vulnerabilities. First, the Slither static code analyzer and a Random Forest Classifier were used to identify malicious smart contracts and their specific vulnerabilities. These malicious smart contracts and their vulnerabilities were used as parameters in a prompt on two separate LLMs, GPT-3.5-Turbo and Llama-2-7B. This prompt was a result of prompt engineering using Chain of Thought reasoning. The two smart contract repair models, one using pre-trained GPT3.5-Turbo and the other a fine-tuned Llama-2-7B, reduced the overall vulnerability count by 97.5% and 96.7% respectively. This novel approach, with state of the art accuracy, allows for smart contracts to be screened and repaired before being deployed. Thus, cybercriminals are unable to exploit vulnerabilites in the contracts. Indeed, this paper establishes a framework that is easy to use, with reliable results, increasing access to safe smart contracts for all. Using the ”Two Timin’” framework, businesses and DAOs can utilize LLMs to repair smart contracts efficiently and effectively, an important step forward as the prevalence of blockchain continues to increase.

FUTURE WORK

Different methods of classifiers powered by transformers or neural networks could be used to identify malicious smart contracts. These could learn across a broader concentration of data with access to a larger proportion of malicious smart contracts. In addition, more finetuning could be completed on Llama-2-7B, with more hidden layers and a larger dataset in order to raise its error correction rate above that of GPT-3.5- Turbo. At the time of writing this paper, GPT-3.5-Turbo is unable to be fine-tuned, however if fine-tuning capabilities were to be developed, further research could focus on fine tuning GPT-3.5-Turbo for repairing smart contracts. Moreover, advances in PEFT and/or QLoRa could allow for a less memory intensive but more accurate LLM for repairing smart contracts.

REFERENCES

[1] Abdelaziz, T., Hobor, A. (2023). Smart learning to find dumb contracts (extended version). ArXiv Preprint, arXiv:2304.10726.

[2] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D. (2023). Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. ArXiv Preprint, arXiv:2201.11903.

[3] Nanqing Dong, Zhipeng Wang, Jiahao Sun, Michael Kampffmeyer, Yizhe Wen, Shuoying Zhang, William Knottenbelt, Eric Xing. (2020). Defending Against Malicious Behaviors in Federated Learning with Blockchain. ArXiv Preprint, arXiv:2307.00543.

[4] Monika di Angelo, Thomas Durieux, Joao F. Ferreira, Gernot Salzer. ˜ (2023). SmartBugs 2.0: An Execution Framework for Weakness Detection in Ethereum Smart Contracts. ArXiv Preprint, arXiv:2306.05057.

[5] Haiyang Liu, Yuqi Fan, Lin Feng and Zhenchun Wei. (2023). Vulnerable Smart Contract Function Locating Based on Multi-Relational Nested Graph Convolutional Network. ArXiv Preprint, arXiv:2306.04479.

[6] Masaru Yamada. 2023. Optimizing Machine Translation through Prompt Engineering: An Investigation into ChatGPT’s Customizability. ArXiv Preprint, arXiv:2308.01391.

[7] Yuhan Ma, Chenyou Fan, Haiqi Jiang. (2023). Sci-CoT: Leveraging Large Language Models for Enhanced Knowledge Distillation in Small Models for Scientific QA. ArXiv Preprint, arXiv:2308.04679.

[8] Isaac David, Liyi Zhou, Kaihua Qin, Dawn Song, Lorenzo Cavallaro, Arthur Gervais. (2023). Do you still need a manual smart contract audit?. ArXiv Preprint, arXiv:2306.12338.

[9] Jing Huang, Kuo Zhou, Ao Xiong, Dongmeng Li. (2022). Smart Contract Vulnerability Detection Model Based on Multi-Task Learning. MDPI, Volume(22)

[10] Sameep Vani, Malav Doshi, Amit A. Nanavati, Ashish Kundu. (2022). Vulnerability Analysis of Smart Contracts. ArXiv Preprint, arXiv:2212.07387.

[11] NIKOLAY IVANOV, CHENNING LI, QIBEN YAN, ZHIYUAN SUN, ZHICHAO CAO, XIAPU LUO. (2023). Security Defense For Smart Contracts: A Comprehensive Survey. ArXiv Preprint, arXiv:2302.07347.

[12] Stefanos Chaliasos, Marcos Antonios Charalambous, Liyi Zhou, Rafaila Galanopoulou, Arthur Gervais, Dimitris Mitropoulos, Benjamin Livshits. (2023). Smart Contract and DeFi Security: Insights from Tool Evaluations and Practitioner Surveys. ArXiv Preprint, arXiv:2304.02981.

[13] Peng Qian, Zhenguang Liu, Qinming He, Butian Huang, Duanzheng Tian, Xun Wang. (2022). Smart Contract Vulnerability Detection Technique: A Survey. ArXiv Preprint, arXiv:2209.05872.

[14] Youwei Huang, Tao Zhang, Sen Fang, Youshuai Tan and Jiachun Tao. (2022). Deep Smart Contract Intent Detection. ArXiv Preprint, arXiv:2211.10724.

[15] Christof Ferreira Torres, Hugo Jonker, Radu State. (2022). Elysium: Context-Aware Bytecode-Level Patching to Automatically Heal Vulnerable Smart Contracts. ArXiv Preprint, arXiv: 2108.10071.

[16] Jon Russell. (2018, July 10). Bancor Loses $23.5M. Retrieved from https://shorturl.at/hCSV9

[17] William Foxley. (2020, October 26). Harvest Finance $24M Attack Triggers $570M Bank Run in Latest DeFi Exploit. Retrieved from https://www.coindesk.com/tech/2020/10/26/ harvest-finance-24m-attack-triggers-570m-bank-run-in-latest-defi-exploit/

[18] PHCorner. (n.d.). ChatGPT Jailbreak IQ. Retrieved from https:// phcorner.net/threads/chatgpt-jailbreak-iq.1668370/

[19] Pengcheng Fang, et al. (2023). CONTRACTFIX: A FRAMEWORK FOR AUTOMATICALLY FIXING VULNERABILITIES IN SMART CONTRACTS. ArXiv Preprint, arXiv:2307.08912.

[20] Feist, J., Grieco, G., & Groce, A. (2019, May). Slither: a static analysis framework for smart contracts. In 2019 IEEE/ACM 2nd International Workshop on Emerging Trends in Software Engineering for Blockchain (WETSEB) (pp. 8-15). IEEE.

[21] Ethereum. (2023, August 17). Ethereum: Build unstoppable applications. https://ethereum.org/en/

Authors:

(1) Abhinav Jain, Westborough High School, Westborough, MA and contributed equally to this work ([email protected]);

(2) Ehan Masud, Sunset High School, Portland, OR and contributed equally to this work ([email protected]);

(3) Michelle Han, Granite Bay High School, Granite Bay, CA ([email protected]);

(4) Rohan Dhillon, Lakeside School, Seattle, WA ([email protected]);

(5) Sumukh Rao, Bellarmine College Preparatory, San Jose, CA ([email protected]);

(6) Arya Joshi, Robbinsville High School, Robbinsville, NJ ([email protected]);

(7) Salar Cheema, University of Illinois, Champaign, IL ([email protected]);

(8) Saurav Kumar, University of Illinois, Champaign, IL ([email protected]).

This paper is available on arxiv under ATTRIBUTION-NONCOMMERCIAL-SHAREALIKE 4.0 INTERNATIONAL license.

← Previous

How Trusted Execution Environments Power Scalable, Private Smart Contracts

Up Next →

Why TEE-Based Smart Contracts Still Aren’t Fully Secure