NSF-funded project aims to mitigate malware and viruses by making them easily understandable

April 25, 2022

As the software development landscape evolves, new security vulnerabilities emerge. Traditionally, software’s source code could shed light on its vulnerabilities, but acquiring high-quality source code for the purpose of finding weaknesses can be difficult due to “compilation”.

Compilation refers to the process of transforming and optimizing a program’s source code to generate a final executable, which is a file that causes a computer to perform specified tasks according to coded instructions. Although an executable works well and runs quickly on computers, it no longer contains any information about the original source code.

Assistant Professor Ruoyu (Fish) Wang has received recognition and financial support from the National Science Foundation for his work in mitigating the effects of malware and computer viruses by making them easily understandable. Search results can allow analysts and researchers to discover source code in a way that identifies vulnerabilities. Photo by Erika Gronek/ASU
Download Full Image

Today, more and more software is developed in high-level programming languages, such as C++, Go, and Rust, due to their many advantages, including higher development speed and better engineering practices. software. More importantly, programs written in high-level languages ​​are compiled into machine code, the basic language of computers, and will run on computers at what is known as native speed. Running at native speed yields the fastest results.

Unfortunately, cybercriminals have also joined the transition to high-level programming, which means that an increasing number of computer viruses and malware are programmed using these languages. And existing techniques do not allow security analysts and researchers to discover malicious source code with satisfactory quality.

However, existing techniques do not allow analysts and security researchers to discover the source code with satisfactory quality.

Ruoyu (fish) Wangassistant professor of computer science and engineering at the Ira A. Fulton Schools of Engineering at Arizona State University since 2018, addresses this security issue with a 2022 National Science Foundation Faculty Early Career Development Program (CAREER) award by discovering new techniques for source code recovery, a process known as decompilation.

“My project will develop a set of generic, automated decompilation techniques that will turn these virus and malware samples into accurate, concise, and human-readable source code,” Wang said. “As an added benefit, this project will enable hardening of software and mitigation of vulnerabilities without accessing the software’s high-level language source code, which will help improve the security portfolio in scenarios where legacy software are used.”

Researchers have worked on binary decompilation for over 25 years, but a critical issue that continues to hamper progress is the lack of a clear metric to assess output quality.

“A fundamental problem, in my view, is that decompilation can lead to many different end goals, such as analyzing software behavior, finding vulnerabilities, generic hardening, patching, and recompiling,” Wang says. “These goals can have very different demands on various aspects of the output.”

With his students and colleagues from School of Computing and Augmented Intelligenceone of the Seven Fulton Schools, Wang will first develop a set of goals under each end goal and then create standardized metrics to assess the quality of the decompilation output.

“Guided by these metrics, we will develop new techniques that transform machine code into a high-level intermediate language known as angr IL or AIL,” says Wang. “With different end goals, we may have different goals or make different trade-offs when transforming code.”

Developing a new decompiler for each high-level programming language can be tedious and expensive. With this in mind, Wang and his team will aim to automatically generate programming language-specific decompilation transformation rules using a new technique called Compiler Transformation Inference and Inversion, or CTII.

“We will use the latest advancements in the fields of natural language processing and scalable computing to help generate these transformation rules,” Wang said. “We will open source all research artifacts as part of this award. The basis of our research, angr and angr decompilerare already available on GitHub.”

Wang’s research will take place in ASU Security Engineering Laboratory for Computing of the Future, known as SEFCOM. Wang attributes the qualified reputations of his SEFCOM colleagues – assistant professor Yan ShoshitaishviliAssociate Professor Adam Doupe and assistant professor Tiffany Bao — all of whom are professors of computer science and engineering at the School of Computing and Augmented Intelligence, one of the reasons his project received funding from the NSF.

“Our team is well known in the computer security community for conducting open, usable, and reproducible research in binary analysis,” says Wang. “I enjoy working with fun, great people who share similar ideologies, and I strongly believe that modern systems research is only possible through a coordinated team effort. My colleagues and I make a great team at SEFCOM and ASU, and I see no opportunity to experience the same level of productivity through teamwork elsewhere.

Comments are closed.