The project "String Constraint Solving with Real-World Regular Expressions" has recently been funded by EPSRC. The project will run for three years and include collaborations in the UK, China, Germany, and Sweden.
Strings in programming languages are sequences of characters that represent any kind of text. They are a fundamental aspect of information representation: user names, passwords, or indeed any kind of text are handled as strings.
The manipulation of strings, however, can also lead to subtle programming errors, that can have consequences for program correctness as well as information security. For example, a malicious user may enter computer code as part of their username. If the system is not sufficiently secure, this code can allow the user to hack the system. The Open Web Application Security Project lists this kind of attack among the top 10 application security risks.
Despite this kind of attack being well-known, it has proved surprisingly difficult to avoid due to the complex nature of computer programs. Ideally, programming mistakes will be caught during testing. However, manual testing is a tedious and time-consuming process which requires the developer to imagine every possible user input. Automatic test-case generation can take this burden away from the developer and allow more complete testing to be done more efficiently. However, this relies on the tools used for test-case generation to be able to accurately reason about how software will run.
This project will focus on how software deals with strings. Typically "regular expressions" are used for this purpose. However, current research takes an idealised view of regular expressions that omits many important features of the regular expressions provided by modern programming languages. We will address this shortcoming both in the theory of computer science and in practice. In particular, we will create a test-case generation tool-chain that will provide better test-case generation for software dealing with strings. These tools will be tested on real-world industrial code provided by our industrial partners.