research agenda

vision:

The modern world has embraced cyber-physical systems. From small and implanted medical devices to large-scale industrial plants, electrical smart grids to autonomous vehicles, the Internet-of-Things and wearable electronics: Cyber-physical systems are everywhere. However, with increased adoption comes increased risk: as demonstrated by the year-on-year increase in cyber-attacks on this infrastructure. It is clear that despite the pervasiveness of these devices, their relative cybersecurity remains an afterthought, with other competing design requirements taking precedence. This challenge motivates my research focus: We can and must do better in understanding and improving the cybersecurity of cyber-physical systems throughout their complete product lifecycle. My current emphasis in this area is twofold: firstly, considering the emerging implications of machine learning within the design space of a given system; and secondly covering attack methodologies and novel defensive mechanisms in modern (Industry 4.0) cyber-physical systems. My interest in these topics stems from a broader research vision towards end-to-end first-class cybersecurity frameworks and design flows; promoting correct-by-construction engineering and automated tooling where possible.

recent, ongoing, and future research:

Implications of AI / Large Language Models (LLMs) in Software and Hardware Development:

Recent advances in program synthesis by AI-based LLMs (such as GPT-3, Codex, and Jurassic J-1) are enabling products like GitHub Copilot to transform the ways that developers write code. Despite their novelty and still-restricted access, they are quickly seeing mass adoption (e.g. GitHub reports that nearly 30% of new code written in common languages committed to their platform is now written with Copilot; and 50% of developers that have tried it since its release in June 2020 have continued using it since). However, this is not without risk. As an early adopter of GitHub Copilot, I performed a study demonstrating that up to 40% of its generated code within cybersecurity related contexts contained vulnerabilities that may be exploitable, indicating that fundamental questions about how these LLMs are developed and used still need consideration. Further consequences can also occur downstream when these language models use languages like Verilog to develop hardware: as bugs introduced in this stage of a product’s design may have catastrophic consequences on the security of the end system. Pairing an LLM with a suitable security-aware toolchain may be one solution, or perhaps it may be possible to augment the training of the LLMs to discourage vulnerable outputs.

In addition to this fundamental question, there are many other angles that may be considered in this research. For instance, given the LLMs can produce prodigious quantities of functional code from program descriptions, can LLMs be used for automated vulnerability repair? In this domain, state-of-the-art tooling focuses on exhaustive searches over program mutation according to sets of encoded rules. LLMs, as they can produce new code given suitable prompting, present a new way to perform this mutation and potentially may be able to perform ‘out-of-the-box’ strategies for repair. In order to investigate this I designed an experiment which paired ensembles of LLMs with automated functional and security checkers to produce a pipeline for automated program repair capable of patching 100% of a proof-of-concept set of synthetic vulnerability examples, as well as 58% of vulnerabilities from a selection of historical bugs in real-world open-source projects. Most impressively, the investigated models performed in the zero-shot setting, that is, without specifically being trained for this purpose. Another angle, supported in part by funding from the Naval Research Office, investigated the usage of LLMs in the domain of reverse engineering - where rather than produce code, we explore if a LLM can produce equivalent documentation and extract key information about given program snippets from the malware and industrial cyber-physical systems domain. Here, the LLM answered a narrow majority (51%) of ‘Q&A’ style questions correctly, again in the zero-shot setting.

What’s next? Given the plethora of expanding vulnerabilities in the hardware and embedded software settings, development of the next generation of trustworthy and reliable tooling is extremely important. I believe that security-aware language models will be a large part of this solution, and my focus will be on three areas within this: bug detection (can language models find vulnerabilities in written code?) and bug mitigation (can they repair them?). This could involve the creation of bespoke language models, especially for use within the RTL languages such as Verilog and VHDL. Open challenges include the ‘prompt engineering’ of the model inputs to best determine representations of designs amenable to analysis, defining datasets of working examples to validate model performance (especially for hardware designs, where hardware CWE samples are too small for most ML tasks). Other avenues I intend to explore include examining the consequences of LLM adoption (does it increase or decrease bug incidence rate?), data poisoning and information leakage attacks (will a LLM unwittingly reproduce errors or sensitive information in its training dataset?), stylography and developer impersonation (can malafide developers alter code to introduce vulnerabilities that appear to have been written by other authors?). I am particularly interested in new collaborations to explore these questions, especially including industry partners like that I have already fostered with OpenAI and Micron.

Cybersecurity of Embedded and Cyber-Physical Systems:

There is an ever-increasing utilization of complex embedded systems across many different domains: manufacturing, electrical grids, transport, aerospace, etc. However, as the reliance on these systems has increased along with their complexity, so too has their connectivity and attack space. Where industries may once have isolated their control systems only to their local human operators, internet-connected systems are now the norm in most environments, with complex computer controllers now featuring full operating systems and programming support. Already in the literature (and in the media in spectacular cases) numerous hacks have been documented, including on consumer vehicles, uranium centrifuges, electrical generators, steel mills, and water/wastewater processing plants. As such, research in this area has the potential for great impact within the multi-billion dollar world-wide Industry 4.0 domain. Recent work I have undertaken in this area has primarily been within the hardware and software Trojan domain. Firstly, within a project supported by the National Science Foundation, I have been examining how Trojans can alter the outputs of additive manufacturing, and designed a firmware Trojan (less than 1.7KB in size) which could subtly impact the outputs of commercially available 3D printers (reductions in printed part tensile strength of up to 50%). I am also part of a larger team which is supported by funding from the Department of Energy which is performing a design space exploration of hardware trojans in the PCBs of industrial PLC controllers - both in their insertion and in novel defensive and detection strategies. I took responsibility for the design of our bespoke OpenPLC-based hardware experimental platform, and we have demonstrated success in deployment and detection of Trojans.

I have also examined how approaches from the formal methods domain can be applied to cybersecurity in this space. Utilizing run-time enforcement hardware, I was able to demonstrate that small hardware extensions could be used to guarantee a minimum quality of service from an implantable medical device (specifically, a pacemaker) even in the face of a malicious attacker by providing a new, formal approach for compiling protection logic into Verilog. Extending this work, I then demonstrated how this approach could be used to guarantee certain run-time safety properties for industrial applications (such as smart grid substation control). Crucially, noting that mechanisms for ensuring security often themselves feature security holes, my methodology enabled automated model checking of the enforcers after synthesis, meaning that the enforcer implementations were guaranteed to meet their designs.

What’s next? My future lines of enquiry in this area seek to continue both threads. While ML-based approaches based on side channel leakage profiles have been demonstrated as a promising mechanism for Trojan detection, catching subtle Trojans with complex activation triggers remains an open challenge, prompting further inquiry into other kinds of measurement-based detection. Further expanding the defensive side of hardware security to include trusted systems for guaranteeing a minimal set of safety properties will involve modifications to internal software layers and/or changes to their I/O modules and design. Other industrial directions will also be considered, including studying the cybersecurity implications of manufacturing with ‘Cobot’ robotic platforms which share workspaces with human collaborators.

Privacy-preserving hardware and shielding:

My most recent research area has been examining novel shielding for use in preventing unwanted EMI emissions and absorption, proposing MXenes for their use in this area. We have found that very thin layers (less than 3 micrometers) can achieve shielding in excess of industry-standard copper layers (35 micrometers). In a similar theme, I have also begun consideration of a ‘digital shield’ implemented in hardware to provide fine-grained protection over potentially sensitive data.