GenAI for Software Engineering: "If you can specify it, I can synthesise it"

Will AI agents become integral members of software engineering teams?

I recently attended the ESEC/FSE conference in San Francisco, where one of the keynote talks was titled "Towards AI-driven software development: challenges and lessons from the field" and delivered by Professor Eran Yahav from Technion. This post summarises my notes on the key messages of the talk.

Part of the context for the talk was a recognition that generative artificial intelligence (GenAI) technologies are seeing increasing usage across all stages of the software development lifecycle. Of course, the extent to which they are used is not uniform across all stages, with the most significant usage being for code and test generation. However, as demonstrated by some of the papers presented in other sessions of the conference, the step change in capability realised by the public release of LLMs over the past year has led to researchers exploring the use of GenAI in deployment and maintenance scenarios, for example generating outage reports based on runtime log information (Jin et al., 2023), and automating program repair (Wei et al., 2023).

A key point is that automation will be partial, with tasks like generating code based on converting requirements into specifications fully automated, and an emphasis on human-machine teaming for key tasks like requirements elaboration and acceptance testing. Such teaming requires the ability to precisely specify the task for the machine, suggesting a future where GenAI agents can say "If you can specify it, I can synthesise it". However, this is only worth doing if the agent can also satisfy the 'rule of delegation':

Cost of specification (input) + Cost of consumption (output) << Cost of manual work

 To satisfy this, the mechanism for communication between the human software engineer and the AI agent is a key aspect. The agent should be: 

  • concise: both input and output should be precise and short, working across different abstraction levels (i.e. input can be a specification at a different abstraction level from output). Similarly, modalities of input and output can be different (e.g., object model diagram in, code out). 
  • reliable: giving human software engineers confidence that the output is correct without needing to expend lots of effort to verify it. Structuring the output appropriately, providing explanations, etc., can all help reduce this aspect of the cost of consumption.
  • inquisitive: the agent should be able to explore the input/output space and get help from the human software engineer to verify the relevance of outputs to the task. Analogous to programming by example.
  • reflective (self-aware): requires long-term memory to learn from past interactions and user feedback on performance, as well as self-evaluation of performance (links to inquisitiveness). Currently, this is not a property of GenAI-driven agents.
  • personalised: the interaction of the agent should be tailored to the needs of the human software engineer.
The integration of GenAI agents into the software engineering process also has implications for the architecture of software engineering tools (and any software that uses LLMs as its base processing layer). The architecture of these systems would be composed of agent programs that interact with a long-term memory layer, a context layer, and a LLM layer. 

Architecture for GenAI Agent Systems

It was suggested that these layers are analogous to the main memory, registers, and CPU of a classical computing architecture. Taking this analogy further, the CPU itself would likely be replaced by several application-specific integrated circuits (ASICs), which maps to the idea that instead of a monolithic general LLM, the architecture would integrate several specialised LLMs that are used depending on the context of the task being performed. The rationale for this is that the smaller models offer better performance, flexibility, etc. It also allows the GenAI agents to be customised to the problem and solution domains of the software engineering team. This architecture also suggests the need for an adaptive management layer to help monitor and change the structure and behaviour of the specialised LLMs depending on the context and quality requirements that need to be satisfied.

One key observation following the talk was that it focussed on the software development lifecycle stages from development onwards, with little to no consideration of requirements elicitation, analysis and system design. However, it wasn't because these were not considered relevant or important, but rather because they were not the speaker's main area of work.  The talk also didn't address the challenges of 'supervisory control' if the GenAI agent is allowed to independently evolve the code (e.g., automatically patching code following failures) where automation can make it harder for the human software engineer to make sense of the codebase when the agent is no longer able to cope.

A broader summary of the use of GenAI in software engineering was presented by Miroslaw Staron from the University of Gothenburg at a recent reporting workshop: 

Additionally, several papers present surveys of the various applications of AI/ML to software engineering, including:

  • Hou, Xinyi, et al. "Large language models for software engineering: A systematic literature review." arXiv preprint arXiv:2308.10620 (2023).
  • Fan, Angela, et al. "Large language models for software engineering: Survey and open problems." arXiv preprint arXiv:2310.03533 (2023).
  • Ozkaya, Ipek. "Application of Large Language Models to Software Engineering Tasks: Opportunities, Risks, and Implications." IEEE Software 40.3 (2023): 4-8.

Title image created by OpenAI's DALL-E, generated on 15 Dec 2023.


Popular posts from this blog

Cloud Wedge - geek of the week

Priming Code Club

Are we losing the Internet Security battle?