GenAI for Software Engineering: "If you can specify it, I can synthesise it"
I recently attended the ESEC/FSE conference in San Francisco, where one of the keynote talks was titled "Towards AI-driven software development: challenges and lessons from the field" and delivered by Professor Eran Yahav from Technion. This post summarises my notes on the key messages of the talk.
Part of the context for the talk was a recognition that generative artificial intelligence (GenAI) technologies are seeing increasing usage across all stages of the software development lifecycle. Of course, the extent to which they are used is not uniform across all stages, with the most significant usage being for code and test generation. However, as demonstrated by some of the papers presented in other sessions of the conference, the step change in capability realised by the public release of LLMs over the past year has led to researchers exploring the use of GenAI in deployment and maintenance scenarios, for example generating outage reports based on runtime log information (Jin et al., 2023), and automating program repair (Wei et al., 2023).
A key point is that automation will be partial, with tasks like generating code based on converting requirements into specifications fully automated, and an emphasis on human-machine teaming for key tasks like requirements elaboration and acceptance testing. Such teaming requires the ability to precisely specify the task for the machine, suggesting a future where GenAI agents can say "If you can specify it, I can synthesise it". However, this is only worth doing if the agent can also satisfy the 'rule of delegation':
Cost of specification (input) + Cost of consumption (output) << Cost of manual work
To satisfy this, the mechanism for communication between the human software engineer and the AI agent is a key aspect. The agent should be:
- concise: both input and output should be precise and short, working across different abstraction levels (i.e. input can be a specification at a different abstraction level from output). Similarly, modalities of input and output can be different (e.g., object model diagram in, code out).
- reliable: giving human software engineers confidence that the output is correct without needing to expend lots of effort to verify it. Structuring the output appropriately, providing explanations, etc., can all help reduce this aspect of the cost of consumption.
- inquisitive: the agent should be able to explore the input/output space and get help from the human software engineer to verify the relevance of outputs to the task. Analogous to programming by example.
- reflective (self-aware): requires long-term memory to learn from past interactions and user feedback on performance, as well as self-evaluation of performance (links to inquisitiveness). Currently, this is not a property of GenAI-driven agents.
- personalised: the interaction of the agent should be tailored to the needs of the human software engineer.
At the reporting workshop @MiroslawStaron discusses the applications of generative AI to software engineering based on recent @FSEconf. “In the future there will be no place for bad programmers.” pic.twitter.com/oiZEFUQtDK
— Alexander Serebrenik (@aserebrenik) December 14, 2023
Additionally, several papers present surveys of the various applications of AI/ML to software engineering, including:
- Hou, Xinyi, et al. "Large language models for software engineering: A systematic literature review." arXiv preprint arXiv:2308.10620 (2023).
- Fan, Angela, et al. "Large language models for software engineering: Survey and open problems." arXiv preprint arXiv:2310.03533 (2023).
- Ozkaya, Ipek. "Application of Large Language Models to Software Engineering Tasks: Opportunities, Risks, and Implications." IEEE Software 40.3 (2023): 4-8.
Acknowledgements:
Title image created by OpenAI's DALL-E, generated on 15 Dec 2023.
Comments
Post a Comment