Software engineering is on the brink of a revolution with the emergence of large language models (LLMs). LLMs are AI systems that have been trained on large amounts of data, allowing them to generate natural language text and source code.

LLMs allow developers to specify intent using prompts, rather than writing complex code, to have task executed, and they can also take on the task of writing and debugging code, enabling developers to focus on higher-level tasks.

We can draw a parallel between this shift in software engineering and the “bitter lesson” in reinforcement learning research described by Richard Sutton: simpler approaches that scale better with more compute will eventually and inevitably outperform more complex approaches.

In this blog post, we will explore how LLMs will change the way we approach software engineering. We will discuss the potential implications of this shift, and the opportunities it presents for software engineers.

Additionally, I have released a proof-of-concept package in Python called llm-strategy, based on langchain, which allows developers to use LLMs to implement functions and interfaces in a more visible and direct way. This package includes a decorator that connects to an LLM (such as OpenAI’s GPT-3) and uses the LLM to implement abstract methods in interface classes. It does this by forwarding requests to the LLM and converting the responses back to Python data using Python’s @dataclasses.

What has happened?

Language models (LLMs) have recently made significant progress in their ability to understand and follow human instructions.

As a result, software engineering is facing a potential revolution with the emergence of LLMs. These models have the potential to change the way we approach software engineering in two main ways:

  1. LLMs allow developers to specify their intent using prompts, rather than writing complex code.
  2. LLMs can take on the task of writing and debugging code, enabling developers to focus on higher-level tasks.

The llm-strategy package can be useful for prototyping applications without writing a lot of backend code and still have the app react in meaningful ways. It uses the doc strings, type annotations, and method/function names as prompts for the LLM, and can automatically convert the results back into Python types (currently only supporting @dataclasses). It can also extract a data schema to send to the LLM for interpretation. While the llm-strategy package still relies on too much Python code for serialization, there is the potential to reduce the need for this code in the future by using additional, cheaper LLM calls to automate the parsing of structured data.

@llm_strategy(OpenAI)
def query_database(database: Database, query:str) -> Table:
    """Query the database using a natural language query `query` and return 
    the resulting table.
    
    Example
    =======
        >>> query_database(database, "SELECT * FROM EMPLOYEES")
        Table(columns=("employee_id", "name", "address", ...),
              data=[["1123123", "John Miller", ...],[...]]     
            
    Arguments
    =========
    ...
    """
    raise NotImplementedError()

Example: Mock Customer Database

As an example, we used the llm-strategy package to create a mock customer database viewer using textual as console UI, which generates mock customer data using GPT-3 and implements basic lookup functionality using GPT-3 as well. This was achieved by defining the relevant interfaces and dataclasses and providing the necessary doc strings. The rest was handled by calling into LLMs using the “llm-strategy” package.

Here are some screenshots of the mock customer database viewer in action:

Customer Database Viewer
Searching for a Customer
Searching for a Customer

Here is an example of the Python code used to create the mock customer database viewer:

from dataclasses import dataclass
from llm_strategy import llm_strategy
from langchain.llms import OpenAI


@llm_strategy(OpenAI(max_tokens=256))
@dataclass
class Customer:
    key: str
    first_name: str
    last_name: str
    birthdate: str
    address: str

    @property
    def age(self) -> int:
        """Return the current age of the customer.

        This is a computed property based on `birthdate` and the current year (2022).
        """

        raise NotImplementedError()


@dataclass
class CustomerDatabase:
    customers: list[Customer]

    def find_customer_key(self, query: str) -> list[str]:
        """Find the keys of the customers that match a natural language query best (sorted by closeness to the match).

        We support semantic queries instead of SQL, so we can search for things like
        "the customer that was born in 1990".

        Args:
            query: Natural language query

        Returns:
            The index of the best matching customer in the database.
        """
        raise NotImplementedError()

    def load(self):
        """Load the customer database from a file."""
        raise NotImplementedError()

    def store(self):
        """Store the customer database to a file."""
        raise NotImplementedError()


@llm_strategy(OpenAI(max_tokens=1024))
@dataclass
class MockCustomerDatabase(CustomerDatabase):
    def load(self):
        self.customers = self.create_mock_customers(10)

    def store(self):
        pass

    @staticmethod
    def create_mock_customers(num_customers: int = 1) -> list[Customer]:
        """
        Create mock customers with believable data (our customers are world citizens).
        """
        raise NotImplementedError()

The full example is here.

Richard Sutton’s Bitter Lesson

The emergence of LLMs in software engineering can be seen as a manifestation of Richard Sutton’s “bitter lesson” in reinforcement learning research, which states that simpler approaches that scale well with more compute will eventually outperform more complex approaches. In the context of software engineering, LLMs offer a potentially simpler and more scalable approach to implementing complex tasks, such as writing and debugging code.

As LLMs continue to improve and become more widely adopted, it is likely that they will eventually surpass more traditional, complex approaches to software development in terms of efficiency and effectiveness. This shift towards simpler, more scalable approaches is similar to the trend that Sutton observed in reinforcement learning research, and highlights the importance of staying attuned to advancements in technology and continuously seeking out more efficient ways of solving problems.

LLMs offer a cost-effective and efficient way to realize intent in software engineering. For example, the llm-strategy package in Python allows developers to quickly prototype and experiment, without having to write complex code themselves. LLMs can also generate code quickly, enabling software engineers to focus their time and resources on other aspects of the software development process, such as debugging and fixing code that fails.

This shift towards using LLMs to encapsulate complexity and execute intent in software engineering is reminiscent of the deep learning revolution, which has unlocked new value and opportunities. As LLMs continue to improve and become more widely adopted, it is likely that they will eventually surpass more traditional, complex approaches to software development in terms of efficiency and effectiveness.

However, in the near term, developers still need to consider the trade-offs between using an LLM and manually writing code. For example, in cases where reproducibility or performance are important, developers still need to write code themselves. But in other (simpler) cases, using an LLM to execute intent and debug and fix code will be a more efficient and effective approach.

Language models are changing the way we write software, allowing us to focus more on intent and less on implementation. By adapting to these changes and leveraging the power of LLMs, we can avoid the “bitter lesson” of being left behind as simpler approaches outperform more complex ones.