AI Assistants 3 – The future of AI assistants

Over the past six months, I conducted a within-subjects study involving 24 software developers to observe their interactions with various LLM-based programming assistants. The developers tackled a range of programming tasks, while I collected data for 16 distinct productivity metrics using an activity tracker, self-assessment surveys, and screen recordings. The results of my study, which are summarized in this series of articles, offer a nuanced perspective on the current state of AI programming assistants and their influence on software developers.

Articles: Part 1 | Part 2 | Part 3


💭 The two main beliefs about the future of AI assistants:

▪ Seperation: The artificial intelligence we are experimenting with is fundamentally different from human intelligence. Thus, AI assistants will not replace human workers for a long time or even indefinetly.

▪ Convergence: Artifical intelligence and human intelligence evolve along the same trajectory. The current distinction between AI assistants' and human work is merely a matter of sophistication. Thus, AI assistants could replace human workers in the long term.

🛠️ Current AI assistants are clearly tools as they are limited by the following characteristics:

▪ Tool-like: Current AI assistants are used like every other tool, where the humans lead the problem-solving and AI assistants merely support them in automating operational activities. 

▪ Mindless: While current AI assistants generate outputs similar to human outputs, they achieve these results by probabilistic pattern matching instead of intelligent thought processes. At the moment, they do not possess a human-like intellect that includes phenomenons like intuition, empathy, and wisdom.

▪ Human-Dependent: AI assistants require precise human guidance to generate correct outputs. Thus, the quality of their answers is largely determined by the abilities of the human that guides and corrects the AI assistant.

▪ Small-scoped: The context that current AI assistants are able to handle is quite small compared to the scope of real world projects. Therefore, they are often unable to support complex, large-scale tasks.

🧠 Some sources argue that AI assistants differ from previous tools, which could lead to disruptive advancements and real artifical intellect in the future. They put forward the following arguments:

▪ Training larger AI models with more data could lead to true understanding
▪ AI assistants are versatile and they could become all-purpose agents in the future
▪ Neural networks mimic the human brain and develop on the same trajectory

🚧 Nevertheless, there are many strong counter arguments to these claims. I believe that one needs to be carful when projecting intelligence on AI assistants, as there are many hurdles that go far beyond computer science which have to be overcome first to achieve truly intelligent assistants:

▪ AI Assistants need to become better at reasoning and begin to truly understand information
▪ AI assistants need to be able to make sense of massive contexts
▪ AI assistants need to be able to autonomously find coherent solution paths to abstract goals
▪ AI assistants need to be able to understand and empathize with human perception
▪ AI assistants need to be able to access and learn from implicit knowledge
▪ AI assistants likely need to exhibit intellectual mechansims that we do not understand

The big debate about the future of AI assistants

The introduction of LLM-based assistants and the discovery that they can generate human-like outputs sparked a debate about their future impact. Two main viewpoints exist in recent discussions on this subject.

Separation: One viewpoint suggests that the artificial intelligence that we are experimenting with is fundamentally different from human intelligence. According to this view, even significant advancements in technology will not result in AI assistants possessing human-like intellect. Both the human intellect and the capabilities of AI assistants advance, however each in their separate lanes. Humans will be able to think on an increasingly abstract level, while AI assistants will become increasingly advanced tools. Nevertheless, humans are the driving intelligence while AI assistants will remain a supporting tool. Consequently, the tasks performed by humans and AI assistants will remain separate for a considerable duration, possibly even indefinitely.

Convergence: The other side instead argues that because of the unique potentials of technologies like LLM, artificial intelligence can advance rapidly over the next years to become increasingly similar to human intelligence. Therefore, we reach a point where human intelligence and artificial intelligence converge. In this scenario, the relationship between humans and AI assistants changes over time, as increasingly sophisticated assistants could become the driving intelligence behind projects. Thus, humans and AI assistants could perform identical tasks in the long term, becoming virtually indistinguishable from each other.

I plan to explore these two viewpoints using the example of software developers and current LLM-based programming assistants. The field of software development is a suitable example as it includes tasks that span a wide range of cognitive complexity, from simple tasks like typing out code according to pre-defined specifications to complex ones like designing the architecture of a whole system.

With software development as an example, this article first outlines the two main beliefs about the relationship between AI assistants and humans. Then, using my study and existing literature, I establish a clear definition of what AI assistants currently are, which is rooted in empirical data and the technologies that we already know. Lastly, I discuss how AI assistants might evolve and affect us in the future to contribute to the larger debate about AI assistants.

💭 Separation or Convergence?

It is important to clearly define the two highlighted perspectives to understand their roots, what they involve, and their implications. I will outline them by considering the arguments put forward by various organizations, leaders, essayists, and developers. I highly recommend checking out these reads for yourself!

Separation of humans and AI assistants

In an essay called “AI and the Automation of Work”, the analyst and essayist Benedict Evans draws parallels between AI assistants and past automation waves, arguing that humans and AI assistants will remain distinct for a considerable time. He reasons that since the Industrial Revolution, automation has merely shifted human work towards more complex tasks, rather than replacing human intellect. In his opinion, LLMs are just another automation wave, accelerating routine tasks in white-collar jobs without causing a fundamental shift.

Peter Körte, the Chief Technology and Strategy Officer of Siemens, paints a similar picture in the Siemens Industrial AI Podcast. He outlines how robotics introduced automation on the shop floor, thus impacting blue-collar workers. Even though machines are widely used in production companies nowadays, blue-collar workers are still needed. The machines did not replace the workers, rather they augmented them to solve repetitive tasks easier. From his point of view, artificial intelligence will play out the same, just in the domain of white-collar work. In both cases, the new technology is simply another step in the automation of routine tasks, which enables workers to focus on high-level activities that require problem-solving, creative, or empathic capabilities.

Doug Seven and Sandeep Pokkunuri, who lead the development of the LLM-based programming assistant Code Whisperer at Amazon, share a similar perspective on the impact of their AI assistant in software development. They highlight that Code Whisperer can provide great support to developers by performing repetitive tasks like generating code. However, one needs to be aware that generating code is just a small part of developers’ actual work. At the very core, software developers are problem solvers, and this intellectual skill is unaffected by AI assistants. Developers will remain the driving intelligence in software development projects, or as Sandeep Pokkunuri says: “The paradigm shift is going to happen not in the core programming software development process. We are traveling on the same road. Instead of going on a bicycle, you’re going on a Ferrari”.

The broad consensus of the mentioned voices is that there is a clear distinction between humans and AI assistants. Human intelligence and artificial intelligence are two different lanes and have an intelligence-tool relationship. Human intelligence comes with unique features that put it in the driver’s seat of our work and personal lives. Artificial intelligence on the other hand is a technology that can be leveraged in tools to automate the operational tasks we face. However, the lanes do not converge. AI assistants become increasingly sophisticated but stay in their lane of tools.

Convergence of humans and AI assistants

Contrary to the previous perspective, some sources suggest a more dramatic change where AI assistants will quickly evolve and their intelligence will match that of humans. They expect these changes to happen not in the distant future, but quite soon.

In March 2023, a petition surfaced, calling for a halt to the advancement of artificial intelligence beyond the capabilities of GPT-4. This petition, backed by numerous public figures, sparked widespread attention. The attached document called “Policy Making in the Pause”, put forth several proposals. While many, like the establishment of an independent AI auditing process and regulatory frameworks, were not novel, some statements stood out. One such statement was that “advanced AI could represent a profound change in the history of life on Earth”. This was justified by a lack of natural laws or barriers that could hinder AGIs, suggesting that AI assistants could become “competitive with humanity’s best at almost any task […] sooner than many expected”. The petition clearly expresses the viewpoint that AI assistants are not merely tools. Instead of improving in their own lane, they will start to swerve into the lane of human intellect and can take on the driving role in work and daily life.

A Business Insider article titled “The End of Coding as We Know It” elaborates on this point of view by contemplating the full replacement of developers in the future. It includes several perspectives that believe current AI technology is geared towards replacing humans rather than enhancing their capabilities, potentially leading to the replacement of human developers within 5 – 10 years. The article mentions the medium post “ChatGPT Will Replace Programmers Within 10 Years” by Adam Hughes, which provides a hypothetical timeline of how the author believes AI assistants will take over software development completely by replacing human intelligence with artificial ones.

OpenAI itself has also fueled the notion of a new form of intelligence emerging, as evidenced by the title of their GPT-4 paper: “Sparks of Artificial General Intelligence”. The paper states: ”The central claim of our work is that GPT-4 attains a form of general intelligence, indeed showing sparks of artificial general intelligence. This is demonstrated by its core mental capabilities (such as reasoning, creativity, and deduction), its range of topics on which it has gained expertise (such as literature, medicine, and coding), and the variety of tasks it is able to perform (e.g., playing games, using tools, explaining itself)”. These conclusions suggest that GPT-4 has achieved a new level of general intelligence, aligning with previous predictions that artificial intelligence is becoming increasingly similar to human intelligence.

The highlighted voices do not draw a clear distinction between human intelligence and artificial intelligence. According to their argumentation, AI assistants and humans evolve along the same trajectory. Therefore, the current distinction between AI assistants and human intelligence is merely a matter of sophistication, and artificial intelligence could rapidly progress to catch up to human intellect.

🛠️ Current AI assistants are clearly tools

Despite the different long-term predictions, both beliefs agree that current AI assistants, such as Copilot and ChatGPT, are merely tools. The capabilities of these assistants have been thoroughly researched, and their limitations have been outlined. In this section, I will use insights from my research and related studies, which are presented in this article, to establish a clear description of how current AI assistants should be characterized based on empirical evidence.

Tool-like: In the studies, both researchers and participants classified AI assistants as tools, rather than intelligent helpers as their names might suggest. The assistants are seen as additions to a developer’s existing toolbox, akin to tools like StackOverflow, GitHub, Code Formatters, etc. This is further exemplified by the fact that already existing non-AI tools that are specialized in one specific domain such as automatic program repair, often outperform LLMs-based assistants.

Mindless: Despite their proficiency in generating code and suggestions, participants were quick to realize that AI assistants lack a human-like thought process. They are useful pattern-matching and reasoning engines, which makes their outputs seem intelligent. Nevertheless, they do not possess a human-like intellect that includes phenomenons like intuition, empathy, and wisdom. For example, participants noticed that they can use these human traits to visualize the users and their behavior, which allows them to design software that is valuable and intuitive to human users. AI assistants do not possess such abilities as they simply try to find probable outputs that fit into the given code context. I also highlighted this characteristic in another article, where I experimented with LLM agents that program an application autonomously. The interface of the application became increasingly incoherent with the application size since the LLM agents were not able to define and follow an overall direction or vision for the application.

Human-Dependent: The lack of a self-determined direction also makes AI assistants dependent on humans. Participants in the studies noted that current AI assistants require exact guidance from a developer who is literate in the coding language. When interacting with AI assistants, developers must craft prompts precisely, review recommendations thoroughly, and know how to leverage the suggestions in the right way to create sophisticated and coherent applications.

Small-scoped: The context that AI assistants are able to process is quite small compared to the scope of real-world projects. Participants noted that using the assistant to generate single lines of code worked reliably, but more abstract prompts to generate large functions introduced ambiguity and often led to incorrect code. Furthermore, they found similar limitations when using AI to tackle conceptual problems such as defining the overall architecture of a system. Because of the limited context size, the AI assistants could not grasp all the complex dependencies, interfaces, and data structures needed to make an informed decision about how to structure the application.

Keeping these characteristics in mind helps to demystify current AI assistants. By interacting with the current AI assistants, it becomes clear that they are versatile but mindless, human-dependent, and small-scoped tools. While they have positively affected the developers in my study by increasing their speed, satisfaction, and understanding of the code, there is no evidence to suggest they are more than tools. Thus, their current impact is limited by the fact that they only augment a human-driven workflow.

But what about the future?

The two opposing viewpoints presented at the beginning of this article predict a fundamentally different development of artificial intelligence in the future. One side argues that there are substantial hurdles that prevent AI assistants from evolving beyond their current status as tools. The other side suggests that artificial intelligence could eventually achieve a level of intellect on par with humans, implying that AI assistants could potentially replace white-collar workers in nearly any task. While I personally agree with the prior, I want to explore both sides for the sake of this discussion.

🧠 AI assistants are different from previous tools

In the past, there was no speculation about whether tools like online forums or code formatters would replace human developers. It was widely understood that these resources and extensions are simply tools, improving within their distinct areas. Therefore, it is important to understand what makes AI-based tools unique and why some sources believe these unique characteristics will allow them to evolve past their current status as tools.

Training larger AI models with more data could lead to true understanding

On the surface, the output of language models like GPT-based assistants can be indistinguishable from human-generated content. For instance, Brown et al. (2020) conducted a study where participants found it challenging to differentiate between news articles written by humans and those generated by GPT.

OpenAI’s GPT-4 paper argues that these human-like outputs are not just reproductions of the training data but actually stem from the fact that the neural network developed general structures for understanding information during training. Thus, LLMs might be able to develop increasingly abstract skills through further training, a capability that previous tools lacked. By leveraging increasingly sophisticated models and increasingly large data sets, very general reasoning structures could form, which potentially equip AI assistants with human-like understanding and intellectual abilities.

AI assistants are versatile and they could become all-purpose agents in the future

In addition to their capabilities in generating text, code, and multimedia, researchers have discovered numerous intriguing additional use cases for LLMs. For instance, LLMs can serve as reasoning engines in autonomous agents, enabling them to simulate human behavior in virtual environments. They were also able to control physical laboratory equipment to conduct experiments in the real world.

Unlike previous tools that were limited to specific use cases, LLM-based tools display versatility across various domains. This phenomenon has given rise to the idea that highly advanced AI assistants could handle diverse challenges, which could make them all-purpose agents who are able to navigate our complex world on their own.

Neural networks mimic the human brain and develop on the same trajectory

While previous tools like Code Formatters use fixed logical structures like decision trees, AI assistants are based on neural networks. This makes them unique as some individuals subscribe to the idea that intelligence solely arises from the way our brain is structured. Thus, the brain is a self-contained system, that transforms the inputs we receive through perception into outputs, which is our behavior. If the human brain is just an input-output system, then one can speculate that this system can be rebuilt using neural networks in computers.

Joseph Carlsmith published an interesting report called “New Report on How Much Computational Power It Takes to Match the Human Brain“, which is based on this belief. By using the underlying assumption that the neural networks in computers and the neural structures of our brains are comparable, Carlsmith estimates how much computing power a computer needs to solve tasks as well as the human brain. Based on this report, one might draw the conclusion that creating intelligent AI assistants simply requires a massive increase in computational power to match the capabilities of the human brain.

Nevertheless, Carlsmith cautions that constructing intelligence likely is not just a “hardware” problem. He highlights that building true intelligence involves not only constructing the fundamental structure of the system but also designing the “software” that operates within it. Thus, we would need to re-engineer aspects such as consciousness, subconsciousness, emotions, intuitions, and other brain phenomena in order to construct software that is able to “run” the artificial brain.

🚧 Being different does not make AI assistants intelligent

Even though the presented arguments might sound compelling, one needs to be careful with labeling something as intelligent. Just because AI assistants are able to mimic human outputs, does not prove that they can transition from stochastic tools to genuinely intelligent agents. Many researchers criticize the previous arguments for being too simplistic and projecting human characteristics onto current AI assistants which are merely “dumb” probabilistic algorithms.

With the current state of AI assistants serving as a foundation, I want to engage in a small thought experiment. In this thought experiment, I will start with the current state of AI assistants, envision their future development towards genuinely intelligent AGI assistants, and highlight the obstacles that stand in the way of achieving this goal. Personally, I believe that these hurdles indicate that AI assistants will likely remain tools. However, by reflecting on the challenges I present, you can form your own perspective on the likelihood of truly intelligent AI assistants emerging.

AI Assistants need to become better at reasoning and begin to truly understand information

AI assistants have made significant progress in reasoning, but further improvement is necessary to approach human-level intelligence. Personally, I believe that advanced reasoning capabilities with LLMs are feasible, given the continuous enhancement of model performance. Notably, GPT-4 has already shown remarkable advancements compared to GPT-3. However, there might be potential barriers where LLMs may struggle to reason about certain topics involving ethics, emotions, or values.

Additionally, researchers need to definitively prove that AI models truly understand information rather than simply replicating patterns in the training data. Melanie Mitchel, a professor who researches artificial intelligence at Santa Fe, points out, that AI models produce non-humanlike errors, which suggests that the models do not truly understand the information that they are processing. The debate on whether a fundamental component of understanding is missing in neural networks or if their limited understanding can be corrected with more sophisticated models is still ongoing.

AI Assistants need to be able to handle massive context

The advancements in the reasoning capabilities likely need to be accompanied by improvements in how context is handled by the assistants. Current AI programming assistants for example can only process a small chunk of code currently opened in the editor. However, to construct sophisticated software systems, the task of piecing modules and components together into one coherent system will inevitably emerge.

Senior developers solve such tasks by thinking on an abstract level, using visualization, and considering huge contexts to make decisions. Current LLM-based assistants are not able to make such decisions as the massive context they would need to consider cannot be communicated to them in an effective way. Furthermore, Liu et al. (2023) demonstrated that expanding the context size for LLMs does not necessarily result in more precise outputs. Therefore, it is necessary to discover new techniques that can effectively break down complex and voluminous information, such as a code repository, in a way that only the chunks of information that are relevant to the current tasks are fed to the LLM.

I personally believe that this problem is hard to solve, as we know quite little about how our brains store and retrieve information. Current databases that reside in huge data centers seem quite primitive compared to our small organic brains that store decades of memories and can retrieve relevant information either subconsciously or consciously to solve tasks at hand. Without knowing how memories, intuition, and other storage processes in our minds work, it seems difficult to build an artificial intelligence that can handle a comparable context size.

AI assistants need to be able to autonomously find coherent solution paths to abstract goals

As the capabilities of LLM-based programming assistants continue to improve, there is a growing need for them to operate with greater autonomy. As observations by Mozannar et al. (2022) and my study show, humans are in the driver’s seat when working with current AI assistants and spend a lot of time guiding and correcting them. However, solving high-level tasks will likely require more autonomous agents that can break down a complex problem into small steps, research and conceptualize how to solve those steps, and then execute their plan.

The research of Park et al. 2023 and Boiko et al. (2023) is noteworthy in this regard since the authors successfully used autonomously running LLMs to simulate character behavior or interact with lab equipment. While these approaches seem promising, I am still skeptical if LLM agents can tackle abstract goals with unknown paths to success. Human-like intelligence includes crafting long-term visions and acting these out with a coherent direction over long periods of time. It is unclear if AI assistants can follow long-term visions just by rational reasoning or if other concepts like intuition, motivation, or intrinsic values are needed. After all, the real world is not completely logical and many problems cannot be broken down into a logical step-by-step plan. If the path to success is unknown, humans often proceed based on their instincts rather than logical reasoning.

During the initial phases of novel software projects, for example, developers typically dedicate weeks to brainstorming sessions, engaging in debates to explore different abstract approaches for solving the given problem. In my opinion, the nature of these tasks is quite explorative and value is created from the developers’ ability to think creatively beyond a strictly logical plan.

AI assistants need to be able to understand and empathize with human perception

Our world is defined by our perception. For example, human developers constantly visualize users and real-life use cases when writing code. They imagine who will use the application and in which context they will engage with the application to predict the tasks, interaction patterns, and goals of the users. Only by taking on the perspective of the users, developers are able to program an application that is useful to humans and complies with the way humans think. When programming autonomously, LLM agents struggle to create applications that are coherent to humans, as they are only trained based on the code itself but not on the real-world context that the original developer had in mind.

In my opinion, it is quite difficult to make AI models understand human behavior and real-world contexts. AI assistants will inevitably need to comprehend concepts like motivation, emotions, and experiences in order to behave like humans. Human workers instinctively incorporate these elements into their work, and for AI to replicate this behavior, it likely will have to possess a similar understanding.

AI assistants need to be able to access and learn from implicit knowledge

The chemist and philosopher Michael Polanyi argued that human intelligence is based on a combination of implicit and explicit information. In contrast, current Language Models (LLMs) are solely trained on explicit information.

To illustrate this further, let’s consider the analogy of software developers programming user interfaces. When developers create interfaces, they draw inspiration from existing interfaces but more importantly, they also empathize with users by envisioning how they would interact with the application. On the one hand, the developers produce the code, the interface of an application, and supporting documentation as the explicit information that is written down and readily available. On the other hand, the thought process of envisioning the users and their use context also produces implicit knowledge, which only resides in the mind of the developer.

As LLMs are trained on explicit knowledge, they only consider the explicit code of other interfaces but do not learn the implicit thought processes that led to the development of the interface. In other words, the AI assistants are trained on WHAT humans produced but never WHY humans produced this. Programming code, the same as natural language, is a simplified representation of our human experience. Thus, I believe that even if artificial intelligence has the capability to mirror human intelligence, it certainly would have to work with all the information humans perceive. It seems logical to me that if we keep on training LLMs on explicit representations, we will only teach them to reproduce these representations. In order to train LLMs to think like humans, we need to give them access to the implicit information we process consciously or subconsciously. This seems like a hard problem to me as it is not clear how we can process or transfer implicit information and debates about this topic quickly reach the limits of philosophy.

AI assistants likely need to exhibit intellectual mechanisms that we do not understand

In his blog article “Why transformative AI is really, really hard to achieve”, the DeepMind Engineer Zhengdong Wang points out that from philosophical, psychological, and neuroscientific perspectives, we are still quite unsure about what intelligence truly is. He points out that we have been debating related philosophical questions, like morality, consciousness, and personal identity, for thousands of years.

Furthermore, human intelligence comes with mysterious mechanisms like intuition, emotion, and wisdom. These mechanisms are clearly important for our intelligence, as developers for example often report that they are most productive in the state of “flow”. In this state, they do not actively reason about solutions and instead, the solutions naturally emerge from their subconsciousness or intuition.

Melanie Mitchel, a professor who researches artificial intelligence at Santa Fe, adds to this point in her paper “Why AI is Harder Than We Think”. In the paper, she argues that intelligence should not be seen as a disconnected process that takes place in a confined input-output system like the brain. Instead, she provides examples highlighting that intelligence could also be closely intertwined with our organic body. Thus, to replicate human intelligence one might not only need to understand the function of the brain, but also how intelligence is connected to the whole human body.

All of these arguments seem plausible to me since past inventions were often understood on a theoretical level first before they could be turned into concrete implementation. This is probably also true for intelligence, which means we first have to understand its natural occurrence in humans and solve the mentioned problems before we are able to replicate this phenomenon in computers.

🚪 Closing thoughts

In conclusion, current AI assistants can be best described as tools. I hope that using software developers as a relatable example illustrates the difference between how humans and AI assistants generate outputs. While human developers produce code that deliberately advances a code base in a way that complies with an abstract, coherent, and human-centered vision, current AI assistants, merely fill in gaps in small-size context windows with the most probable code snippets.

Future predictions about the development of truly intelligent AGI agents and the timeline in which they will emerge drastically differ based on which fundamental belief you subscribe to. Personally, I think that we are far away from AI assistants that stand in competition with human workers. As long as we do not understand our own human intelligence fully on a philosophical, psychological, and neuroscientific level, it seems unlikely to rebuild such intelligence using computers. Thus, it is important to not become blinded by the human-like outputs of current AI assistants and to project human thought processes onto a machine that does not think the way we do.