Junior Developer GPT 2 – Programming a browser game just by chatting

This article is part of a small blog series called Junior-Dev-GPT where I explore ideas to turn LLMs into autonomous junior developers.

Articles: Part 1 | Part 2 | Part 3

ChatGPT but it returns applications?

The video game showcased was entirely developed by LLMs. This browser-based game allows users to control a tank and navigate through a map that is randomly generated. The objective is to shoot projectiles at enemies while avoiding getting hit by them.

I did not write, modify or debug any of the code myself. I only had a back and forth conversational with the LLMs where I acted as a non-technical client providing natural language instructions like: “Make a browser-based game where the user can control a tank with WASD keys.”, “Add a barrel that points into the direction that tank is driving”, “Make the tank shoot projectiles out of the barrel”, etc.

Recap

In the previous article, I experimented with fully autonomous LLMs to achieve a specific objective – collaborating and developing an application. To ensure effective communication, I implemented a fixed communication structure that utilized five differently prompted LLMs. This approach allowed me to generate simple Python applications with graphical user interfaces like a calculator, a to-do app, an image viewer, and a drawing application.

Although the LLMs initially achieved success in generating functional code (which was a time-consuming process), the complexity of the applications remains relatively low. Development reaches a point of stagnation, with around 250 lines of code. Furthermore, as the applications become more complex, there is an increased likelihood that the LLMs deviate in a weird direction, resulting in features that do not fit together.

The problem of aligning LLMs with human expectations

As mentioned in the previous article, it is hard to align the LLMs with human expectations. Without human guidance, the LLMs will come up with something but often not with the result you expect, especially on long autonomous tasks like developing an application.

In my proof of concept, the ability to evaluate the application from a user perspective instead of a code- and error-based perspective is missing. In order to create applications that meet human expectations, the LLMs must comprehend the overall purpose, context, concepts, and relationships associated with the application.

This task is quite challenging as it involves comprehending the application’s user base, the specific context in which they use the application, and their subjective and human-based assessments of the concepts

Currently, I have not come across any evidence indicating that an LLM or any other technology can evaluate an application from a user’s perspective as effectively as a human. The closest I found was the startup AdeptAI, which utilizes a model capable of interacting with GUIs to tackle simpler tasks. However, autonomous models that can navigate GUIs like humans are still considerably far from being fully developed.

Adding a human into the loop

Luckily, new technology is not the only way to introduce the mentioned component into my proof of concept JuniorDev-GPT. There is also a much simpler way to get a human perspective into the tool. By adding me to the communication flow.

To achieve this, I switched things up in my autonomous coding tool to also allow human input to enter the communication flow. Instead of just having the LLMs stream their output to a web interface, I programmed an interactive tool that runs on your local PC and allows the user to chat with the LLMs that develop the application. This way the human user can give feedback on each iteration of the application.

This is how the interface looks like. On the left, there is the new version of JuniorDev-GPT and on the right is a VSCode instance that has the folder workspace opened, in which the LLMs autonomously program. The only thing you have to do is to chat with JuniorDev-GPT in the left window using natural language and the LLMs will automatically program for you. If you are curious, you can view what the LLMs are programming in VSCode on the right side and execute each iteration of the application to test it.

First results

The LLMs were able to implement the browser-based tank game that can be seen in the video at the beginning of the article. In terms of size, the maximum complexity was reached after 18 iterations (= I gave feedback 18 times) which produced the final code base that contained 438 lines of code. The development cost about 80 cents in OpenAI API costs and I chatted with JuniorDev-GPT for about 45 minutes to get to this result.

Complexity Ceiling

The LLMs were able to correctly implement the following concepts into one coherent game: tank, enemies, projectiles, walls, and score.

  • Tank: The LLMs implemented a tank together with its characteristic to drive forward and backward in a line while being also able to rotate to change direction.
  • Enemies: The LLMs understood the concept of enemies with the characteristics that they are able to move freely and the fact that they move toward the tank in order to attack. They also implemented the relationship between the tank and the enemies where a collision of the tank with an enemy means the game is over.
  • Projectiles: The LLMs accurately depict projectiles as objects that are propelled from the tank in the direction of travel and continue moving forward after being launched. They have also programmed the projectiles to always originate from the front of the tank and to eliminate enemies upon collision.
  • Walls: I was also able to communicate the concept of walls to the LLMs. The walls were implemented as randomly generated blocks that can overlap to build more sophisticated structures. The LLMs were able to model the relationship between the walls and the other concepts. For example, I defined that tanks cannot move through walls, while enemies and projectiles can move through walls to make the game harder.
  • Score: The LLMs grasped the idea of a score and effectively incorporated its features into the gameplay by adding the score to the canvas and to the game over alert. They also comprehended the connection between projectiles, enemies, and the score, wherein hitting an enemy with a projectile increases the score.

I was not able to introduce the additional concepts of powerups and shields since the overall code base became too complex for the LLMs. Some parts of the powerups and shields were added to the game by the LLMs, however, previous concepts were dropped or altered as a consequence. After the 18th iteration (which is the final code base I used in this article), 10 more iterations followed where no advancements happened. Instead, already implemented concepts got swapped out. Some of those versions had working walls, some had working shields, and others had randomly spawning powerups, but no version managed to get all concepts into one codebase as the lines of code also never surpassed 440 lines of code.

My input into JuniorDev-GPT

As mentioned earlier, my sole objective was to provide input as a client who wanted the application to be programmed. Therefore, I used simple, non-technical language and descriptions. The following statements outline the feedback I provided during the approximately 45-minute programming process.

The user gave the following feedback: Make a browser based game where the user can control a tank with WASD keys. W and S are for driving forwards and backwards, A and D for rotation.
The user gave the following feedback: Add a barrel that points into the direction that tank is driving.
The user gave the following feedback: The barrel does not point into the direction that the tank is driving.
The user gave the following feedback: The barrel should be fixed onto the tank.
The user gave the following feedback: The barrel rotates when the tank rotates but around its own axis. This does not make sense. The barrel and tank are one fixed entitity that rotates as a whole.
The user gave the following feedback: The barrel needs to be rotated 90 degress relative to the tank.
The user gave the following feedback: The barrel should be a little more towards to front to the tank so it sticks out. Also make the tank shoot projectiles out of the barrel.
The user gave the following feedback: Uncaught ReferenceError: projectileActive is not defined at render (main7-0.html:107:13) at gameLoop (main7-0.html:118:13) at main7-0.html:123:9.
The user gave the following feedback: Once a projectile is shot, it should move away from the tank. Always in the direction that the tank is facing.
The user gave the following feedback: I want to be able to shoot an unlimited amount of projectiles. Also they are still shooting out of the side of the tank and not out of the front.
The user gave the following feedback: Make the projectiles shoot into the direction that the tank is facing. The direction that the tank is facing is the direction it moves when the users presses W.
The user gave the following feedback: Spawn enemies which move towards the tank from all sides.
The user gave the following feedback: Make the enemies follow the tank.
The user gave the following feedback: If the projectiles collide with an enemy, the emeny should be removed.
The user gave the following feedback: If an enemy hits the tank, show a game over alert. The user should accept the alert and then the game resets.
The user gave the following feedback: Reset the key inputs on game over. Also add a score. Each time a enemy is hit by a projectile, increase the score by one. Show the score in the game over alert.
The user gave the following feedback: Show the score on the canvas while the user is playing.
The user gave the following feedback: Randomly spawn walls on the canvas. The tank should not be able to move through those walls but has to drive around them.

Ideas for the future

The first results are interesting because they opened up a window of about 400 lines of code or about 5 abstract concepts which the LLMs are able to implement by only responding to natural language instructions.

While the browser game is an easy-to-understand example, one could probably fill the 400 lines of code with something more useful. I think there are some interesting use cases out there where small application sizes like this can be used to solve worthwhile tasks, especially when thinking about creating little automation scripts, connectors for integrations, and maybe even custom-made dashboards. If you have interesting ideas that I should try out – let me know!

On a technical level, this proof of concept could still be improved by the same approach that I shared in the previous article. I will probably stick to the human-in-the-loop application and not let the LLMs run without oversight, since the usefulness of the produced application is much higher if a human guides the development, as shown in this article.