This past Thursday I finally broke through my AI skepticism, and spent the day trying to answer the question “Can ChatGPT write software?” to my own satisfaction. I don’t want to bury the lead here, so the short answer is “yes, ChatGPT can write software”. The long answer is, of course, more nuanced. So, let me tell you what I did, and let you form your own opinion from there.
My test of ChatGPT was to ask the system to write Conway’s Game of Life (Life), a cellular automata simulation program, which I have written several times in various languages, and which I have used as the subject of programming interviews over the years. The reason I have used this as a test of programmers, and languages, is that the core of the program is an incredibly simple heuristic, and the options for optimizing, and extending the program in interesting ways are almost limitless. I was particularly interested in finding out if ChatGPT could do this without having to tweak the code myself. In other words, I didn’t want to know if ChatGPT could help me write the code. I wanted to know if it could write and modify the code with only my prompts.
I started by simply telling ChatGPT that I wanted it to write Life in Python with no additional features. The response, much to my surprise, was a working program that used NumPy and Matplotlib to generate a version of the game in a 50 by 50 grid. The solution was not complete spaghetti code, but imposed rather good coding structures, and was simple to read.
My next step, once I had a basic working program in hand, was to prompt ChatGPT to modify the code to add useful features. The first change I wanted to see after having a working program in hand was to add the ability to configure certain parameters of the game. Initially I asked ChatGPT to add the ability to define the size of the game grid, which it did instantly. I was less than delighted to see that the initial implementation of this change was done by accepting an optional command line argument to the code in the form of a number representing the size of the grid squares. I’d have preferred that ChatGPT use the argparse library, so that the user would have useful help messages about the input arguments, and I instructed ChatGPT to change this implementation, which it did without issue.
Next I requested that the program be modified to support arguments that allow a user to specify the interval between the display of new generations of the game (the initial code regenerated the board 5 times per second), and the number of generations of the game to be generated. While ChatGPT did add these arguments, the initial implementation did not stop after the defined number of generations. I told ChatGPT about the problem and it promptly fixed it.
After that I moved on to a feature, which I believe to be essential to any implementation of Life, the ability to store previous generations of the board, and allow the user to scroll back to previous generations. The initial implementation ChatGPT came up with, allows the user to use the spacebar to pause the game, and the left arrow key to scroll back through previous generations. When the spacebar is pressed again, the game resumes from the last viewed generation. This all worked as expected.
At this point I started to feel more like a product manager. I was simply feeding requirements, and errors, to ChatGPT, then waiting for the changes. So, my next request was to ask ChatGPT how I should attribute the code it was writing. Interestingly it suggested that I should license the code under the MIT open source license, with a note that the code was generated by an ChatGPT, and modified by me. The attribution also suggested that I be listed as the copyright holder.
I spent a couple more hours asking ChatGPT to add other important features, including the ability to tune the likelihood of cells being populated at the start of a new game, saving a particular generation of the game to a file, and loading that saved data as a starting point for a new game, and adding useful information like how to control the program to the games display. The final result is visualized in the below screenshot.
This review of ChatGPT’s coding abilities has been fairly glowing thus far, so I want to spend a little time talking about what didn’t work so well. First of all, ChatGPT seemed to have a glitch where it would fail to print the entire contents of the program after a while. When I would ask it to regenerate the program, it would fail over and over again to print the last 20 to 50 lines. I got around this by giving it a copy of the main functions I wanted it to modify and telling it to only show the salient changes. Second, there were numerous times when I asked it to add a new piece of functionality that it would do as I requested, but would in the process remove other features that had been working. I got around this by informing ChatGPT that it had broken existing functionality and it generally corrected the mistakes pretty well. The biggest issues I ran into was while working on the feature to save existing generations, and when generating text to explain what was happening to the user within the Matplotlib rendering. ChatGPT decided that the ‘s’ key should be used for saving data, which conflicted with an existing Matplotlib keybinding for saving images. The effect was that a generation of the board would be saved as text, but a dialog box would pop up on screen, which wanted you to name and save an image file. Trying to explain to ChatGPT what was happening and getting it to fix the problem was surprisingly difficult. It eventually came up with the correct solution of removing the native keybinding to ‘s’ and this allowed the functionality that I wanted for the ‘s’ key. I also had a bit of trouble explaining to the program that I wanted to include labels in the plot screen that told the user, what keys to use for various functionality, and to display useful information messages, like the name of a saved file, and which generation of the board was currently being rendered. ChatGPT got stuck on the idea of using a legend, which is implemented in Matplotlib as a rendering that obscures a portion of the plot. For Life this meant not being able to see what was happening in part of the simulation. I also struggled to get ChatGPT to properly include messages about saved files. Initially, ChatGPT simply rendered new save messages directly on top of the previous messages. This is the only place where I actually gave up on having ChatGPT do the coding and adjusted the location and re-rendering of messages myself.
I should probably note that during this experiment I was using the free version of ChatGPT. I’ve read that the paid version, GPT-4, I think, is supposedly much better. I frankly don’t care enough at this moment to pay $20 for a month to explore the differences. Perhaps I will at some point in the future.
Overall, I was surprised how effectively ChatGPT was at completing this task. Not only was it able to generate the lion’s share of the code I requested with nothing more than my prompting, but it generally did a good job of explaining the changes it was making. When prompted about the reasons it chose for using Matplotlib and NumPy as the core of the program it correctly responded that these libraries are faster than native Python data structures, and included useful support for rendering the game board as plots.
I’m not really sure what I think about all of this. It will take me a while to digest the possibilities. In the near term, I plan to see how well ChatGPT works for turning this simple application into a multi-platform mobile app using something like Flutter. I’m to see how well it works using a less common programming language, and one that I have no personal experience with. I’ll post an update with my findings.
My parting thought for this blog post is that AI chatbots are clearly much more powerful than I have given them credit for, but what this means for programmers, or society as a whole, is beyond me. This was not a terribly difficult program for an experienced programmer to write, and I have no idea how well an AI chatbot would do with implementing something truly complex like SQLite, or the Linux kernel. I suspect not as well, and I will leave it to an expert on those systems to determine how useful an AI chatbot can be for that type of development.
To view the code generated by ChatGPT, with very minor tweaks on my part see this gist.