During the summer of 2015 I became aware of a project by Google called DeepDream. This project uses a convolutional neural network to find and enhance patterns in images via algorithmic pareidolia. The results of this processing are fascinating and eerily resemble a psychedelic experience. As soon as I saw the first image coming out of this project I knew I had to learn more.
My research into the project led me down numerous paths on which I learned more about artificial intelligence and the work being done in that field using graphical processing units (GPUs) than I ever imagined.
It was during this research that I discovered a paper by Leon A. Gatys, Alexander S. Ecker, and Matthias Bethg titled A Neural Algorithm of Artistic Style. The end result of the process this team was pursuing allowed for this convoluted neural network to take an image and apply it's aesthetic onto another. That is, you could take a painting by Vincent Van Gogh and apply that style onto a regular photo. The very high level of how this works is that each image is separated into two neural nets with one representing style and the other representing content. The algorithm then takes the neural net representing style from one and swaps it with the other before re-assembling the image. Amazing.
After I saw the results of this algorithm I knew I had to start making my own images. The problem with creating these images is that they require a lot of computational power. The algorithm can be run on a CPU but because of the small number of cores available the process is bound in a linear fashion making anything beyond a trivial resolution takes a restrictive amount of time. This is where GPU processing comes into play. By utilizing a parallel computing platform - such as NVIDIA's CUDA - the processing can be spread across the thousands of cores available on a modern GPU. The processing still takes a while but it is orders of magnitude faster than running it on a CPU.
Because I have never been much of a gamer the GPUs I had on hand were older and not up to the task, I started my multi-thousand dollar journey into the world of GPUs by first buying a GTX970. With the 2048 CUDA cores available on the GTX970 I figured I would be good to go; however, I quickly learned that the neural networks being constructed are memory hungry and the 4GB available on my GTX970 limited the output size of any images I could render to around 600px X 600px. The ideas that I had for applying this technology required high resolution images and thus I went deeper down the GPU rabbit hole.
Not wanting to be disappointed again I went straight to the top and acquired a TitanX video card with 12GB of memory. To my surprise the output images produced by the TitanX start to max out around 2000px X 2000px. With upscaling algorithms I am able to get the images up around 4000px X 4000px which works for my purposes but I had thought the 3072 CUDA cores and 12GB of memory I bought for $1700 would have enabled me to produce much larger images. C'est la vie.
Another unforeseen problem with shoving all this computing power into standard PC cases was the heat. At first I tried to modify the cases in an effort to increase the ventilation but that proved to be a futile exercise (although I did have some fun taking a saws-all to my PC case). I knew that air flow was the name of the game and took a note from the Bitcoin community and built myself a milk crate machine.
With the processing power in place I started working on the pipeline. I started with the neural-style implementation by jcjohnson and was able to get some promising first results. There was a large heuristic component to the process of exploring the different parameters and how different style images affected the outcome. With a bash script and a little Lua hacking I was able to start running batches of images to hone in on the aesthetics I liked.
Using my wife as a first subject I quickly realized that portraits was what I wanted to pursue. The following images represent the output from some of these first experiments.
Excited about the outcome I began honing in on a refined process and started creating portraits for a bunch of friends and family. The following set of images represents some of the varied styles I was able to generate. Obviously some of the results are better than others but I included a wide variety of styles to give you a good sense of what is possible.
At some point I created a batch of images for a good friend of mine who used the results as a profile image on social media. Her profile image created quite a buzz and quickly resulted in many requests from other people to get their portraits done too.
Not one to pass up an opportunity I quickly assembled a "brand" and started to test whether or not people would be willing to pay for these images. I acquired the domain monkeyandbot.com as well as the various social media assets under the same name. Knowing that people do not like to pay for digital goods - and following a bit of hunch - I set up an Etsy page, dusted off my Cannon Selphy printer and made a trip to the dollar store to get some picture frames. Before the end of that business day I had gone from accidental release to selling a product on the Internet (I love this stuff).
I made four sales on my first day for a gross total of $79.85 after Etsy extracted their fee. The stuff I was posting on social media was picking up a little bit of attention and I decided to take the next step. Up until this point there was a heavy manual component to each portrait generated and I realized that if this project was going to have any legs it would have to be self-serve.
My idea was to create a system such that users could generate thumbnail size previews of their image in the various styles and then order a hi-resolution print if they liked a particular outcome. When I was at this stage of the project Amazon had yet to offer affordable GPU cycles - and since I already had invested in some GPUs myself - I decided to cook up my own process.
I set up an EC2 instance hosting a React client that sent images to an S3 bucket using Loopback to mitigate the whole process. Once I had a way to capture photographs and email addresses I had to figure out how to pull them down to my cluster, render them and send the results back to the user. To solve this problem I ended up building a cluster of render machines and using a Rasperry Pi as a queue manage. The basic premise is that the Raspberry Pi polls my Loopback API and when a new image is available it checks with the cluster to see who is doing the least amount of work. Once the Pi decides which machine should render the image it sends it a command and the given machine pulls the image down directly from the S3 bucket. Once the image has been rendered it uploads it back to S3 and sends the user an email with a link using AWS SES. Obviously this is a very crude implementation and not meant for production but it was enough for a live end-2end test. I announced my project on social media and had 174 users render images using this system on day 1.
Despite what I thought was a fairly successful first day the amount of people using the system quickly tailed off and pressing issues with Carmanah Digital Creative pulled my attention away from the project. Since I started down this road a few different apps like Prisma have come out and the "wow" factor for these types of images has diminished as the public gains more familiarity with them. As it stands I am not sure what the future holds for monkeyandbot.com but I continue to be fascinated by this technology and am actively exploring different applications.