Procedural Paintings with Genetic Evolution Algorithm
Procedural paintings based on genetic algorithm made in Unity 3D using only Compute Shaders. Depending on who you ask, I guess you can also call this AI paintings.
You can find the source code as usual on my Github: https://github.com/IRCSS/Procedural-painting
If you just want to try it out for yourself, download the project from Github, follow the instruction on the Github ReadMe page and you will get your first painting in no time. The code is very well documented, and I also have a mini getting started documentation on Github.
Here is the painting in action:
From Randomness to a Painting
Over three months ago, I saw a very impressive work from Anastasia Opara which used genetic evolution algorithm to paint a picture given to it. You can view her work on her Github.
The idea got me interested, and I decided to implement something similar. Since I wanted to get better at using Compute Shaders, I decided to implement the whole thing in compute as practice. By now you have seen the results above, and here comes the break down of how it works.
Here is what is going to come in the article:
- First I need to explain a bit what genetic algorithm is. This part has little to do with painting and is more on the general concept
- How it applies to painting a picture
- First primitive results
- Improved solutions
Genetic Algorithm
Genetic algorithm is in its essence a search operation, modeled after evolution. In a search operation we look for the best answer to a problem. We obviously don’t know the answer but we have a way of measuring how fitting a given random answer is. The goal is to find the best possible answer we could. The best answer is one that has achieved the highest fitness score according to a function we define.
Daniel Shiffman has a good video series + chapters on the topic. A good resource, which is way shorter is this one which goes in depth on all the terminology which I will be using. I won’t be repeating whatever is there, in favour of writing things that are not there, so have a look at those.
In genetic algorithm terminology, an attempt at an answer is made up of a bunch of genes. The gene is the different parameters of your problem. When these genes are put together, they represent an individual attempt at the problem, or a population member. A bunch of population members put together make up the population pool. The population pool is the collection of the random attempts the algorithm is making.
Here are the general steps for a genetic evolution algorithm:
- Population is initialized with n members. Each member gets a random set of genes
- Each member is assessed by a fitness function, which determines how close the population member is to the desired answer
- Based on the fitness value of each member, a probability is assigned to a member, which determines how likely it is to become a parent for the next generation
- In a parent selection phase, based on those probabilities, for each second generation member, n number of parents are selected
- In a cross over, the genes of parents are combined together to create a new solution as a combination of previous ones.
- To not get stuck with only the combinations which were created in step one, from time to time genes are mutated and take random new values
Here are some more eye candies painted by the algorithm as a break.
Genetic Algorithm and Painting
How does the above apply to this case? Let’s first formulate the problem which the algorithm will be solving.
Problem: Given a limited number of brushes placed on the canvas, how must we position, scale, rotate and color these brushes, to create a result which is visually as close to the original image as possible?
If you see this and think, I have no clue how to mathematically do this, then genetic algorithm, or any other search/ optimization is a good solution to the problem.
You can actually solve this problem in a more analytical way. If you are interested in alternatives that run faster and produce more consistent results, you can have a look at parts of some feature detection algorithm like SIFT.
I am going to go over the list I provided in the previous section, and walk you through how this would apply to our problem.
Population Creation
First step. A population member in our case is an attempt at painting the picture. For convenience’s sake, lets say we want to paint the image with only 120 brush strokes. Then a population member is a collection of 120 brush strokes layered on top of each other, which make a painting.
A brush stroke in our case is the gene. The gene is made up of the position, rotation, scale, color and texture of the brush.
A group of population members, make up the population pool. I create them in a compute kernel on the GPU, using random numbers created from integer seeds.
To ease up the burden of the program, when the image is in black and white, I create only black and white brushes.
Our population creation is not over with just that. The genes (the brush strokes) can’t be assessed on their own for the fitness. That can only happen after they have been used to draw an image, which then gets compared to the original photo.
Using Command Buffers’ Draw command, I paint with those brushes and create an image. The brush strokes are saved in a structured buffer, which I read in the vertex shader of the draw command execution and apply them on top of quads created in the vertex shader. The shader has a bunch of tricks in it, I could write a separate blog post on it, but since I have so much ground to cover, here is how it looks like.
Fitness Function
I tried so many things and then stuck with the most basic approach. How do you compare two images with each other and determine how similar they are to each other? The easiest way, which I went for at the end is to compare the value of each pixel of the painted image with the original and take the distance between them as the error. The fitness is one minus this error.
Another approach which I tired, which didn’t turn out as expected is to compute the gradient image for both and compare the per pixel values on that. My thought here were: I rarely want to know how well a pixel is doing. I’d rather know how well pixels are doing in relation to each other. This made the fitness function too incoherent for the optimization and the results got worst.
The most important thing I learned here, specially for colored photos is, to not use RGB space. A lesson I relearn again, every time I work with procedural colors. RGB space is not perceptually uniform, which is an issue.
If you look at the color wheel made out of the HSV space, you will realize that a unit of change in the green area makes barely any noticeable difference, whereas a unit of change in the red area shifts through major hues such as red, orange and yellow.
This makes the fitness function, which calculates the distance between the RGB values per pixel, super precise in the green areas and unstable in the red.
To fix this, I implemented everything in the CIELab space. Which is designed to be perceptually uniform.
Once you have a fitness function per pixel, you need to sum it up to get one value for the entire image. I do all these steps in the fitness function compute over several kernels. This is not the best way to sum up a buffer in GPU, since it is not using the parallel power of the GPU. The typical way to do this is a parallel reduction, which I didn’t implement since it was more work.
Fitness to Probability
This stage is identical to any other genetic evolution algorithm. The goal is to convert the fitness values to something we can use to randomly pick parents for the next generation.
The easiest way, would be to use the fitness directly as probability. Mapped one to one. Next step would be to convert these probabilities to cumulative values, to be able to use it for efficient sampling. Here is a simple entry explaining this in C# terms. I summed up the probability in an additional buffer holding only cumulative probabilities in compute shader.
If you do just the above, you will have the issue that your images will take forever to converge to anything. I mean really forever. The fitness function is not aggressive enough to differentiate strongly between bad painting attempts and good ones.
To fix this issue, the first thing I tried was a power function. This is a typical strategy to artificially increase contrast between probabilities. This helps to speed up getting to the final image, but not by much.
To speed up the process further, I decided to make my fitness function context aware. The fitness value one is unattainable, it is the perfect image. The fitness value zero is so bad, that any random attempt would score higher. So why should we take these two irrelevant extremes to asses the probability of a member becoming a parent?
First I find out what the worst and best members in this generation are. Then I re normalize the fitness of all members and remap them to the range of worst to best member of the population, so that the worst gets fitness zero and the best fitness one. This way the worst member has zero chance of being picked as a parent and the best has a very high one. This is similar to the Level image processing panel in Photoshop. Which remaps the pixel values to new range, to artificially maximize contrast.
This change to the fitness function drastically increases the speed of the search. So what’s the catch? There is always a catch. More aggressive fitness functions means faster convergence to the best solution around the area you are searching. But it also means the highest probability of getting stuck in local minima, which you can’t get out of.
I think this a more general gradient descent problem. I didn’t look into this much, as I found a solution pretty quickly, but one of the things I thought about trying was to increase mutation rate, and make the fitness function more liberal as a function of the change of fitness. So if in a local minimum, start trying out the more liberal options and try to get out of the pit.
As for how I went around this problem, that comes later.
Parent Selection Cross Over and Mutation
I am grouping these together, though they are each their own compute kernel, because there is not much to be said about them. Based on the cumulative probabilities I calculated in the previous stage, I select two parents per member for the second generation, and I take half the brush strokes from one parent and combine it with the other half from the other parent to create the the second generation. Then based on the mutation probability specified in the balancing parameters, I sometimes randomly change a brush stroke in the child.
Higher mutation rate means faster convergence. Because it will try out different solutions faster. However, if the mutation rate is too high, your population will become unstable. There won’t be enough generations for the changes to settle in, and the best genes to become dominate in the population.
Very straight forward. This is a loop that repeats itself. The second generation becomes the parent of the third, third of the fourth and so forth.
Needle in Haystack
If you just implement what I described above, you wouldn't be getting images I posted. At least not in any reasonable time. Instead you would be getting this:
You can let it paint a circle, no problem. Give it a face and it might barely get the silhouette. Why is that?
In the example above the algorithm tries to find an answer to the problem: how do you place 32 brush strokes, so that it represents the image on the left as closely as possible.
Does this problem have a solution? Yes of course. You could maybe paint the left image with 5 brush strokes. What about a face?
This is in my opinion a classic issue with search algorithm. Most of the times, you are not sure if there is a solution to the problem you are trying to solve. For example, you can’t paint the scene below with 32 brush strokes, regardless of how good the rest of your algorithm is. As a matter of fact, this scene was painted at the end with a few thousand brush strokes and even then, it would have needed a few thousand more to get the details.
So we increase the number of brush strokes used right? That would greatly increase the domain of the problem which the algorithm needs to search. With every added brush stroke, you have added a new dimension to the equation of the problem. So what do we do?
We can break down the problem to smaller steps. Instead of trying to find the right parameter for everything simultaneously, we can find answers for some, save those answers and then search for rest of the solution.
In concrete terms, we could for example let the program try to paint the scene with 32 strokes. It won’t get to paint everything, but it will get some silhouettes and the general color scheme blocked out. Then we save this painting and let it try again and again on top of the result of the previous stage. On each stage, it will be only search for the best combination of 32 brush strokes and eventually it will paint the entire scene, having used thousands of brush strokes.
Each stage has a certain setting associated with it, which determines the balancing parameters of the algorithm on that stage, such as the number of brush strokes, the number of members in the population pool etc. I wrote a documentation for these on the Github wiki which you can have a look at.
This is by the way how I paint as a human. I first lay down the major gradients, block out the shapes and then slowly focus on higher level details bit by bit. This method is very common among painters as it reduces the cognitive work load, and gives you the head space to focus on specific parts and get them right.
When do we know when a stage is done and we should switch to the next one? If you remember in the previous sections I talked about being stuck in a local minimum. That is when even after a considerable amount of time, the painting is not getting any closer to the actual image. This is when I switch to the next stage, which also gets rid of my local minimum problem. When it gets stuck in a local minimum, it starts a new random walk, using the results of the previous calculation.
Trying the above, you will start getting way better results, the faces are successfully painted. However they lack detail and the time you need to get those details is still very unrealistic. Look at the example below, the first image is done purely with the method stated above, and the second one with the final solution in the repo. You can see the difference in details:
The first image is soft, and it is lacking details. The edges are disappearing. I actually like this aesthetic quite a bit, and some of the examples I have shown have been painted like this, but I like to also have a method to get the details if I want to.
The issue lays again in the needle in a haystack problem. I restrain the width of my images to 1024 pixels due to performance reasons. That makes 1 million pixels. To expect the algorithm to draw an edge in any reasonable time requires it to randomly place the correct brush color, size and orientation in one of this 1 out of a million locations. Not a great plan.
What can be done? Edge detection. If we could detect the most important edges that is perceptually defining the image, we could constrain the position search domain only to those edges (only spawn brush strokes there) and let the fitness function know to only focus on those areas and not care about the rest.
Thankfully gradient images and feature detection are well studied fields in computer vision. With the Sobel filter you can perform edge detection.
These edges are a bit too much though. It constrains our search, but it is also very noisy and too focused on tiny details, which is not good for major features such as the eye lashes/ lips etc.
To the rescue comes the Gaussian filter. If you apply it before the sobel, you will be basically getting rid of the smaller feature and focus on the bigger one.
Using the filtered mask which is created by Gaussian-Sobel filter (or hand created and passed by the user), I generate in compute shader the position domain which the search focuses on. Here is an example of how this position domain could look like at different stages and the image it creates. The algorithm will only focus on the white areas visualized in the mask image.
The implementation for this is a bit trickier, because you don’t know how many pixels are going to pass the test of your mask. My solution to that is an Append buffer and an Argument buffer meant for indirect drawing.
About Compute Shaders
I learned tons implementing this in compute shader. This was the biggest pipeline I have implemented in compute shaders, containing close to twenty kernels working after another, interacting with the graphic pipeline and the CPU.
I initially wanted to get the painting part done on a weekend, and then write a post on things I learned regarding compute shaders. However, the painting part itself got so big that this post became about that.
At any case, it is worth mention some resources that taught me a lot of new things. For example I messaged Kostas on Twitter, asking about best practice on the number of threads dispatch in a group, his response cleared some stuff up for me, hopefully it will do for you too. He also shared this link which was an amazing read, written by Sebastian Aaltonen on best use of groupshared memory and thread numbers. A lot more on groupshared memory, using it for gaussian/ sobel filter etc. But I think that is enough for this blog post.
Closing Remarks
There is still work left that can be done. Too much actually. I might come back to this one day and try out those ideas. For example, by adjusting the fitness function slightly, you can very different results. Here is a fitness function that only cares about accuracy in value.
I Hope you enjoyed reading. You can follow me on my Twitter IRCSS and get in touch if there is factually incorrect information in this article or you have any questions. Here are some more eye candies, softer “Speed paintings”. Each took around 10 minutes.
Further Readings and Resources
- Daniel Shiffmans video series on genetic algorithms: https://www.youtube.com/playlist?list=PLRqwX-V7Uu6bJM3VgzjNV5YxVxUwzALHV
- Same as above but as text for whoever prefers reading:https://natureofcode.com/book/chapter-9-the-evolution-of-code/
- Post on pseudo random number generation by Nathan Reed: http://www.reedbeta.com/blog/quick-and-easy-gpu-random-numbers-in-d3d11/
- Article by Nvidia on parallel reduction in GPU: https://developer.download.nvidia.com/assets/cuda/files/reduction.pdf
- Sampling elements from a list given their probability: https://stackoverflow.com/questions/38086513/selecting-random-item-from-list-given-probability-of-each-item
- Post by Sebastian Aaltonen on maximizing Optimizing GPU occupancy and resource for large thread groups: https://gpuopen.com/learn/optimizing-gpu-occupancy-resource-usage-large-thread-groups/
- Microsoft documentation on compute shader group dispatch numbers and thread numbers per group: https://docs.microsoft.com/en-us/windows/win32/api/d3d11/nf-d3d11-id3d11devicecontext-dispatch
- Kostas Anagnostou explaining best practices on the number of threads to dispatch per group: https://twitter.com/KostasAAA/status/1274359186185428994?s=20
- Nice slides on downsampling and gaussian blur: http://www.cs.toronto.edu/~fidler/slides/2015/CSC420/lecture5.pdf
- A gaussian blur implementation in Unity done in shader and graphic pipeline: https://danielilett.com/2019-05-08-tut1-3-smo-blur/
- Entry on the use of append buffers in Unity: https://docs.microsoft.com/en-us/windows/win32/direct3d12/indirect-drawing
- Microsoft documentation on the structured of indirect drawing argument buffer: https://docs.microsoft.com/en-us/windows/win32/direct3d12/indirect-drawing