Designing for Aesthetic in Video Games Art Direction

Shahriar Shahrabi
13 min readSep 25, 2023

Part of video game art direction is clear communication. However overoptimizing for an immediate unambiguous reading of your message can make the aesthetic of your game suffer and reduce the playfulness of your visuals.

Whenever I talk about art direction, I always emphasize three points. I focus on legibility, emotional resonance and visual interest. Legibility is about clear communication of the main message through your visuals. Emotional resonance is about building the right visual associations that evoke the appropriate emotions. And visual interest is about providing enough (low level or high level semantics) visual variety to keep your viewers engaged.

This way of thinking about art direction is practical and systematic. The three pillars give you immediate actionable steps on how to improve the visuals of your game. Once you go through them, chances are very high that you will be happy with how your game looks.

However it is just a mental model. It is a method for how to look at a problem, analyze it and come up with a solution. It doesn’t contain the entirety of what a good art direction is for video games.

Today I would like to discuss something that my standard art direction model fails to cover, and that is the topic of aesthetic.

It is the Journey, not the Destination

Consider the design of the following STOP sign. If you can read english, it is the clearest form of communication you can possibly have. The words leave very little left to your imagination, the main message of the visual is for you to stop.

Source Wikipedia

Even if you can’t speak english, the bold dominant font and the strong color scheme communicates a sense of urgency. We could design the visual to keep it language independent, such as the option below:


This has less legibility than the worded version, since the viewer needs some time to decipher and abstract away the meaning of the shapes, but it communicates to a broader audience.

I hope you have already noticed something about designing for legibility. That this field is all about trade offs. One dimension I just introduced is that, if you assume pre-existing knowledge in your viewer, you can do more effective communication, at the cost of making your message only understandable for a sub-group of all the people you can address.

Your experience of looking at this STOP sign is a good summary of what legibility is about. You look at the visuals, your brain starts a process of deciphering the elements, you converge to a meaning and finally you do something with the information you received. When we optimize for legibility, we like to increase the chance that the viewer extracts exactly the meaning we intend from the image. But another aspect is that we also try to shorten the time it takes for the viewer to understand what they are looking at. If you are going 60 km/h, it won’t do you any good if you need 4 seconds to understand what the sign is.

So much of the visual design we do for video games and films is like designing a STOP sign. Take something simple like character design. We put a lot of care in all the various aspects that make up a character. For example we carefully design the silhouette.

Source Scott Flanders

With good silhouette design we aim to communicate crucial information within seconds to the viewer. Is this a bad guy? Does he move fast, can he punch hard? Does he throw things, etc.

When we are optimizing for legibility, our focus is on the message, and ensuring that the player converges to the meaning we desire. However, we don’t care much about how the player converges to that meaning.

Aesthetic is less about the specific message of your visual (the destination the viewer arrives at), and more about the journey, or in other words how it felt to arrive at whatever meaning the player arrives at. As a matter of fact most of the greatest works of aesthetic are idiosyncratic.

When we optimize for legibility and put our energy on the destination, we neglect the journey. To make the matter worse, the more we optimize for a faster reading of our visuals, the shorter the experience of understanding the meaning behind it becomes. This means we have less time to enjoy the aesthetic of the work. Last but not least, the more we try to take out ambiguity from our work and ensure that all viewers converge to the exact same meaning, the less we allow interpertation and playful experience of the aesthetic in our games.

Above is the summary of this post, now lets get to some examples.

To Be or not to Be

“To be or not to be” has to be one of the best known sentences in the English language. Shakespeare could have as easily written “I am burdened by the intense pain of the murder of my father and the remarriage of my mother, to the point where I am asking myself if death won’t be a better alternative than existence, because it will relieve me of this pain and sense of powerlessness”. But who would remember that?

What makes this sentence such an intriguing aesthetic exprience is your contemplation of what it means. How it reflects your thoughts and experiences regarding pain and fear of death. Whatever meaning you extract from that line, how did you get there? What thoughts did you go through? How did it feel?

Let’s look at some visual examples. Below are posters designed by Paul Rand. While you are looking at them, pay attention to your exprience of understanding what is in the image.

Paul Rand

If I was art directing this within the context of video games, my first feedback would be “I wished we could increase legibility”. The rooster is presented in the negative space. What if the viewer misses it? What if it takes too long to see? Why a rooster? Will people understand that a rooster calls when it sings and the poster is a call for entries? How about the IBM logo? I absolutely love this poster. The first time I saw it, I remember the joy of putting together the elements and understanding what it means. But it takes a few seconds and there is a chance the viewer might miss it all together.

Let’s look an example by Tanaka Ikko.

Tanaka Ikko

Again notice how the path to understanding what the image is, is a process. It takes time and requires mental effort. Some abstraction such as abstracting the act of stopping into the semantic of the word STOP increases legibility. But some abstractions like the one Tanaka Ikko has designed increase playful aesthetic. The key difference is that this abstraction creates an interesting viewing experience for the viewer. What does this say about the visuals of a game like Minecraft? Its abstraction leaves out so much detail, leaving you with the process of filling in the blanks.

Some example from the more traditional form of art. Look at the image below.

Kako Tsuji — Bush Cricket with Falling Blossoms

What is the message of this visual? Here is my aesthetic experience of viewing this. There are falling blossoms in the scene. Within the Japense visual tradition, the falling blossoms embody the impermanence of beauty and cycle of death and life. Another visual element that suggests the same thing is how the bigger leaf has a decaying section. Last but not least the cricket is also a seasonal insect like the cicada which is a symbol of aproaching death, commonly used to portray feelings of nostalgia towards fading life and beauty. As I look at this image, supported by the mellow yellow and blue color scheme, the meaning I converge to is that things die. And there is a beauty to be found in death.

But is this legible? I mentioned a bunch of cultural symbolism, if you don’t know any of them, what meaning would you even converge to? Let’s art direct this video-game-style.

The cricket is a central aspect of that message. It is too small. We would make it bigger, change the color so that its silhouette is readable against the background. We would shine light on it and darken everything else, just to make sure no one would miss the point that this is an important visual element. “This is a metaphor, it means something”.

But even that can be more legible. The cricket is a more obscure visual metaphor from the japanese culture. Most people know of the cherry blossom festival, so why not use that? The tree itself is only hinted in this image through the fallen petals, why not frame the tree front and center? Make it big, high contrast and hard to miss. Change the color scheme from the softer blue. Maybe the petals are blood red.

But is this enough? What if people don’t have any context of Japanese culture? People understand human concepts through human representation. Why not put a dying child infront of the tree? Why even have the tree? Just put the child. Why have the child, just Write DEATH.

I overdid the above for demonstration purposes. But what I am trying to show you is an axis. On one side of the axis, the visual is optimized for the aesthetic and on the other for an immediate unambiguous communication.

Let’s quickly go through some more visual examples:

Old Plum Tree Kano Sansetsu

There is so much to the above scene that is not drawn. How does the landscape look? What are the relationship between different elements? All for you to figure out, and have fun doing so.

Four Sages of Mount Shang, Soga Shohaku

I love the Four Sages. The brush work is abstract, to the point where you need to think about what they mean. Is that a river? How would it feel like to touch that tree trunk. What is that guy doing?

Birth of Venus by Bouguereau

Realistic images can also be a visual experience. The experience of seeing the Birth of Venus in person is vastly different to viewing it on this screen. This painting is HUGE. You can’t possibily look at the whole thing in one go. As your eyes dart around you converge to a sense of eternal “beauty”, which was the theme Bouguereau was obsessed with. But it is a fun process to discover this by taking the painting in over time.


The above panel from Tintin is a car chase scene. There is a point to this panel. Try to visualize the quintessential car chase scene you know from films. It would involve a tightly framed car, being followed by another. Why isn’t this framed that way? Why all the extra detail?

To me it is obvious why. This panel is a lot of fun. There is so much to discover. The process of figuring out what the hell is going on is enjoyable. As you skim through details you discover little stories in the scene which you try to fit in with what you know about the story from before and what can come up next.

Le Petit Nicolas

Speaking of a lot of details. This is a technique commonly used for childern books drawings. The sheer amount of things to discover creates an interesting viewing experience. Look at these examples from Le Petit Nicolas. Why is that guy happy? Why is the other angry? What type of personality does that kid have?

What about Games?

Why do we rarely design visuals this way in games or to an extent even films? I think there are various reasons for it.

  1. Gameplay is king. And the king has a lot of visual demands on immediate, unambigious communication
  2. What is considered trendy and marketable visuals is partially determined by a low attention span, fast pace, social media ecosystem. You have 2 seconds to capture people’s attention on social media.
  3. Video games are expensive and risky investments, we can rarely afford to miscommunicate and leave some potential customers behind

To the first point, when you are playing MegamanX and you see an enemy, you want to know how it will attack and you want to know it fast. If within your gamist setup you are demanding your players to be excellent players, you will frustrate them if your visuals communicate “Is this thing an enemy or not, that is the question!”.

That doesn’t mean it is impossible. Games like Flower, Journey or Monument Valley priotize the aesthetic experience and reduce the challenge of the gameplay, they leave you with the mental space to spend on the aesthetic exprience. Journey doesn’t demand from you to do backflips while surfing on the sand and doesn’t score you based on it, so you wouln’t mind if you don’t immediately understand the topology of what is infront of you.

Some games are also inherently about the process of understanding visual elements. A good example of this is Gorogoa. The gameplay is aligned with what an aesthetic experience is.

Gorogoa game

You can optimize Gorogoa to be more immediate and unambigious in visual communication, but that would make the gameplay itself to be dull.

You can also gamify the experience of looking at that Tintin panel. On that front Hidden Folks is a good example.

Hidden Folks

How about gamifying the aesthetic experience of understanding what visual elements mean at the higher level? Similar to how I experienced the Cricket in the Bush. Unpacking is a beautiful game that does just that. You build a map of what items mean and that tell you alot about things that are not in the frame, such as the protagonist.


These were some examples of games that either reduce the demand of gameplay to create space for aesthetic or align the game system by gamifying the aesthetic experience itself.

Another way is to include the playful experience through stylization similar to Tanako Ikko poster. Consider a game like Sea of Stars. The stylized perspective and the lack of detail in certain areas give your brain plenty of inbetweens to fill. That is one reason why pixel art, graphic style and other stylization appear more playful and have an easier time creating an engaging aesthetic experience.

Sea of Stars

The point that is important here is that whatever gap you leave for the player to figure out shouldn’t be crucial information for the gameplay. In the Sea of Stars screenshot, it is left to you to figure out how the terrace the figures are standing on relates to the rest of the landscape. But if the topology of the landscape is important information for the gameplay (let’s say the game is a platformer), then it would endlessly annoy the player.

The last thing I would to mention is that designing the video game version of the IBM logo takes a lot of confidence in your work. What if the players don’t get it? What if they don’t manage to come up with an interesting interpretation? To design aesthetic, you need to stop treating players like idiots and trust that they will figure things out. One thing that would help you is to reduce the amount of financial risk. If a game took 200 million to make, you really can’t afford to leave behind the people “who just didn’t get it”. But if you reduce the scope of your game, you can get more daring. Maybe you CAN make a game that would only be understood by people with a certain background. And maybe that is enough to get your return on investment.

To design with the aesthetic experience in mind, stop looking at your visuals in terms of things you want to communicate to the players. Start looking at them as building blocks that construct an interesting viewing experience for the player.

If we look at our industry right now, you can find experiences ranging from mobile games that preprocess everything into chunks of unambiguous immediate communication, to games like INSIDE, which provide you a athesthetic playground at every level. Personally, I would like to see more vibes in games.

That was it for this post. I would like to leave you with a little exercise. Look at the 3D model below. Pay attention to how you extract meaning from the scene. The title is Artist Nook. What do the visual elements tell you about the person that uses it? How would you make it more playful? Or on the other hand, how would you frame it for more immediate unambigious communication?

Thanks for reading, as usual you can follow me on various socials. All linked on my website: