Style Transfer – What It Is and How You Do It

Have your friends posted pictures that look like they were painted by Picasso? Maybe you’ve seen cool art effects in a music video. Both of these were likely made with style transfer. This post will give some background on how it started and how it works. If you don’t care about that, we have a guide that can show you how to do it immediately!

In September 2015, an entire new branch of machine learning was brought into the world. Leon Gatys et. al. released a paper titled “A Neural Algorithm of Artistic Style” which has since been cited over a thousand times. This paper and many to follow would bring machine learning to the world of art. As Gatys puts in his introduction:

In fine art, especially painting, humans have mastered the skill to create unique visual experiences through composing a complex interplay between the content and style of an image. Thus far the algorithmic basis of this process is unknown and there exists no artificial system with similar capabilities.
Here we introduce an artificial system based on a Deep Neural Network that creates artistic images of high perceptual quality. The system uses neural representations to separate and recombine content and style of arbitrary images, providing a neural algorithm for the creation of artistic images.

Leon Gatys et. al., https://arxiv.org/pdf/1508.06576.pdf

Gatys could not have expected the snowball of art and research he had just started. Though not perfect, the algorithm he designed is simple to implement and easily extensible.

How Style Transfer Works

I’m going to go through the basics of how a style transfer works. This may get a little detailed, so if you don’t care just scroll to the cool cat image below and make your own!

The first style transfer algorithm takes advantage of the VGG-19 – an image processing convolutional neural network (CNN) made for problems like labeling cats and dogs in images. A neural network like the VGG-19 is extremely complex, but so are the images it processes.

In each layer of the network, the input image is broken down into more and more fundamental components. A single filter in the network can be specialized in detecting horizontal edges, and another vertical edges. Another layer can take these edges and detect an eye or a nose. Yet another layer could take these features and detect an entire face.

Let’s Try it

To perform the style transfer, two images are fed through this VGG-19 network: the “content” and the “style”. A third image is created which tries to minimize the difference between the intermediate layers of the VGG-19 when both images are fed through. Let’s see what that looks like.

The cat picture content for style transfer
The oil painting art for style transfer

Here are the content and style we will analyze. The cat is the content, and the oil is the style.

Convolutional layer 1 of the content transfer network
Convolutional layer 3 of the contenttransfer network
Convolutional layer 5 of the contenttransfer network
Convolutional layer 1 of the style transfer network
Convolutional layer 3 of the style transfer network
Convolutional layer 5 of the style transfer network

These images correspond to the reconstructed image that minimizes the loss at layers 1, 3, and 5 for each the content and style. Deeper layers of the network have more abstract representations of the original image, so the reconstructed image is more different from the original. There is a clear difference in the reconstructed content and style, however.

Why do they look different?

If we look at the cat reconstructions, we notice the cat and stairs are unmoved. To put it technically, locality is preserved. Because of the content loss function, the reconstructed image must have the same features in the same places. This is very desirable, since we still want to see the cat and the stairs in the resulting image.

However, this is not the case for the style image. We don’t care as much about the structure of the oil painting, just the colors and textures. As we can see, the style reconstruction has similar colors in layer 1, textures in layer 5, but the original structure is absent. The style loss function is designed in a way that allows for this nonlocality. It also means that the textures from the oil can bend around the features in the content like the cat and stairs.

What’s the end result?

To complete the style transfer, Gatys discovered that the best results were achieved by using layer 4 for the content and layers 1-5 for the style. How does that look?

A style transfer of a cat with an artistic oil painting

Wow! The result is psychedelic. The cat is clearly visible, but as if redrawn by the oil painter. Even the blank white wall has been given its own artistic flair.

Make Your Own!

The key to a good style transfer is, well, a good style. I haven’t used the oil painting above much yet, but I expect it to look really good on a lot of other content images. Good content images can’t hurt either. Take your favorite photo of your pet, a photo with your family and friends, or a beautiful landscape, and experiment! Go ahead and try this online at NumArt and post your results! Google also has their own high fidelity style transfer at Deep Dream Generator (though it takes longer to run). I talk about these two websites and more in my Style Transfer Complete Tutorial.

Leave a Reply