Computer vision

Neural Style Transfer: Everything You Need to Know [Guide]

9 min read

Sep 8, 2021

Neural Style Transfer is a technique that allows us to generate an image with the same "content" as a base image, but with the "style" of our chosen picture. Learn how it works and what are its real-world applications.

Pragati Baheti

Pragati Baheti

Imagine that you could transfer the style of Picasso into your own pieces of art. Your paintings would look exactly as if Pablo Picasso was actually the one creating them.

In fact, if presented side by side, they would be so indistinguishable that nobody could tell the difference.

Sounds pretty crazy, right?

Well, not if we are discussing Machine Learning.

You see, there's a class of Convolutional Neural Networks that offers an interesting solution, which makes this achievable using Deep Learning and Neural Networks—

Meet Neural Style Transfer.

Here’s what we’ll cover:

  1. What is Neural Style Transfer?

  2. How does Style Transfer work?

  3. Neural Style Transfer use cases

A video labeling annotation tool where drone footage of a port inspection is being annotated

Video annotation

AI video annotation

Get started today

A video labeling annotation tool where drone footage of a port inspection is being annotated

Video annotation

AI video annotation

Get started today

Ready to streamline AI product deployment right away? Check out:

What is Neural Style Transfer?

Neural Style Transfer is the technique of blending style from one image into another image keeping its content intact. The only change is the style configurations of the image to give an artistic touch to your image.

The content image describes the layout or the sketch and Style being the painting or the colors. It is an application of Computer Vision related to image processing techniques and Deep Convolutional Neural Networks.

Neural style transfer performed on Mona Lisa painting

Pro tip: Would you like to learn more about Neural Networks? Read The Essential Guide to Neural Network Architectures.

Neural Style Transfer deals with two sets of images: Content image and Style image.

This technique helps to recreate the content image in the style of the reference image. It uses Neural Networks to apply the artistic style from one image to another.

Neural style transfer opens up endless possibilities in design, content generation, and the development of creative tools.

How does Style Transfer work?

Now, let’s explore how NST works.

The aim of Neural Style Transfer is to give the Deep Learning model the ability to differentiate between the style representations and content image.

NST employs a pre-trained Convolutional Neural Network with added loss functions to transfer style from one image to another and synthesize a newly generated image with the features we want to add.

Style transfer works by activating the neurons in a particular way, such that the output image and the content image should match particularly in the content, whereas the style image and the desired output image should match in texture, and capture the same style characteristics in the activation maps.

These two objectives are combined in a single loss formula, where we can control how much we care about style reconstruction and content reconstruction.

Here are the required inputs to the model for image style transfer:

  1. A Content Image –an image to which we want to transfer style to

  2. A Style Image – the style we want to transfer to the content image

  3. An Input Image (generated) – the final blend of content and style image

Neural Style Transfer basic structure

Training a style transfer model requires two networks: a pre-trained feature extractor and a transfer network.

NST uses a pre-trained model trained on ImageNet- VGG in TensorFlow.

Images themselves make no sense to the model. These have to be converted into raw pixels and given to the model to transform it into a set of features, which is what Convolutional Neural Networks are responsible for.

Thus, somewhere in between the layers, where the image is fed into the model, and the layer, which gives the output, the model serves as a complex feature extractor. All we need to leverage from the model is its intermediate layers, and then use them to describe the content and style of the input images.

The input image is transformed into representations that have more information about the content of the image, rather than the detailed pixel value.

The features that we get from the higher levels of the model can be considered more related to the content of the image.

To obtain a representation of the style of a reference image, we use the correlation between different filter responses.

Neural Style Transfer basic structure

Content loss

It helps to establish similarities between the content image and the generated image.

It is intuitive that higher layers of the model focus more on the features present in the image i.e. overall content of the image.

Content loss is calculated by Euclidean distance between the respective intermediate higher-level feature representation of input image (x) and content image (p) at layer l.

                       

Content loss formula



It is natural for a model to produce different feature maps in higher layers being activated in the presence of different objects.

This helps us to deduce that images having the same content should also have similar activations in the higher layers.

Style loss

Style loss is conceptually different from Content loss.

We cannot just compare the intermediate features of the two images and get the style loss.

That's why we introduce a new term called Gram matrices.

Gram matrix is a way to interpret style information in an image as it shows the overall distribution of features in a given layer. It is measured as the amount of correlation present between features maps in a given layer.

Style loss structure

Style loss is calculated by the distance between the gram matrices (or, in other terms, style representation) of the generated image and the style reference image.

The contribution of each layer in the style information is calculated by the below formula:

                               

 contribution of each layer in the style information formula

Thus, the total style loss across each layer is expressed as:

Total style loss formula

where the contribution of each layer in the style loss is depicted by some factor wl.

Model architecture overview

The architecture of the NST can be designed in such a way that it can range from applying a single style in an image to allowing mix and match of multiple styles.

Let’s have a look at the different possibilities.

Neural style transfer model architecture

Single style per model

Neural networks are employed to recreate the styled images in a single, feed-forward pass. For example, VGG16 models pre-trained on ImageNet are employed. Each model is small and compact both in terms of the model’s depth and size and is trained for a single style blend at a particular pass.

Multiple styles per model

A single transmission network may create images in a variety of styles and merge them together.

Style transfer networks are fed a content image and style images with an additional vector, indicating how much of each style should be applied to the image.

This is a good technique to blend multiple styles and eliminates the overhead to train and store various models for different styles.

Arbitrary styles per model

The above two models are limited to the use-case of producing images in styles that they've already seen during training.

Some random changes in style cannot be accommodated in the above models.

NST with an arbitrary style transfer model takes a content image and a style image and learns to extract and apply any variation of style to an image.

Style transfer optimizations and extensions

The stability of NST while training is very important, especially while blending style in a series of frames in a video.

The style should be constant in successive frames to avoid flickering effects in terms of color, contrast, and inconsistencies in terms of placement of objects. Two sequential frames are stylized and compared to each other such that the model learns to produce the same stylization for an object as it moves through the frame.

To ensure this an additional loss term for “temporal coherence” is added in the training of the model.

Additional attentional should be given to the perseverance of color of the content image while accounting for the stokes/style pigments from a reference image.

Pro tip: Style Transfer is an important extension of Computer Vision. Refresh your knowledge by reading Computer Vision: Everything You Need to Know.

Neural Style Transfer use cases

Finally, let's have a look at some of the real-world applications of the Neural Style Transfer.

Photo and video editors

Style transfer is extensively used in photo and video editing software.

These deep learning approaches and professional style transfer models can easily be applied to devices like mobile phones and give users a real-time ability to style images and videos.

Art and entertainment

Style transfer also provides new techniques that can change the way we look and deal with art.

It makes high-rated and over-priced artistic work reproducible for office and home decor, or for advertisements. Transfer models may help us commercialize art.

Gaming and Virtual reality

There are many cloud-powered video game streams that use image style transfer.

These models help the developers to provide interactive environments with customized artistic styles to users. This provides a 3D touch to the game and helps to enrich the artist inside every developer.

Much similar to gaming, VR apps help to tell visual stories through their applications, games, films, and more.

Pro tip:

Summary

Let's do a quick recap of everything we've learned in this guide:

  • Neural Style transfer builds on the fact to blend the content image to a style reference image such that the content is painted in the specific style

  • NST employs a pre-trained Convolutional Neural Network for feature extraction and separation of content and style representations from an image

  • NST network has two inputs: Content image and Style image. The content image is recreated as a newly generated image which is the only trainable variable in the neural network

  • The architecture of the model performs the training using two loss terms: Content Loss and Style Loss

  • Content loss is calculated by measuring the difference between the higher-level intermediate layer feature maps

  • Style loss can be measured by the degree of correlation between the responses from different filters at a level.  

  • Image style transfer is a growing topic of research and aims to boost the artistic skills of every person. It is currently being used in diversified domains like gaming, virtual reality, photo and video editors etc

Read more:

An Introduction to Autoencoders: Everything You Need to Know

The Beginner's Guide to Deep Reinforcement Learning

What is Data Labeling and How to Do It Efficiently [Tutorial]

An Introductory Guide to Quality Training Data for Machine Learning

The Beginner’s Guide to Contrastive Learning

9 Reinforcement Learning Real-Life Applications

Mean Average Precision (mAP) Explained: Everything You Need to Know

A Step-by-Step Guide to Text Annotation [+Free OCR Tool]

The Essential Guide to Data Augmentation in Deep Learning

A Generative AI tool that automates knowledge work like reading financial reports that are pages long

Knowledge work automation

AI for knowledge work

Get started today

A Generative AI tool that automates knowledge work like reading financial reports that are pages long

Knowledge work automation

AI for knowledge work

Get started today

Pragati Baheti

Pragati Baheti

Pragati Baheti

Pragati Baheti

Pragati is a software developer at Microsoft, and a deep learning enthusiast. She writes about the fundamental mathematics behind deep neural networks.

Next steps

Label videos with V7.

Rewind less, achieve more.

Try our free tier or talk to one of our experts.

Next steps

Label videos with V7.

Rewind less, achieve more.