Menelaos Page: Cinematic AI: AI for Filmmaking!

Analysing cinema is a time-consuming process. In the cinematography domain alone, there's a lot of factors to consider, such as shot scale, shot composition, camera movement, color, lighting, etc. Whatever you shoot is in some way influenced by what you've watched. There's only so much one can watch, and even lesser that one can analyse thoroughly.

This is where neural networks offer ample promise. They can recognise patterns in images that weren't possible until less than a decade ago, thus offering an unimaginable speed up in analysing cinema. I've developed a neural network that focuses on one fundamental element of visual grammar: shot types. It's capable of recognising 6 unique shot types, and is ~91% accurate. The pretrained model, validation dataset (the set of images used to determine its accuracy), code used to train the network, and some more code to classify your own images is freely available.

What is Visual Language, and Why Does it Matter?

When you're writing something — an email, an essay, a report, a paper, etc, you're using the rules of grammar to put forth your point. Your choice of words, the way you construct the sentence, correct use of punctuation, and most importantly, what you have to say, all contribute towards the effectiveness of your message.

Cinema is about how ideas and emotions are expressed through a visual form. It's a visual language, and just like any written language, your choice of words (what you put in the shot/frame), the way you construct the sentence (the sequence of shots), correct use of punctuation (editing & continuity) and what you have to say (the story) are key factors of creating effective cinema. The comparison doesn't apply rigidly, but is a good starting point to start thinking about cinema as a language.

The most basic element of this language is a shot. There's many factors to consider while filming a shot — how big should the subject be, should the camera be placed above or below the subject, how long should the shot be, should the camera remain still or move with the subject, and if it's moving, how should it move? Should it follow the subject, observe it from a certain point while turning right/left or up/down and should the movement be smooth or jerky. There are other major visual factors, such as color and lighting, but we'll restrict our scope to these factors only. A filmmaker chooses how to construct a shot based on what he/she wants to convey, and then juxtaposes them effectively to drive home the message.

Neural Networks 101

'AI' is most often a buzzword for deep learning, the field that uses neural networks to learn from data.

The key idea is that instead of explicitly specifying patterns to look for, you specify the rules for the neural network to autonomously detect patterns from data. The data could be something structured, like a database of customers' purchasing decisions, or something unstructured, like images, audio clips, medical scans, or video. Neural networks are good at tasks like predicting a customer's desired products, differentiating the image of a dog and a cat, the mating calls of dolphins and whales, a video of a goal being scored vs. the goalkeeper saving the day, or whether a tumor is benign or malignant.

With a large enough labelled dataset (say 1000 images of dogs and cats stored separately), you could use a neural network to learn patterns from these images. The network puts the image through a pile of computation, and spits out two probabilities: P(cat) and P(dog). You calculate how wrong the network was using a loss function, then use calculus (chain rule) to tweak this pile of computation to produce a lower loss (a more correct output). Neural networks are nothing but a sophisticated mechanism of optimising this function.

If the network's output is far off from the truth, the loss is larger, and so the tweak made is also larger. Tweaks that are too large are bad, so you multiply the tweaking factor with a tiny number known as the learning rate. One pass through the entire dataset is known as an epoch. You'd probably run through many epochs to reach a good solution; it's a good idea to tweak the images non-invasively (such as flipping them horizontally), so that the network sees different numbers for the same image and can more robustly detect patterns. This is known as data augmentation.

Neural networks can transfer knowledge from one project to another. It's very common to take a network that's been trained with 14 million images of a thousand common objects (ImageNet), and then tweak it to adapt to your project. It works because it has already learnt basic visual concepts like curves, edges, textures, eyes, etc, which come in handy for any visual task. This process is known as transfer learning.

Rinse and repeat this process carefully, and you have in your hands an 'AI' solution to your problem.*

If that piqued your interest, I suggest you watch this (~19mins) for a fairly detailed explanation of how a neural network works. If you're bursting with excitement, follow through with this course .

Neural networks burst into popularity in the past decade with the development of large datasets and the ability to leverage GPUs (graphics cards) for the heavy computation demanded by neural nets.

Rapid advances on the technical side open up opportunities to solve novel problems like the one presented in the post. Shot scale recognition is one of many possible applications to film. It's possible to recognise camera movement with 3D CNNs (convolutional neural networks); the only missing piece is the dataset. Camera angles could be detected with the same methodology as this project, but the dataset for it doesn't exist either. Cut detection — transition from one shot to the next — has been worked on extensively, and can be adopted to film*.

*Frederic Brodbeck's Cinemetrics project does something similar, and is worth looking at.

Barry Salt's Database also does something similar, but at a higher level (summary statistics for the entire movie); a filmmaker would find individual shot analysis more useful.

Source: https://rsomani95.github.io/ai-film-1.html

Primitive Source: https://towardsdatascience.com/a-i-for-filmmaking-f2a2197020aa

Menelaos Page

Σελίδες

5/11/20

Cinematic AI: AI for Filmmaking!

What is Visual Language, and Why Does it Matter?

Neural Networks 101

Δεν υπάρχουν σχόλια:

Δημοσίευση σχολίου