Neural Networks and Old Regularities
In my last post, I wrote about the possibility that neural networks can represent new regularities in nature. These regularities are impossible to concisely represent with the kinds of representations humans are comfortable with: chiefly rules, probabilistic statements, and metaphor. This can make their pronouncements seem eerie and magical. But neural networks (hereafter NNs) are nothing if not flexible, and can also represent old and familiar regularities. These are the type we can translate into easier-to-digest formats. And this is another avenue in which they can innovate.
To explore this, let’s talk about NNs copying famous painters.
For a variety of reasons, a lot of computer scientists are interested in teaching NNs to paint. Well, more precisely, how to generate images that apply a given artistic style to any photo. One of the early papers in this area is Gatys, Ecker, and Bethge (2015)(real “early” right?). In figure 1, they apply the style of Van Gogh’s Starry Night to a photograph of canal houses.
In contrast to the baffling genius of AlphaGo, this is a case where we can understand what the NN is doing. Copying the style of Van Gogh is not that mysterious. People do it all the time. Here’s a lovely painting of trees by Sylvie Burr.
This is not a case where regularities are mysterious and defy explanation. The regularities that characterize Van Gogh’s style include a texturing of thick swirling lines and a moody (rather than realistic) color palette. Show us examples of his style and even someone who has never seen his work will begin to pick out commonalities.
Neural Network Representations in images
NNs are also capable of representing those regularities. But, in an emerging theme, the way it represents those regularities is opaque. We can’t “tell” it what regularities characterize Van Gogh’s style. Instead, we give it lots of examples and let it rediscover these regularities on its own.
However, before we go on, we have to talk about a second set of regularities that a NN has to represent to transfer style. These are regularities in the content and perspective of images done in different styles. In Figure 1, we can tell that both images correspond to the same subject matter and view. They only differ in their styles. In contrast, figure 2 and the right-hand side of figure 1 have the same style (sort of), but clearly depict different subject matter. The NN has to represent both regularities in style and content.
It does this in different ways (I am drawing on this and this for the following section). For the computer scientists, “style” is understood as a form of diffuse texture. It is the curving lines and color palette of Van Gogh, not the composition and choice of subject (in this respect, they miss a lot). When they train a NN to match the style of a painter, the “style” of the image is converted into numbers corresponding to non-localized regularities over the whole surface of the image. For example, in figure 1, it doesn’t care much about making the left-hand side of the image dark (to match the mountain spire of Starry Night). Instead, it cares about matching the thick wavy lines and color palette of the entire image. By assessing how much the numerical score of the NN’s style differ from that of the example, it can be evaluated. The NN’s weights, links, and thresholds are tweaked in pursuit of this style target.
So much for style. To match the subject matter of a NN, the evaluation is done with respect to large-scale local regularities. In figure 1, it wants to see a recognizable sky on top, row houses in the middle, and water on the bottom.
Recognizing that two images are of the same subject and perspective, even if all their pixels are different, is closely related to the problem image classification. Image classification problems includes facial recognition (realizing two different images correspond to the same face) and labeling image content (as corresponding to, say, dogs, or cats, or “inappropriate content”). In all cases, we want to match regularities in the relative position of large chunks of visual image. For example, if we’re identifying faces, we might be comparing the relative size of the “nose” to the “mouth” and “eyes” (although actually we don’t know what the NNs are doing).
Identifying regularities in images with different styles is closely related to the image classification problem. So computer scientists actually borrow the NN representation of these image classification problems! In a literal sense, they start with the hidden layers of NNs trained on image recognition (including the nodes, links, weights, and thresholds of the NN) and simply copy them over to the style-transfer NN. And recall, we don’t really “understand” what the NN is doing to classify images. Here, the fact that we don’t “understand” means we struggle to translate what the NN is doing into metaphors and rules. But it’s not necessary for us to do that. The representation encoded by the NN still does what we want a representation to do: it conveys information about a regularity in nature. We don’t really “understand” the NN’s internal representation of the regularities in an image, but that doesn’t stop us from redeploying that representation in a new context. Does it reliably identify image content? Great, that’s all we care about!
Is this really innovation?
So, NNs are capable of representing regularities in image style and content in such a way that styles can be swapped and content retained. By my definition, this is an innovation: the NN has stepped into the unknown and exploited regularities to generate something a lot more “interesting” than a random collection of pixels. But it’s fair to say these innovations are not world-changing. Indeed, they can fairly be described as derivative. An artist who only copied other artists’ styles wouldn’t be described as innovative, even if he did apply those styles to new contexts.
This is related to a critique of NNs by NYU psychologist Gary Marcus:
In general, the neural nets I tested could learn their training examples, and interpolate to a set of test examples that were in a cloud of points around those examples in n-dimensional space (which I dubbed the training space), but they could not extrapolate beyond that training space. (p. 16)
Put another way, NNs are good at combining aspects of what they are trained on (the content of pictures, the style of painters), but they are always trapped in the cage of these examples. A NN trained in the above manner won’t ever take us beyond Van Gogh.
But this is not as much of a shortcoming as it seems. The ability to usefully combine aspects of disconnected things is, in fact, one of the fundamental creative acts. Indeed, Keith Sawyer (who wrote the book on creativity) defines innovation as “a new mental combination that is expressed in the world.” (pg. 7) I’ll briefly give examples in three different domains.
- In art, borrowing and recombining ideas can be seen in super-literal forms. Think like Pride and Prejudice and Zombies, and mashup artists like GirlTalk. But it’s also there, just below the surface, in things that like Star Wars.
- Earlier, I asserted all technologies are combinations of pre-existing components. The internal combustion engine is one clear example. The modern combustion engine is built from a dizzying set of components that were often pioneered elsewhere. To take two examples, crankshafts and flywheels together convert uneven back-and-forth motion of a piston into smoothly rotating energy. Crankshafts had previously been employed to transform the rotational motion in waterwheels and windmills into back-and-forth motion. And flywheels had long been used to give potter’s wheels smooth and continuous motion. (Dartnell, pg. 201–207)
- Lastly, the product of sexual reproduction is of course a new organism that draws on a mix of genes from each of its parents. Over time, this mixing, matching, and selection generates entirely new species.
The difference between the above and what NNs are doing is a different of degree, rather than a difference of kind. Most of the innovation done by humans and nature is also bound by “the training space” of available examples.
Expanding the Training Set
The difference is that humans and nature have a vastly, vastly more diverse storehouse of training examples than NNs. It’s not possible for a NN trained to reproduce the style of Van Gogh to go beyond him, because the only examples of painting it has are those of Van Gogh. To develop a new style, it would need examples drawn from other styles of painting at a minimum. More importantly, to really generate something we’ve never seen before, the NN would need the capacity to interpolate between different styles. Is this possible?
Yes, and it’s been done. Dumoulin, Shlens, and Kudlur (2017) from Google Brain trained a single NN to transfer the styles of 32 different artists to new images. Because the same NN represented all these different styles, the NN is also capable of applying interpolations of their styles to images. Figure 3 is an example from their paper:
In this figure, the style of Starry Night has been applied to a picture of Brad Pitt’s face in the upper left corner. Head of a Clown by Georges Rouault is the upper right style, The Scream by Edvard Munch is the lower left style, and Bicentennial Print by Roy Lichtenstein is the lower right style. In between we have interpolations between the different styles. Subsequent work by Ghiasi et al (2017) (a group that includes the same team as above) generalized these techniques to a much wider set of painting styles.
This work shows its possible for NNs to develop styles that did not previously exist. Are they any good? In the small set of examples given in Figure 3, I tend to like the interpolations between Rouault and Lichtenstein more than I like pure Rouault Pitt and pure Lichtenstein Pitt. But the main point of this post is simply to show NNs can innovate, even when they are using the kinds of regularities that humans are able to understand.
Now, what I am not claiming is that NNs can match humans in our ability to combine and interpolate between different ideas. Reading these papers, it’s clear that representing the different styles of painting in a NN was a major technical challenge that took considerable work to implement. Worse, their solution to this problem cannot be applied to problems and data different from the painting-style problem, at least without considerable modification (and maybe not even then). It is going to be a long time before a single NN can combine ideas and concepts from vastly different domains like us. But on the other hand, a lot of progress has been made in just 3 years.