Why is Knowledge Transfer Hard in Neural Nets but Easy with Metaphor?

Matt Clancy
8 min readJul 23, 2018


Neural networks (NNs) and metaphors are both ways of representing regularities in nature. NNs pass signals about data features through a complex network and spit out a decision. Metaphors take as a given that we know something, and then assert something else is “like” that. In this post, I am thinking of NNs as a form of representation belonging to computers (even if they were initially inspired by the human brain), and metaphors as belonging to human brains.

These forms of representation have very different strengths and weaknesses.

Within some narrow domains, NNs reign supreme. They have spooky-good representations of regularities in these domains, best demonstrated by superhuman abilities to play Go and classify images. On the other hand, step outside the narrow domain and they completely fall apart. To master other games, the learning algorithms AlphaGo used to master Go would essentially have to start from scratch. It can’t condense the lessons of Go down to abstract principles that apply to chess. And it’s algorithms might be useless for a non-game problem such as image classification.

In contrast, a typical metaphor has opposite implications: great at transferring knowledge to new domains, but of more limited value within any one domain. Anytime someone tells a parable, they are linking two very different sets of events in a way I doubt any NN could do. But metaphors are often too fuzzy and imprecise to be much help for a specific domain. For instance, Einstein’s use of metaphor in developing general relativity (see Hofstadter and Sander, chapter 8) pointed him in the right direction, but he still needed years of work to deliver the final theory.

This is surprising, because at some level, both techniques operate on the same principles.

Feature Matching

Metaphor asserts two or more different things share important commonalities. As argued by Hofstadter and Sander, one of the most important forms of metaphorical thinking is the formation of categories. Categories assert that certain sets of features “go toghether.” For example, “barking,” “hairy,” and “four legs” are features that tend to go together. We call this correlated set of features the category “dog.” Categories are useful because they let us fill in gaps when something has some features, but we can’t observe them all.

This kind of categorization via feature tabulation was actually one of the first applications of NNs. As described by Steven Pinker (How the Mind Works, pgs. 112–131), a simple auto-associator model is a NN where each node is connected to another. These kinds of NNs easily “fill in the gaps” when given access to some but not all of the features in a category. For example, if barking, hairy, and four legs are three connected nodes, then an auto-associator is likely to activiate the nodes for “hairy” and “four legs” when it observes “barking.” Even better, these simple NNs are easy to train. And if such simple NNs can approximate categorization, then we would expect modern NNs with hidden layers to do that much better.

Now, as I’ve argued elsewhere, proper use of metaphor isn’t as simple as matching features. The “deep features” of a metaphor are the ones that really matter. Typically there will be only a small number of these, but if you get them right, the metaphor is useful. Get it wrong, and the metaphor leads you astray.

But this isn’t so different from NNs either. NNs implement a variety of methods to prune and condense the set of features, almost as if they too are trying to zero in on a smaller set of “deep features.”

  • Stochastic gradient descent (a major tool to the training of NNs) involves optimizing on a random subset of your data in each period, rather than all the data. In essence, we throw some information away each iteration (although we throw away different information each time). Now, this is partially done to speed up training times, but it also seems to improve the robustness of the NN (i.e., it is less sensitive to small changes in the data set).
  • Dropout procedures involve randomly setting some parameters to zero during the optimization process. If the parameter isn’t actually close to zero, the optimization will re-discover this fact, but it turns out you get better results if you frequently ask your NN to randomly ignore some features of its data.
  • Information bottlenecks are NN layers with fewer nodes than the incoming layer. They force the NN to find a more compact way to represent its data, again, forcing it to zero in on the most important features.

So, to summarize. Using metaphor involve matching the deep features between two different situations. NNs are also trained to seek out the “deep features” of training data, the ones that are most robustly correlated with various outcomes. So why don’t NNs transfer knowledge to new domains as well as metaphors?

What are the Features?

It may come down to the kinds of features each picks out. As discussed in another post, the representations of NNs are difficult (impossible?) to concisely translate into forms of representation humans prefer. It’s hard to describe what they’re doing. So we can’t directly compare the deep features that a NN picks out and compare them to the deep features we humans would select.

However, image classification NNs give us strong clues that NNs are picking up things very different from what we would select. There is an interesting literature on finding images that are incorrectly classified by NNs. In this literature, you start with some image and you tweak as few pixels as little as possible to fool the NN into an incorrect classification. For example, this image from the above link is incorrectly classified as a toaster:

Figure 1. Fooling image classification neural networks (source)

How can this be? Whatever the NN thinks a toaster looks like, it’s obviously different from what you or I would think. The huge gap between the deep features we identify and those identified by a NN are best illustrated by the following images from the blog of computer scientist Filip Piękniewski.

Figure 2. Filip Piękniewski trained a NN to tweak gray images until they were classified with high confidence (source)

Filip starts with gray images and trains a NN to modify pixels until a second NN gives a confident classification. The top left image is classified as a goldfist with 96% probability. The bottom right is classified as a horned viper with 98% probability. The results are kind of creepy, as they highlight the huge gulf between how “we” and NNs “see.” Even though metaphor and NN both involve zeroing in on the deep features of a problem, the features selected are really different.

Different Data, Different Features

[Warning: this isn’t my area but it is my blog so I’m going there anyway]

One reason figure 2 is so alien to us is that it comes from a very alien place. Compared to a human being, a NN’s training data is extremely constrained. Yes they see millions of images, and that seems like a lot. But if we see a qualitatively different image every three seconds, and we’re awake 16 hours a day, then we see a million distinct images every 52 days. And unlike most image classification NNs, we see those images in sequence, which is additional information. Add to that inputs from the rest of our senses, plus intuitions we get from being embodied in the world, plus feedback we get from social learning, plus the ability to try and physically change the world, and it starts to become obvious why we zero in on different things from NNs.

In particular, NNs are (today) trained to perform very well on narrow tasks. Human beings navigates far more diverse problems, many of which are one-of-a-kind. That kind of diverse experience gives us a better framework for understanding “how the world works” on the whole, but less expertise with any one problem. When faced with a novel problem, we can use our blueprint for “how the world works” to find applicable knowledge from other domains (figure 3). And this skill of transferring knowledge across domains is one that we get better at with practice, but which requires knowledge of many domains before you can even begin to practice.

Figure 3. “I gave it a cold.”

My earlier post on the use of metaphor in alchemy and chemistry illustrates how a better blueprint for “how the world works” can dramatically improve feature selection. Prior to 1550, alchemists used metaphor extensively to guide their efforts, but it mostly led them astray. They chose metaphors on the basis of theological and symbolic similarities, rather than underlying interactions and processes. This isn’t a bad idea, if you think the world is run by supernatural entities with a penchant for communicating revelations and other hidden knowledge to mankind. But a better understanding of “how the world works” (i.e., according to impersonal laws) allowed later chemists to choose more fruitful metaphor than the alchemists.

When I see something like Figure 2, I see an intelligence that hasn’t learned how the world “really is.” Animals and physical objects are clumps of matter, not diffuse color patterns, no matter how much those color patterns align with previously seen pixel combinations. But I can see how it would be harder to know that if you hadn’t handled animals, seen them from different angles, and been embodied in physical space.

So I think one reasons human metaphor transfers knowledge so well is that it has so much more diverse training data to draw on. We pick deep features with an eye on “how the world works.” So why don’t AI companies just give their own NNs more diverse training data? One reason is that important parts of the structure of NNs still have to be hand-tuned to the kind of training data. You can’t just let loose an image classification problem on the game of Go and expect to get comparable results. There seems to be a big role for the architecture of NNs.

Whatever the “right” architecture is for the diverse training data humans encounter, evolution seems to have found it. But it took a long time. Evolution worked on the problem for hundreds of millions of years in parallel over billions of life forms. For contrast, AlphaGoZero played 21mn games of Go to train itself. At one hour per game, that works out to a bit under 2,400 years, if the games were played at human speed one at a time.

In a sense, I think this makes NNs more impressive — look how much they’ve done with the equivalent of a paltry 5,000 years of evolution! But I also think it provides a warning that matching broadly human performance might be a lot harder than recent advances have suggested.