
Photo shown under a Creative Commons license from MethoxyRoxy.
Everywhere I look, it seems as if I find the work of cognitive scientist Jeff Elman.
Not his work, actually, but those things that his work explains.
I awoke too early this morning to rouse the kids and get them ready for school. In those semi-conscious moments before waking them, I scrolled through the new items on Facebook.
There I saw this status update by an old college friend, “Patrick wants to know why I can remember lyrics from a song I knew 20 years ago but can’t for the life of me remember what I just walked into the kitchen to get?”
Even in my dawn-hating haze, I thought, “Elman net.”
It’s a common thought, too. So many things make sense when viewed through Elman’s neural network structure.
I try not to reify the Elman net. I try to be a dispassionate scientist and not force the data into a hypothesis.
But his network is that darned elegant.
Parallel processing
This is not a short story, you see. It features many players, decades of research, and intricate concepts.
But it’s an amazing story.
The human brain represents the most complicated processing device in the known universe.
By any account, the human brain holds about 100 billion neurons.
Yet to focus on neurons is to miss the point.
Magnificent though they are, neurons are merely simple binary devices. They are either on or off. An action potential either cascades down their length (nodes of Ranvier, anyone>), or it doesn’t.
The salient fact is that a neuron holds no information. Neurons merely represent electro-chemical instantiations of the 1 and 0 binary code of the computer on which I type.
Billions and billions
Instead, everything you know rests in the connections among neurons, called synapses. And these are quite plentiful. Most estimate that an average adult human has more than 100 trillion synapses.
With this number, any one remains unimportant. But great insight came when we discovered that information is stored in the patterns of connectivity.
The brain stores information by altering the connections among neurons, somewhat like traffic patterns in metropolitan areas are better understood by intersections than isolated stretches of highway.
Slow moving vehicle
Although neurons number in the billions and synapses in the trillions, remarkable limitations exist on these information-processing devices: they’re slow.
Just ask an electrician to design an electrical system submerged in water, and you’ll understand the challenge of wiring a brain bathed in cerebrospinal fluid. Conditions are, shall we say, suboptimal.
In part as a solution to this problem, evolution engineered the chemical transfer of signal of action potentials across a synapse via neurotransmitter.
Problem solved, but electricity moves near the speed of light. Action potentials lag behind, and the chemical transfer of neurotransmitters is far, far slower.
Without a clever solution, we’d be remarkably slow, never realizing a tree is falling until our brain was too flat to process anything.
The solution is a massively parallel system. A “thought,” as it were, does not travel down a single neuron. Instead patterns of signals cascade across networks.
Think of it this way: if you want to get 100 marbles somewhere in a hurry, it makes no sense to push them one at a time through a narrow pipeline. Instead, push 100 marbles at once through 100 narrow pipelines. They arrive together, far more quickly. Such is the difference between serial (one at a time) and parallel processing.
Building models
Once we understood this, cognitive scientists began to model this process, perhaps most famously by David Rummelhart and James McClelland in their groundbreaking two-volume Parallel Distributed Processing (1987).
Without going into detail, these neural network models can solve amazingly complex problems. But in their original versions, they have no concept of time. And time matters. A lot!
People began to try to build time into these models in a meaningful way, including the ironically named Michael Jordan (not that one).
You see, simple neural networks can learn a lot, and they can learn connections between pieces of information that most other types of learning simply cannot learn.
But it appears that they can never (to date) learn everyday things, such as grammar. Cognitive science and linguists share a tight bond, and if neural networks were to explain human cognition, then they had to make a reasonable prediction about completing the sentence, “John walked through the ___________.”
You want to say, “door,” most likely, and you would have been surprised if I had said “octopus,” and you likely would have been even more surprised if I had said “walked,” for the first is highly unlikely, and the second is not grammatical.
But you should have been far less surprised if I had said, “red,” as it is common for modifiers to precede nouns in English.
“Door” was most likely. “Archway” would have been OK. “Red” also unexpected unless you knew the door about which I spoke. But “octopus” and “walked” are out.
How do you model this?
Finding structure in time
Enter Jeffery Elman and his brilliant 1990 article, Finding Structure in Time from the journal Cognitive Science.
Without delving deeply into the architecture of neural networks, Elman made a very elegant change to the simple feed-forward neural network.
You see, the latter take some representation of the worlds (not unlike how your retinas turn light into neural signals) and process them into some kind of “output.”
So, for example, imagine seeing the letter “A” and knowing it is an “A.” In this case, light has been turned into an output of letter recognition.
Simple neural networks excel at this task, even learning to recognize the letter “A” through all the various different fonts, many of which are not A-like at all.
But what comes next? What is the most probable letter to come after an A? Or more importantly, if you’re reading along a string of type such as this, how do you know when an “A” signals thew end of a word? In English, words don’t typically end in “A.” But how do you actually know this?
Elman had a solution. You see, between the “input” and “output,” a simple neural network must have an internal representation. That is, it must be allowed to transform the “input” into some other, hidden, form in order to best produce the output.
So, to solve any meaningful problem, neural nets have hidden layers. There, math drives a seemingly magical process of turning the world into some other unique code needed to “understand” it.
Elman’s solution was elegant. Give the neural network access to its internal representation from the immediately preceding timestep.
This is like a short-term memory, but it is exceedingly short. One moment in time. So, for instance, if the network were processing this very sentence, it would have access to the “S” in “So” when it was processing the “o.”
But not quite. It would have access to its own internal representation of the “S” in “So” when processing the “o.”
One solitary moment in time.
But the use of these internal representations has dramatic results. Consider the processing of “The quick red fox jumps over the lazy brown dog.”
When you start, you have nothing. Your mind is proverbial blank.
Now: T Before: (nada)
Now: h Before: (T)
Now:e Before: h plus my internal representation of T
Now:(space) Before: e plus my internal representation of h (which included my internal representation of T)
So what happens? You build a short-term memory on the fly. It never actually exists more than a moment in time, but the information makes it stretch back much further.
Back to the Cabaret
Although it is hard to do it justice here, these networks are so incredibly powerful. They explain so much.
Because at each moment in time, a network that has “learned” about the world, expects certain things to come next, and it “knows” when the next thing is out-of-place.
For instance, many words in English end in “-ng.” Walking, running, typing. We have no problem pronouncing it.
But have you ever watched a native speaker of English try to pronounce Vietnamese names, such as “Nguyen”?
It’s funny really. Their faces usually contort trying to figure out how to begin.
You see, in English, words never begin with “Ng.” Ever. So the Elman net inside your head doesn’t know how to begin without the rest of the word to get it started.
It’s the same sound, largely. You say it dozens if not hundreds of times a day. But time matters. And “-ng” comes later rather than sooner.
Patrick’s question at last
In many ways, music is special. But it’s highly ordered. That Scorpions tune that Patrick hears in his head has played in his ear hundreds of times.
And every time, there was a very rigid structure to it. It precedes in the same order, and with small exceptions for live versions, etc., it’s always the same. Singing one line cues the next. One note cues the next.
It’s a well-rehearsed pattern, and Elman nets are exceptionally well versed at this, pardon the pun.
So the song is well stored, and any cue to it is highly predictive of what comes next. Once you get it started, it runs itself in memory.
Conversely, however, I am willing to bet that Patrick would flounder and stumble trying to reproduce the words to that song backward.
What about the kitchen? Well, Patrick has likely walked into the kitchen hundreds of times, too. Usually along a similar route.
This, too, gets stored as a memory in time. I’m guessing he never abruptly turns and smacks into the wall when alcohol is not involved.
It’s an automatic process because it is so well learned. But unlike the song, something different comes at the end of the loop.
He does something different almost every time he walks into the kitchen. So once you get there, the loop ends, and it has no idea what comes next.
En route, he engages in a well rehearsed automatic process. It’s so easy that it frees his mind to think of other things. The mind wanders to his plans for the day, and the immediate goal slips out of mind because it is not needed for the automatic process of walking. When he get’s there, short-term memory has been cleared, and the long-term loop is trying to guess the correct next step from the thousands of past results, none of which is accurate.
Think of it this way: you’re pretty good at predicting that next word in the Scorpions tune, but you’re pretty lousy at predicting the next song on the radio.
Tagged as:
brain,
cognitive science,
music