Participate in building a complete model of being human because science is a community effort. The uncharted territory to explore is the holographic mind. The resources that I am using to create a coherent model are the recently developed mathematics of fuzzy systems, fractals, self organizing systems, cellular automata, neural nets. "Play" is the first experience of being self organizing and creative.
The brain is a biological neural network in which modern scientific research has discovered structural properties that are similar to a hologram.
So how did the human species evolve culture, and all the power of group learning passed down for generations outside of biological constraints. By the development of language as meta maps that allow actual brain sharing even though each individual is still constrained by self organization to form their own implementations of these identical maps.
This is a linking of the inhibitory computational properties of the neocortex to form meta structures that have fractal properties especially of a cantor dust that can interface with "white light" holograms to separate the frequencies as a defraction grating. This was pointed to by the Taoists pictorial representation of the I Ching and can be constructed from an information connectivity with the real local cosmos!
It is the purpose of this chapter to explain these statements and later chapters to show what are the consequences and implications for religion and science. There are many pieces to this jigsaw puzzle or cards to this deck. I will not presume to teach the detailed nature of these immense subjects, but to explain how they fit together and to use the material introduced here in later chapters on religion and my model of reality. I will provide what I hope is a guide thru this forest of new research into the new view as given by science of reality as complex self organizing and self regulating systems in which chaotic processes are the stuff from which emerges life.
AND NEURAL NETS:
This self or person who is conscious is made of the same holograms, but has the fiction or illusion of viewing something separate: consciousness "extracts" itself from itself! This means that all the procedures used by all life forms throughout evolutionary time are also in a holographic form and are all present in our consciousness: all time is now! The development of the mathematics of neural networks opens up our understanding to how networks developed at different evolutionary times that exist "side by side" in holographic mind conflict and interact. Humans are not the end product of evolution such as the space shuttle is the end of development in transportation, but the sum and content of all of evolution as if the chariot, horse and buggy, clipper ship and steam engine were all integrated into the operation of the space shuttle. [At least then it could land on water!] This is also the paradigm of Astrology: humans receive instructions from all our mammal, reptilian, fish and insect ancestors! So what is it to be human that is separate from those messages? It is our sharing and opening and trusting and inter-depending thru human communication. This is our self realization and spiritual resource.
I have briefly introduced neural networks
in the opening of this chapter. I have used this theoretical work
as a guide in understanding and shaping my study of the technical
biology of the holographic brain, and subsequently, of human behavior.
This follows a well know principle of the scientific method that
will be discussed in subsequent chapters: completability. Present
knowledge cannot contradict known facts in other fields unless
it replaces it, but must be completable given the entire context
of knowledge. This applies to neural networks and biology: it
is known that supervision and learning from behavior modification
or stimulus response automatic mechanisms are biologically impossible.
We know from human behavior that competition and beauty and creativity
exist. So I am looking for neural network structures that fit
these facts: that are completable within these contexts. And sure
enough, there are whole classes of competitive all-or-nothing
neural networks, and another class that offers adaptive resonant.
Both of these types are self organizing. So I will start with
the details about them and the characteristics of the computational
spaces that emerge from these networks.
Here is some material that first caught my interest in the literature of neural networks. I am briefly reporting the findings. [The complete text hopefully can be found on the WWW or in your library.]
Comments on excerpts from "Naturally Intelligent Systems"
by Maureen Caudill, and Charles Buttler
" AI [artificial intelligence] attempts to capture intelligent behavior without regard to the underlying mechanisms producing the behavior. This approach involves describing behaviors, usually with rules and symbols. In contrast, neural networks do not describe behaviors; they imitate them."
[This is a metaprocess and again shows the difference between "thinking" and the underlying primitives of language which are the neural networks that resonate as the meaning of the words.
Hypercubes are equivalent to The I Ching and fractal Cantor dusts
as well as defraction gratings. They can be fuzzy by having different
dimensional hypercubes or I Ching patterns embedded within each
other. It all ties together!
[That human minds have the following qualities
in far greater development is clear. Understand that living intelligent
systems are not limited by the need to do numeric calculations
or store separate memories in holographic and fractal spaces.
But it is obvious that we use these kinds of processes as competition,
resonance, as in worship and recognition of beauty, filtering
and learning rules at many levels of resolution. In the next chapter
I will make the case that life can make these things available
as self organizing only when we connect to the cosmos as sources
for the components of these systems!]
"An autoassociative memory is one where each data item is associated with itself. In this case, a data pattern is recalled by providing part of a data item or a garbled version of the whole item. A heteroassociative memory is one in which two different data items, say A and B, are associated with each other. A can be used to recall B, and B can be used to recall A. ... Consider the case of a system used to "clean up" stereotyped pictures. We might, for example, want to give the network garbled or incomplete versions of the letters of the alphabet and receive back complete, legible versions. To do this, we would store "clean" examples of the letters A, B, and so on. Afterward, when we input a noisy or garbled F, for instance; we would get back the original, legible F pattern. As long as there is not so much garbling of the input patterns that the wrong associations are made, an autoassociative memory will clean up input patterns presented to it and return correct versions."
[ Obviously, (I hope) culture uses this a
lot to keep the integrity of its meanings. But if the neural networks
as holoprocesses are tied to coherent sources in the cosmos, there
will be "drift" where the meaning is not recognizable
because the cosmos "moves". Thus trying to establish
"absolute unmoving final" instructions for living or
knowing "truth" does not work.]
[This model is used in my model to represent
how personality fits into a social structure. Although we think
we are unique, the more we suppress our feelings and other areas
of our "animal" intelligence, the more we participate
in a cultural energy surface common to or shared by most of those
in a language group. The more we are "consistent" and
obey the rules of culture, the more we are minima or solutions
on a surface that may be like a lake with many holes in the bottom
producing whirlpools in which we are caught. This kind of behavior
is more pronounced during school ages and is called peer pressure,
"Probably the most useful construct in the study of crossbar networks is the energy surface."
[As a dimensional construct that can be in 3 dimensions or slices of higher dimensions it is embedded in higher dimensions, or as a mobius within a twisted dimension of broken symmetry.]
"The concept provides a useful physical analogy to the way
a crossbar network stores information. Imagine that we have a
supply of some soft plastic substance, say modeling clay, that
we can dent by pressing it firmly with a finger. Let's suppose
we want to associate the size and weight of several spheres made
from different materials like lead, cotton, and wood. We spread
the modeling clay in a rectangle on a large table so that it is
an even 3 inches thick all over. We then label two adjacent sides
of the rectangle with the ranges of the numbers we want to use.
Along the side nearest us, we tape a scale with numbers ranging
from zero to the maximum diameter we expect to encounter; along
the side to our left, we tape another scale with numbers ranging
from zero to some maximum expected weight.
For each sphere, we press our finger into
the clay at the spot corresponding to the measured values of the
diameter and weight, leaving a conical dent. As a reminder, we
can place a slip of paper containing the name of the material
at the bottom of each dent. ...
This is a slightly simplified model of the
way an associative memory stores information. When we have made
dents for all samples, we find that spheres that are similar in
size and weight are associated. In neural network terms, memories
of similar spheres are stored near each other. Given a new sphere
of unknown makeup, we can easily find which of our example materials
it most closely resembles in these two combined properties. If
we place a marble on the surface of the clay at the spot corresponding
to the size and weight of the unknown sphere, it will roll to
the bottom of the nearest dent. Readers familiar with introductory
physics recognize that the ball minimizes its potential energy
by moving to the lowest accessible position on the surface; it
seeks the nearest potential energy minimum.
If we look a little more closely at the characteristics of the clay surface, we see that there are two kinds of spots on it. If we release a marble at a place where the surface is level, it does not move. If we release the marble on a slope, it moves into the nearest dent that is downhill from it, as if the dent attracted the ball. By placing the marble at many different places around a dent, we can map out the region of influence of that dent, its "basin of attraction."
Crossbar networks have their own "energy surfaces" that
are comparable to the clay surface of our example. Mathematically
the crossbar's energy is analogous to the potential energy of
the ball on the clay surface, but in the network, each energy
value corresponds to a state of the network, that is, to a unique
set of synapse weights and neurode activations. Researchers working
with crossbar networks often talk about "sculpting the energy
surface." They mean that they store patterns in the weight
matrix so that appropriate "dents" are created in the
energy surface. Just as with our clay surface, these dents have
basins of attraction. Any input pattern that causes the state
of the system to fall within the basin of attraction of a dent
will also cause the system to recall the memory associated with
"Another complicating factor is the
problem of storing patterns that are very similar to each other.
Crossbar associative memories produce the
fewest recall problems when the patterns stored are orthogonal.
In essence, two patterns are orthogonal if they do not overlap,
that is, if they are completely distinct. Orthogonal three dimensional
vectors, for instance, are all perpendicular to each other in
space; they do not overlap. Of course, real-world problems are
rarely orthogonal. Most of the time, we cannot be sure that the
data patterns we must store are sufficiently different from each
other to enable a crossbar to record them reliably. ..."
[This problem of orthogonal separation fits
very well into my model of connection with the cosmos. In this
case it is the functions that derive answers that are separable
or too indistinguishable. These functions exist on different scales
or levels of resolution that are correlated with the planets and
"There is yet another problem associated with storage and recall by crossbar associative memories. We can best discuss this problem using the energy surface analogy. When we "sculpt the energy surface" by placing the energy dents or wells where we want to create memories, we invariably and
unavoidably end up adding extra energy wells we don't want. It is as though we have to walk across our clay surface to get to the place where we want to make a dent, and the process of walking across the clay leaves dents where we don't want them or smoothes out dents where we have stored data.
These extra energy wells, called "spurious
minima" by researchers, cause crossbars in general, and BAMs
in particular, sometimes to generate output patterns that have
nothing whatever to do with any of the input patterns stored.
When this happens, the imaginary marble placed on the surface
has rolled into one of those extra energy wells rather than into
one of. the wells we deliberately produced. In the terminology
of the computer world, it's "good stuff in, garbage out."
[Again this correlates with cultural behavior
as "sin", error and negativity. Thus as we adjust the
"distance" between our functions, which are other neural
networks on other fractal scales, we undo these false solutions
or reveal true solutions that have been "filled in",
but create other errors at other places. When we wake up to being
self correcting and to self regulation, and stop trying to fix
something that is what it is and is not really broken, but is
just a cultural characteristic. This built-in production of error
functions is more obvious in times like now in the inner cities
and culture in general that is making many "jobs", lifestyles,
and family situations "errors". This will be discussed
in later chapters.]
"... The idea behind an adaptive signal filter is simply to make a system that can adjust the way it filters noise from a signal. The filter ideally will adapt to the types of noise presented to it and learn to filter the signal, removing the noise and thus enhancing the signal. ... The adaline is of great importance in our study of neural networks because it is the first network we will look at that
learns through an iterative procedure.
Such learning procedures are more typical
of neural networks than the kind of single-pass algorithm we discovered
in the crossbar networks. The adaline's iterative learning procedure
is more similar to some types of animal learning than the crossbar
because new patterns are not instantly stored; instead they must
be presented a number of times before learning is complete."
[Iteration and recursion bring fractals into the picture.]
"The adaline also introduces a learning law that is one of the most important in the field of neural networks, the delta rule."
[Does this "rule" work with self organization? Is the
training pattern done by other holograms or CA?]
" One of the oldest neural networks,
the adaline has been around for more than a quarter-century. In
its simplest form, this network consists of a single neurode along
with its associated input interconnects and synapses. That single
neurode can learn to sort complex input patterns into two classes.
The adaline ... forms a weighted sum of all inputs, applies a threshold, and in this case outputs a +1 or -l signal as appropriate. It has one input and one modifiable synapse for every element in the expected input pattern. In addition, it has an extra input. We use this extra "mentor" input in the training process to tell the neurode what it is supposed to output for the current input pattern. We leave the weight of the mentor input at a constant value of 1.0. It does not contribute to the summed input unless the adaline is being taught, but when it is in use, we want the mentor signal to overwhelm the combined effect of all other inputs."
[There were many steps in the development of neural nets, but
I will skip the history and only include material pertinent to
my model and real brain / mind problems. I find this book very
well written even for non-mathematicians like me.]
Filter Associative Memories
"In the previous chapters we have seen
two kinds of associative memories, each of which can learn the
correct associations only when provided with the right response
during training. There are times when such a training technique
is adequate; however, there are also times when we need a system
that can learn by itself. A neural network with such a capability
is called a self-organizing system because during training the
network changes the weights on its interconnects to learn appropriate
associations, even though no right answers are provided. One of
the simplest networks with this characteristic is the competitive
filter associative memory, so called because the neurodes in the
network compete for the privilege of learning.
Why is self-organization so important?
In the early 1960s, researchers had naive
notions about the prerequisites for constructing intelligent systems.
Some expected that they could just randomly interconnect huge
numbers of neurodes and then turn them on and feed them data to
create automatically an intelligent mechanical brain. This is
nonsense. As we now know, our brains, even the brains of lizards
and snails, are highly organized and structured devices. Mere
random interconnection of elements will not work. And yet one
observation underlying those naive notions is certainly valid:
our brains can learn without being given the correct answer to
Certainly we sometimes need a tutor during
learning; this is one of the reasons we go to school. And learning
from a teacher or a book is often a more efficient means of mastering
cognitive tasks than simple discovery. But how did you learn to
move your arm or hand? How did you learn to walk? How did you
learn to focus your eyes and interpret visual stimuli to gain
an understanding of the physical reality around you? This kind
of learning clearly occurs in all of us, and yet there is no teacher
to tell us how to do it. Such learning is not taught in any traditional
sense. How does it happen?
Some of the most exciting research in neural
networks addresses this question: how is it possible for a neural
network (such as the brain) to
learn spontaneously, without benefit of a tutor?
In early days, many people postulated a little
man living inside the brain, called a homunculus. The idea was
that this little man acted as the decision maker/tutor/pilot for
learning. The reason for this invention was simply that no one
could envision a mechanism for learning that did not require some
kind of tutor to be available. Of course, this explanation is
not very helpful in the long run, because that means we still
have to explain how the little man knows what to do. (Does the
homunculus have a mini-homunculus resident in its brain for example?
If not, how does the homunculus learn what to do?)
In any event, we clearly need a learning system
that does not rely on predigested lessons with answers.
Self-organization and self organizing systems
have thus taken on an important role in the
search for biologically reasonable systems. Research into self
organization has generally been concentrated on two specific kinds
of networks, one relatively simple and one highly complex. In
this chapter we will address the simpler kind, which has been
intensively developed and investigated by Teuvo Kohonen. This
Kohonen network, as it is often called, is the competitive filter
associative memory, and we use these terms, as well as the descriptive
phrase Kohonen feature map, interchangeably in this book.
A Self-Organizing Architecture
The competitive filter network is exquisitely
simple in concept yet has some remarkable properties. In its usual
form, this network consists of three layers. The input layer consists
only of fan-out neurodes, which distribute the input pattern to
each neurode in the middle, competitive layer. The output layer
similarly receives the complete pattern of activity generated
in the middle-layer neurodes and processes it in some manner appropriate
to each particular application. Both of these layers are garden-variety
neurode layers, with little to distinguish them from other networks.
The interesting layer is the middle, competitive layer,
and we will concentrate on its operation.
Neurodes in this layer have connections to the input and output
layers and also strong connections to other neurodes within the
layer. We have not seen such intralayer connections before in
this book. Since they are central to competitive learning, it
is important that we understand their function before discussing
how the network learns.
We have previously considered one way of introducing
competition among the neurodes of a neural network. The crossbar
associative network, when implemented in hardware, uses feedback
competition to ensure that the correct neurodes become active.
In that system, the output pattern is fed back to the input during
The Kohonen network uses a different sort
of competition, called "lateral inhibition" or "lateral
competition." In this scheme, the neurodes in the competitive
layer have many connections to each other, as well as the usual
connections from the input layer and to the output layer. The
strengths of these intralayer connections are fixed rather than
modifiable, and are generally arranged so that a given neurode
is linked to nearby neurodes by excitatory connections and to
neurodes farther away by inhibitory connections. In other words,
when any given neurode fires, the excitatory connections to its
immediate neighbors tend to help them fire as well, and the inhibitory
connections to neurodes farther away try to keep those neurodes
from firing. All neurodes in the layer receive a complex mixture
of excitatory and inhibitory signals from input-layer neurodes
and from other competitive-layer neurodes. If properly designed,
however, the layer's activity will quickly stabilize so that only
a single neurode has a strong output; all others are suppressed.
This kind of connection scheme is also sometimes called an oncenter,
off-surround architecture, a term used for biological structures
that operate in the same way.
In lateral inhibition, an input is presented
to all the neurodes in the competitive layer. Some of these are
sufficiently excited that they try to generate output signals.
These output signals are sent to the other neurodes in the layer
through the intralayer connections, where they try to squash the
receiver's output (an inhibitory connection) or try to assist
it in firing (an excitatory connection). The result is that some
of these receiving neurodes that were on the verge of firing have
their activity suppressed. This strengthens the remaining neurode's
outputs since the suppressed neurodes are no longer inhibiting
their neighbors. Eventually one neurode's output will prove to
be the strongest of all; that one neurode transmits a signal to
the output layer for further processing. All other neurodes have
their output suppressed in this winner-take-all scheme. A very
real competition has occurred, with the strongest neurode winning
the competition and thus winning the right to output to the next
Several variations on this are possible. The number of neurodes that are excited and suppressed by the intralayer connections can vary, as can the values of the fixed excitatory and inhibitory weights. It is not necessary, for example, for all of these fixed weights to have the same value. Lateral inhibition has a number of subtleties of this sort that can make it reasonably complex to implement, but that are unimportant here. The point is that by using this scheme, we can enforce a system whereby the neurode with the strongest response to the input pattern is the single winner. Furthermore, we have a mechanism that makes this scheme work without having to call upon some outside mediator to decide upon a winner arbitrarily. The need for the homunculus has disappeared."
[Within the holographic mind the distribution of centers of information where each area contains the whole yet from a different viewpoint or location in phase space. This can be pictured as if the content of mind is within a room whose walls are holograms. We look at the contents from one direction which gives us a selective view: we can best see the opposite wall and space, but can't see what is directly on or near the wall where we are looking from. Using this metaphor, "animal" mind would have their viewpoint set by their species, and our animal mind or midbrain viewpoint would be set at birth. This implies that humans as a group can have multiple viewpoints which is the paradigm implicit in Astrology. Further the human brain or neocortex can take many positions including a view from the ceiling which sees the entire view.]
The network literally organizes
itself based only on the input patterns presented to it, so that
it models the distribution of input patterns.
"If we use a training set that is too small, we will not get a valid model of the input distribution, just as too small a population sample will give invalid survey results. Similarly we also need to choose carefully the input patterns we use for training to ensure that they are representative of the actual input pattern distribution."
[In humans, being exposed to a confined, reduced environment is abuse of deprivation: the neural nets do not form or self organize correctly. Yet when they are exposed unusual results can take place as illustrated by the story of the Buddha. He was isolated from suffering until as an adult he came to his own conclusions. There is evidence of retarded brain development in language areas when there is constriction of movements like crawling of infants. Science itself uses a method of control of input restriction which tests for relevance and only allows input that has a causative relationship. This method has great successes in material sciences, but in applying this method to humans and animals before self organization was even discovered, incorrect conclusions have been made. Since neural nets create their own models there is no exterior cause to be found and humans are not the effect of anything except insufficient or excess input - output.
The problem is of random distribution of levels of resolution in presentation of the number and type of patterns to human minds. This ensures variety and successful matching because every possibility is tried. The connection with the cosmos gives this random distribution of patterns and levels of resolution. Thus every individual is useful as it is and standardization thru human engineering defeats the structure of mind.
The connection with the cosmos directly applies to the scope -
width of the inner model vectors which have different levels of
resolution. Thus a data set can have different internal representational
sizes as well as different sizes in the input, inner representation
and output. For instance, the input vector may have two very broad
vectors and the representational map have 12 or 90 vectors or
visa-versa. There may also be fuzzy or merged vectors. Emotions
like fear and love fall in this category. New levels of resolution
may be established as happened during the 16th, 19th,
and 20th centuries. When applied to work context, it
implies a great increase in specialties and with the personality
an increase in fragmentation. It seems that these networks in
life forms other than human connect an entire ecosystem but that
humans in the neocortex established networks of networks or meta
structures, embodied in language, which is what I understand Astrology
and other "spiritual" systems modeled for thousands
of years. Now it is as if the fragments of human consciousness
have taken the size and intelligence of insects.]
"Because of the need to use many input patterns and because we keep the nudges small so the weight vectors stay normalized, we can expect the training of this network to be fairly slow. In fact, this network needs a great deal of careful thought when designing it for an application. We need to be concerned with exactly how to normalize the input and weight vectors. The simple normalization procedure discussed earlier may not be the most appropriate method for a given problem. We need also to consider how we should initialize the weight vectors before training. The simplest, and most obvious, solution is to place them randomly; however, this may not be the best solution."
[ Astrologically, this is random fractal, and is the solution that is used by life forms. Life uses solutions that work and by natural selection hopefully evolves or develops into "best solutions". But the major way things get stuck in a "working but not best" solution is that the structure of the networks are based on resonance, which makes change difficult in the operation of life. It is almost as if resonance must be defeated before progress can be made. Hence the revolution in 500 BC of focusing on the world as suffering and labeling its cause as forms of resonance called attachment and desire! This was further amplified in the Roman Christian and Moslem ideologies, and brought to a peak in the Protestant and Puritan anti-enjoyment attitude. Why should we not see that withdrawal of nurturing can lead to change. This direction also "suffers" from stuckness in resonance as evidenced by the first half of the 20th century and the competition of Fascism and communism to be harshest in their removal of competition and the resonance provided by religion! In a world where the modern, the new, and change is resonated, there is little room for tradition, especially in love and marriage. Thus the benefits of stability and being able to find ground states is replaced by agitation and dissatisfaction: suffering.]
"We have to assure ourselves, for example, that every neurode in the competitive layer is initialized so that it will have an opportunity to be the winner or at least to be the neighbor of a winner. Otherwise some of the neurodes may never participate in the training.
[So much or maybe all of what society does that we label good or bad has its origin in brain structure and is only elaborated in a projected form.]
These are called "dead vectors" when they occur, for obvious reasons. Appropriate initialization procedures can be quite tricky to implement. Although the network designer does not need to have a detailed mathematical form for the distribution of input vectors, he or she must have an understanding of the characteristics of the input data. Only with such an understanding can proper normalization and weight randomization be defined. And of course the training set must be carefully selected so that it accurately portrays the statistical characteristics of the overall data set. This in itself is nontrivial in many cases."
[Normalization is one of the major features
of connection with the cosmos where the intersections of the planes
of the orbits provide "built-in" normalization. But
there are 2 symmetric nodes and they drift! It will be my contention
that this drift is also reflected in cultural drift which is mislabeled
as history and progress.]
"There is still another important point about these networks: they are particularly useful in modeling the statistical characteristics of the input data, but the statistical models they create are only as accurate as the network size permits. A competitive layer of 100 neurodes produces a statistical model that is 10 times as detailed as one produced by a layer with only 10 neurodes but only a tenth as detailed as one from a layer with 1000 neurodes will be. The network will do its best to model the input data correctly, but the more neurodes it has available, the less area each weight vector must cover, and the more accurate the final trained network. For a perfect data set model, there would be one weight vector, or neurode, available for each possible input vector. A moment's thought reveals that this arrangement is not feasible. It is equivalent to saying that the ideal model of the input data set is the input data set itself. If we want a network capable of telling us something nontrivial about the data, we must use networks having fewer neurodes than the number of possible input vectors. For this reason, a Kohonen network will never be perfectly accurate."
[Here is the source of the problems of politics and the democratic solution versus the single ruler with divine resonance. The structure of divine resonance is that only one person is allowed to be fully connected with their nonverbal self and whole: therefore holy! The nonverbal self as a neural network still follows the structure of "Data in - rules out", but is not inhibited by resonance with the current rules, which to our inner self is anything within hundreds (or thousands) of years, which is the time necessary for cultural rules to be embodied in the preverbal "divine resonance"!]
"On the other hand, there are some interesting
possibilities for this kind of network. For example, suppose we
train a network with a collection of input patterns and after
training find that the weight vectors are clustered. We can then
replace these clusters of weight vectors with single supervectors
that serve to represent that cluster. As long as we keep the correct
ratios of weight vectors in each cluster, we can use as few replacement
supervectors as we like. For example, an initial set of weight
vectors might have 100 vectors in one cluster and 200 in another.
We can replace these with a single weight vector pointing to the
average location of the first cluster and two vectors pointing
to the average location of the second cluster. Since each weight
vector corresponds to a single neurode in the competitive layer,
this represents a dramatic reduction in the size of the network
needed for this application.
When we have done this clustering replacement of the network, we have a smaller, more efficient network that effectively performs a data compression on the original patterns. Furthermore, we are guaranteed that this data compression scheme is statistically meaningful relative to the input data patterns. These clusters correspond to feature vectors of the input data set, and the scheme that produces them is sometimes called vector quantization."
[This discussion relates to levels of resolution and language
as supervectors. There must be a level of resolution fine enough
to allow the emergence of supervectors, which I believe happens
in the neocortex where there is a 20,000 to 1 ramification of
connections between the midbrain and the neocortex. So how many
words are needed to represent a single feeling state of the midbrain?]
As new input patterns are represented, we can relate the new inputs to the old by specifying how far away the new ones are from the nearest feature vector. If we have stored these original feature vectors somewhere, we can store the new inputs by simply saving the differences between the input and the stored feature vectors. This may not sound difficult, but for vectors with many elements, such as might be found in digital images, transmitting only the differences between the current image and some standard feature image can result in enormous efficiency improvements.
[A excellent description of social language
processes and culturally integrated rules into the preverbal intelligence.
But these "enormous efficiency improvements" do not
include "drift", and thus become "enormous efficiency
inhibitors"! Thus the scientific method and the understanding
of the shift of paradigms is coming to grips with this built in
problem. But they have it operating on a highly impersonal - non-biological
plane of trying to isolate "Truth".]
The one application of these networks that
best illustrates their usefulness is the topology-preserving map,
studied extensively by Teuvo Kohonen. The easiest way to understand
what topology preserving map means is to consider an example of
how to create one. Imagine a sheet of paper and a robot arm, with
a pencil in the robot's hand. Let's assume that we can move the
hand to any location on the sheet of paper and have the robot
make a dot. Suppose we connect sensors to the arm and hand that
report back the position of the robot's arm as we make dots on
the paper. When we are done making dots on the paper (and we must
make many, many dots to make a valid statistical set), we have
a pattern on the paper giving the distribution of locations where
we placed the pencil point. Places that are very dark were visited
many times and thus had a high probability of occurrence. Places
that are still blank or contain few dots had a very low probability
of being visited. The coordinates of each dot define the input
vector for that dot. We use these vectors as input data to a Kohonen
network. The competitive layer of the network is laid out as a
two-dimensional grid, with connections between neighbors in rows
and columns. Imagine a grid like that found on ordinary graph
paper to understand the connections between the competitive layer
neurodes. Suppose we make 2000 dots on the surface of the paper,
feeding each dot's coordinates into the Kohonen network as training
data. Let's stop every 100 dots and make a note of the network's
weight vectors, for a snapshot of the state of the network at
these times. Now we take the snapshots and make a series of plots
of the weight vectors in the network. As we plot the positions,
we draw a line between the weight vectors of nearest-neighbor
neurodes, defining neighbor as a neurode that is only one column
or one row away in the layers grid. The plot we are making connects
the weight vectors of neurodes that are physically positioned
next to each other in the grid. It should be clear that there
is no particular reason that neighboring neurodes in the grid
should have weight vectors that point anywhere near each other.
Remember that initialization of the network deliberately scrambled
the weight vectors before we began, so we would expect the chart
we make to be a jumbled tangle of connecting lines. Figure 7.3a
shows that initially we have just such a tangled mess of lines.
(In the figure, we initially forced all the weight vectors to
be randomly located within the upper right quadrant.)
What will the snapshots of the network look like over time? The other sections of figure 7.3 show the weight vector chart after 100, 200, and 2000 data points have been passed through the network. In this case, about half of the input patterns came from points in the upper right quadrant of the circle, and the remaining input patterns were about evenly divided between the upper left and lower right quadrants. Notice that as the number of input patterns increases, there are fewer and fewer lines crossing the center of the circle and that the edges of the plot become closer and closer to an actual circle. This indicates that the physical ordering of the weight vectors over time becomes organized according to the characteristics of the input data. In other words, if a neurode's neighbor has a weight vector pointing in a particular direction, the neurode itself very likely has its weight vector point in a similar direction. The jumbled mass of lines is gone, replaced with an orderly mesh. It is as though the weight vectors form a stretchy fishnet that begins as a crumpled, tangled ball and tries to conform itself to the shape of the input pattern distribution, with more mesh intersections where input patterns are more likely and fewer mesh intersections where they are less likely. It turns out that no matter what the input pattern distribution is,
the network will organize itself so that the weight vector fishnet stretches and twists so that it makes a reasonably good mapping of the input pattern distribution.
[This is another reason why those life forms
that successfully modeled and tracked the cosmos for use in initialization,
and normalization as well as resonance became successful. The
use of cycles of the cosmos self organizes into models of the
cosmos that track the cosmos!]
Furthermore, we can experiment and connect the neurodes in the competitive layer in a simple linear array instead of a grid, with each one connected only to the neurodes before and after it in the line. If we do this, the weight vectors behave as if they were a ball of twine, and their distribution after training becomes like a string twisting along the input vector pattern distribution. These plots are topology-preserving maps because the topology, or shape, of the distribution of the input patterns in coordinate space is preserved in the physical organization of the weight vectors. Topology-preserving maps do not necessarily have to map physical locations. They can map frequencies, for example. A common name for a topology-preserving map when the input data corresponds to sound frequencies is a tonotopic map. In this case, the map represents an ascending or descending set of frequencies, and the neurodes are sensitive to a graduated scale of frequencies. In other words, the weight vectors of neighboring neurodes point to neighboring frequency inputs.
Because the robot arm example is truly a plot
of spatial distributions, we call it a geotopic map. This may
sound like a somewhat wild-eyed, and perhaps even useless, trait,
but in fact topology preserving maps exist in animals. It is known,
for instance, that certain structures in the brain that form part
of the auditory system are physically organized by the acoustic
pitch or frequencies they respond to. Quite literally, tonotopic
maps exist in the brain for sound inputs. In addition, there appear
to be other such abstract maps existing in the brain for such
purposes as geographic-location mapping, such as retinotopic maps
in vision and somatotopic maps in the sense of touch. For example,
rats that have been trained in a maze have certain spatially ordered
brain cells that fire when they are in a particular location in
the maze. Such spatial ordering cannot possibly have existed in
the animal before training unless we argue that it is that rat's
destiny to learn that particular maze. Some mechanism must exist
that allows the physical structure of the brain to modify during
learning so that the neurons order themselves according to the
layout of the maze. While competitive learning may not be exactly
correct as a mechanism for this process, it certainly offers an
elegant, simple model of how this might occur.
Why does the competitive filter network preserve input data topology?
The fishnet analogy is quite apropos. As Kohonen has described, there are two forces working on the weight vectors. First, the vectors are trying to model the probability distribution function of the input data. Second, their interconnections are also trying to form into a continuous surface because of the synaptic links between each neurode and its neighbors. These different forces establish the model of the input data that we have seen. In other words, when each winning neurode adjusts its weight vector in the direction of an input vector, it pulls its neighbor's weight vectors along with it. Therefore, after training, the weight vectors have formed a more or less continuous surface. Finally, the continuity of the maps means that the trained network has the ability to generalize from its specific experiences to process novel data patterns.
... Let's now examine more complex learning
systems, ones that are even more directly modeled after biological
systems. In the next part we explore some of these biologically
based learning systems.
Application: The Voice Typewriter
... accuracy is only marginally adequate when working in the large vocabulary, speaker-independent mode.
[This book was written in 1990 and much has
happened in this field since!]
[Learning is of grave concern to our present
culture. We have implemented systems in schools based on classical
conditioning and behavior modification research. This research
started with birds and animals without an understanding of even
what the possible differences are between humans with a shared
mind that can be supervised, and animals with only self organization.
In effect science imposed an unexamined bias onto animal intelligence.
The structure of this bias emerged from European child rearing
practices which is big on reward and punishment that doesn't exist
in the animal kingdom. Since the learning of rational knowledge
is a subset or lower dimension that holographic mind and holoprocesses,
humans have projected this "lower intelligence" onto
animals, children and women. So I will use much material from
the learning chapters of this book and hope to correlate this
with social and personal problems around learning and mental health.
The correlation for the cosmos of all this material is still hypothetical
but fruitful in opening to the possible ways life has connected
itself to the cosmos. By studying this one will not predict any
Fate or Future, but open to the way cultural practices has set
humans up to believe in fate and predestination by mere association
of supposed laws of the influence of the "Stars" as
in "It is written in one's stars". Our neural networks
like structures and rules and can set up shared expectations as
if they were laws, when in fact they are self fulfilling prophesies.
Self fulfilling and self reference expectations are like computer
virus that eat away our natural defense of testing and trying
new alternatives and being self organizing.]
A learned blockhead is a greater blockhead
than an ignorant one. Ben Franklin
Learning without thought is useless; thought
without Learning is dangerous. Confucius
Neural networks are trained, not programmed;
they learn. We have already seen two distinctly different types
of learning in the adaline and the Kohonen feature map. The subject
of learning and memory in artificial systems is so important,
however, that we need to consider it in a more structured manner.
We will start this introduction to learning
in artificial systems by looking at the ways animals and people
learn and the kinds of memory that have been identified in humans
by psychologists. We will then be in a position to relate these
to the ways neural networks learn and remember. Finally, we will
adopt a more operational view and discuss the major methods for
training artificial systems.
[This material is covered by many sources of from traditional academic experimental psychology. I do not agree with the epistemology or ontology of this direction. I conceptualize the results of these areas of human endeavor as projections that have more application and support for the politics of fascism and other forms of repression which prove that intelligence is instrumental and mechanical. This neocortex inhibition projection asserts that intelligence is not part of a mechanical universe, but something to be manipulated. We are seeing the consequences of this manipulation in the violence of our present school systems. Every person that I have met and questioned who is attending school has reported and confirmed the awareness of these major abuses to their true nature!
I will skip to the material on instar and
outstar which can begin to model holographic processes. The material
input can be transformed into light and dark patterns which essentially
store information as interference patterns or convolution spaces.]
Types of Learning
Learning has taken place-in an animal or in a neural network when there is some lasting behavioral change or increase in knowledge resulting from an experience. For our purposes, we can break learning in animals and humans into three broad classes: instrumental conditioning, classical conditioning, and observational learning.
[I have presented much material that I do
not accept as accurate model of how the holoprocesses operate,
but do accept as "rational cultural" processes that
can be changed and reframed. This includes assumptions about the
need to change synaptic weights in order for learning to take
place as well as classical conditioning and behavior modification.
With state of the art theories I can agree that science doesn't
"know any better", but I am using a different model
of shared holomind, fractals etc. which sees the use of neural
net connectivity applied to holoprocesses.]
Learning models in neural networks are rules or procedures that tell a neurode how to modify its synaptic weights in response to stimuli.
Training a Neural Network
Those seeking a new neural network design often adopt the ideas of biologists or psychologists.
The neohebbian model accounts for the fact that biological systems not only learn but also forget.
Differential Hebbian Learning
In our discussion of simple hebbian learning,
we had to introduce two features found in biological systems that
are necessary for proper operation of a neural network: the possibility
of both decreasing and increasing weights during learning and
the presence of inhibitory as well as excitatory synapses. ...
In mathematical terms, the expression "rate of change"
refers to the derivative of a neurode's output with respect to
[I am not concerned with models of Classical
conditioning because I surmise that it is a political use of science
to justify what we now know to be dysfunctional social practices
supported by Christian traditions and models of the universe based
on Heaven and Hell. But as is often done in science, new discoveries
may be developed that are misapplied. This does not change the
mathematical correctness of the model. I see these models of animal
behavior working in a "schizoid" context of a laboratory
setting or in a human environment which is invested in splitting
off our "animal" intelligence and thus our connection
with holoprocess in favor of cultural values producing rational
behavior devoid of natural fractal complexity and self organization.
In fact, in the myth of the Garden of Eden, self organization
is represented as rebellion against God inspired by the reptilian
Let's begin our discussion of the outstar
by looking at the neurode from a new, geometric perspective. We
describe here only those minimal characteristics needed to understand
outstar learning. In Grossberg's work the concepts of instar and
outstar imply much more than the simple physical structure we
outline. We know that each neurode in a neural network receives
input from hundreds or thousands of other neurodes. Thus, each
neurode is at the focus of a vast array of other neurodes feeding
signals to it. In three dimensions, this construct resembles a
many-pointed star with all its radii directed inward. Stephen
Grossberg terms this an "instar." From another, equally
valid point of view, each neurode is a hub from which signals
fan out to a vast array of other neurodes, since each neurode
sends its output to hundreds or thousands of others. Grossberg,
reasonably enough, calls this an "outstar."
Every neurode in any neural network is, at
the same time, both the focus of an instar and the hub of an outstar.
Thus, a neural network can be viewed as a highly complex, interwoven
mesh of these structures, with the inwardly feeding inputs of
each instar arising from the outwardly directed signals of other
outstars. In a properly designed network, this complicated arrangement
does not result in chaos. In fact, it is precisely this complex
mesh in which the ever-changing activity takes place that generates
the behavior characteristic of neural networks.
So far, we have not mentioned the synapses
that we know lie at the end of every interconnect in both the
instar and outstar. In an instar, the synapses form a tight cluster
about the input end of the focus neurode. In an outstar, there
is a synapse where each interconnect terminates at one of the
outer, or "border," neurodes. If we could in some way
make the weights on these synapses visible during learning, we
would see a beehive of activity, with some weights tending upward,
others tending downward, and yet others staying nearly constant.
We can use this instar-outstar concept to
understand how a neural network can learn complex patterns. First
let's consider how a network of instars and outstars might learn
a static spatial pattern, one that does not change in time.
Let's build a small network that consists
of a single neurode, acting as an outstar, connected to an array
of neurodes that act as instars. For this network, we need to
use only two inputs on each instar neurode: one from the outstar
neurode that has an adjustable weight and a training input that
has a fixed synapse weight of 1.0.
Imagine that we cluster the instar neurodes
together into a two-dimensional grid, similar to the pixel grid
that makes up the image on a computer monitor. (A pixel is the
smallest element of light or dark a monitor can display.) So we
can more easily visualize the operation of the outstar, we assume
that we have arranged a way to make the output of each neurode
visible. We see a tiny spot of light proportional to the output
of each grid neurode. Thus whenever a grid neurode has an output
near 1, we get a very bright spot on the grid at that neurode's
position, and whenever a neurode has an output near zero, little
light is emitted. In between these limits, the light output varies
with the neurode's output. If we use enough neurodes in the instar
grid, it will be able to produce an image much like that appearing
on a computer screen. Finally, we will place a threshold on each
of the incoming signals from the outstar neurode so that only
stimuli that are at least as strong as the threshold value will
be perceived by the grid neurodes; any smaller stimuli will be
ignored. This threshold will suppress random noise firings in
[Instead of arriving at a picture that duplicates the input in space, an interference pattern is constructed that is independent of space, otherwise we are starting with elements that are dependent on exact size, distance and other spacial characteristics that are not relevant]
[Drive reinforcement is one of the best examples
of total dysfunction that creates a belief and grasping of cause
and effect as unalterable real processes. Yet it is stated that
the process I am investigating is how living systems generate
various networks that can produce subsets of holoprocesses. We
produce hierarchies of processing the results of other processes
until we have lost sight of our original holoprocess and shared
brain (One Mind). This is a theme over and over again in religions
and spiritual teachers of which I am a modern example. My spiritual
insight emerges from and is informed directly by the holoprocess
and millions of years of animal intelligence which has changed
little over the last mere 5 thousand years of recorded spiritual
leaders. The difference is that I have the resources of science,
but the task of reframing both modern science and religious traditions
is the same as other "reformers". Now reform is built
into the system and expected and is reframed as progress. Isn't
this one of the spiritual blessings of the age we live in?]
Even the drive-reinforcement network, however, still looks at the world only one step at a time. If we truly want naturally intelligent systems, we must do more than process a series of single moments; we must deal with continuity of actions and events. We must be able to handle patterns that change in time rather than just patterns that are static; in the next chapter we look at some ways neural networks can handle such sequences of patterns.
[The following excerpt is concerned with learning
sequences and in its simple goal, to learning alphabets. I see
this as working with fractals and dimensions of fractals by making
dependencies of needing the presence of predefined "moves"
before continuing the calculation. Thus I see this as a process
of computational iterated procedures rather than applied to learning
a data. It is like a combination lock: it doesn't open until all
conditions are met. When applying this to the connection with
the cosmos, I hypothesize a topology of time: the next state of
a computation may be completed in the next moment or in a month
or year. And it may be completed without the "question"
having been asked or alerting the "self" [beyond awareness]
that anything happened so that the answer seems to appear suddenly
in a totally unrelated context, like a dream or preparing food.
Many such occurrences are recorded in Buddhist literature, and
in fact the Zen training such as Koan develops just such awareness.]
pg. 141-149 There is nothing permanent except
Learning Sequences of Patterns
The Music Box Associative Memory
... The "music box" name for this
technique derives from its similarity to the frozen sequence of
notes that a music box plays. The sequence can be repeated indefinitely,
and there is no variation or alteration; each replay is exactly
the same as the one before. This style of associative memory operation
is also called the "tape recorder" mode, for obvious
We can make a system that recalls sequences of patterns with a much simpler layout, however. In fact, Stephen Grossberg has shown that a single outstar neurode can initiate and sustain a lengthy spatiotemporal sequence. ...
The Outstar Avalanche
We can use the outstar learning model to build a network that can learn pattern sequences by making one basic change to the model: we must use neurodes that are slow to lose their activity once they become activated.
... For the network, we use an input, middle,
and output layer. The input layer provides fan-out. It transmits
every element of the initiating or "trigger" pattern
to every middle-layer neurode and starts the replay of the sequence
of letters. Thus, each input neurode is the hub of an outstar.
The output layer comprises the grid that displays
the appropriate letters. Its purpose is thus the same as the corresponding
layer of the static outstar network. Its structure is also the
same: each grid neurode is the focus of an instar beginning on
neurodes in the previous layer. In the static network, there was
only a single neurode in the previous layer; in this case, however,
there are many. Also as in the previous network, these output
neurodes each have an additional external input used during training
to impress the pattern for each letter on the output grid.
The middle layer is the most complex. These
neurodes receive inputs from the input layer and from some number
of other neurodes within the middle layer itself. We will be a
bit vague about how many other neurodes each middle layer unit
connects to, but it should be some small number, say between 1
and 10. These intralayer connections are essential for the correct
operation of the network.
Now we need only two more things before describing the way a trained system works. First, although the operation of the avalanche network is continuous, it will be helpful in our description if we break time into short intervals as with the crossbar network. Second, we need to indicate the size of the activation decay constant of the input- and middle-layer neurodes. ...
Operation of the trained avalanche is exquisitely
simple in concept. The neurodes in the middle, avalanche layer
are trained to fire only if they receive a stimulus from the currently
active neurode and if the previously firing neurodes in the sequence
are still at least partially active. Each succeeding neurode can
thus be triggered only if the correct combination of stimuli is
received at the correct time. For example, if the correct sequence
of neurode firings in the middle layer is 1, then 5, then 3, then
6, the network is set so that neurode 3 will not fire unless it
sees stimulating activity from neurode 5 and at least partial
activity from neurode 1. Neurode 6 will not fire unless it sees
stimulating activity from neurode 3 and partial activity from
neurode 5, and so on. The intralayer connections enforce the temporal
relationships between the avalanche layer neurodes.
There are a number of points to note about
the operation of this network. For instance, if the middle-layer
neurodes are excited in the wrong order or accidentally stimulated
with noise, any resulting spurious activity in the layer soon
dies out, and the process continues as if nothing happened. Also,
for complex patterns, we can require more than one neurode to
be active for the pattern to continue. We can also arrange for
operator interaction. At any point in the process, for instance,
we can require a reinforcing command from the input layer in order
for the recollection to continue; thus, for instance, a single
input prod will not necessarily cause the network to run through
the entire alphabet. Finally, we can store many sequences with
a relatively small number of neurodes since it is their temporal
relationship during stimulation that determines whether or not
[Time dimension fractals: The length or scale
of the line is the duration and the interruptions and direction
are the actions? Here are I Ching ground states and transitions?]
These are fundamental properties of voluntary behavior; we can stop and start such behaviors at will.
Recognizing Sequences of Patterns
on to self organization or as it is called "Buddha Mind".
I "cannot" resist, actually I don't want to, including
an excerpt from a very well known Buddhist teacher of hundreds
of years ago: Naropa. "Tilopa sang this song of his oral
instructions in which the meaning of supreme goal-realization
Naropa, you are a worthy vessel: In the lamasery
of Pullahari In the spacious sphere of radiant light, ineffable,
The little bird of mind as transference has risen high by its
wings of coincidence. Dismiss the craving of belief in an ego.
In the lamasery of non-dual transcending awareness,
In the offering pit of the apparitional body By the fire of awareness
deriving from the bliss and heat of mystic warmth The fuel of
evil tendencies of normal forms of body, speech, and mind has
been consumed; The fuel of dream tendencies has been burnt up.
Dismiss the craving for the duality of this and that.
In the lamasery of the ineffable, The sharp
knife of intuitive understanding Of Great Bliss, of Mahamudra,
has cut the rope of jealousy in the intermediate state. Dismiss
the craving that causes all attachment.
Walk the hidden path of the Wish-Fulfilling
Gem Leading to the realm of the heavenly tree, the changeless.
Untie the tongues of mutes. Stop the stream of Samsara, of belief
in an ego. Recognize your very nature as a mother knows her child.
This is transcendent awareness cognizant in itself, Beyond the path of speech, the object of no thought. I, Tilopa, have nothing at which to point. Know this as pointing in itself to itself.
Do not imagine, think, deliberate, Meditate,
act, but be at rest.' With an object do not be concerned. Spirituality,
self-existing, radiant, in which there is no memory to upset you
cannot be called a thing.
Naropa then said that action which is free from all bias had been fully understood.
Naropa had imbibed all the qualities that
were to be found in the treasure-house of Tilopa's mind. He had
realized the twelfth spiritual level and he expressed his intuitive
understanding in the words:
One need not ask when one has seen the actuality, The mind beyond all thought, ineffable, unveiled; This yoga, immaculate and self-risen, in itself is free. Through the Guru's grace highest realization has been won, One's own and others' interests fulfilled. Thus it is."
I will leave it to the reader to mull this
over until later chapters where I can fully explain what is being
discussed. Samsara is the word used to point to our so called
consciousness of reality conditioned by our cultural and personal
bias. It is not an "evil" state, but is awareness and
programs of daily life. It is what you are experiencing at this
very moment as visually reading, the visual and other sensual
surroundings as well as your internal voices. But since it is
derived from or a subset of a holoprocess it is also nirvana.]
Who learns by Finding Out has sevenfold The
Skill of him who learns by Being Told. --Guiterman
We have discussed several important learning
models in this part. Let's now step back and take one last, broader
look at autonomous learning systems. We will distinguish autonomous
learning from the more general unsupervised learning of, say,
the competitive filter associative memory by the following characteristic:
an ordinary unsupervised learning system learns every input pattern,
whether or not it is important; the only way to prevent an input
pattern from being learned is temporarily to disable-turn off-learning.
An autonomous system, on the other hand, can learn selectively;
it learns only "important" input patterns. As a result,
learning can be enabled-left on-at all times.
Characteristics of Autonomous Learning
The competitive filter associative memory
is capable of ordinary unsupervised learning; for example, it
can learn the statistical Properties of its input data set without
a tutor. But we must provide the network with a carefully controlled
schooling experience for it to learn correctly. For instance,
we must arrange for the learning data set to be a balanced and
rich representation of the real-world dare we expect the network
to experience in operation. build into a system What are the characteristics
we would like to the following, capable of truly autonomous learning?
We suggest based on a list originally set forth by Gail Carpenter
and Stephen Grossberg.
1. The system functions as an autonomous associative
memory; it organizes its knowledge into associated categories
with no help from us and reliably retrieves related information
from partial or even garbled input cues.
2. It responds rapidly when asked to recall
information or to recognize some input pattern it has already
learned. This means that the system utilizes parallel architecture
and parallel search techniques.
3. Since it must function in the real world,
the system learns and recalls arbitrarily complex input patterns.
Further, it places no mathematical restrictions on the form of
those patterns. A mathematician would say that the input does
not need to be orthogonal or linearly separable, for instance.
4. The system learns constantly but learns
only significant information, and it does not have to be told
what information is significant.
5. New knowledge does not cover or destroy
information that the system has already learned.
6. It automatically learns more detail in
a particular associative category if feedback information indicates
that this is necessary. The autonomous system may suddenly begin
treating as significant some input that it had previously been
ignoring, or vice versa.
7. It reorganizes its associative categories
if new knowledge indicates that its present ones are inefficient
8. It can generalize from specific examples
to general categories.
9. Its storage capacity is essentially unlimited.
These are ambitious requirements, but some
of the neural network designs we will describe display almost
all of them. Let's explore each requirement in more detail to
see what it might mean in the operation of an autonomous system.
Neural net theories are continued here!