The "trick" of the CAM-6 was to encode the rules as a lookup table accessed
by an "address" formed of the states of the neighborhood. For example, one bit
states of cell plus eight neighboors is a 512 possible results. There was also a
"rule compiler" that built the transition table and other computations from a
special programming language.
You might also want to look at [HTA92]
which describes a 3D automata engine, computation is performed through the
reconfiguration (at a very low level) of the hardware.
Are there any simulators for CAM?
Contributions by:
Contributions by: Don Hopkins <hopkins@turing.ac.uk>
I (Don Hopkins) have written a recreational CAM-6 simulator (Toffoli &
Margolus's Cellular Automata Machine) and ported it to HyperLook (the user
interface development system I'm working on at Turing). It displays animated
cellular automata that you can edit in real time with the mouse! And it comes
with a free Lava Lamp!
The Cellular Automata Machine simulator (a SPARC binary with a bunch of
PostScript and data files) and the HyperLook runtime system are now available
for anonymous ftp! HyperLook and the CAM simulator run under OpenWindows Version
3 on color SPARC workstations.
See also CAM-PC and CAMEX.
What about running CA's on parallel or distributed machines?
Contributions by:
Contributions by: Don Hopkins <hopkins@turing.ac.uk>
Yes, CA are pretty trivial on massively parallel computers. Especially if you
have data parallel software (e.g. CM-Fortran or C* on the Connection Machine),
then you just create a virtual processor for each cell, and then have each cell
fetch its neighbors and do a table-lookup or computation.
If you are not using data parallel software, but instead are using
message-passing, it is still pretty simple. I typically partition the 2-D array
into a set of ``stripes''. E.g. on a 128-node machine, if I want to run a
1024x1024 array, I give each processor a 128x8 patch of cells (plus one extra
row of boundary condition at the top and bottom, so actually 128x10 with some
redundancy). Each processor updates its local array, and then exchanges its top
and bottom row with its neighbors. So you alternate between a step of
computation where you loop over your patch of cells (lots of work if you have a
big patch), and doing 2 sends and 2 receives (hopefully pretty quick). Imagine
the processors are connected as a ring; you don't need any more connectivity
than that (although it's good to have some nice way to dump data out to disk or
a machine for analysis).
Partitioning the array into stripes minimizes the ``surface area'' of the
cell patches, and so minimizes the communication you have to do (if you
partitioned it as a ``checkerboard,'' you'd have more data to exchange with more
neighbors). It also makes your inner loops more efficient, because you have
really wide rows to loop over, instead of a bunch of short rows. It also makes
each boundary a contiguous block of memory, so it's easy to send to its
neighbor.
If you have a high overhead for sending, you may want to consider having 2
boundary-rows, and doing a little bit of redundant computation, so that you only
do communication every other step.
In fact, I implemented a CA simulation using a network of Sun workstations
using the above layout, and BSD sockets. Using 16 Suns, I think I had CA code
running about 10-12 times faster than a single Sun. This was a few years ago on
Sun 3/50's at RPI. I had grand visions of turning the whole campus into a
monster CA simulation environment, but shortly after that, I got access to a
CM-2 and forgot about the Suns. :-) Actually, there's no need to stay on Suns
only - you could have some other machines in there as well, as long as they can
do socket communication to exchange data with the neighbors.
N. M Allinson and M. J Sales. CART - A cellular automata research tool.
Microprocessors and microsystems, 16(8):4093, 1992. The authors
discuss the principles of cellular automata and present a detailed design of
an expandable machine together with an example maze-solving program.
N. Howard, R. Taylor, and N. Allinson. The design and implementation of a
massively-parallel fuzzy architecture. Proc. IEEE, pages 545-552,
March 1992.