PGENESIS Reference Manual
Other documentation that may be of interest:
- Latest serial GENESIS Reference Manual
- Shell script used to run PGENESIS
- Parallel routines: paron, raddmsg, rshowmsg, rvolumeconnect, rvolumedelay, rvolumeweight, @, async, waiton, barrier, barrierall
- Parallel I/O Issues
- Scheduler considerations
- pitfalls and holes
Startup
To use any of the capabilities of the parallel library, one must first start it up. This will also spawn the requested number of worker nodes on architectures that support process-spawning.
The paron command starts up the parallel library.
There are several commands for obtaining configuration information:
- mynode
- number of this node in this zone
- nnodes
- number of nodes in this zone
- myzone
- number of this node's zone
- nzones
- number of zones
- ntotalnodes
- number of nodes in all zones
- mytotalnode
- unique number over all zones for this node
- mypvmid
- task identifier used by PVM for this node
- npvmcpu
- number of cpus used by PVM in the parallel machine
The ability to run parallel threads can be turned on or off (default is on).
- threadson
- Re-enables parallelism.
- threadsoff
- Disables parallelism.
- clearthreads
- Clears all pending parallel setup commands or remote procedure calls
- clearthread
- Clears at most one pending parallel setup command or remote procedure call
Adding Messages
It is possible to create arbitrary messages between elements on different nodes using the raddmsg command:
- raddmsg
- Adds message between the listed sources elements and the listed destination elements (which may be designated to be on other nodes by means of the '@' notation).
The following routine displays inter-node messages correctly (and suppresses the display of the postmaster messages used to implement the inter-node messages).
- rshowmsg
- Connects one group of elements in a volume to another, using source and destination element lists and masks.
Synaptic Connections
There are several routines which allow one to set up multiple synaptic connections across nodes. They are analogues of the regular GENESIS routines for setting up synapses.
- rvolumeconnect
- Connects one group of elements in a volume to another, using source and destination element lists and masks.
- rvolumedelay
- Sets delays of a group of synapses receiving input from a list of presynaptic elements in a volume.
- rvolumeweight
- Sets weights of a group of synapses receiving input from a list of presynaptic elements in a volume.
Remote Command Execution and Synchronization
- command@nodelist
- Executes command on specified nodes synchronously (i.e., does not return until remote commands have completed and returned result)
- async command@nodelist
- Executes command on specified nodes asynchronously (i.e., returns without waiting for result)
- waiton
- Wait for completion of a specified async command, or wait for completion of all async commands.
- barrier
- Wait for all nodes in my zone to reach this point.
- barrierall
- Wait for all nodes in alls zones to reach this point.
Scheduler Considerations
If you use a custom .psimrc file, it should include a pschedule command so that the PGENESIS-specific scheduling policy is used. If you intend to use a custom scheduling policy, then it should contain the line:
addtask Simulate /##[CLASS=postmaster] -action PROCESS
before any other PROCESS actions. This is needed so that the postmaster objects can perform their message transfers before any other process actions modify the simulation state for that simulation step.
Pitfalls: Unsupported and Dangerous Operations
It is extremely easy to reach deadlock in parallel programs. One way to reduce the chances of this is frequent use of barriers and sparse use of asynchronous commands. However, barriers are expensive to execute and can reduce parallelism, so they should be placed judiciously in scripts.
The serial GENESIS stop command should be used only with extreme care in zones containing more than one node. PGENESIS executes an implicit barrier before performing a simulation step. If any nodes enter the barrier then all nodes must; otherwise, deadlock will result. It is very difficult to satisfy this requirement when the stop command is issued.
Issuing step commands must be done with care. Since the step command executes an implicit barrier, failure to follow the following rule can result in deadlock. The two safe methods to issue step commands are:
- step commands are issued exclusively locally (i.e., no use of the @ operator with step)
- remote simulation step commands (e.g., step@all) be issued by at most one node in a zone.
Holes in the documentation
- Description of the user-accessible fields of the postmaster, including sync_before_step