Particle Re-distribution for Load Balancing on the Cray T3D


Xin Yuan, Bohr He and Rami Melhem
Computer Science Department
University of Pittsburgh

PROCEDURE NAME AND PARAMETERS:

subroutine bisect(xmin, xmax, numbin, extbin, npartpemax, nattri, nattrf, x, attri, attrf, npartpe, nghostpe, xminpe, xmaxpe, binminpe, binmaxpe, bal_flag, bound_flag, usercut_flag, cutdimension, iwork, rwork)

INTRODUCTION

bisect() distributes particles among p PEs according to different criteria (described below) by using nested bi-section, bin sorting and domain extention. Particles are specified by their positions within a rectangular parallelepiped physical domain, and each particle has a certain number of attributes. One of these attributes may be a "weight" associated with that particle. The routine imposes on the domain a 3-dimensional grid of bins whose size is specified by the user. It, then, bisects the domain at bin boundaries into parallelepiped subdomains. Each cut doubles the number of subdomains. After log p cuts, the original physical domain is decomposed into p parallelepipled subdomains with the number of particles or the sum of weights of the particles in each subdomain being balanced. Each subdomain is assigned to one PE along with the particles within the subdomain, and the particles within a user specified extention of the subdomain.

HIGH LEVEL I/O SPECIFICATION

Inputs

Each PE has npartpe particles, where npartpe may differ from one PE to the next. The positions of the particles in the domain are stored in a local array x(npartpemax,3) along with other integer and floating point attributes that are stored in local arrays attri(npartpemax,nattri) and attrf(npartpemax,nattrf), respectively.

Outputs

The physical space is divided up into subdomains each of which belongs to one processor. Thus aside from the external boundaries associated with the boundaries of the physical domain, there are internal boundaries associated with the subdomain assigned to a PE. The positions and attributes of the particles in that subdomain (corresponding to a single PE), are stored in the arrays x, attri and attrf. Moreover, by assigning a non-zero value to the input parameter extbin, the arrays x, attri and attrf will contain, in addition to the particles in the subdomain, replicated copies of the particles from the original domain that lie within extbin bins outside the subdomain boundaries. These replicated particles, called "ghost particles", may be needed to account for the interaction between subdomains. The number of ghost particles is returned in nghostpe. This provides the user with enough flexibility to avoid future off-processor communication to access these ghost particles.

INTERFACE

The following parameters are passed to bisect. The calling procedure has to set any variable marked *in* or *inout* to appropriate values before calling bisect(). The variables marked *out* and *inout* are set or modified, by bisect(). The variables marked *work* are work arrays used internally by bisect().

Problem domain definition

xmin(3), xmax(3)
domain boundaries for the entire physical problem.*in*
numbin(3)
number of bins in each dimension. *in*

Particle parameters

npartpe
number of particles in the local array x at the calling PE. *inout*
nghostpe
number of ghosts in x following the regular particles. *out*
npartpemax
maximum number of particles allowed per PE. The subroutine writes out warnings if, at any point of the bisection step, npartpe + nghostpe exceeds npartpemax. *in*
nattri
number of integer attributes per particle. Can be set by the user at run time.*in*
nattrf
number of floating point attributes per particle. Can be set by the user at run time.*in*

Particle arrays - modified by bisect()

x(npartpemax,3)
array containing the positions of the particles. *inout*
attri(npartpemax,nattri)
array containing integer attributes. *inout*
attrf(npartpemax,nattrf)
array containing floating point attributes. *inout*

When Bisect returns, the first npartpe positions of the arrays -- x, attri and attrf on a given processor contain information about the particles assigned to that processor. The next nghostpe positions contain information about the ghosts duplicated on that processor.

Distribution specification

bal_flag
specifies the criteria used to obtain the subdomains *in*
0
balance the number of of particles per subdomain
1
uniform bisection (all subdomains have roughly the same volume - always bisect into equal halves). The particles are swapped between processors so that each processor has the particles that correspond to the local partition of a shared array that is distributed in
(:block(numbin(1)), :block(numbin(2)),:block(numbin(3)))
fashion across the PEs.
2
balance a weighted sum of the particles per subdomain. The weight of each particle, i, is taken from the first array in attrf(). That is attrf(i,1). In typical use, the weight of a particle will be chosen to be proportional to the execution time needed to process that particle.
extbin
number of ghost bins used to extend the subdomains at cut boundaries. It is the same in each bisected direction. *in*
bound_flag
specifies the treatment of ghost particles at the domain boundaries. *in*
0
no outside boundary conditions. Only particles close to interior boundaries, edges or corners are replicated.
1
periodic boundary conditions. Particles close to interior boundaries, edges or corners are still replicated as before. Furthermore, a particle within extbin bins of a boundary in the physical domain is replicated as a ghost in the processor which owns the opposite boundary. The coordinates (position) of the replicated ghost particle are kept equal to the coordinates of the original particle.
2
same as in bound_flag = 1, except that The coordinates (position) of the replicated ghost particle are changed to reflect its new position in the extended domain.
usercut_flag
specifies if the cuts are performed along dimensions 1, 2 and 3 in a round robin fashion or as specified by the user in the array cutdimension(). *in*
0
cuts performed automatically in round robin
1
cuts performed along user specified dimensions
cutdimension(20)
user specified cut dimensions when usercut_flag is set to 1. The dimension of 20 implies that at most 20 bisection steps can be made, and thus at most one million PE can be used *in*

Subdomain specification

xminpe (3)
Floating point variables that give the lower boundaries of the subdomain owned by the current processor. This represents the "inner" subdomain, i.e. without the ghost zones on the boundaries. *out*
xmaxpe (3)
Similar to "xminpe" above. The upper boundaries of the inner subdomain. *out*
binminpe (3)
Integer variables that give the lower bin number in each dimension of the subdomain owned by the current processor. *out*
binmaxpe (3)
Similarly for the upper bin number. *out*

Work arrays

rwork(npartpemax,2)
A floating point work array. *work*
iwork(npartpemax,3)
An integer work array. *work*

Particle Re-distribution for Load Balancing on the Cray T3D / Xin Yuan, Bohr He and Rami Melhem, Computer Science Department, University of Pittsburgh / melhem@cs.pitt.edu