PaRSI
Parallel Computation of Rotor-Stator Interaction

Paul Cizmas, Westinghouse Electric Corporation, and Ravishankar Subramanya, Pittsburgh Supercomputing Center


The paper Parallel Computation of Rotor-Stator Interaction, which explains this work in detail, is available in postscript format. (Please note that this paper is gzip'd.)

Overview

We have implemented a parallel approach to simulating unsteady flows in turbo-machinery. This is the optimal way to model rotor-stator interactions since it provides us with a means of capturing the flow non-linearities. A time marching approach is adopted with the full Navier-Stokes equation being solved at each time step. The simulations provide results that enable turbine geometry design optimization.

Motivation

Sequential codes are too slow and expensive. For the simulation to have a significant impact on the design process, the designer should have the flexibility of experimenting with different blade geometries and have results turn-around in a matter of days rather than weeks. Simulation of the rotor-stator interaction for a typical geometry, using a sequential code may run for a month (or more) on a Cray-C90. Parallelization reduces the turnaround by an order of magnitude.

Method

A data-parallel paradigm was chosen as the optimal way to parallelize the code. MPI was used for communications - to ensure portability across parallel platforms. Each processor was allocated one blade section (2 grids) with the inlet and outlet sections allocated on separate PEs. Communications in the code were used to synchronize boundary conditions at each time step.

Performance

Turn around times are greatly reduced for the parallel code. For a 10 PE job, the reduction in turnaround time is a factor of 15. The superlinear speedup is attributable to increased cache availability. The code runs on the SGI-challenge and T3E. Timing results for a 3 row, 10 PE run.

Machine NPES time/(iter*grid pt)
SGI-Challenge 1 397. E-6 
C - 90 1 33.5 E-6 
SGI-Challenge 10 25.5 E-6 
T3E-900 10 25.0 E-6 


The scalability of the code on the Challenge is limited by the number of processors available. Real-world configurations require larger number of PE's. For a 15 PE run on the T3E, the following timings were recorded:

Machine Clock(MHz) Time(s) Speedup
T3E-600 300 34 1.000
T3E-900 450 27 1.259
T3E-1200 600 20 1.700

Animations

Two animations are available:

Authors

Paul Cizmas
cizmas@reynolds.pgh.wec.com
Westinghouse Science & Technology Center
Pittsburgh PA 15235
Ravi Subramanya
ravi@psc.edu
Pittsburgh Supercomputing Center
Pittsburgh PA 15213