|
PaRSI Parallel Computation of Rotor-Stator Interaction |
| |
Paul Cizmas, Westinghouse Electric Corporation, and Ravishankar Subramanya, Pittsburgh Supercomputing Center
The paper Parallel Computation of Rotor-Stator Interaction, which explains this work in detail, is available in postscript format. (Please note that this paper is gzip'd.)
Overview
We have implemented a parallel approach to simulating unsteady flows in turbo-machinery. This is the optimal way to model rotor-stator interactions since it provides us with a means of capturing the flow non-linearities. A time marching approach is adopted with the full Navier-Stokes equation being solved at each time step. The simulations provide results that enable turbine geometry design optimization.
Motivation
Sequential codes are too slow and expensive. For the simulation to have a significant impact on the design process, the designer should have the flexibility of experimenting with different blade geometries and have results turn-around in a matter of days rather than weeks. Simulation of the rotor-stator interaction for a typical geometry, using a sequential code may run for a month (or more) on a Cray-C90. Parallelization reduces the turnaround by an order of magnitude.
Method
A data-parallel paradigm was chosen as the optimal way to parallelize the code. MPI was used for communications - to ensure portability across parallel platforms. Each processor was allocated one blade section (2 grids) with the inlet and outlet sections allocated on separate PEs. Communications in the code were used to synchronize boundary conditions at each time step.
Performance
Turn around times are greatly reduced for the parallel code. For a 10 PE job, the reduction in turnaround time is a factor of 15. The superlinear speedup is attributable to increased cache availability. The code runs on the SGI-challenge and T3E. Timing results for a 3 row, 10 PE run.
| Machine | NPES | time/(iter*grid pt) |
| SGI-Challenge | 1 | 397. E-6 |
| C - 90 | 1 | 33.5 E-6 |
| SGI-Challenge | 10 | 25.5 E-6 |
| T3E-900 | 10 | 25.0 E-6 |
The scalability of the code on the Challenge is limited by the number of processors available. Real-world configurations require larger number of PE's. For a 15 PE run on the T3E, the following timings were recorded:
| Machine | Clock(MHz) | Time(s) | Speedup |
| T3E-600 | 300 | 34 | 1.000 |
| T3E-900 | 450 | 27 | 1.259 |
| T3E-1200 | 600 | 20 | 1.700 |
Animations
Two animations are available:
- Four rows of a 2-D section of a turbine and the
associated entropy
- Three rows of a 2-D section of a turbine and the associated Mach number
Authors
| Paul
Cizmas
cizmas@reynolds.pgh.wec.com Westinghouse Science & Technology Center Pittsburgh PA 15235 |
Ravi Subramanya
ravi@psc.edu Pittsburgh Supercomputing Center Pittsburgh PA 15213 |