Part 1: Analytic Case

2D problem

The first problem was defined over a square with one boundary at 100° and the others at 0°. The inital state was essentially a linear interpolation across the grid. SOR iteration was used until the residual (step size per iteration) fell below a threshold. The temperature field was then loaded into Matlab where it could be compared with the Fourier series solution. This was done by taking the error = T_fourier - T_sor and hence finding the mean squared error (element-wise).

The following figure shows the temperature field on a 256x256 grid after running SOR to reasonable convergence: a mean squared error of 0.014 after 17500 iterations.

Matlab was used to visualise the error surface, eg the following 64x64 grid. Clearly this is dominated by oscillations on the hot edge, which is a truncation error in the Fourier solution (the Gibbs phenomenon).

The convergence process on a 100x100 grid is shown in the following animation. Each frame is a snapshot of the error surface with increasing iterations. It can be seen that the higher-frequency error near the hot boundary is reduced faster than the lower-frequency error at the cold end.

Residuals were computed during the computation as simply the relaxation step size: the sum of absolute values of grid cell changes. This gives an indication of the error, and was used as one stopping criterium. By comparing residuals with error for various grid sizes, it seems that the same residual corresponds to a similar mean squared error over several grid sizes:

grid iterations final residual mean squared error CPU time (secs)
64x64 1,040 10-8 0.021 0
256x256 17,500 10-8 0.014 32
512x512 70,300 10-8 0.025 504 (8:24)

These errors are within the truncation error of the Fourier series solution using 150 terms. This was verified by observing that the SOR solution was closer as the number of terms was increased.

3D problem

The three dimensional problem was directly analogous: a cube with one face at 100° and the others at 0°. The initial condition was again a linear ramping between the boundaries, forming a layered pyramid, as in the following slice plot.

The temperature volume was visualised using slices and isosurfaces (surfaces where the temperature is constant). These can also be combined effectively as in the following example. On a 64x64x64 grid, a slice was taken perpendicular to the hot boundary, through the midpoint of that face. Translucent isosurfaces were then overlaid for 20° and 80°, and viewed in profile.

Residuals were again compared with the error (derived from the Fourier solution) for various grid sizes. Grids of length greater than 128 were resampled down to that level for calculating the error.

grid iterations final residual mean squared error CPU time (secs)
64x64x64 1,160 10-8 0.040 14
128x128x128 5,170 10-8 0.071 485 (8:05)
256x256x256 10,000 3.5 * 10-2 1.67 9120 (2:32:)
512x512x512 1,000 2.6 * 10+5 6840 (1:54:)

A graph of residuals changing with number of iterations is shown below, for several 3D grid sizes: Red = 2563, Green = 1283, Blue = 643. The exponential decay of residuals is apparent, but the time scale increases dramatically with grid size. The early peak in residuals is due to ramping up of the over-relaxation coefficient -- this increases step sizes by definition, but leads to faster convergence.

We would like to know the structure of error in the 3D solution. This slice plot shows error on a 20x20x20 grid after 50 iterations. The dominant values are adjacent to the hot boundary, probably because the initial error there was large.

The same error was visualised with an isosurface representing 0 error. On one side there is an underestimation (towards centre of hot face), and on the other side the temperature is overestimated. It also appears that two other faces have a change of error sign, though this is probably trivial.

Parallelisation

Red-black updating was used for a consistent parallel scheme. One dimension was partitioned evenly between the processors, and the boundaries of each portion were exchanged with the adjacent nodes twice on each time step (after red update, and black update). The speedups on a 512 grid were as follows:

CPUs parallel time serial time speedup
4 0:31:33 01:54:00 3.61
8 0:17:13 01:54:00 6.62

Code

Fortran 90 serial code:

sor.f90 Main program.
iterate.f90 Code for doing 2D and 3D SOR.
sorglobals.f90 Variable declarations.
sorinfin.f90 Configuration setup, initial conditions, output.

Fortran 90 parallel (MPI) code:

mpi_sor.f90 Main program.
mpi_iterate.f90 2D and 3D SOR code with distributed workload, using red-black updating.
mpi_sorglobals.f90 Variable declarations.
mpi_sorinfin.f90 Configuration setup, initial conditions, output.



Felix Andrews