The linear systems that need to be solved in Elmer/Ice are of the form

| A B^T | |v| |f| | B C | |p| = |g| (1)

where the v-component of the solution vector represents the velocities, and the p-component the pressure. The matrix A is an elasticity-like operator, C a stabilization term, with B the discrete version of the operator -div.

While direct solvers are a robust way of solving (1), their computational cost in 3D grows with the square of the number of unknowns, and their memory usage becomes excessive for large 3D applications. Iterative methods are therefore needed. A new module called `ParStokes' is therefore available in Elmer now, which is dedicated to this purpose.

The general strategy for the iterative solution of (1) is to use a Krylov-subspace method (called the 'outer iteration'), combined with a preconditioner P. The approach followed in the ParStokes module is to use GCR (the generalized conjugate residual method) as outer iteration, and a preconditioner of the block upper triangular form

| A B^T | P = | 0 M |. (2)

Here M denotes the mass matrix, scaled by the element-wise fluidity of the current approximation of the solution. It can be shown that, under certain regularity assumptions, this preconditioner leads to a convergence rate of the outer iteration which is independent of the mesh-size h. However, the application of the inverse of P still requires the exact solution of linear systems with A and M. These operations would still be too expensive with direct solvers for large 3D problems, so further approximations are made. Note that the inverse of P has to be applied in each outer iteration, so the approximate solution of systems with A and M should be very cheap compared to the exact solution of a system (1).

The ParStokes module is a dedicated solver for ice applications and is not as flexible as the NavierStokes module. Certain choices such as the outer iterative solver and the use of high-order bubble functions for stabilization are hard-coded.

A sif-file for the ParStokes module should have (at least) three Solver sections, one for the coupled problem, one for the velocity system with A, and one for the pressure system with M. An example could look like this:

Solver 1 Equation = "Velocity Preconditioning" Procedure = "DummyRoutine" "DummyRoutine" Variable = -dofs 3 "V" Variable Output = False Exec Solver = "before simulation" Element = "p:1 b:4" Bubbles in Global System = False Linear System Symmetric = True Linear System Scaling = True Linear System Row Equilibration = Logical False Linear System Solver = Iterative Linear System Max Iterations = 250 Linear System Preconditioning = ILU0 Linear System Convergence Tolerance = 1.0e-6 Linear System Abort Not Converged = False Skip Compute Nonlinear Change = Logical True Back Rotate N-T Solution = Logical False Linear System Timing = True End

Solver 2 Equation = "Pressure Preconditioning" Procedure = "DummyRoutine" "DummyRoutine" Variable = -dofs 1 "P" Variable Output = False Exec Solver = "before simulation" Element = "p:1 b:4" Bubbles in Global System = False Linear System Symmetric = True Linear System Scaling = True Linear System Solver = iterative Linear System Iterative Method = CG Linear System Max Iterations = 1000 Linear System Convergence Tolerance = 1.0e-6 Linear System Preconditioning = Diagonal Linear System Residual Output = 10 Skip Compute Nonlinear Change = Logical True Back Rotate N-T Solution = Logical False Linear System Timing = True End

Solver 3 Equation = "Stokes" Procedure = "ParStokes" "StokesSolver" Element = "p:1 b:4" Bubbles in Global System = False Variable = FlowVar Variable Dofs = 4 Convective = Logical False Block Diagonal A = Logical True Use Velocity Laplacian = Logical True !----------------------------------------------- ! Keywords related to the block preconditioning !----------------------------------------------- Block Preconditioning = Logical True Linear System Scaling = Logical True Linear System Row Equilibration = Logical True Linear System Solver = "Iterative" Linear System GCR Restart = Integer 200 Linear System Max Iterations = 200 Linear System Convergence Tolerance = 1.0e-6 Nonlinear System Max Iterations = 100 Nonlinear System Convergence Tolerance = 1.0e-5 Nonlinear System Newton After Tolerance = 1.0e-3 End

a) The ParStokes module is not compiled automatically. The source code is in trunk/fem/src/modules/ParStokes.src. You should copy the code to your directory (possibly rename the suffix to .f90, as some Fortran compilers are picky about this) and then simply compile it:

elmerf90 ParStokes.f90 -o ParStokes.so

Place the resulting shared object into a directory that is either under the $LD_LIBRARY_PATH or - if you can access it - into $ELMER_HOME/share/elmersolver/lib

b) The entry 'Equation' in the first two solvers must have the names given above so that the ParStokes module finds them. The function DummyRoutine is an empty subroutine that you have to supply, it may look like this:

SUBROUTINE DummyRoutine( Model,Solver,dt,TransientSimulation ) USE DefUtils USE SolverUtils USE ElementUtils IMPLICIT NONE TYPE(Solver_t) :: Solver TYPE(Model_t) :: Model REAL(KIND=dp) :: dt LOGICAL :: TransientSimulation END SUBROUTINE DummyRoutine

and is compiled as usual, by

elmerf90 -o DummyRoutine.so DummyRoutine.f90

Same here: Place the resulting shared object into a directory that is either under the $LD_LIBRARY_PATH or - if you can access it - into $ELMER_HOME/share/elmersolver/lib

c) We propose to use linear elements for all variables here, augmented with four bubble functions per velocity component to achieve stability. While the high order stabilization makes the assembly somewhat costly, it is important for achieving a stable solution. The corresponding lines in all three solver sections are

Element = "p:1 b:4" Bubbles in Global System = False

d) The scaling of the linear systems is essential in order to get appropriate stopping criteria for the nested iterations. For the outer iteration (Solver 3 above), one should generally use

Linear System Scaling = Logical True Linear System Row Equilibration = Logical True

For the systems with A and M it is possible to preserve symmetry by setting

Linear System Scaling = Logical True Linear System Row Equilibration = Logical False

In that case CG can be used instead of BiCGStab.

note: while the matrices A and M are in principle symmetric, the symmetry may be destroyed by boundary conditions. The flag

Linear System Symmetric = True

will yield a symmetric system matrix for the most common boundary conditions, but in case CG fails to converge it may be worthwhile to try BiCGStab instead to see if a loss of symmetry causes the problems.

e) It is possible to replace A in the preconditioner by the block diagonal matrix

| A_{11} 0 0 | A_b= | 0 A_{22} 0 | | 0 0 A_{33} |

where couplings between the velocity components are neglected. This is selected by setting

Block Diagonal A = Logical True

To make the systems with A even cheaper, one can in addition set

Use Velocity Laplacian = Logical True

in which case the diagonal blocks $A_{ii}$ are replaced by scaled Laplace operators. The price you pay for these approximations is an increase in the number of outer iterations. The gain is that the systems are easier to solve, so the number of inner iterations in each outer iteration decreases.

f) The variables in the solver sections for the preconditioner (Solver 1 and 2) are not identified as being the same as in the coupled system (Solver 3), so boundary conditions have to be assigned to all the components twice. For instance, to specify a no-slip condition at the bedrock the according section in the sif-file might look like this:

! bedrock:

Boundary Condition 2 Name = "bedrock" Target Boundaries(1) = 5 FlowVar 1 = Real 0.0 FlowVar 2 = Real 0.0 FlowVar 3 = Real 0.0 V 1 = Real 0.0 V 2 = Real 0.0 V 3 = Real 0.0 End

g) The keyword *Flow Model = Stokes* is not necessary, since ParStokes does only solve the Stokes (and not Navier-Stokes) equation. Also, 2 Picard iterations are always performed, independently from what is specified in the sif-file. The line “Reset Newton…” is not necessary.

Note: At least currently (Nov 2014) it seems not to be possible to use ParStokes in a restart from a run in which ParStokes was not used.

The solution of systems with M is relatively easy in practice, we recommend using a CG iterative solver with diagonal preconditioning or ILU0, and solving the system to relatively high accuracy:

Solver 2 Equation = "Pressure Preconditioning" Procedure = "DummyRoutine" "DummyRoutine" Variable = -dofs 1 "P" Variable Output = False Exec Solver = "before simulation" Element = "p:1 b:4" Bubbles in Global System = False Linear System Symmetric = True Linear System Scaling = True Linear System Solver = iterative Linear System Iterative Method = CG Linear System Max Iterations = 1000 Linear System Convergence Tolerance = 1.0e-6 Linear System Preconditioning = Diagonal Linear System Residual Output = 10 Skip Compute Nonlinear Change = Logical True Back Rotate N-T Solution = Logical False Linear System Timing = True End

The systems with A are considerably harder to solve, but the accuracy to which they are solved is not so critical for the overall convergence of the outer iteration. The matrix A is related to a linear elasticity problem, and for such systems the standard ILU preconditioning in Elmer tends to lack robustness in parallel. This is related to the large null space (rigid body modes) of the matrices on each processor partition. For moderately sized problems (say, 16 partitions), ILU0 or ILU1 are likely to give good results. For large-scale applications we advocate using one of the parallel linear algebra packages interfaced by Elmer, HYPRE or Trilinos. Below we suggest two alternative solvers for parallel computations.

A preconditioner for A that does not suffer from the singularity of the partition-wise matrices is the parallel ILU solver Euclid, available through the HYPRE library. To use this variant, set in the according solver section

Linear System Use HYPRE = Logical True

The parameters

Linear System Iterative Method Linear System Preconditioning Linear System Convergence Tolerance Linear System Max Iterations

are used to determine the HYPRE strategy (BiCGStab and CG are implemented as solvers, and ILU0, ILU1, ILU2 etc should be used as preconditioner) This solver is likely to be slower for small numbers of partitions (np) but more robust wrt. np and more scalable (up to a few hundred processors).

In order to achieve good parallel performance for large-scale applications we propose a rather coarse approximation of the 'inverse of A' operation by a single multigrid cycle. Elmer should be compiled with support for Trilinos. Then set the parameters

Linear System Use Trilinos = Logical True Trilinos Parameter File = String input.xml

The parameter file input.xml contains settings for the smoothed aggregation solver ML. We are currently experimenting with this solver and will publish suggestions on their choice for achieving robustness later on. Some simple examples of using Trilinos with Elmer (with xml input files) can be found in the source code under

elmerfem/fem/examples/TrilinosSolvers