SPARSE BATCHED – KokkosKernels sparse batched functor-level interfaces

cg

template<typename MemberType, typename ArgMode> struct CG

crsmatrix

template<class ValuesViewType, class IntViewType> class CrsMatrix

Template Parameters:

ValuesViewType – Input type for the values of the batched crs matrix, needs to be a 2D view
IntView – Input type for row offset array and column-index array, needs to be a 1D view

Public Functions

template<typename ArgTrans, typename ArgMode, typename MemberType, typename XViewType, typename YViewType> inline void apply(const MemberType &member, const XViewType &X, const YViewType &Y, MagnitudeType alpha = Kokkos::ArithTraits<MagnitudeType>::one(), MagnitudeType beta = Kokkos::ArithTraits<MagnitudeType>::zero()) const

apply version that uses constant coefficients alpha and beta

y_l <- alpha * A_l * x_l + beta * y_l for all l = 1, …, N where:

N is the number of matrices,
A_1, …, A_N are N sparse matrices which share the same sparsity pattern,
x_1, …, x_N are the N input vectors,
y_1, …, y_N are the N output vectors,
alpha is a scaling factor for x_1, …, x_N,
beta is a scaling factor for y_1, …, y_N.

Template Parameters:

MemberType – Input type for the TeamPolicy member
XViewType – Input type for X, needs to be a 2D view
YViewType – Input type for Y, needs to be a 2D view
ArgTrans – Argument for transpose or notranspose
ArgMode – Argument for the parallelism used in the apply

Parameters:

member – [in]: TeamPolicy member
alpha – [in]: input coefficient for X (default value 1.)
X – [in]: Input vector X, a rank 2 view
beta – [in]: input coefficient for Y (default value 0.)
Y – [in/out]: Output vector Y, a rank 2 view

gmres

template<typename MemberType, typename ArgMode> struct GMRES

identity

class Identity: Batched Identity Operator:

jacobiprec

template<class ValuesViewType> class JacobiPrec

Batched Jacobi Preconditioner:

Template Parameters:: ValuesViewType – Input type for the values of the diagonal

krylovhandle

template<class NormViewType, class IntViewType, class ViewType3D> class KrylovHandle

KrylovHandle.

The handle is used to pass information between the Krylov solver and the calling code.

The handle has some views as data member, their required size can be different depending on the used Krylov solver.

In the case of the Batched GMRES, the size should be as follows:

Arnoldi_view a batched_size x max_iteration x (n_rows + max_iteration + 3);
tmp_view is NOT used for the team/teamvector GMRES; it is used for the serial GMRES and the size is batched_size x (n_rows + max_iteration + 3);
residual_norms is an optional batched_size x (max_iteration + 2) used to store the convergence history;
iteration_numbers is a 1D view of length batched_size;
first_index and last_index are 1D of length n_teams.

Template Parameters:

NormViewType – type of the view used to store the convergence history
IntViewType – type of the view used to store the number of iteration per system
ViewType3D – type of the 3D temporary views

Public Functions

inline int get_number_of_systems_per_team(): get_number_of_systems_per_team

inline int get_number_of_teams(): get_number_of_teams

inline void reset(): reset Reset the iteration numbers to the default value of -1 and the residual norms if monitored. (Usefull when mulitple consecutive solvers use the same handle)

inline void synchronise_host(): synchronise_host Synchronise host and device.

inline bool is_converged() const: is_converged Test if all the systems have converged.

inline bool is_converged_host(): is_converged_host Test if all the systems have converged (host).

inline bool is_converged(int batched_id) const

is_converged Test if one particular system has converged.

Parameters:: batched_id – [in]: Global batched ID

inline bool is_converged_host(int batched_id)

is_converged Test if one particular system has converged (host).

Parameters:: batched_id – [in]: Global batched ID

inline void set_tolerance(norm_type _tolerance)

set_tolerance Set the tolerance of the batched Krylov solver

Parameters:: _tolerance – [in]: New tolerance

inline norm_type get_tolerance() const: get_tolerance Get the tolerance of the batched Krylov solver

inline void set_max_tolerance(norm_type _max_tolerance)

set_max_tolerance Set the maximal tolerance of the batched Krylov solver

Parameters:: _max_tolerance – [in]: New tolerance

inline norm_type get_max_tolerance() const: get_max_tolerance Get the maximal tolerance of the batched Krylov solver

inline void set_max_iteration(int _max_iteration)

set_max_iteration Set the maximum number of iterations of the batched Krylov solver

Parameters:: _max_iteration – [in]: New maximum number of iterations

inline int get_max_iteration() const: get_max_iteration Get the maximum number of iterations of the batched Krylov solver

inline norm_type get_norm(int batched_id, int iteration_id) const

get_norm Get the norm of one system at a given iteration

Parameters:

batched_id – [in]: Global batched ID
iteration_id – [in]: Iteration ID

inline norm_type get_norm_host(int batched_id, int iteration_id)

get_norm_host Get the norm of one system at a given iteration (host)

Parameters:

batched_id – [in]: Global batched ID
iteration_id – [in]: Iteration ID

inline norm_type get_last_norm(int batched_id) const

get_last_norm Get the last norm of one system

Parameters:: batched_id – [in]: Global batched ID

inline norm_type get_last_norm_host(int batched_id)

get_last_norm_host Get the last norm of one system (host)

Parameters:: batched_id – [in]: Global batched ID

inline int get_iteration(int batched_id) const

get_iteration Get the number of iteration after convergence for one system

Parameters:: batched_id – [in]: Global batched ID

inline int get_iteration_host(int batched_id)

get_iteration_host Get the number of iteration after convergence for one system (host)

Parameters:: batched_id – [in]: Global batched ID

inline void set_ortho_strategy(int _ortho_strategy)

set_ortho_strategy Set the used orthogonalization strategy. Either classical GS (_ortho_strategy=0) or modified GS (_ortho_strategy=1)

Parameters:: _ortho_strategy – [in]: used orthogonalization strategy

inline int get_ortho_strategy() const: get_ortho_strategy Get the used orthogonalization strategy. Either classical GS (_ortho_strategy=0) or modified GS (_ortho_strategy=1)

inline void set_scratch_pad_level(int _scratch_pad_level)

set_scratch_pad_level Set the scratch pad level used to store temporary variables.

Parameters:: _scratch_pad_level – [in]: used level

inline int get_scratch_pad_level() const: get_scratch_pad_level Get the scratch pad level used to store temporary variables.

inline void set_compute_last_residual(bool _compute_last_residual)

set_compute_last_residual Select if the last residual is explicitly computed.

Parameters:: _compute_last_residual – [in]: boolean that specifies if we compute the last residual explicitly

inline bool get_compute_last_residual() const: get_compute_last_residual Specify if the last residual has to be computed explicitly.

spmv

template<typename ArgTrans = Trans::NoTranspose> struct SerialSpmv

Serial Batched SPMV: y_l <- alpha_l * A_l * x_l + beta_l * y_l for all l = 1, …, N where:

N is the number of matrices,
A_1, …, A_N are N sparse matrices which share the same sparsity pattern,
x_1, …, x_N are the N input vectors,
y_1, …, y_N are the N output vectors,
alpha_1, …, alpha_N are N scaling factors for x_1, …, x_N,
beta_1, …, beta_N are N scaling factors for y_1, …, y_N.

The matrices are represented using a Compressed Row Storage (CRS) format and the shared sparsity pattern is reused from one matrix to the others.

Concretely, instead of providing an array of N matrices to the batched SPMV kernel, the user provides one row offset array (1D view), one column-index array (1D view), and one value array (2D view, one dimension for the non-zero indices and one for the matrix indices).

No nested parallel_for is used inside of the function.

Template Parameters:

ValuesViewType – Input type for the values of the batched crs matrix, needs to be a 2D view
IntView – Input type for row offset array and column-index array, needs to be a 1D view
xViewType – Input type for X, needs to be a 2D view
yViewType – Input type for Y, needs to be a 2D view
alphaViewType – Input type for alpha, needs to be a 1D view
betaViewType – Input type for beta, needs to be a 1D view
dobeta – Int which sepcifies if beta_l * y_l is used or not (if dobeta == 0, beta_l * y_l is not added to the result of alpha_l * A_l * x_l)

Param alpha:

[in]: input coefficient for X, a rank 1 view

Param values:

[in]: values of the batched crs matrix, a rank 2 view

Param row_ptr:

[in]: row offset array of the batched crs matrix, a rank 1 view

Param colIndices:

[in]: column-index array of the batched crs matrix, a rank 1 view

Param X:

[in]: Input vector X, a rank 2 view

Param beta:

[in]: input coefficient for Y (if dobeta != 0), a rank 1 view

Param Y:

[in/out]: Output vector Y, a rank 2 view

template<typename MemberType, typename ArgTrans = Trans::NoTranspose> struct TeamSpmv

Team Batched SPMV: y_l <- alpha_l * A_l * x_l + beta_l * y_l for all l = 1, …, N where:

N is the number of matrices,
A_1, …, A_N are N sparse matrices which share the same sparsity pattern,
x_1, …, x_N are the N input vectors,
y_1, …, y_N are the N output vectors,
alpha_1, …, alpha_N are N scaling factors for x_1, …, x_N,
beta_1, …, beta_N are N scaling factors for y_1, …, y_N.

The matrices are represented using a Compressed Row Storage (CRS) format and the shared sparsity pattern is reused from one matrix to the others.

Concretely, instead of providing an array of N matrices to the batched SPMV kernel, the user provides one row offset array (1D view), one column-index array (1D view), and one value array (2D view, one dimension for the non-zero indices and one for the matrix indices).

A nested parallel_for with TeamThreadRange is used.

Template Parameters:

ValuesViewType – Input type for the values of the batched crs matrix, needs to be a 2D view
IntView – Input type for row offset array and column-index array, needs to be a 1D view
xViewType – Input type for X, needs to be a 2D view
yViewType – Input type for Y, needs to be a 2D view
alphaViewType – Input type for alpha, needs to be a 1D view
betaViewType – Input type for beta, needs to be a 1D view
dobeta – Int which sepcifies if beta_l * y_l is used or not (if dobeta == 0, beta_l * y_l is not added to the result of alpha_l * A_l * x_l)

Param member:

[in]: TeamPolicy member

Param alpha:

[in]: input coefficient for X, a rank 1 view

Param values:

[in]: values of the batched crs matrix, a rank 2 view

Param row_ptr:

[in]: row offset array of the batched crs matrix, a rank 1 view

Param colIndices:

[in]: column-index array of the batched crs matrix, a rank 1 view

Param X:

[in]: Input vector X, a rank 2 view

Param beta:

[in]: input coefficient for Y (if dobeta != 0), a rank 1 view

Param Y:

[in/out]: Output vector Y, a rank 2 view

template<typename MemberType, typename ArgTrans = Trans::NoTranspose, unsigned N_team = 1> struct TeamVectorSpmv

TeamVector Batched SPMV: y_l <- alpha_l * A_l * x_l + beta_l * y_l for all l = 1, …, N where:

N is the number of matrices,
A_1, …, A_N are N sparse matrices which share the same sparsity pattern,
x_1, …, x_N are the N input vectors,
y_1, …, y_N are the N output vectors,
alpha_1, …, alpha_N are N scaling factors for x_1, …, x_N,
beta_1, …, beta_N are N scaling factors for y_1, …, y_N.

The matrices are represented using a Compressed Row Storage (CRS) format and the shared sparsity pattern is reused from one matrix to the others.

Concretely, instead of providing an array of N matrices to the batched SPMV kernel, the user provides one row offset array (1D view), one column-index array (1D view), and one value array (2D view, one dimension for the non-zero indices and one for the matrix indices).

Two nested parallel_for with both TeamThreadRange and ThreadVectorRange (or one with TeamVectorRange) are used inside.

Template Parameters:

ValuesViewType – Input type for the values of the batched crs matrix, needs to be a 2D view
IntView – Input type for row offset array and column-index array, needs to be a 1D view
xViewType – Input type for X, needs to be a 2D view
yViewType – Input type for Y, needs to be a 2D view
alphaViewType – Input type for alpha, needs to be a 1D view
betaViewType – Input type for beta, needs to be a 1D view
dobeta – Int which sepcifies if beta_l * y_l is used or not (if dobeta == 0, beta_l * y_l is not added to the result of alpha_l * A_l * x_l)

Param member:

[in]: TeamPolicy member

Param alpha:

[in]: input coefficient for X, a rank 1 view

Param values:

[in]: values of the batched crs matrix, a rank 2 view

Param row_ptr:

[in]: row offset array of the batched crs matrix, a rank 1 view

Param colIndices:

[in]: column-index array of the batched crs matrix, a rank 1 view

Param X:

[in]: Input vector X, a rank 2 view

Param beta:

[in]: input coefficient for Y (if dobeta != 0), a rank 1 view

Param Y:

[in/out]: Output vector Y, a rank 2 view

template<typename MemberType, typename ArgTrans, typename ArgMode> struct Spmv

Batched SPMV: Selective Interface y_l <- alpha_l * A_l * x_l + beta_l * y_l for all l = 1, …, N where:

N is the number of matrices,
A_1, …, A_N are N sparse matrices which share the same sparsity pattern,
x_1, …, x_N are the N input vectors,
y_1, …, y_N are the N output vectors,
alpha_1, …, alpha_N are N scaling factors for x_1, …, x_N,
beta_1, …, beta_N are N scaling factors for y_1, …, y_N.

Template Parameters:

ValuesViewType – Input type for the values of the batched crs matrix, needs to be a 2D view
IntView – Input type for row offset array and column-index array, needs to be a 1D view
xViewType – Input type for X, needs to be a 2D view
yViewType – Input type for Y, needs to be a 2D view
alphaViewType – Input type for alpha, needs to be a 1D view
betaViewType – Input type for beta, needs to be a 1D view
dobeta – Int which sepcifies if beta_l * y_l is used or not (if dobeta == 0, beta_l * y_l is not added to the result of alpha_l * A_l * x_l)

Param member:

[in]: TeamPolicy member

Param alpha:

[in]: input coefficient for X, a rank 1 view

Param values:

[in]: values of the batched crs matrix, a rank 2 view

Param row_ptr:

[in]: row offset array of the batched crs matrix, a rank 1 view

Param colIndices:

[in]: column-index array of the batched crs matrix, a rank 1 view

Param X:

[in]: Input vector X, a rank 2 view

Param beta:

[in]: input coefficient for Y (if dobeta != 0), a rank 1 view

Param Y:

[in/out]: Output vector Y, a rank 2 view