SPARSE BATCHED – KokkosKernels sparse batched functor-level interfaces
cg
-
template<typename MemberType, typename ArgMode>
struct CG
crsmatrix
-
template<class ValuesViewType, class IntViewType>
class CrsMatrix Batched CrsMatrix:
- Template Parameters:
ValuesViewType – Input type for the values of the batched crs matrix, needs to be a 2D view
IntView – Input type for row offset array and column-index array, needs to be a 1D view
Public Functions
-
template<typename ArgTrans, typename ArgMode, typename MemberType, typename XViewType, typename YViewType>
inline void apply(const MemberType &member, const XViewType &X, const YViewType &Y, MagnitudeType alpha = Kokkos::ArithTraits<MagnitudeType>::one(), MagnitudeType beta = Kokkos::ArithTraits<MagnitudeType>::zero()) const apply version that uses constant coefficients alpha and beta
y_l <- alpha * A_l * x_l + beta * y_l for all l = 1, …, N where:
N is the number of matrices,
A_1, …, A_N are N sparse matrices which share the same sparsity pattern,
x_1, …, x_N are the N input vectors,
y_1, …, y_N are the N output vectors,
alpha is a scaling factor for x_1, …, x_N,
beta is a scaling factor for y_1, …, y_N.
- Template Parameters:
MemberType – Input type for the TeamPolicy member
XViewType – Input type for X, needs to be a 2D view
YViewType – Input type for Y, needs to be a 2D view
ArgTrans – Argument for transpose or notranspose
ArgMode – Argument for the parallelism used in the apply
- Parameters:
member – [in]: TeamPolicy member
alpha – [in]: input coefficient for X (default value 1.)
X – [in]: Input vector X, a rank 2 view
beta – [in]: input coefficient for Y (default value 0.)
Y – [in/out]: Output vector Y, a rank 2 view
gmres
-
template<typename MemberType, typename ArgMode>
struct GMRES
identity
jacobiprec
-
template<class ValuesViewType>
class JacobiPrec Batched Jacobi Preconditioner:
- Template Parameters:
ValuesViewType – Input type for the values of the diagonal
krylovhandle
-
template<class NormViewType, class IntViewType, class ViewType3D>
class KrylovHandle -
The handle is used to pass information between the Krylov solver and the calling code.
The handle has some views as data member, their required size can be different depending on the used Krylov solver.
In the case of the Batched GMRES, the size should be as follows:
Arnoldi_view a batched_size x max_iteration x (n_rows + max_iteration + 3);
tmp_view is NOT used for the team/teamvector GMRES; it is used for the serial GMRES and the size is batched_size x (n_rows + max_iteration + 3);
residual_norms is an optional batched_size x (max_iteration + 2) used to store the convergence history;
iteration_numbers is a 1D view of length batched_size;
first_index and last_index are 1D of length n_teams.
- Template Parameters:
NormViewType – type of the view used to store the convergence history
IntViewType – type of the view used to store the number of iteration per system
ViewType3D – type of the 3D temporary views
Public Functions
-
inline int get_number_of_systems_per_team()
get_number_of_systems_per_team
-
inline int get_number_of_teams()
get_number_of_teams
-
inline void reset()
reset Reset the iteration numbers to the default value of -1 and the residual norms if monitored. (Usefull when mulitple consecutive solvers use the same handle)
-
inline void synchronise_host()
synchronise_host Synchronise host and device.
-
inline bool is_converged() const
is_converged Test if all the systems have converged.
-
inline bool is_converged_host()
is_converged_host Test if all the systems have converged (host).
-
inline bool is_converged(int batched_id) const
is_converged Test if one particular system has converged.
- Parameters:
batched_id – [in]: Global batched ID
-
inline bool is_converged_host(int batched_id)
is_converged Test if one particular system has converged (host).
- Parameters:
batched_id – [in]: Global batched ID
-
inline void set_tolerance(norm_type _tolerance)
set_tolerance Set the tolerance of the batched Krylov solver
- Parameters:
_tolerance – [in]: New tolerance
-
inline norm_type get_tolerance() const
get_tolerance Get the tolerance of the batched Krylov solver
-
inline void set_max_tolerance(norm_type _max_tolerance)
set_max_tolerance Set the maximal tolerance of the batched Krylov solver
- Parameters:
_max_tolerance – [in]: New tolerance
-
inline norm_type get_max_tolerance() const
get_max_tolerance Get the maximal tolerance of the batched Krylov solver
-
inline void set_max_iteration(int _max_iteration)
set_max_iteration Set the maximum number of iterations of the batched Krylov solver
- Parameters:
_max_iteration – [in]: New maximum number of iterations
-
inline int get_max_iteration() const
get_max_iteration Get the maximum number of iterations of the batched Krylov solver
-
inline norm_type get_norm(int batched_id, int iteration_id) const
get_norm Get the norm of one system at a given iteration
- Parameters:
batched_id – [in]: Global batched ID
iteration_id – [in]: Iteration ID
-
inline norm_type get_norm_host(int batched_id, int iteration_id)
get_norm_host Get the norm of one system at a given iteration (host)
- Parameters:
batched_id – [in]: Global batched ID
iteration_id – [in]: Iteration ID
-
inline norm_type get_last_norm(int batched_id) const
get_last_norm Get the last norm of one system
- Parameters:
batched_id – [in]: Global batched ID
-
inline norm_type get_last_norm_host(int batched_id)
get_last_norm_host Get the last norm of one system (host)
- Parameters:
batched_id – [in]: Global batched ID
-
inline int get_iteration(int batched_id) const
get_iteration Get the number of iteration after convergence for one system
- Parameters:
batched_id – [in]: Global batched ID
-
inline int get_iteration_host(int batched_id)
get_iteration_host Get the number of iteration after convergence for one system (host)
- Parameters:
batched_id – [in]: Global batched ID
-
inline void set_ortho_strategy(int _ortho_strategy)
set_ortho_strategy Set the used orthogonalization strategy. Either classical GS (_ortho_strategy=0) or modified GS (_ortho_strategy=1)
- Parameters:
_ortho_strategy – [in]: used orthogonalization strategy
-
inline int get_ortho_strategy() const
get_ortho_strategy Get the used orthogonalization strategy. Either classical GS (_ortho_strategy=0) or modified GS (_ortho_strategy=1)
-
inline void set_scratch_pad_level(int _scratch_pad_level)
set_scratch_pad_level Set the scratch pad level used to store temporary variables.
- Parameters:
_scratch_pad_level – [in]: used level
-
inline int get_scratch_pad_level() const
get_scratch_pad_level Get the scratch pad level used to store temporary variables.
-
inline void set_compute_last_residual(bool _compute_last_residual)
set_compute_last_residual Select if the last residual is explicitly computed.
- Parameters:
_compute_last_residual – [in]: boolean that specifies if we compute the last residual explicitly
-
inline bool get_compute_last_residual() const
get_compute_last_residual Specify if the last residual has to be computed explicitly.
spmv
-
template<typename ArgTrans = Trans::NoTranspose>
struct SerialSpmv Serial Batched SPMV: y_l <- alpha_l * A_l * x_l + beta_l * y_l for all l = 1, …, N where:
N is the number of matrices,
A_1, …, A_N are N sparse matrices which share the same sparsity pattern,
x_1, …, x_N are the N input vectors,
y_1, …, y_N are the N output vectors,
alpha_1, …, alpha_N are N scaling factors for x_1, …, x_N,
beta_1, …, beta_N are N scaling factors for y_1, …, y_N.
The matrices are represented using a Compressed Row Storage (CRS) format and the shared sparsity pattern is reused from one matrix to the others.
Concretely, instead of providing an array of N matrices to the batched SPMV kernel, the user provides one row offset array (1D view), one column-index array (1D view), and one value array (2D view, one dimension for the non-zero indices and one for the matrix indices).
No nested parallel_for is used inside of the function.
- Template Parameters:
ValuesViewType – Input type for the values of the batched crs matrix, needs to be a 2D view
IntView – Input type for row offset array and column-index array, needs to be a 1D view
xViewType – Input type for X, needs to be a 2D view
yViewType – Input type for Y, needs to be a 2D view
alphaViewType – Input type for alpha, needs to be a 1D view
betaViewType – Input type for beta, needs to be a 1D view
dobeta – Int which sepcifies if beta_l * y_l is used or not (if dobeta == 0, beta_l * y_l is not added to the result of alpha_l * A_l * x_l)
- Param alpha:
[in]: input coefficient for X, a rank 1 view
- Param values:
[in]: values of the batched crs matrix, a rank 2 view
- Param row_ptr:
[in]: row offset array of the batched crs matrix, a rank 1 view
- Param colIndices:
[in]: column-index array of the batched crs matrix, a rank 1 view
- Param X:
[in]: Input vector X, a rank 2 view
- Param beta:
[in]: input coefficient for Y (if dobeta != 0), a rank 1 view
- Param Y:
[in/out]: Output vector Y, a rank 2 view
-
template<typename MemberType, typename ArgTrans = Trans::NoTranspose>
struct TeamSpmv Team Batched SPMV: y_l <- alpha_l * A_l * x_l + beta_l * y_l for all l = 1, …, N where:
N is the number of matrices,
A_1, …, A_N are N sparse matrices which share the same sparsity pattern,
x_1, …, x_N are the N input vectors,
y_1, …, y_N are the N output vectors,
alpha_1, …, alpha_N are N scaling factors for x_1, …, x_N,
beta_1, …, beta_N are N scaling factors for y_1, …, y_N.
The matrices are represented using a Compressed Row Storage (CRS) format and the shared sparsity pattern is reused from one matrix to the others.
Concretely, instead of providing an array of N matrices to the batched SPMV kernel, the user provides one row offset array (1D view), one column-index array (1D view), and one value array (2D view, one dimension for the non-zero indices and one for the matrix indices).
A nested parallel_for with TeamThreadRange is used.
- Template Parameters:
ValuesViewType – Input type for the values of the batched crs matrix, needs to be a 2D view
IntView – Input type for row offset array and column-index array, needs to be a 1D view
xViewType – Input type for X, needs to be a 2D view
yViewType – Input type for Y, needs to be a 2D view
alphaViewType – Input type for alpha, needs to be a 1D view
betaViewType – Input type for beta, needs to be a 1D view
dobeta – Int which sepcifies if beta_l * y_l is used or not (if dobeta == 0, beta_l * y_l is not added to the result of alpha_l * A_l * x_l)
- Param member:
[in]: TeamPolicy member
- Param alpha:
[in]: input coefficient for X, a rank 1 view
- Param values:
[in]: values of the batched crs matrix, a rank 2 view
- Param row_ptr:
[in]: row offset array of the batched crs matrix, a rank 1 view
- Param colIndices:
[in]: column-index array of the batched crs matrix, a rank 1 view
- Param X:
[in]: Input vector X, a rank 2 view
- Param beta:
[in]: input coefficient for Y (if dobeta != 0), a rank 1 view
- Param Y:
[in/out]: Output vector Y, a rank 2 view
-
template<typename MemberType, typename ArgTrans = Trans::NoTranspose, unsigned N_team = 1>
struct TeamVectorSpmv TeamVector Batched SPMV: y_l <- alpha_l * A_l * x_l + beta_l * y_l for all l = 1, …, N where:
N is the number of matrices,
A_1, …, A_N are N sparse matrices which share the same sparsity pattern,
x_1, …, x_N are the N input vectors,
y_1, …, y_N are the N output vectors,
alpha_1, …, alpha_N are N scaling factors for x_1, …, x_N,
beta_1, …, beta_N are N scaling factors for y_1, …, y_N.
The matrices are represented using a Compressed Row Storage (CRS) format and the shared sparsity pattern is reused from one matrix to the others.
Concretely, instead of providing an array of N matrices to the batched SPMV kernel, the user provides one row offset array (1D view), one column-index array (1D view), and one value array (2D view, one dimension for the non-zero indices and one for the matrix indices).
Two nested parallel_for with both TeamThreadRange and ThreadVectorRange (or one with TeamVectorRange) are used inside.
- Template Parameters:
ValuesViewType – Input type for the values of the batched crs matrix, needs to be a 2D view
IntView – Input type for row offset array and column-index array, needs to be a 1D view
xViewType – Input type for X, needs to be a 2D view
yViewType – Input type for Y, needs to be a 2D view
alphaViewType – Input type for alpha, needs to be a 1D view
betaViewType – Input type for beta, needs to be a 1D view
dobeta – Int which sepcifies if beta_l * y_l is used or not (if dobeta == 0, beta_l * y_l is not added to the result of alpha_l * A_l * x_l)
- Param member:
[in]: TeamPolicy member
- Param alpha:
[in]: input coefficient for X, a rank 1 view
- Param values:
[in]: values of the batched crs matrix, a rank 2 view
- Param row_ptr:
[in]: row offset array of the batched crs matrix, a rank 1 view
- Param colIndices:
[in]: column-index array of the batched crs matrix, a rank 1 view
- Param X:
[in]: Input vector X, a rank 2 view
- Param beta:
[in]: input coefficient for Y (if dobeta != 0), a rank 1 view
- Param Y:
[in/out]: Output vector Y, a rank 2 view
-
template<typename MemberType, typename ArgTrans, typename ArgMode>
struct Spmv Batched SPMV: Selective Interface y_l <- alpha_l * A_l * x_l + beta_l * y_l for all l = 1, …, N where:
N is the number of matrices,
A_1, …, A_N are N sparse matrices which share the same sparsity pattern,
x_1, …, x_N are the N input vectors,
y_1, …, y_N are the N output vectors,
alpha_1, …, alpha_N are N scaling factors for x_1, …, x_N,
beta_1, …, beta_N are N scaling factors for y_1, …, y_N.
- Template Parameters:
ValuesViewType – Input type for the values of the batched crs matrix, needs to be a 2D view
IntView – Input type for row offset array and column-index array, needs to be a 1D view
xViewType – Input type for X, needs to be a 2D view
yViewType – Input type for Y, needs to be a 2D view
alphaViewType – Input type for alpha, needs to be a 1D view
betaViewType – Input type for beta, needs to be a 1D view
dobeta – Int which sepcifies if beta_l * y_l is used or not (if dobeta == 0, beta_l * y_l is not added to the result of alpha_l * A_l * x_l)
- Param member:
[in]: TeamPolicy member
- Param alpha:
[in]: input coefficient for X, a rank 1 view
- Param values:
[in]: values of the batched crs matrix, a rank 2 view
- Param row_ptr:
[in]: row offset array of the batched crs matrix, a rank 1 view
- Param colIndices:
[in]: column-index array of the batched crs matrix, a rank 1 view
- Param X:
[in]: Input vector X, a rank 2 view
- Param beta:
[in]: input coefficient for Y (if dobeta != 0), a rank 1 view
- Param Y:
[in/out]: Output vector Y, a rank 2 view