BATCHED – KokkosKernels batched functor-level interfaces
innerlu
CodeCleanup-TODO: Move Decl file to dense/impl/KokkosBatched_InnerLU_Internal.hpp
applypivot
qr_withcolumnpivoting
addradial
-
struct SerialAddRadial
This add tiny values on diagonals so the absolute values of diagonals become larger Serial AddRadial
-
template<typename MemberType>
struct TeamAddRadial Team Set
householder
-
template<typename ArgSide>
struct SerialHouseholder Serial Householder
-
template<typename MemberType, typename ArgSide>
struct TeamVectorHouseholder TeamVector Householder
set
-
struct SerialSet
Serial Set
-
template<typename MemberType>
struct TeamSet Team Set
-
template<typename MemberType>
struct TeamVectorSet TeamVector Set
scale
-
struct SerialScale
Serial Scale
-
template<typename MemberType>
struct TeamScale Team Scale
-
template<typename MemberType>
struct TeamVectorScale TeamVector Scale
setidentity
-
struct SerialSetIdentity
Serial SetIdentity
-
template<typename MemberType>
struct TeamSetIdentity Team Set
-
template<typename MemberType, typename ArgMode>
struct SetIdentity Selective Interface
applyhouseholder
-
template<typename ArgSide>
struct SerialApplyHouseholder Serial Householder
-
template<typename MemberType, typename ArgSide>
struct TeamVectorApplyHouseholder
innermultipledotproduct
CodeCleanup-TODO: Move Decl file to dense/impl/KokkosBatched_InnerMultipleDotProduct_Internal.hpp
lu
-
template<typename ArgAlgo>
struct SerialLU
-
template<typename MemberType, typename ArgAlgo>
struct TeamLU
-
template<typename MemberType, typename ArgMode, typename ArgAlgo>
struct LU Selective Interface
solveutv
-
template<typename MemberType, typename ArgAlgo>
struct TeamVectorSolveUTV For given UTV = A P^T, it solves A X = B
input:
matrix_rank is computed while UTV factorization
U is m x m real matrix (m x matrix_rank is only used)
T is m x m real matrix (matrix_rank x matrix_rank is only used)
V is m x m real matrix (matrix_Rank x m is only used)
p is m integer vector including pivot indicies
X is a solution matrix (or vector)
B is a right hand side matrix (or vector)
w is B.span() real vector workspace (contiguous)
output:
B is overwritten with its solutions
When A is a full rank i.e., matrix_rank == m, UTV computes QR with column pivoting only where Q is stored in U and R is stored in T TeamVector Solve UTV
utv
-
template<typename MemberType, typename ArgAlgo>
struct TeamVectorUTV For given A, it performs UTV factorization i.e., UTV = A P^T
input:
A is m x m real matrix
p is m integer vector
U is m x m real matrix
V is m x m real matrix
w is 3*m real vector workspace (contiguous)
output:
A is overwritten as lower triangular matrix_rank x matrix_rank real matrix
P^T includes pivot indicies (note that this is different from permutation indicies)
U is left orthogonal matrix m x matrix_rank
V is right orthogonal matrix matrix_rank x m
When A is a full rank i.e., matrix_rank == m, this only compute a QR with column pivoting
output:
A is overwritten as upper triangular matrix
P^T includes pivot indicies (note that this is different from permutation indicies)
U is an orthogonal matrix m x m
V is not touched
For the solution of a rank-deficient problem, it is recommended to use SolveUTV. TeamVector UTV
inverselu
CodeCleanup-TODO: Move Decl file to dense/impl/KokkosBatched_InverseLU_Internal.hpp
svd
-
struct SerialSVD
eigendecomposition
-
struct SerialEigendecomposition
Given a general nonsymmetric matrix A (m x m), it performs eigendecomposition of the matrix.
Parameters: [in] member Team interface only has this argument. Partial specialization can be applied for a different type of team member. [in/out]A Real general nonsymmetric rank 2 view A(m x m). A is first condensed to a upper Hessenberg form. Then, the Francis double shift QR algorithm is applied to compute its Schur form. On exit, A stores a quasi upper triangular matrix of the Schur decomposition. [out]er, [out]ei A real and imaginary eigenvalues, which forms er(m)+ei(m)i For a complex eigen pair, it stores a+bi and a-bi consecutively. [out]UL, [out]UR Left/right eigenvectors are stored in (m x m) matrices. If zero span view is provided, it does not compute the corresponding eigenvectors. However, both UL and UR cannot have zero span. If eigenvalues are only requested, use the Eigenvalue interface which simplifies computations [out]W 1D contiguous workspace. The minimum size is (2*m*m+5*m) where m is the dimension of matrix A.
-
template<typename MemberType>
struct TeamVectorEigendecomposition
trtri
-
template<typename ArgUplo, typename ArgDiag, typename ArgAlgo>
struct SerialTrtri
qr
-
template<typename MemberType, typename ArgMode, typename ArgAlgo>
struct QR Selective Interface
trmm
-
template<typename ArgSide, typename ArgUplo, typename ArgTrans, typename ArgDiag, typename ArgAlgo>
struct SerialTrmm
trsm
-
template<typename ArgSide, typename ArgUplo, typename ArgTrans, typename ArgDiag, typename ArgAlgo>
struct SerialTrsm
-
template<typename MemberType, typename ArgSide, typename ArgUplo, typename ArgTrans, typename ArgDiag, typename ArgAlgo>
struct TeamTrsm
-
template<typename MemberType, typename ArgSide, typename ArgUplo, typename ArgTrans, typename ArgDiag, typename ArgAlgo>
struct TeamVectorTrsm
-
template<typename MemberType, typename ArgSide, typename ArgUplo, typename ArgTrans, typename ArgDiag, typename ArgMode, typename ArgAlgo>
struct Trsm Selective Interface
innergemmfixa
CodeCleanup-TODO: Move Decl file to dense/impl/KokkosBatched_InnerGemmFixA_Internal.hpp
innergemmfixb
CodeCleanup-TODO: Move Decl file to dense/impl/KokkosBatched_InnerGemmFixB_Internal.hpp
innergemmfixc
CodeCleanup-TODO: Move Decl file to dense/impl/KokkosBatched_InnerGemmFixC_Internal.hpp
applyq
-
template<typename MemberType, typename ArgSide, typename ArgTrans, typename ArgAlgo>
struct TeamApplyQ Team ApplyQ
-
template<typename MemberType, typename ArgSide, typename ArgTrans, typename ArgAlgo>
struct TeamVectorApplyQ TeamVector ApplyQ
-
template<typename MemberType, typename ArgSide, typename ArgTrans, typename ArgMode, typename ArgAlgo>
struct ApplyQ Selective Interface
copy
-
template<typename MemberType, typename ArgTrans = Trans::NoTranspose, int rank = 2>
struct TeamCopy Team Copy
-
template<typename MemberType, typename ArgTrans = Trans::NoTranspose, int rank = 2>
struct TeamVectorCopy TeamVector Copy
-
template<typename MemberType, typename ArgTrans, typename ArgMode, int rank = 2>
struct Copy Selective Interface
innertrsm
CodeCleanup-TODO: Move Decl file to dense/impl/KokkosBatched_InnerTrsm_Internal.hpp
solvelu
-
template<typename ArgTrans, typename ArgAlgo>
struct SerialSolveLU
-
template<typename MemberType, typename ArgTrans, typename ArgAlgo>
struct TeamSolveLU
-
template<typename MemberType, typename ArgTrans, typename ArgMode, typename ArgAlgo>
struct SolveLU Selective Interface
xpay
-
struct SerialXpay
Serial Batched XPAY: y_l <- x_l + alpha_l * y_l for all l = 1, …, N where:
N is the number of vectors,
x_1, …, x_N are the N input vectors,
y_1, …, y_N are the N output vectors,
alpha_1, …, alpha_N are N scaling factors for y_1, …, y_N.
No nested parallel_for is used inside of the function.
- Template Parameters:
XViewType – Input type for X, needs to be a 2D view
YViewType – Input type for Y, needs to be a 2D view
alphaViewType – Input type for alpha, needs to be a 1D view
- Param alpha:
[in]: input coefficient for Y, a rank 1 view
- Param X:
[in]: Input vector X, a rank 2 view
- Param Y:
[in/out]: Output vector Y, a rank 2 view
-
template<typename MemberType>
struct TeamXpay Team Batched XPAY: y_l <- x_l + alpha_l * y_l for all l = 1, …, N where:
N is the number of vectors,
x_1, …, x_N are the N input vectors,
y_1, …, y_N are the N output vectors,
alpha_1, …, alpha_N are N scaling factors for y_1, …, y_N.
A nested parallel_for with TeamThreadRange is used.
- Template Parameters:
XViewType – Input type for X, needs to be a 2D view
YViewType – Input type for Y, needs to be a 2D view
alphaViewType – Input type for alpha, needs to be a 1D view
- Param member:
[in]: TeamPolicy member
- Param alpha:
[in]: input coefficient for Y, a rank 1 view
- Param X:
[in]: Input vector X, a rank 2 view
- Param Y:
[in/out]: Output vector Y, a rank 2 view
-
template<typename MemberType>
struct TeamVectorXpay TeamVector Batched XPAY: y_l <- x_l + alpha_l * y_l for all l = 1, …, N where:
N is the number of vectors,
x_1, …, x_N are the N input vectors,
y_1, …, y_N are the N output vectors,
alpha_1, …, alpha_N are N scaling factors for y_1, …, y_N.
Two nested parallel_for with both TeamThreadRange and ThreadVectorRange (or one with TeamVectorRange) are used inside.
- Template Parameters:
XViewType – Input type for X, needs to be a 2D view
YViewType – Input type for Y, needs to be a 2D view
alphaViewType – Input type for alpha, needs to be a 1D view
- Param member:
[in]: TeamPolicy member
- Param alpha:
[in]: input coefficient for Y, a rank 1 view
- Param X:
[in]: Input vector X, a rank 2 view
- Param Y:
[in/out]: Output vector Y, a rank 2 view
axpy
-
struct SerialAxpy
Serial Batched AXPY: y_l <- alpha_l * x_l + y_l for all l = 1, …, N where:
N is the number of vectors,
x_1, …, x_N are the N input vectors,
y_1, …, y_N are the N output vectors,
alpha_1, …, alpha_N are N scaling factors for x_1, …, x_N.
No nested parallel_for is used inside of the function.
- Template Parameters:
XViewType – Input type for X, needs to be a 2D view
YViewType – Input type for Y, needs to be a 2D view
alphaViewType – Input type for alpha, needs to be a 1D view
- Param alpha:
[in]: input coefficient for X, a rank 1 view
- Param X:
[in]: Input vector X, a rank 2 view
- Param Y:
[in/out]: Output vector Y, a rank 2 view
-
template<typename MemberType>
struct TeamAxpy Team Batched AXPY: y_l <- alpha_l * x_l + y_l for all l = 1, …, N where:
N is the number of vectors,
x_1, …, x_N are the N input vectors,
y_1, …, y_N are the N output vectors,
alpha_1, …, alpha_N are N scaling factors for x_1, …, x_N.
A nested parallel_for with TeamThreadRange is used.
- Template Parameters:
XViewType – Input type for X, needs to be a 2D view
YViewType – Input type for Y, needs to be a 2D view
alphaViewType – Input type for alpha, needs to be a 1D view
- Param member:
[in]: TeamPolicy member
- Param alpha:
[in]: input coefficient for X, a rank 1 view
- Param X:
[in]: Input vector X, a rank 2 view
- Param Y:
[in/out]: Output vector Y, a rank 2 view
-
template<typename MemberType>
struct TeamVectorAxpy TeamVector Batched AXPY: y_l <- alpha_l * x_l + y_l for all l = 1, …, N where:
N is the number of vectors,
x_1, …, x_N are the N input vectors,
y_1, …, y_N are the N output vectors,
alpha_1, …, alpha_N are N scaling factors for x_1, …, x_N.
Two nested parallel_for with both TeamThreadRange and ThreadVectorRange (or one with TeamVectorRange) are used inside.
- Template Parameters:
XViewType – Input type for X, needs to be a 2D view
YViewType – Input type for Y, needs to be a 2D view
alphaViewType – Input type for alpha, needs to be a 1D view
- Param member:
[in]: TeamPolicy member
- Param alpha:
[in]: input coefficient for X, a rank 1 view
- Param X:
[in]: Input vector X, a rank 2 view
- Param Y:
[in/out]: Output vector Y, a rank 2 view
gemv
-
template<typename MemberType, typename ArgTrans, typename ArgAlgo>
struct TeamVectorGemv TeamVector Gemv
-
template<typename MemberType, typename ArgTrans, typename ArgMode, typename ArgAlgo>
struct Gemv Selective Interface
dot
-
template<typename ArgTrans = Trans::NoTranspose>
struct SerialDot Serial Batched DOT:
Depending on the ArgTrans template, the dot product is row-based (ArgTrans == Trans::NoTranspose):
dot_l <- (x_l:, y_l:) for all l = 1, …, N where:
N is the second dimension of X.
Or column-based: dot_l <- (x_:l, y_:l) for all l = 1, …, n where:
n is the second dimension of X.
No nested parallel_for is used inside of the function.
- Template Parameters:
ArgTrans – type of dot product (Trans::NoTranspose by default)
XViewType – Input type for X, needs to be a 2D view
YViewType – Input type for Y, needs to be a 2D view
alphaViewType – Input type for alpha, needs to be a 1D view
- Param X:
[in]: Input vector X, a rank 2 view
- Param Y:
[in]: Input vector Y, a rank 2 view
- Param dot:
[out]: Computed dot product, a rank 1 view
-
template<typename MemberType, typename ArgTrans = Trans::NoTranspose>
struct TeamDot Team Batched DOT:
Depending on the ArgTrans template, the dot product is row-based (ArgTrans == Trans::NoTranspose):
dot_l <- (x_l:, y_l:) for all l = 1, …, N where:
N is the second dimension of X.
Or column-based: dot_l <- (x_:l, y_:l) for all l = 1, …, n where:
n is the second dimension of X.
A nested parallel_for with TeamThreadRange is used.
- Template Parameters:
ArgTrans – type of dot product (Trans::NoTranspose by default)
XViewType – Input type for X, needs to be a 2D view
YViewType – Input type for Y, needs to be a 2D view
alphaViewType – Input type for alpha, needs to be a 1D view
- Param X:
[in]: Input vector X, a rank 2 view
- Param Y:
[in]: Input vector Y, a rank 2 view
- Param dot:
[out]: Computed dot product, a rank 1 view
-
template<typename MemberType, typename ArgTrans = Trans::NoTranspose>
struct TeamVectorDot TeamVector Batched DOT:
Depending on the ArgTrans template, the dot product is row-based (ArgTrans == Trans::NoTranspose):
dot_l <- (x_l:, y_l:) for all l = 1, …, N where:
N is the second dimension of X.
Or column-based: dot_l <- (x_:l, y_:l) for all l = 1, …, n where:
n is the second dimension of X.
Two nested parallel_for with both TeamThreadRange and ThreadVectorRange (or one with TeamVectorRange) are used inside.
- Template Parameters:
ArgTrans – type of dot product (Trans::NoTranspose by default)
XViewType – Input type for X, needs to be a 2D view
YViewType – Input type for Y, needs to be a 2D view
alphaViewType – Input type for alpha, needs to be a 1D view
- Param X:
[in]: Input vector X, a rank 2 view
- Param Y:
[in]: Input vector Y, a rank 2 view
- Param dot:
[out]: Computed dot product, a rank 1 view
hadamardproduct
-
struct SerialHadamardProduct
Serial Batched Hadamard Product: v_ij <- x_ij * y_ij for all i = 1, …, n and j = 1, …, N where:
n is the number of rows,
N is the number of vectors.
No nested parallel_for is used inside of the function.
- Template Parameters:
XViewType – Input type for X, needs to be a 2D view
YViewType – Input type for Y, needs to be a 2D view
VViewType – Input type for V, needs to be a 2D view
- Param X:
[in]: Input vector X, a rank 2 view
- Param Y:
[in]: Input vector Y, a rank 2 view
- Param V:
[out]: Output vector V, a rank 2 view
-
template<typename MemberType>
struct TeamHadamardProduct Team Batched Hadamard Product: v_ij <- x_ij * y_ij for all i = 1, …, n and j = 1, …, N where:
n is the number of rows,
N is the number of vectors.
A nested parallel_for with TeamThreadRange is used.
- Template Parameters:
XViewType – Input type for X, needs to be a 2D view
YViewType – Input type for Y, needs to be a 2D view
VViewType – Input type for V, needs to be a 2D view
- Param member:
[in]: TeamPolicy member
- Param X:
[in]: Input vector X, a rank 2 view
- Param Y:
[in]: Input vector Y, a rank 2 view
- Param V:
[out]: Output vector V, a rank 2 view
-
template<typename MemberType>
struct TeamVectorHadamardProduct TeamVector Batched Hadamard Product: v_ij <- x_ij * y_ij for all i = 1, …, n and j = 1, …, N where:
n is the number of rows,
N is the number of vectors.
Two nested parallel_for with both TeamThreadRange and ThreadVectorRange (or one with TeamVectorRange) are used inside.
- Template Parameters:
XViewType – Input type for X, needs to be a 2D view
YViewType – Input type for Y, needs to be a 2D view
VViewType – Input type for V, needs to be a 2D view
- Param member:
[in]: TeamPolicy member
- Param X:
[in]: Input vector X, a rank 2 view
- Param Y:
[in]: Input vector Y, a rank 2 view
- Param V:
[out]: Output vector V, a rank 2 view
-
template<typename MemberType, typename ArgMode>
struct HadamardProduct
vector
CodeCleanup-TODO: Move Decl file to dense/impl/
trsv
-
template<typename ArgUplo, typename ArgTrans, typename ArgDiag, typename ArgAlgo>
struct SerialTrsv Serial Trsv
-
template<typename MemberType, typename ArgUplo, typename ArgTrans, typename ArgDiag, typename ArgAlgo>
struct TeamTrsv Team Trsv
-
template<typename MemberType, typename ArgUplo, typename ArgTrans, typename ArgDiag, typename ArgAlgo>
struct TeamVectorTrsv TeamVector Trsv
-
template<typename MemberType, typename ArgUplo, typename ArgTrans, typename ArgDiag, typename ArgMode, typename ArgAlgo>
struct Trsv Selective Interface
gemm
-
template<typename MemberType, typename ArgTransA, typename ArgTransB, typename ArgAlgo>
struct TeamGemm Team Gemm
-
template<typename MemberType, typename ArgTransA, typename ArgTransB, typename ArgAlgo>
struct TeamVectorGemm TeamVector Gemm
-
template<typename MemberType, typename ArgTransA, typename ArgTransB, typename ArgMode, typename ArgAlgo>
struct Gemm Selective Interface