BATCHED – KokkosKernels batched functor-level interfaces

innerlu

CodeCleanup-TODO: Move Decl file to dense/impl/KokkosBatched_InnerLU_Internal.hpp

applypivot

template<typename MemberType, typename ArgSide, typename ArgDirect> struct TeamVectorApplyPivot: TeamVector

qr_withcolumnpivoting

template<typename MemberType, typename ArgAlgo> struct TeamVectorQR_WithColumnPivoting: TeamVector QR

addradial

struct SerialAddRadial: This add tiny values on diagonals so the absolute values of diagonals become larger Serial AddRadial

template<typename MemberType> struct TeamAddRadial: Team Set

householder

template<typename ArgSide> struct SerialHouseholder: Serial Householder

template<typename MemberType, typename ArgSide> struct TeamVectorHouseholder: TeamVector Householder

set

struct SerialSet: Serial Set

template<typename MemberType> struct TeamSet: Team Set

template<typename MemberType> struct TeamVectorSet: TeamVector Set

scale

struct SerialScale: Serial Scale

template<typename MemberType> struct TeamScale: Team Scale

template<typename MemberType> struct TeamVectorScale: TeamVector Scale

setidentity

struct SerialSetIdentity: Serial SetIdentity

template<typename MemberType> struct TeamSetIdentity: Team Set

template<typename MemberType, typename ArgMode> struct SetIdentity: Selective Interface

applyhouseholder

template<typename ArgSide> struct SerialApplyHouseholder: Serial Householder

template<typename MemberType, typename ArgSide> struct TeamVectorApplyHouseholder

innermultipledotproduct

CodeCleanup-TODO: Move Decl file to dense/impl/KokkosBatched_InnerMultipleDotProduct_Internal.hpp

lu

template<typename ArgAlgo> struct SerialLU

template<typename MemberType, typename ArgAlgo> struct TeamLU

template<typename MemberType, typename ArgMode, typename ArgAlgo> struct LU: Selective Interface

solveutv

template<typename MemberType, typename ArgAlgo> struct TeamVectorSolveUTV

For given UTV = A P^T, it solves A X = B

input:
- matrix_rank is computed while UTV factorization
- U is m x m real matrix (m x matrix_rank is only used)
- T is m x m real matrix (matrix_rank x matrix_rank is only used)
- V is m x m real matrix (matrix_Rank x m is only used)
- p is m integer vector including pivot indicies
- X is a solution matrix (or vector)
- B is a right hand side matrix (or vector)
- w is B.span() real vector workspace (contiguous)
output:
- B is overwritten with its solutions

When A is a full rank i.e., matrix_rank == m, UTV computes QR with column pivoting only where Q is stored in U and R is stored in T TeamVector Solve UTV

utv

template<typename MemberType, typename ArgAlgo> struct TeamVectorUTV

For given A, it performs UTV factorization i.e., UTV = A P^T

input:
- A is m x m real matrix
- p is m integer vector
- U is m x m real matrix
- V is m x m real matrix
- w is 3*m real vector workspace (contiguous)
output:
- A is overwritten as lower triangular matrix_rank x matrix_rank real matrix
- P^T includes pivot indicies (note that this is different from permutation indicies)
- U is left orthogonal matrix m x matrix_rank
- V is right orthogonal matrix matrix_rank x m

When A is a full rank i.e., matrix_rank == m, this only compute a QR with column pivoting

output:
- A is overwritten as upper triangular matrix
- P^T includes pivot indicies (note that this is different from permutation indicies)
- U is an orthogonal matrix m x m
- V is not touched

For the solution of a rank-deficient problem, it is recommended to use SolveUTV. TeamVector UTV

inverselu

CodeCleanup-TODO: Move Decl file to dense/impl/KokkosBatched_InverseLU_Internal.hpp

eigendecomposition

struct SerialEigendecomposition

Given a general nonsymmetric matrix A (m x m), it performs eigendecomposition of the matrix.

Parameters: [in] member Team interface only has this argument. Partial specialization can be applied for a different type of team member. [in/out]A Real general nonsymmetric rank 2 view A(m x m). A is first condensed to a upper Hessenberg form. Then, the Francis double shift QR algorithm is applied to compute its Schur form. On exit, A stores a quasi upper triangular matrix of the Schur decomposition. [out]er, [out]ei A real and imaginary eigenvalues, which forms er(m)+ei(m)i For a complex eigen pair, it stores a+bi and a-bi consecutively. [out]UL, [out]UR Left/right eigenvectors are stored in (m x m) matrices. If zero span view is provided, it does not compute the corresponding eigenvectors. However, both UL and UR cannot have zero span. If eigenvalues are only requested, use the Eigenvalue interface which simplifies computations [out]W 1D contiguous workspace. The minimum size is (2*m*m+5*m) where m is the dimension of matrix A.

template<typename MemberType> struct TeamVectorEigendecomposition

trtri

template<typename ArgUplo, typename ArgDiag, typename ArgAlgo> struct SerialTrtri

qr

template<typename ArgAlgo> struct SerialQR: Serial QR

template<typename MemberType, typename ArgAlgo> struct TeamQR: Team QR

template<typename MemberType, typename ArgAlgo> struct TeamVectorQR: TeamVector QR

template<typename MemberType, typename ArgMode, typename ArgAlgo> struct QR: Selective Interface

trmm

template<typename ArgSide, typename ArgUplo, typename ArgTrans, typename ArgDiag, typename ArgAlgo> struct SerialTrmm

trsm

template<typename ArgSide, typename ArgUplo, typename ArgTrans, typename ArgDiag, typename ArgAlgo> struct SerialTrsm

template<typename MemberType, typename ArgSide, typename ArgUplo, typename ArgTrans, typename ArgDiag, typename ArgAlgo> struct TeamTrsm

template<typename MemberType, typename ArgSide, typename ArgUplo, typename ArgTrans, typename ArgDiag, typename ArgAlgo> struct TeamVectorTrsm

template<typename MemberType, typename ArgSide, typename ArgUplo, typename ArgTrans, typename ArgDiag, typename ArgMode, typename ArgAlgo> struct Trsm: Selective Interface

innergemmfixa

CodeCleanup-TODO: Move Decl file to dense/impl/KokkosBatched_InnerGemmFixA_Internal.hpp

innergemmfixb

CodeCleanup-TODO: Move Decl file to dense/impl/KokkosBatched_InnerGemmFixB_Internal.hpp

innergemmfixc

CodeCleanup-TODO: Move Decl file to dense/impl/KokkosBatched_InnerGemmFixC_Internal.hpp

applyq

template<typename ArgSide, typename ArgTrans, typename ArgAlgo> struct SerialApplyQ: Serial ApplyQ

template<typename MemberType, typename ArgSide, typename ArgTrans, typename ArgAlgo> struct TeamApplyQ: Team ApplyQ

template<typename MemberType, typename ArgSide, typename ArgTrans, typename ArgAlgo> struct TeamVectorApplyQ: TeamVector ApplyQ

template<typename MemberType, typename ArgSide, typename ArgTrans, typename ArgMode, typename ArgAlgo> struct ApplyQ: Selective Interface

copy

template<typename ArgTrans = Trans::NoTranspose, int rank = 2> struct SerialCopy: Serial Copy

template<typename MemberType, typename ArgTrans = Trans::NoTranspose, int rank = 2> struct TeamCopy: Team Copy

template<typename MemberType, typename ArgTrans = Trans::NoTranspose, int rank = 2> struct TeamVectorCopy: TeamVector Copy

template<typename MemberType, typename ArgTrans, typename ArgMode, int rank = 2> struct Copy: Selective Interface

innertrsm

CodeCleanup-TODO: Move Decl file to dense/impl/KokkosBatched_InnerTrsm_Internal.hpp

solvelu

template<typename ArgTrans, typename ArgAlgo> struct SerialSolveLU

template<typename MemberType, typename ArgTrans, typename ArgAlgo> struct TeamSolveLU

template<typename MemberType, typename ArgTrans, typename ArgMode, typename ArgAlgo> struct SolveLU: Selective Interface

xpay

struct SerialXpay

Serial Batched XPAY: y_l <- x_l + alpha_l * y_l for all l = 1, …, N where:

N is the number of vectors,
x_1, …, x_N are the N input vectors,
y_1, …, y_N are the N output vectors,
alpha_1, …, alpha_N are N scaling factors for y_1, …, y_N.

No nested parallel_for is used inside of the function.

Template Parameters:

XViewType – Input type for X, needs to be a 2D view
YViewType – Input type for Y, needs to be a 2D view
alphaViewType – Input type for alpha, needs to be a 1D view

Param alpha:

[in]: input coefficient for Y, a rank 1 view

Param X:

[in]: Input vector X, a rank 2 view

Param Y:

[in/out]: Output vector Y, a rank 2 view

template<typename MemberType> struct TeamXpay

Team Batched XPAY: y_l <- x_l + alpha_l * y_l for all l = 1, …, N where:

N is the number of vectors,
x_1, …, x_N are the N input vectors,
y_1, …, y_N are the N output vectors,
alpha_1, …, alpha_N are N scaling factors for y_1, …, y_N.

A nested parallel_for with TeamThreadRange is used.

Template Parameters:

XViewType – Input type for X, needs to be a 2D view
YViewType – Input type for Y, needs to be a 2D view
alphaViewType – Input type for alpha, needs to be a 1D view

Param member:

[in]: TeamPolicy member

Param alpha:

[in]: input coefficient for Y, a rank 1 view

Param X:

[in]: Input vector X, a rank 2 view

Param Y:

[in/out]: Output vector Y, a rank 2 view

template<typename MemberType> struct TeamVectorXpay

TeamVector Batched XPAY: y_l <- x_l + alpha_l * y_l for all l = 1, …, N where:

N is the number of vectors,
x_1, …, x_N are the N input vectors,
y_1, …, y_N are the N output vectors,
alpha_1, …, alpha_N are N scaling factors for y_1, …, y_N.

Two nested parallel_for with both TeamThreadRange and ThreadVectorRange (or one with TeamVectorRange) are used inside.

Template Parameters:

XViewType – Input type for X, needs to be a 2D view
YViewType – Input type for Y, needs to be a 2D view
alphaViewType – Input type for alpha, needs to be a 1D view

Param member:

[in]: TeamPolicy member

Param alpha:

[in]: input coefficient for Y, a rank 1 view

Param X:

[in]: Input vector X, a rank 2 view

Param Y:

[in/out]: Output vector Y, a rank 2 view

axpy

struct SerialAxpy

Serial Batched AXPY: y_l <- alpha_l * x_l + y_l for all l = 1, …, N where:

N is the number of vectors,
x_1, …, x_N are the N input vectors,
y_1, …, y_N are the N output vectors,
alpha_1, …, alpha_N are N scaling factors for x_1, …, x_N.

No nested parallel_for is used inside of the function.

Template Parameters:

XViewType – Input type for X, needs to be a 2D view
YViewType – Input type for Y, needs to be a 2D view
alphaViewType – Input type for alpha, needs to be a 1D view

Param alpha:

[in]: input coefficient for X, a rank 1 view

Param X:

[in]: Input vector X, a rank 2 view

Param Y:

[in/out]: Output vector Y, a rank 2 view

template<typename MemberType> struct TeamAxpy

Team Batched AXPY: y_l <- alpha_l * x_l + y_l for all l = 1, …, N where:

N is the number of vectors,
x_1, …, x_N are the N input vectors,
y_1, …, y_N are the N output vectors,
alpha_1, …, alpha_N are N scaling factors for x_1, …, x_N.

A nested parallel_for with TeamThreadRange is used.

Template Parameters:

XViewType – Input type for X, needs to be a 2D view
YViewType – Input type for Y, needs to be a 2D view
alphaViewType – Input type for alpha, needs to be a 1D view

Param member:

[in]: TeamPolicy member

Param alpha:

[in]: input coefficient for X, a rank 1 view

Param X:

[in]: Input vector X, a rank 2 view

Param Y:

[in/out]: Output vector Y, a rank 2 view

template<typename MemberType> struct TeamVectorAxpy

TeamVector Batched AXPY: y_l <- alpha_l * x_l + y_l for all l = 1, …, N where:

N is the number of vectors,
x_1, …, x_N are the N input vectors,
y_1, …, y_N are the N output vectors,
alpha_1, …, alpha_N are N scaling factors for x_1, …, x_N.

Two nested parallel_for with both TeamThreadRange and ThreadVectorRange (or one with TeamVectorRange) are used inside.

Template Parameters:

XViewType – Input type for X, needs to be a 2D view
YViewType – Input type for Y, needs to be a 2D view
alphaViewType – Input type for alpha, needs to be a 1D view

Param member:

[in]: TeamPolicy member

Param alpha:

[in]: input coefficient for X, a rank 1 view

Param X:

[in]: Input vector X, a rank 2 view

Param Y:

[in/out]: Output vector Y, a rank 2 view

gemv

template<typename ArgTrans, typename ArgAlgo> struct SerialGemv: Serial Gemv

template<typename MemberType, typename ArgTrans, typename ArgAlgo> struct TeamGemv: Team Gemv

template<typename MemberType, typename ArgTrans, typename ArgAlgo> struct TeamVectorGemv: TeamVector Gemv

template<typename MemberType, typename ArgTrans, typename ArgMode, typename ArgAlgo> struct Gemv: Selective Interface

dot

template<typename ArgTrans = Trans::NoTranspose> struct SerialDot

Serial Batched DOT:

Depending on the ArgTrans template, the dot product is row-based (ArgTrans == Trans::NoTranspose):

dot_l <- (x_l:, y_l:) for all l = 1, …, N where:

N is the second dimension of X.

Or column-based: dot_l <- (x_:l, y_:l) for all l = 1, …, n where:

n is the second dimension of X.

No nested parallel_for is used inside of the function.

Template Parameters:

ArgTrans – type of dot product (Trans::NoTranspose by default)
XViewType – Input type for X, needs to be a 2D view
YViewType – Input type for Y, needs to be a 2D view
alphaViewType – Input type for alpha, needs to be a 1D view

Param X:

[in]: Input vector X, a rank 2 view

Param Y:

[in]: Input vector Y, a rank 2 view

Param dot:

[out]: Computed dot product, a rank 1 view

template<typename MemberType, typename ArgTrans = Trans::NoTranspose> struct TeamDot

Team Batched DOT:

Depending on the ArgTrans template, the dot product is row-based (ArgTrans == Trans::NoTranspose):

dot_l <- (x_l:, y_l:) for all l = 1, …, N where:

N is the second dimension of X.

Or column-based: dot_l <- (x_:l, y_:l) for all l = 1, …, n where:

n is the second dimension of X.

A nested parallel_for with TeamThreadRange is used.

Template Parameters:

ArgTrans – type of dot product (Trans::NoTranspose by default)
XViewType – Input type for X, needs to be a 2D view
YViewType – Input type for Y, needs to be a 2D view
alphaViewType – Input type for alpha, needs to be a 1D view

Param X:

[in]: Input vector X, a rank 2 view

Param Y:

[in]: Input vector Y, a rank 2 view

Param dot:

[out]: Computed dot product, a rank 1 view

template<typename MemberType, typename ArgTrans = Trans::NoTranspose> struct TeamVectorDot

TeamVector Batched DOT:

Depending on the ArgTrans template, the dot product is row-based (ArgTrans == Trans::NoTranspose):

dot_l <- (x_l:, y_l:) for all l = 1, …, N where:

N is the second dimension of X.

Or column-based: dot_l <- (x_:l, y_:l) for all l = 1, …, n where:

n is the second dimension of X.

Two nested parallel_for with both TeamThreadRange and ThreadVectorRange (or one with TeamVectorRange) are used inside.

Template Parameters:

ArgTrans – type of dot product (Trans::NoTranspose by default)
XViewType – Input type for X, needs to be a 2D view
YViewType – Input type for Y, needs to be a 2D view
alphaViewType – Input type for alpha, needs to be a 1D view

Param X:

[in]: Input vector X, a rank 2 view

Param Y:

[in]: Input vector Y, a rank 2 view

Param dot:

[out]: Computed dot product, a rank 1 view

hadamardproduct

struct SerialHadamardProduct

Serial Batched Hadamard Product: v_ij <- x_ij * y_ij for all i = 1, …, n and j = 1, …, N where:

n is the number of rows,
N is the number of vectors.

No nested parallel_for is used inside of the function.

Template Parameters:

XViewType – Input type for X, needs to be a 2D view
YViewType – Input type for Y, needs to be a 2D view
VViewType – Input type for V, needs to be a 2D view

Param X:

[in]: Input vector X, a rank 2 view

Param Y:

[in]: Input vector Y, a rank 2 view

Param V:

[out]: Output vector V, a rank 2 view

template<typename MemberType> struct TeamHadamardProduct

Team Batched Hadamard Product: v_ij <- x_ij * y_ij for all i = 1, …, n and j = 1, …, N where:

n is the number of rows,
N is the number of vectors.

A nested parallel_for with TeamThreadRange is used.

Template Parameters:

XViewType – Input type for X, needs to be a 2D view
YViewType – Input type for Y, needs to be a 2D view
VViewType – Input type for V, needs to be a 2D view

Param member:

[in]: TeamPolicy member

Param X:

[in]: Input vector X, a rank 2 view

Param Y:

[in]: Input vector Y, a rank 2 view

Param V:

[out]: Output vector V, a rank 2 view

template<typename MemberType> struct TeamVectorHadamardProduct

TeamVector Batched Hadamard Product: v_ij <- x_ij * y_ij for all i = 1, …, n and j = 1, …, N where:

n is the number of rows,
N is the number of vectors.

Two nested parallel_for with both TeamThreadRange and ThreadVectorRange (or one with TeamVectorRange) are used inside.

Template Parameters:

XViewType – Input type for X, needs to be a 2D view
YViewType – Input type for Y, needs to be a 2D view
VViewType – Input type for V, needs to be a 2D view

Param member:

[in]: TeamPolicy member

Param X:

[in]: Input vector X, a rank 2 view

Param Y:

[in]: Input vector Y, a rank 2 view

Param V:

[out]: Output vector V, a rank 2 view

template<typename MemberType, typename ArgMode> struct HadamardProduct

vector

CodeCleanup-TODO: Move Decl file to dense/impl/

trsv

template<typename ArgUplo, typename ArgTrans, typename ArgDiag, typename ArgAlgo> struct SerialTrsv: Serial Trsv

template<typename MemberType, typename ArgUplo, typename ArgTrans, typename ArgDiag, typename ArgAlgo> struct TeamTrsv: Team Trsv

template<typename MemberType, typename ArgUplo, typename ArgTrans, typename ArgDiag, typename ArgAlgo> struct TeamVectorTrsv: TeamVector Trsv

template<typename MemberType, typename ArgUplo, typename ArgTrans, typename ArgDiag, typename ArgMode, typename ArgAlgo> struct Trsv: Selective Interface

gemm

template<typename ArgTransA, typename ArgTransB, typename ArgAlgo> struct SerialGemm: Serial Gemm

template<typename MemberType, typename ArgTransA, typename ArgTransB, typename ArgAlgo> struct TeamGemm: Team Gemm

template<typename MemberType, typename ArgTransA, typename ArgTransB, typename ArgAlgo> struct TeamVectorGemm: TeamVector Gemm

template<typename MemberType, typename ArgTransA, typename ArgTransB, typename ArgMode, typename ArgAlgo> struct Gemm: Selective Interface

BATCHED – KokkosKernels batched functor-level interfaces

innerlu

applypivot

TeamVector

qr_withcolumnpivoting

addradial

householder

set

scale

setidentity

applyhouseholder

innermultipledotproduct

lu

solveutv

utv

inverselu

svd

eigendecomposition

trtri

qr

trmm

trsm

innergemmfixa

innergemmfixb

innergemmfixc

applyq

copy

innertrsm

solvelu

xpay

axpy

gemv

dot

hadamardproduct

vector

trsv

gemm