Undocumented Intrinsics
The following intrinsics are undocumented: they are not supported by either compiler or assembler. You have to use .word
directive in assembly to use them.
__m256 __lasx_xvfscaleb_s (__m256 a, __m256i b)
Synopsis
__m256 __lasx_xvfscaleb_s (__m256 a, __m256i b)
#include <lasxintrin.h>
Instruction: xvfscaleb.s xr, xr, xr
CPU Flags: LASX
Description
Compute IEEE754 scaleB of single precision floating point elements in a
by integer elements in b
. Currently undocumented.
Operation
for (int i = 0; i < 8; i++) {
dst.fp32[i] = __builtin_scalbn(a.fp32[i], b.word[i]);
}
Tested on real machine.
Latency and Throughput
CPU | Latency | Throughput (IPC) |
---|---|---|
3C6000 | 4 | 2 |
__m256d __lasx_xvfscaleb_d (__m256d a, __m256i b)
Synopsis
__m256d __lasx_xvfscaleb_d (__m256d a, __m256i b)
#include <lasxintrin.h>
Instruction: xvfscaleb.d xr, xr, xr
CPU Flags: LASX
Description
Compute IEEE754 scaleB of double precision floating point elements in a
by integer elements in b
. Currently undocumented.
Operation
for (int i = 0; i < 4; i++) {
dst.fp64[i] = __builtin_scalbn(a.fp64[i], b.dword[i]);
}
Tested on real machine.
Latency and Throughput
CPU | Latency | Throughput (IPC) |
---|---|---|
3C6000 | 4 | 2 |
__m256i __lasx_xvmepatmsk_v (int mode, int uimm5)
Synopsis
__m256i __lasx_xvmepatmsk_v (int mode, int uimm5)
#include <lasxintrin.h>
Instruction: xvmepatmsk.v xr, mode, uimm5
CPU Flags: LASX
Description
Compute pattern according to mode
, then add uimm5
to each element.
Operation
if (mode == 0b00) {
for (int i = 0; i < 16; i++) {
dst.byte[i + 16] = dst.byte[i] =
uimm5 + (i % 4); // [0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3]
}
} else if (mode == 0b01) {
for (int i = 0; i < 16; i++) {
dst.byte[i + 16] = dst.byte[i] =
uimm5 + (i / 4) + (i % 4); // [0 1 2 3 1 2 3 4 2 3 4 5 3 4 5 6]
}
} else if (mode == 0b10) {
for (int i = 0; i < 16; i++) {
dst.byte[i + 16] = dst.byte[i] =
uimm5 + (i / 4) + (i % 4) + 4; // [4 5 6 7 5 6 7 8 6 7 8 9 7 8 9 10]
}
} else if (mode == 0b11) {
for (int i = 0; i < 16; i++) {
dst.byte[i + 16] = dst.byte[i] =
uimm5 + i; // [0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15]
}
} else {
// illegal instruction
}
Tested on real machine.
Latency and Throughput
CPU | Latency | Throughput (IPC) |
---|---|---|
3C6000 | N/A | 4 |