Branch

int __lasx_xbz_v (__m256i a)

Synopsis

int __lasx_xbz_v (__m256i a)
#include <lasxintrin.h>
Instruction: xvseteqz.v fcc, xr; bcnez
CPU Flags: LASX

Description

Expected to be used in branches: branch if the whole vector a equals to zero.

Operation

dst = a.qword[0] == 0 && a.qword[1] == 0;

Tested on real machine.

Latency and Throughput

CPU Latency Throughput (CPI)
3A6000 N/A 2
3C5000 N/A 2

int __lasx_xbnz_v (__m256i a)

Synopsis

int __lasx_xbnz_v (__m256i a)
#include <lasxintrin.h>
Instruction: xvsetnez.v fcc, xr; bcnez
CPU Flags: LASX

Description

Expected to be used in branches: branch if the whole vector a is non-zero.

Operation

dst = a.qword[0] != 0 || a.qword[1] != 0;

Tested on real machine.

Latency and Throughput

CPU Latency Throughput (CPI)
3A6000 N/A 2
3C5000 N/A 2

int __lasx_xbz_b (__m256i a)

Synopsis

int __lasx_xbz_b (__m256i a)
#include <lasxintrin.h>
Instruction: xvsetanyeqz.b fcc, xr; bcnez
CPU Flags: LASX

Description

Expected to be used in branches: branch if any 8-bit element in a equals to zero.

Operation

dst = 0;
for (int i = 0; i < 32; i++) {
  if (a.byte[i] == 0) {
    dst = 1;
  }
}

Tested on real machine.

Latency and Throughput

CPU Latency Throughput (CPI)
3A6000 N/A 2
3C5000 N/A 2

int __lasx_xbz_h (__m256i a)

Synopsis

int __lasx_xbz_h (__m256i a)
#include <lasxintrin.h>
Instruction: xvsetanyeqz.h fcc, xr; bcnez
CPU Flags: LASX

Description

Expected to be used in branches: branch if any 16-bit element in a equals to zero.

Operation

dst = 0;
for (int i = 0; i < 16; i++) {
  if (a.half[i] == 0) {
    dst = 1;
  }
}

Tested on real machine.

Latency and Throughput

CPU Latency Throughput (CPI)
3A6000 N/A 2
3C5000 N/A 2

int __lasx_xbz_w (__m256i a)

Synopsis

int __lasx_xbz_w (__m256i a)
#include <lasxintrin.h>
Instruction: xvsetanyeqz.w fcc, xr; bcnez
CPU Flags: LASX

Description

Expected to be used in branches: branch if any 32-bit element in a equals to zero.

Operation

dst = 0;
for (int i = 0; i < 8; i++) {
  if (a.word[i] == 0) {
    dst = 1;
  }
}

Tested on real machine.

Latency and Throughput

CPU Latency Throughput (CPI)
3A6000 N/A 2
3C5000 N/A 2

int __lasx_xbz_d (__m256i a)

Synopsis

int __lasx_xbz_d (__m256i a)
#include <lasxintrin.h>
Instruction: xvsetanyeqz.d fcc, xr; bcnez
CPU Flags: LASX

Description

Expected to be used in branches: branch if any 64-bit element in a equals to zero.

Operation

dst = 0;
for (int i = 0; i < 4; i++) {
  if (a.dword[i] == 0) {
    dst = 1;
  }
}

Tested on real machine.

Latency and Throughput

CPU Latency Throughput (CPI)
3A6000 N/A 2
3C5000 N/A 2

int __lasx_xbnz_b (__m256i a)

Synopsis

int __lasx_xbnz_b (__m256i a)
#include <lasxintrin.h>
Instruction: xvsetallnez.b fcc, xr; bcnez
CPU Flags: LASX

Description

Expected to be used in branches: branch if all 8-bit elements in a are non-zero.

Operation

dst = 1;
for (int i = 0; i < 32; i++) {
  if (a.byte[i] == 0) {
    dst = 0;
  }
}

Tested on real machine.

Latency and Throughput

CPU Latency Throughput (CPI)
3A6000 N/A 2
3C5000 N/A 2

int __lasx_xbnz_h (__m256i a)

Synopsis

int __lasx_xbnz_h (__m256i a)
#include <lasxintrin.h>
Instruction: xvsetallnez.h fcc, xr; bcnez
CPU Flags: LASX

Description

Expected to be used in branches: branch if all 16-bit elements in a are non-zero.

Operation

dst = 1;
for (int i = 0; i < 16; i++) {
  if (a.half[i] == 0) {
    dst = 0;
  }
}

Tested on real machine.

Latency and Throughput

CPU Latency Throughput (CPI)
3A6000 N/A 2
3C5000 N/A 2

int __lasx_xbnz_w (__m256i a)

Synopsis

int __lasx_xbnz_w (__m256i a)
#include <lasxintrin.h>
Instruction: xvsetallnez.w fcc, xr; bcnez
CPU Flags: LASX

Description

Expected to be used in branches: branch if all 32-bit elements in a are non-zero.

Operation

dst = 1;
for (int i = 0; i < 8; i++) {
  if (a.word[i] == 0) {
    dst = 0;
  }
}

Tested on real machine.

Latency and Throughput

CPU Latency Throughput (CPI)
3A6000 N/A 2
3C5000 N/A 2

int __lasx_xbnz_d (__m256i a)

Synopsis

int __lasx_xbnz_d (__m256i a)
#include <lasxintrin.h>
Instruction: xvsetallnez.d fcc, xr; bcnez
CPU Flags: LASX

Description

Expected to be used in branches: branch if all 64-bit elements in a are non-zero.

Operation

dst = 1;
for (int i = 0; i < 4; i++) {
  if (a.dword[i] == 0) {
    dst = 0;
  }
}

Tested on real machine.

Latency and Throughput

CPU Latency Throughput (CPI)
3A6000 N/A 2
3C5000 N/A 2