LoongArch SIMD Basics
Registers and Types
LoongArch SIMD has two vector extensions:
- LSX (LoongArch SIMD eXtension): 128-bit vectors in registers
v0-v31 - LASX (LoongArch Advanced SIMD eXtension): 256-bit vectors in registers
x0-x31
The C/C++ intrinsic types are:
| Width | Integer | Float | Double |
|---|---|---|---|
| 128-bit (LSX) | __m128i |
__m128 |
__m128d |
| 256-bit (LASX) | __m256i |
__m256 |
__m256d |
In hardware, the lower bits of FP/LSX/LASX registers are shared:
- FP register
f0, LSX registerv0, and LASX registerx0share the lowest 64 bits - LSX register
v0and LASX registerx0share the lowest 128 bits
Element Type Suffixes
Instruction mnemonics and intrinsics use consistent suffixes to indicate element width and signedness:
| Suffix | Meaning |
|---|---|
b |
8-bit byte (signed) |
bu |
8-bit byte (unsigned) |
h |
16-bit halfword (signed) |
hu |
16-bit halfword (unsigned) |
w |
32-bit word (signed) |
wu |
32-bit word (unsigned) |
d |
64-bit doubleword (signed) |
du |
64-bit doubleword (unsigned) |
q |
128-bit quadword (signed) |
qu |
128-bit quadword (unsigned) |
s |
single-precision float |
d |
double-precision float |
For example, vadd_b adds 16 signed bytes, while vadd_w adds 4 signed 32-bit integers.
Instruction Prefix Convention
- LSX instructions use the
vprefix (e.g.,vadd,vshuf) - LASX instructions use the
xvprefix (e.g.,xvadd,xvshuf)
This pattern holds across all categories: arithmetic, logical, memory, shuffle, etc.
Compiler Macros
GCC and Clang define feature macros for LoongArch SIMD when the corresponding target options are enabled:
__loongarch_sx: LSX is enabled, for example with-mlsx.__loongarch_asx: LASX is enabled, for example with-mlasx.__loongarch_simd_width: the enabled SIMD vector width in bits. It is128for LSX and256for LASX.
GCC also defines __loongarch_simd when some LoongArch SIMD extension is enabled. Clang does not define this macro, so portable code should prefer __loongarch_sx, __loongarch_asx or __loongarch_simd_width.
The __loongarch_asx_sx_conv macro is defined when the compiler provides the LASX 128-bit lane helper intrinsics such as __lasx_cast_128, __lasx_concat_128, __lasx_extract_128_lo and __lasx_insert_128_lo. This macro is available in GCC 16 and Clang 22 and later.