LBT Integer
addu12i.w
Synopsis
Instruction: addu12i.w rd, rj, imm
CPU Flags: LBT
Description
Add sign-extended 5-bit immediate left-shifted by 12 to unsigned 32-bit value in rj, sign-extend the 32-bit result to 64-bit and store in rd.
Operation
int64_t offset = (int64_t)sext(imm & 0x1f, 5) << 12;
dst = sext((uint32_t)a + (uint32_t)offset, 32);
Tested on real machine.
Latency and Throughput
| CPU | µarch | Latency | Throughput (IPC) |
|---|---|---|---|
| 3C5000 | LA464 | 1 | 4 |
| 3C6000 | LA664 | 1 | 4 |
addu12i.d
Synopsis
Instruction: addu12i.d rd, rj, imm
CPU Flags: LBT
Description
Add sign-extended 5-bit immediate left-shifted by 12 to 64-bit value in rj and store the result in rd.
Operation
int64_t offset = (int64_t)sext(imm & 0x1f, 5) << 12;
dst = a + (uint64_t)offset;
Tested on real machine.
Latency and Throughput
| CPU | µarch | Latency | Throughput (IPC) |
|---|---|---|---|
| 3C5000 | LA464 | 1 | 4 |
| 3C6000 | LA664 | 1 | 4 |
adc.b
Synopsis
Instruction: adc.b rd, rj, rk
CPU Flags: LBT
Description
Add 8-bit values in rj and rk with carry (CF in EFLAGS), sign-extend the result to 64-bit and store in rd.
Operation
uint8_t lhs = a;
uint8_t rhs = b;
uint8_t cf = EFLAGS.CF;
uint8_t r = lhs + rhs + cf;
dst = sext(r, 8);
Tested on real machine.
Latency and Throughput
| CPU | µarch | Latency | Throughput (IPC) |
|---|---|---|---|
| 3C5000 | LA464 | 2 | 1 |
| 3C6000 | LA664 | 2 | 1 |
adc.h
Synopsis
Instruction: adc.h rd, rj, rk
CPU Flags: LBT
Description
Add 16-bit values in rj and rk with carry (CF in EFLAGS), sign-extend the result to 64-bit and store in rd.
Operation
uint16_t lhs = a;
uint16_t rhs = b;
uint8_t cf = EFLAGS.CF;
uint16_t r = lhs + rhs + cf;
dst = sext(r, 16);
Tested on real machine.
Latency and Throughput
| CPU | µarch | Latency | Throughput (IPC) |
|---|---|---|---|
| 3C5000 | LA464 | 2 | 1 |
| 3C6000 | LA664 | 2 | 1 |
adc.w
Synopsis
Instruction: adc.w rd, rj, rk
CPU Flags: LBT
Description
Add 32-bit values in rj and rk with carry (CF in EFLAGS), sign-extend the result to 64-bit and store in rd.
Operation
uint32_t lhs = a;
uint32_t rhs = b;
uint8_t cf = EFLAGS.CF;
uint32_t r = lhs + rhs + cf;
dst = sext(r, 32);
Tested on real machine.
Latency and Throughput
| CPU | µarch | Latency | Throughput (IPC) |
|---|---|---|---|
| 3C5000 | LA464 | 2 | 1 |
| 3C6000 | LA664 | 2 | 1 |
adc.d
Synopsis
Instruction: adc.d rd, rj, rk
CPU Flags: LBT
Description
Add 64-bit values in rj and rk with carry (CF in EFLAGS), sign-extend the result to 64-bit and store in rd.
Operation
uint64_t lhs = a;
uint64_t rhs = b;
uint8_t cf = EFLAGS.CF;
uint64_t r = lhs + rhs + cf;
dst = r;
Tested on real machine.
Latency and Throughput
| CPU | µarch | Latency | Throughput (IPC) |
|---|---|---|---|
| 3C5000 | LA464 | 2 | 1 |
| 3C6000 | LA664 | 2 | 1 |
sbc.b
Synopsis
Instruction: sbc.b rd, rj, rk
CPU Flags: LBT
Description
Subtract 8-bit values in rj and rk with borrow (CF in EFLAGS), sign-extend the result to 64-bit and store in rd.
Operation
uint8_t lhs = a;
uint8_t rhs = b;
uint8_t cf = EFLAGS.CF;
uint8_t r = lhs - rhs - cf;
dst = sext(r, 8);
Tested on real machine.
Latency and Throughput
| CPU | µarch | Latency | Throughput (IPC) |
|---|---|---|---|
| 3C5000 | LA464 | 2 | 1 |
| 3C6000 | LA664 | 2 | 1 |
sbc.h
Synopsis
Instruction: sbc.h rd, rj, rk
CPU Flags: LBT
Description
Subtract 16-bit values in rj and rk with borrow (CF in EFLAGS), sign-extend the result to 64-bit and store in rd.
Operation
uint16_t lhs = a;
uint16_t rhs = b;
uint8_t cf = EFLAGS.CF;
uint16_t r = lhs - rhs - cf;
dst = sext(r, 16);
Tested on real machine.
Latency and Throughput
| CPU | µarch | Latency | Throughput (IPC) |
|---|---|---|---|
| 3C5000 | LA464 | 2 | 1 |
| 3C6000 | LA664 | 2 | 1 |
sbc.w
Synopsis
Instruction: sbc.w rd, rj, rk
CPU Flags: LBT
Description
Subtract 32-bit values in rj and rk with borrow (CF in EFLAGS), sign-extend the result to 64-bit and store in rd.
Operation
uint32_t lhs = a;
uint32_t rhs = b;
uint8_t cf = EFLAGS.CF;
uint32_t r = lhs - rhs - cf;
dst = sext(r, 32);
Tested on real machine.
Latency and Throughput
| CPU | µarch | Latency | Throughput (IPC) |
|---|---|---|---|
| 3C5000 | LA464 | 2 | 1 |
| 3C6000 | LA664 | 2 | 1 |
sbc.d
Synopsis
Instruction: sbc.d rd, rj, rk
CPU Flags: LBT
Description
Subtract 64-bit values in rj and rk with borrow (CF in EFLAGS), sign-extend the result to 64-bit and store in rd.
Operation
uint64_t lhs = a;
uint64_t rhs = b;
uint8_t cf = EFLAGS.CF;
uint64_t r = lhs - rhs - cf;
dst = r;
Tested on real machine.
Latency and Throughput
| CPU | µarch | Latency | Throughput (IPC) |
|---|---|---|---|
| 3C5000 | LA464 | 2 | 1 |
| 3C6000 | LA664 | 2 | 1 |
rotr.b
Synopsis
Instruction: rotr.b rd, rj, rk
CPU Flags: LBT
Description
Rotate 8-bit value in rj right by the amount specified in rk. The shift amount is masked modulo 8.
Operation
uint8_t v = (uint8_t)a;
unsigned c = (unsigned)b % 8;
uint8_t r = c == 0 ? v : (uint8_t)((v >> c) | (v << (8 - c)));
dst = sext(r, 8);
Tested on real machine.
Latency and Throughput
| CPU | µarch | Latency | Throughput (IPC) |
|---|---|---|---|
| 3C5000 | LA464 | 2 | 1 |
| 3C6000 | LA664 | 2 | 1 |
rotr.h
Synopsis
Instruction: rotr.h rd, rj, rk
CPU Flags: LBT
Description
Rotate 16-bit value in rj right by the amount specified in rk. The shift amount is masked modulo 16.
Operation
uint16_t v = (uint16_t)a;
unsigned c = (unsigned)b % 16;
uint16_t r = c == 0 ? v : (uint16_t)((v >> c) | (v << (16 - c)));
dst = sext(r, 16);
Tested on real machine.
Latency and Throughput
| CPU | µarch | Latency | Throughput (IPC) |
|---|---|---|---|
| 3C5000 | LA464 | 2 | 1 |
| 3C6000 | LA664 | 2 | 1 |
rotr.w
Synopsis
Instruction: rotr.w rd, rj, rk
CPU Flags: LBT
Description
Rotate 32-bit value in rj right by the amount specified in rk. The shift amount is masked modulo 32.
Operation
uint32_t v = (uint32_t)a;
unsigned c = (unsigned)b % 32;
uint32_t r = c == 0 ? v : (uint32_t)((v >> c) | (v << (32 - c)));
dst = sext(r, 32);
Tested on real machine.
Latency and Throughput
| CPU | µarch | Latency | Throughput (IPC) |
|---|---|---|---|
| 3C5000 | LA464 | 1 | 4 |
| 3C6000 | LA664 | 1 | 4 |
rotr.d
Synopsis
Instruction: rotr.d rd, rj, rk
CPU Flags: LBT
Description
Rotate 64-bit value in rj right by the amount specified in rk. The shift amount is masked modulo 64.
Operation
uint64_t v = (uint64_t)a;
unsigned c = (unsigned)b % 64;
uint64_t r = c == 0 ? v : (uint64_t)((v >> c) | (v << (64 - c)));
dst = r;
Tested on real machine.
Latency and Throughput
| CPU | µarch | Latency | Throughput (IPC) |
|---|---|---|---|
| 3C5000 | LA464 | 1 | 4 |
| 3C6000 | LA664 | 1 | 4 |
rotri.b
Synopsis
Instruction: rotri.b rd, rj, imm
CPU Flags: LBT
Description
Rotate 8-bit value in rj right by immediate imm. The shift amount is masked modulo 8.
Operation
uint8_t v = (uint8_t)a;
unsigned c = imm % 8;
uint8_t r = c == 0 ? v : (uint8_t)((v >> c) | (v << (8 - c)));
dst = sext(r, 8);
Tested on real machine.
Latency and Throughput
| CPU | µarch | Latency | Throughput (IPC) |
|---|---|---|---|
| 3C5000 | LA464 | 2 | 1 |
| 3C6000 | LA664 | 2 | 1 |
rotri.h
Synopsis
Instruction: rotri.h rd, rj, imm
CPU Flags: LBT
Description
Rotate 16-bit value in rj right by immediate imm. The shift amount is masked modulo 16.
Operation
uint16_t v = (uint16_t)a;
unsigned c = imm % 16;
uint16_t r = c == 0 ? v : (uint16_t)((v >> c) | (v << (16 - c)));
dst = sext(r, 16);
Tested on real machine.
Latency and Throughput
| CPU | µarch | Latency | Throughput (IPC) |
|---|---|---|---|
| 3C5000 | LA464 | 2 | 1 |
| 3C6000 | LA664 | 2 | 1 |
rotri.w
Synopsis
Instruction: rotri.w rd, rj, imm
CPU Flags: LBT
Description
Rotate 32-bit value in rj right by immediate imm. The shift amount is masked modulo 32.
Operation
uint32_t v = (uint32_t)a;
unsigned c = imm % 32;
uint32_t r = c == 0 ? v : (uint32_t)((v >> c) | (v << (32 - c)));
dst = sext(r, 32);
Tested on real machine.
Latency and Throughput
| CPU | µarch | Latency | Throughput (IPC) |
|---|---|---|---|
| 3C5000 | LA464 | 1 | 4 |
| 3C6000 | LA664 | 1 | 4 |
rotri.d
Synopsis
Instruction: rotri.d rd, rj, imm
CPU Flags: LBT
Description
Rotate 64-bit value in rj right by immediate imm. The shift amount is masked modulo 64.
Operation
uint64_t v = (uint64_t)a;
unsigned c = imm % 64;
uint64_t r = c == 0 ? v : (uint64_t)((v >> c) | (v << (64 - c)));
dst = r;
Tested on real machine.
Latency and Throughput
| CPU | µarch | Latency | Throughput (IPC) |
|---|---|---|---|
| 3C5000 | LA464 | 1 | 4 |
| 3C6000 | LA664 | 1 | 4 |
rcr.b
Synopsis
Instruction: rcr.b rd, rj, rk
CPU Flags: LBT
Description
Rotate 8-bit value in rj and CF (in EFLAGS) together as a 9-bit ring right by the amount specified in rk. The result is written to rd.
Operation
uint8_t v = a;
// Mask count to 5 bits (matching x86 CL masking), then modulo (bits+1)
// for the 9-bit [CF, v] ring rotation.
unsigned c = ((unsigned)b & 0x1F) % (8 + 1);
// Rotate [CF, v] right by c, write low 8 bits to rd
uint64_t ring_hi = EFLAGS.CF;
uint64_t ring_lo = v;
uint64_t r;
if (c == 0) {
r = v;
} else {
uint64_t tmp_hi = ring_hi, tmp_lo = ring_lo;
for (unsigned i = 0; i < c; i++) {
uint64_t new_lo = (tmp_lo >> 1) | (tmp_hi << (8 - 1));
uint64_t new_hi = tmp_lo & 1;
tmp_lo = new_lo;
tmp_hi = new_hi;
}
r = tmp_lo & ((1ULL << 8) - 1);
}
dst = sext(r, 8);
Tested on real machine.
Latency and Throughput
| CPU | µarch | Latency | Throughput (IPC) |
|---|---|---|---|
| 3C5000 | LA464 | 2 | 1 |
| 3C6000 | LA664 | 2 | 1 |
rcr.h
Synopsis
Instruction: rcr.h rd, rj, rk
CPU Flags: LBT
Description
Rotate 16-bit value in rj and CF (in EFLAGS) together as a 17-bit ring right by the amount specified in rk. The result is written to rd.
Operation
uint16_t v = (uint16_t)a;
unsigned c = ((unsigned)b & 0x1F) % (16 + 1);
// Rotate [CF, v] right by c, write low 16 bits to rd
uint64_t ring_hi = EFLAGS.CF;
uint64_t ring_lo = v;
uint64_t r;
if (c == 0) {
r = v;
} else {
uint64_t tmp_hi = ring_hi, tmp_lo = ring_lo;
for (unsigned i = 0; i < c; i++) {
uint64_t new_lo = (tmp_lo >> 1) | (tmp_hi << (16 - 1));
uint64_t new_hi = tmp_lo & 1;
tmp_lo = new_lo;
tmp_hi = new_hi;
}
r = tmp_lo & ((1ULL << 16) - 1);
}
dst = sext(r, 16);
Tested on real machine.
Latency and Throughput
| CPU | µarch | Latency | Throughput (IPC) |
|---|---|---|---|
| 3C5000 | LA464 | 2 | 1 |
| 3C6000 | LA664 | 2 | 1 |
rcr.w
Synopsis
Instruction: rcr.w rd, rj, rk
CPU Flags: LBT
Description
Rotate 32-bit value in rj and CF (in EFLAGS) together as a 33-bit ring right by the amount specified in rk. The result is written to rd.
Operation
uint32_t v = (uint32_t)a;
unsigned c = (unsigned)b & 0x1f;
// Rotate [CF, v] right by c, write low 32 bits to rd
uint64_t ring_hi = EFLAGS.CF;
uint64_t ring_lo = v;
uint64_t r;
if (c == 0) {
r = v;
} else {
uint64_t tmp_hi = ring_hi, tmp_lo = ring_lo;
for (unsigned i = 0; i < c; i++) {
uint64_t new_lo = (tmp_lo >> 1) | (tmp_hi << (32 - 1));
uint64_t new_hi = tmp_lo & 1;
tmp_lo = new_lo;
tmp_hi = new_hi;
}
r = tmp_lo & ((1ULL << 32) - 1);
}
dst = sext(r, 32);
Tested on real machine.
Latency and Throughput
| CPU | µarch | Latency | Throughput (IPC) |
|---|---|---|---|
| 3C5000 | LA464 | 2 | 1 |
| 3C6000 | LA664 | 2 | 1 |
rcr.d
Synopsis
Instruction: rcr.d rd, rj, rk
CPU Flags: LBT
Description
Rotate 64-bit value in rj and CF (in EFLAGS) together as a 65-bit ring right by the amount specified in rk. The result is written to rd.
Operation
uint64_t v = (uint64_t)a;
unsigned c = (unsigned)b & 0x3f;
// Rotate [CF, v] right by c, write low 64 bits to rd
uint64_t ring_hi = EFLAGS.CF;
uint64_t ring_lo = v;
uint64_t r;
if (c == 0) {
r = v;
} else {
uint64_t tmp_hi = ring_hi, tmp_lo = ring_lo;
for (unsigned i = 0; i < c; i++) {
uint64_t new_lo = (tmp_lo >> 1) | (tmp_hi << (64 - 1));
uint64_t new_hi = tmp_lo & 1;
tmp_lo = new_lo;
tmp_hi = new_hi;
}
r = tmp_lo;
}
dst = r;
Tested on real machine.
Latency and Throughput
| CPU | µarch | Latency | Throughput (IPC) |
|---|---|---|---|
| 3C5000 | LA464 | 2 | 1 |
| 3C6000 | LA664 | 2 | 1 |
rcri.b
Synopsis
Instruction: rcri.b rd, rj, imm
CPU Flags: LBT
Description
Rotate 8-bit value in rj and CF (in EFLAGS) together as a 9-bit ring right by immediate imm. The result is written to rd.
Operation
uint8_t v = (uint8_t)a;
unsigned c = (unsigned)imm % (8 + 1);
// Rotate [CF, v] right by c, write low 8 bits to rd
uint64_t ring_hi = EFLAGS.CF;
uint64_t ring_lo = v;
uint64_t r;
if (c == 0) {
r = v;
} else {
uint64_t tmp_hi = ring_hi, tmp_lo = ring_lo;
for (unsigned i = 0; i < c; i++) {
uint64_t new_lo = (tmp_lo >> 1) | (tmp_hi << (8 - 1));
uint64_t new_hi = tmp_lo & 1;
tmp_lo = new_lo;
tmp_hi = new_hi;
}
r = tmp_lo & ((1ULL << 8) - 1);
}
dst = sext(r, 8);
Tested on real machine.
Latency and Throughput
| CPU | µarch | Latency | Throughput (IPC) |
|---|---|---|---|
| 3C5000 | LA464 | 2 | 1 |
| 3C6000 | LA664 | 2 | 1 |
rcri.h
Synopsis
Instruction: rcri.h rd, rj, imm
CPU Flags: LBT
Description
Rotate 16-bit value in rj and CF (in EFLAGS) together as a 17-bit ring right by immediate imm. The result is written to rd.
Operation
uint16_t v = (uint16_t)a;
unsigned c = (unsigned)imm % (16 + 1);
// Rotate [CF, v] right by c, write low 16 bits to rd
uint64_t ring_hi = EFLAGS.CF;
uint64_t ring_lo = v;
uint64_t r;
if (c == 0) {
r = v;
} else {
uint64_t tmp_hi = ring_hi, tmp_lo = ring_lo;
for (unsigned i = 0; i < c; i++) {
uint64_t new_lo = (tmp_lo >> 1) | (tmp_hi << (16 - 1));
uint64_t new_hi = tmp_lo & 1;
tmp_lo = new_lo;
tmp_hi = new_hi;
}
r = tmp_lo & ((1ULL << 16) - 1);
}
dst = sext(r, 16);
Tested on real machine.
Latency and Throughput
| CPU | µarch | Latency | Throughput (IPC) |
|---|---|---|---|
| 3C5000 | LA464 | 2 | 1 |
| 3C6000 | LA664 | 2 | 1 |
rcri.w
Synopsis
Instruction: rcri.w rd, rj, imm
CPU Flags: LBT
Description
Rotate 32-bit value in rj and CF (in EFLAGS) together as a 33-bit ring right by immediate imm. The result is written to rd.
Operation
uint32_t v = (uint32_t)a;
unsigned c = (unsigned)imm % (32 + 1);
// Rotate [CF, v] right by c, write low 32 bits to rd
uint64_t ring_hi = EFLAGS.CF;
uint64_t ring_lo = v;
uint64_t r;
if (c == 0) {
r = v;
} else {
uint64_t tmp_hi = ring_hi, tmp_lo = ring_lo;
for (unsigned i = 0; i < c; i++) {
uint64_t new_lo = (tmp_lo >> 1) | (tmp_hi << (32 - 1));
uint64_t new_hi = tmp_lo & 1;
tmp_lo = new_lo;
tmp_hi = new_hi;
}
r = tmp_lo & ((1ULL << 32) - 1);
}
dst = sext(r, 32);
Tested on real machine.
Latency and Throughput
| CPU | µarch | Latency | Throughput (IPC) |
|---|---|---|---|
| 3C5000 | LA464 | 2 | 1 |
| 3C6000 | LA664 | 2 | 1 |
rcri.d
Synopsis
Instruction: rcri.d rd, rj, imm
CPU Flags: LBT
Description
Rotate 64-bit value in rj and CF (in EFLAGS) together as a 65-bit ring right by immediate imm. The result is written to rd.
Operation
uint64_t v = (uint64_t)a;
unsigned c = (unsigned)imm % (64 + 1);
// Rotate [CF, v] right by c, write low 64 bits to rd
uint64_t ring_hi = EFLAGS.CF;
uint64_t ring_lo = v;
uint64_t r;
if (c == 0) {
r = v;
} else {
uint64_t tmp_hi = ring_hi, tmp_lo = ring_lo;
for (unsigned i = 0; i < c; i++) {
uint64_t new_lo = (tmp_lo >> 1) | (tmp_hi << (64 - 1));
uint64_t new_hi = tmp_lo & 1;
tmp_lo = new_lo;
tmp_hi = new_hi;
}
r = tmp_lo;
}
dst = r;
Tested on real machine.
Latency and Throughput
| CPU | µarch | Latency | Throughput (IPC) |
|---|---|---|---|
| 3C5000 | LA464 | 2 | 1 |
| 3C6000 | LA664 | 2 | 1 |