x86 Integer

x86adc.b

Synopsis

Instruction: x86adc.b rj, rk
CPU Flags: LBT

Description

x86-style add with carry: add 8-bit values in rj and rk with CF (in EFLAGS). Update EFLAGS (CF, AF, OF, PF, ZF, SF) according to the result. Only EFLAGS (LBT4) is updated; the GPR is not modified.

Operation

uint8_t lhs = (uint8_t)a;
uint8_t rhs = (uint8_t)b;
uint8_t carry_in = EFLAGS.CF;
uint8_t result = lhs + rhs + carry_in;
EFLAGS.CF = (uint64_t)lhs + rhs + carry_in > UINT8_MAX;
EFLAGS.AF = ((lhs ^ rhs ^ result) & 0x10) != 0;
EFLAGS.OF = ((~(lhs ^ rhs)) & (lhs ^ result) & 0x80) != 0;
EFLAGS.PF = parity_even((uint8_t)result);
EFLAGS.ZF = result == 0;
EFLAGS.SF = (int8_t)result < 0;

Tested on real machine.

Latency and Throughput

CPU µarch Latency Throughput (IPC)
3C5000 LA464 2 0.5(1/2)
3C6000 LA664 2 0.5(1/2)

x86adc.h

Synopsis

Instruction: x86adc.h rj, rk
CPU Flags: LBT

Description

x86-style add with carry: add 16-bit values in rj and rk with CF (in EFLAGS). Update EFLAGS (CF, AF, OF, PF, ZF, SF) according to the result. Only EFLAGS (LBT4) is updated; the GPR is not modified.

Operation

uint16_t lhs = (uint16_t)a;
uint16_t rhs = (uint16_t)b;
uint8_t carry_in = EFLAGS.CF;
uint16_t result = lhs + rhs + carry_in;
EFLAGS.CF = (uint64_t)lhs + rhs + carry_in > UINT16_MAX;
EFLAGS.AF = ((lhs ^ rhs ^ result) & 0x10) != 0;
EFLAGS.OF = ((~(lhs ^ rhs)) & (lhs ^ result) & 0x8000) != 0;
EFLAGS.PF = parity_even((uint8_t)result);
EFLAGS.ZF = result == 0;
EFLAGS.SF = (int16_t)result < 0;

Tested on real machine.

Latency and Throughput

CPU µarch Latency Throughput (IPC)
3C5000 LA464 2 0.5(1/2)
3C6000 LA664 2 0.5(1/2)

x86adc.w

Synopsis

Instruction: x86adc.w rj, rk
CPU Flags: LBT

Description

x86-style add with carry: add 32-bit values in rj and rk with CF (in EFLAGS). Update EFLAGS (CF, AF, OF, PF, ZF, SF) according to the result. Only EFLAGS (LBT4) is updated; the GPR is not modified.

Operation

uint32_t lhs = (uint32_t)a;
uint32_t rhs = (uint32_t)b;
uint8_t carry_in = EFLAGS.CF;
uint32_t result = lhs + rhs + carry_in;
EFLAGS.CF = (uint64_t)lhs + rhs + carry_in > UINT32_MAX;
EFLAGS.AF = ((lhs ^ rhs ^ result) & 0x10) != 0;
EFLAGS.OF = ((~(lhs ^ rhs)) & (lhs ^ result) & 0x80000000) != 0;
EFLAGS.PF = parity_even((uint8_t)result);
EFLAGS.ZF = result == 0;
EFLAGS.SF = (int32_t)result < 0;

Tested on real machine.

Latency and Throughput

CPU µarch Latency Throughput (IPC)
3C5000 LA464 2 0.5(1/2)
3C6000 LA664 2 0.5(1/2)

x86adc.d

Synopsis

Instruction: x86adc.d rj, rk
CPU Flags: LBT

Description

x86-style add with carry: add 64-bit values in rj and rk with CF (in EFLAGS). Update EFLAGS (CF, AF, OF, PF, ZF, SF) according to the result. Only EFLAGS (LBT4) is updated; the GPR is not modified.

Operation

uint64_t lhs = (uint64_t)a;
uint64_t rhs = (uint64_t)b;
uint8_t carry_in = EFLAGS.CF;
uint64_t result = lhs + rhs + carry_in;
EFLAGS.CF = lhs > UINT64_MAX - rhs - carry_in;
EFLAGS.AF = ((lhs ^ rhs ^ result) & 0x10) != 0;
EFLAGS.OF = ((~(lhs ^ rhs)) & (lhs ^ result) & 0x8000000000000000ULL) != 0;
EFLAGS.PF = parity_even((uint8_t)result);
EFLAGS.ZF = result == 0;
EFLAGS.SF = (int64_t)result < 0;

Tested on real machine.

Latency and Throughput

CPU µarch Latency Throughput (IPC)
3C5000 LA464 2 0.5(1/2)
3C6000 LA664 2 0.5(1/2)

x86add.b

Synopsis

Instruction: x86add.b rj, rk
CPU Flags: LBT

Description

x86-style add: add 8-bit values in rj and rk. Update EFLAGS (CF, AF, OF, PF, ZF, SF) according to the result. Only EFLAGS (LBT4) is updated; the GPR is not modified.

Operation

uint8_t lhs = (uint8_t)a;
uint8_t rhs = (uint8_t)b;
uint8_t result = lhs + rhs;
EFLAGS.CF = (uint64_t)lhs + (uint64_t)rhs > UINT8_MAX;
EFLAGS.AF = ((lhs ^ rhs ^ result) & 0x10) != 0;
EFLAGS.OF = ((~(lhs ^ rhs)) & (lhs ^ result) & 0x80) != 0;
EFLAGS.PF = parity_even((uint8_t)result);
EFLAGS.ZF = result == 0;
EFLAGS.SF = (int8_t)result < 0;

Tested on real machine.

Latency and Throughput

CPU µarch Latency Throughput (IPC)
3C5000 LA464 2 0.5(1/2)
3C6000 LA664 2 0.5(1/2)

x86add.h

Synopsis

Instruction: x86add.h rj, rk
CPU Flags: LBT

Description

x86-style add: add 16-bit values in rj and rk. Update EFLAGS (CF, AF, OF, PF, ZF, SF) according to the result. Only EFLAGS (LBT4) is updated; the GPR is not modified.

Operation

uint16_t lhs = (uint16_t)a;
uint16_t rhs = (uint16_t)b;
uint16_t result = lhs + rhs;
EFLAGS.CF = (uint64_t)lhs + (uint64_t)rhs > UINT16_MAX;
EFLAGS.AF = ((lhs ^ rhs ^ result) & 0x10) != 0;
EFLAGS.OF = ((~(lhs ^ rhs)) & (lhs ^ result) & 0x8000) != 0;
EFLAGS.PF = parity_even((uint8_t)result);
EFLAGS.ZF = result == 0;
EFLAGS.SF = (int16_t)result < 0;

Tested on real machine.

Latency and Throughput

CPU µarch Latency Throughput (IPC)
3C5000 LA464 2 0.5(1/2)
3C6000 LA664 2 0.5(1/2)

x86add.w

Synopsis

Instruction: x86add.w rj, rk
CPU Flags: LBT

Description

x86-style add: add 32-bit values in rj and rk. Update EFLAGS (CF, AF, OF, PF, ZF, SF) according to the result. Only EFLAGS (LBT4) is updated; the GPR is not modified.

Operation

uint32_t lhs = (uint32_t)a;
uint32_t rhs = (uint32_t)b;
uint32_t result = lhs + rhs;
EFLAGS.CF = (uint64_t)lhs + (uint64_t)rhs > UINT32_MAX;
EFLAGS.AF = ((lhs ^ rhs ^ result) & 0x10) != 0;
EFLAGS.OF = ((~(lhs ^ rhs)) & (lhs ^ result) & 0x80000000) != 0;
EFLAGS.PF = parity_even((uint8_t)result);
EFLAGS.ZF = result == 0;
EFLAGS.SF = (int32_t)result < 0;

Tested on real machine.

Latency and Throughput

CPU µarch Latency Throughput (IPC)
3C5000 LA464 2 0.5(1/2)
3C6000 LA664 2 0.5(1/2)

x86add.d

Synopsis

Instruction: x86add.d rj, rk
CPU Flags: LBT

Description

x86-style add: add 64-bit values in rj and rk. Update EFLAGS (CF, AF, OF, PF, ZF, SF) according to the result. Only EFLAGS (LBT4) is updated; the GPR is not modified.

Operation

uint64_t lhs = (uint64_t)a;
uint64_t rhs = (uint64_t)b;
uint64_t result = lhs + rhs;
EFLAGS.CF = lhs > UINT64_MAX - rhs;
EFLAGS.AF = ((lhs ^ rhs ^ result) & 0x10) != 0;
EFLAGS.OF = ((~(lhs ^ rhs)) & (lhs ^ result) & 0x8000000000000000ULL) != 0;
EFLAGS.PF = parity_even((uint8_t)result);
EFLAGS.ZF = result == 0;
EFLAGS.SF = (int64_t)result < 0;

Tested on real machine.

Latency and Throughput

CPU µarch Latency Throughput (IPC)
3C5000 LA464 2 0.5(1/2)
3C6000 LA664 2 0.5(1/2)

x86add.wu

Synopsis

Instruction: x86add.wu rj, rk
CPU Flags: LBT

Description

x86-style add: add unsigned 32-bit values in rj and rk. Update EFLAGS (CF, AF, OF, PF, ZF, SF) according to the result. Only EFLAGS (LBT4) is updated; the GPR is not modified.

Operation

uint32_t lhs = (uint32_t)a;
uint32_t rhs = (uint32_t)b;
uint32_t result = lhs + rhs;
EFLAGS.CF = (uint64_t)lhs + (uint64_t)rhs > UINT32_MAX;
EFLAGS.AF = ((lhs ^ rhs ^ result) & 0x10) != 0;
EFLAGS.OF = 0;
EFLAGS.PF = parity_even((uint8_t)result);
EFLAGS.ZF = result == 0;
EFLAGS.SF = (int32_t)result < 0;

Tested on real machine.

Latency and Throughput

CPU µarch Latency Throughput (IPC)
3C5000 LA464 2 0.5(1/2)
3C6000 LA664 2 0.5(1/2)

x86add.du

Synopsis

Instruction: x86add.du rj, rk
CPU Flags: LBT

Description

x86-style add: add unsigned 64-bit values in rj and rk. Update EFLAGS (CF, AF, OF, PF, ZF, SF) according to the result. Only EFLAGS (LBT4) is updated; the GPR is not modified.

Operation

uint64_t lhs = (uint64_t)a;
uint64_t rhs = (uint64_t)b;
uint64_t result = lhs + rhs;
EFLAGS.CF = lhs > UINT64_MAX - rhs;
EFLAGS.AF = ((lhs ^ rhs ^ result) & 0x10) != 0;
EFLAGS.OF = 0;
EFLAGS.PF = parity_even((uint8_t)result);
EFLAGS.ZF = result == 0;
EFLAGS.SF = (int64_t)result < 0;

Tested on real machine.

Latency and Throughput

CPU µarch Latency Throughput (IPC)
3C5000 LA464 2 0.5(1/2)
3C6000 LA664 2 0.5(1/2)

x86inc.b

Synopsis

Instruction: x86inc.b rj
CPU Flags: LBT

Description

x86-style increment: add 1 to the 8-bit value in rj. Update EFLAGS (AF, OF, PF, ZF, SF). Preserve CF. Only EFLAGS (LBT4) is updated; the GPR is not modified.

Operation

uint8_t v = (uint8_t)a;
uint8_t r = v + 1;
// CF preserved from input EFLAGS.CF
EFLAGS.PF = parity_even((uint8_t)r);
EFLAGS.AF = ((v ^ 1 ^ r) & 0x10) != 0;
EFLAGS.ZF = r == 0;
EFLAGS.SF = (int8_t)r < 0;
EFLAGS.OF = v == (uint8_t)INT8_MAX;

Tested on real machine.

Latency and Throughput

CPU µarch Latency Throughput (IPC)
3C5000 LA464 2 0.5(1/2)
3C6000 LA664 2 0.5(1/2)

x86inc.h

Synopsis

Instruction: x86inc.h rj
CPU Flags: LBT

Description

x86-style increment: add 1 to the 16-bit value in rj. Update EFLAGS (AF, OF, PF, ZF, SF). Preserve CF. Only EFLAGS (LBT4) is updated; the GPR is not modified.

Operation

uint16_t v = (uint16_t)a;
uint16_t r = v + 1;
// CF preserved from input EFLAGS.CF
EFLAGS.PF = parity_even((uint8_t)r);
EFLAGS.AF = ((v ^ 1 ^ r) & 0x10) != 0;
EFLAGS.ZF = r == 0;
EFLAGS.SF = (int16_t)r < 0;
EFLAGS.OF = v == (uint16_t)INT16_MAX;

Tested on real machine.

Latency and Throughput

CPU µarch Latency Throughput (IPC)
3C5000 LA464 2 0.5(1/2)
3C6000 LA664 2 0.5(1/2)

x86inc.w

Synopsis

Instruction: x86inc.w rj
CPU Flags: LBT

Description

x86-style increment: add 1 to the 32-bit value in rj. Update EFLAGS (AF, OF, PF, ZF, SF). Preserve CF. Only EFLAGS (LBT4) is updated; the GPR is not modified.

Operation

uint32_t v = (uint32_t)a;
uint32_t r = v + 1;
// CF preserved from input EFLAGS.CF
EFLAGS.PF = parity_even((uint8_t)r);
EFLAGS.AF = ((v ^ 1 ^ r) & 0x10) != 0;
EFLAGS.ZF = r == 0;
EFLAGS.SF = (int32_t)r < 0;
EFLAGS.OF = v == (uint32_t)INT32_MAX;

Tested on real machine.

Latency and Throughput

CPU µarch Latency Throughput (IPC)
3C5000 LA464 2 0.5(1/2)
3C6000 LA664 2 0.5(1/2)

x86inc.d

Synopsis

Instruction: x86inc.d rj
CPU Flags: LBT

Description

x86-style increment: add 1 to the 64-bit value in rj. Update EFLAGS (AF, OF, PF, ZF, SF). Preserve CF. Only EFLAGS (LBT4) is updated; the GPR is not modified.

Operation

uint64_t v = (uint64_t)a;
uint64_t r = v + 1;
// CF preserved from input EFLAGS.CF
EFLAGS.PF = parity_even((uint8_t)r);
EFLAGS.AF = ((v ^ 1 ^ r) & 0x10) != 0;
EFLAGS.ZF = r == 0;
EFLAGS.SF = (int64_t)r < 0;
EFLAGS.OF = v == (uint64_t)INT64_MAX;

Tested on real machine.

Latency and Throughput

CPU µarch Latency Throughput (IPC)
3C5000 LA464 2 0.5(1/2)
3C6000 LA664 2 0.5(1/2)

x86sbc.b

Synopsis

Instruction: x86sbc.b rj, rk
CPU Flags: LBT

Description

x86-style subtract with borrow: subtract 8-bit values in rj and rk with CF (in EFLAGS). Update EFLAGS (CF, AF, OF, PF, ZF, SF) according to the result. Only EFLAGS (LBT4) is updated; the GPR is not modified.

Operation

uint8_t lhs = (uint8_t)a;
uint8_t rhs = (uint8_t)b;
uint8_t carry_in = EFLAGS.CF;
uint64_t subtrahend = (uint64_t)rhs + carry_in;
uint8_t result = (uint8_t)(lhs - subtrahend);
EFLAGS.CF = lhs < subtrahend;
EFLAGS.AF = (lhs & 0xf) < (rhs & 0xf);
EFLAGS.OF = ((lhs ^ rhs) & (lhs ^ result) & 0x80) != 0;
EFLAGS.PF = parity_even((uint8_t)result);
EFLAGS.ZF = result == 0;
EFLAGS.SF = (int8_t)result < 0;

Tested on real machine.

Latency and Throughput

CPU µarch Latency Throughput (IPC)
3C5000 LA464 2 0.5(1/2)
3C6000 LA664 2 0.5(1/2)

x86sbc.h

Synopsis

Instruction: x86sbc.h rj, rk
CPU Flags: LBT

Description

x86-style subtract with borrow: subtract 16-bit values in rj and rk with CF (in EFLAGS). Update EFLAGS (CF, AF, OF, PF, ZF, SF) according to the result. Only EFLAGS (LBT4) is updated; the GPR is not modified.

Operation

uint16_t lhs = (uint16_t)a;
uint16_t rhs = (uint16_t)b;
uint8_t carry_in = EFLAGS.CF;
uint64_t subtrahend = (uint64_t)rhs + carry_in;
uint16_t result = (uint16_t)(lhs - subtrahend);
EFLAGS.CF = lhs < subtrahend;
EFLAGS.AF = (lhs & 0xf) < (rhs & 0xf);
EFLAGS.OF = ((lhs ^ rhs) & (lhs ^ result) & 0x8000) != 0;
EFLAGS.PF = parity_even((uint8_t)result);
EFLAGS.ZF = result == 0;
EFLAGS.SF = (int16_t)result < 0;

Tested on real machine.

Latency and Throughput

CPU µarch Latency Throughput (IPC)
3C5000 LA464 2 0.5(1/2)
3C6000 LA664 2 0.5(1/2)

x86sbc.w

Synopsis

Instruction: x86sbc.w rj, rk
CPU Flags: LBT

Description

x86-style subtract with borrow: subtract 32-bit values in rj and rk with CF (in EFLAGS). Update EFLAGS (CF, AF, OF, PF, ZF, SF) according to the result. Only EFLAGS (LBT4) is updated; the GPR is not modified.

Operation

uint32_t lhs = (uint32_t)a;
uint32_t rhs = (uint32_t)b;
uint8_t carry_in = EFLAGS.CF;
uint64_t subtrahend = (uint64_t)rhs + carry_in;
uint32_t result = (uint32_t)(lhs - subtrahend);
EFLAGS.CF = lhs < subtrahend;
EFLAGS.AF = (lhs & 0xf) < (rhs & 0xf);
EFLAGS.OF = ((lhs ^ rhs) & (lhs ^ result) & 0x80000000) != 0;
EFLAGS.PF = parity_even((uint8_t)result);
EFLAGS.ZF = result == 0;
EFLAGS.SF = (int32_t)result < 0;

Tested on real machine.

Latency and Throughput

CPU µarch Latency Throughput (IPC)
3C5000 LA464 2 0.5(1/2)
3C6000 LA664 2 0.5(1/2)

x86sbc.d

Synopsis

Instruction: x86sbc.d rj, rk
CPU Flags: LBT

Description

x86-style subtract with borrow: subtract 64-bit values in rj and rk with CF (in EFLAGS). Update EFLAGS (CF, AF, OF, PF, ZF, SF) according to the result. Only EFLAGS (LBT4) is updated; the GPR is not modified.

Operation

uint64_t lhs = (uint64_t)a;
uint64_t rhs = (uint64_t)b;
uint8_t carry_in = EFLAGS.CF;
uint64_t subtrahend = (uint64_t)rhs + carry_in;
uint64_t result = (uint64_t)(lhs - subtrahend);
EFLAGS.CF = lhs < subtrahend;
EFLAGS.AF = (lhs & 0xf) < (rhs & 0xf);
EFLAGS.OF = ((lhs ^ rhs) & (lhs ^ result) & 0x8000000000000000ULL) != 0;
EFLAGS.PF = parity_even((uint8_t)result);
EFLAGS.ZF = result == 0;
EFLAGS.SF = (int64_t)result < 0;

Tested on real machine.

Latency and Throughput

CPU µarch Latency Throughput (IPC)
3C5000 LA464 2 0.5(1/2)
3C6000 LA664 2 0.5(1/2)

x86sub.b

Synopsis

Instruction: x86sub.b rj, rk
CPU Flags: LBT

Description

x86-style subtract: subtract 8-bit values in rj and rk. Update EFLAGS (CF, AF, OF, PF, ZF, SF) according to the result. Only EFLAGS (LBT4) is updated; the GPR is not modified.

Operation

uint8_t lhs = (uint8_t)a;
uint8_t rhs = (uint8_t)b;
uint8_t result = lhs - rhs;
EFLAGS.CF = lhs < rhs;
EFLAGS.AF = ((lhs ^ rhs ^ result) & 0x10) != 0;
EFLAGS.OF = ((lhs ^ rhs) & (lhs ^ result) & 0x80) != 0;
EFLAGS.PF = parity_even((uint8_t)result);
EFLAGS.ZF = result == 0;
EFLAGS.SF = (int8_t)result < 0;

Tested on real machine.

Latency and Throughput

CPU µarch Latency Throughput (IPC)
3C5000 LA464 2 0.5(1/2)
3C6000 LA664 2 0.5(1/2)

x86sub.h

Synopsis

Instruction: x86sub.h rj, rk
CPU Flags: LBT

Description

x86-style subtract: subtract 16-bit values in rj and rk. Update EFLAGS (CF, AF, OF, PF, ZF, SF) according to the result. Only EFLAGS (LBT4) is updated; the GPR is not modified.

Operation

uint16_t lhs = (uint16_t)a;
uint16_t rhs = (uint16_t)b;
uint16_t result = lhs - rhs;
EFLAGS.CF = lhs < rhs;
EFLAGS.AF = ((lhs ^ rhs ^ result) & 0x10) != 0;
EFLAGS.OF = ((lhs ^ rhs) & (lhs ^ result) & 0x8000) != 0;
EFLAGS.PF = parity_even((uint8_t)result);
EFLAGS.ZF = result == 0;
EFLAGS.SF = (int16_t)result < 0;

Tested on real machine.

Latency and Throughput

CPU µarch Latency Throughput (IPC)
3C5000 LA464 2 0.5(1/2)
3C6000 LA664 2 0.5(1/2)

x86sub.w

Synopsis

Instruction: x86sub.w rj, rk
CPU Flags: LBT

Description

x86-style subtract: subtract 32-bit values in rj and rk. Update EFLAGS (CF, AF, OF, PF, ZF, SF) according to the result. Only EFLAGS (LBT4) is updated; the GPR is not modified.

Operation

uint32_t lhs = (uint32_t)a;
uint32_t rhs = (uint32_t)b;
uint32_t result = lhs - rhs;
EFLAGS.CF = lhs < rhs;
EFLAGS.AF = ((lhs ^ rhs ^ result) & 0x10) != 0;
EFLAGS.OF = ((lhs ^ rhs) & (lhs ^ result) & 0x80000000) != 0;
EFLAGS.PF = parity_even((uint8_t)result);
EFLAGS.ZF = result == 0;
EFLAGS.SF = (int32_t)result < 0;

Tested on real machine.

Latency and Throughput

CPU µarch Latency Throughput (IPC)
3C5000 LA464 2 0.5(1/2)
3C6000 LA664 2 0.5(1/2)

x86sub.d

Synopsis

Instruction: x86sub.d rj, rk
CPU Flags: LBT

Description

x86-style subtract: subtract 64-bit values in rj and rk. Update EFLAGS (CF, AF, OF, PF, ZF, SF) according to the result. Only EFLAGS (LBT4) is updated; the GPR is not modified.

Operation

uint64_t lhs = (uint64_t)a;
uint64_t rhs = (uint64_t)b;
uint64_t result = lhs - rhs;
EFLAGS.CF = lhs < rhs;
EFLAGS.AF = ((lhs ^ rhs ^ result) & 0x10) != 0;
EFLAGS.OF = ((lhs ^ rhs) & (lhs ^ result) & 0x8000000000000000ULL) != 0;
EFLAGS.PF = parity_even((uint8_t)result);
EFLAGS.ZF = result == 0;
EFLAGS.SF = (int64_t)result < 0;

Tested on real machine.

Latency and Throughput

CPU µarch Latency Throughput (IPC)
3C5000 LA464 2 0.5(1/2)
3C6000 LA664 2 0.5(1/2)

x86sub.wu

Synopsis

Instruction: x86sub.wu rj, rk
CPU Flags: LBT

Description

x86-style subtract: subtract unsigned 32-bit values in rj and rk. Update EFLAGS (CF, AF, OF, PF, ZF, SF) according to the result. Only EFLAGS (LBT4) is updated; the GPR is not modified.

Operation

uint32_t lhs = (uint32_t)a;
uint32_t rhs = (uint32_t)b;
uint32_t result = lhs - rhs;
EFLAGS.CF = lhs < rhs;
EFLAGS.AF = ((lhs ^ rhs ^ result) & 0x10) != 0;
EFLAGS.OF = 0;
EFLAGS.PF = parity_even((uint8_t)result);
EFLAGS.ZF = result == 0;
EFLAGS.SF = (int32_t)result < 0;

Tested on real machine.

Latency and Throughput

CPU µarch Latency Throughput (IPC)
3C5000 LA464 2 0.5(1/2)
3C6000 LA664 2 0.5(1/2)

x86sub.du

Synopsis

Instruction: x86sub.du rj, rk
CPU Flags: LBT

Description

x86-style subtract: subtract unsigned 64-bit values in rj and rk. Update EFLAGS (CF, AF, OF, PF, ZF, SF) according to the result. Only EFLAGS (LBT4) is updated; the GPR is not modified.

Operation

uint64_t lhs = (uint64_t)a;
uint64_t rhs = (uint64_t)b;
uint64_t result = lhs - rhs;
EFLAGS.CF = lhs < rhs;
EFLAGS.AF = ((lhs ^ rhs ^ result) & 0x10) != 0;
EFLAGS.OF = 0;
EFLAGS.PF = parity_even((uint8_t)result);
EFLAGS.ZF = result == 0;
EFLAGS.SF = (int64_t)result < 0;

Tested on real machine.

Latency and Throughput

CPU µarch Latency Throughput (IPC)
3C5000 LA464 2 0.5(1/2)
3C6000 LA664 2 0.5(1/2)

x86dec.b

Synopsis

Instruction: x86dec.b rj
CPU Flags: LBT

Description

x86-style decrement: subtract 1 from the 8-bit value in rj. Update EFLAGS (AF, OF, PF, ZF, SF). Preserve CF. Only EFLAGS (LBT4) is updated; the GPR is not modified.

Operation

uint8_t v = (uint8_t)a;
uint8_t r = v - 1;
// CF preserved from input EFLAGS.CF
EFLAGS.PF = parity_even((uint8_t)r);
EFLAGS.AF = ((v ^ 1 ^ r) & 0x10) != 0;
EFLAGS.ZF = r == 0;
EFLAGS.SF = (int8_t)r < 0;
EFLAGS.OF = v == (uint8_t)INT8_MIN;

Tested on real machine.

Latency and Throughput

CPU µarch Latency Throughput (IPC)
3C5000 LA464 2 0.5(1/2)
3C6000 LA664 2 0.5(1/2)

x86dec.h

Synopsis

Instruction: x86dec.h rj
CPU Flags: LBT

Description

x86-style decrement: subtract 1 from the 16-bit value in rj. Update EFLAGS (AF, OF, PF, ZF, SF). Preserve CF. Only EFLAGS (LBT4) is updated; the GPR is not modified.

Operation

uint16_t v = (uint16_t)a;
uint16_t r = v - 1;
// CF preserved from input EFLAGS.CF
EFLAGS.PF = parity_even((uint8_t)r);
EFLAGS.AF = ((v ^ 1 ^ r) & 0x10) != 0;
EFLAGS.ZF = r == 0;
EFLAGS.SF = (int16_t)r < 0;
EFLAGS.OF = v == (uint16_t)INT16_MIN;

Tested on real machine.

Latency and Throughput

CPU µarch Latency Throughput (IPC)
3C5000 LA464 2 0.5(1/2)
3C6000 LA664 2 0.5(1/2)

x86dec.w

Synopsis

Instruction: x86dec.w rj
CPU Flags: LBT

Description

x86-style decrement: subtract 1 from the 32-bit value in rj. Update EFLAGS (AF, OF, PF, ZF, SF). Preserve CF. Only EFLAGS (LBT4) is updated; the GPR is not modified.

Operation

uint32_t v = (uint32_t)a;
uint32_t r = v - 1;
// CF preserved from input EFLAGS.CF
EFLAGS.PF = parity_even((uint8_t)r);
EFLAGS.AF = ((v ^ 1 ^ r) & 0x10) != 0;
EFLAGS.ZF = r == 0;
EFLAGS.SF = (int32_t)r < 0;
EFLAGS.OF = v == (uint32_t)INT32_MIN;

Tested on real machine.

Latency and Throughput

CPU µarch Latency Throughput (IPC)
3C5000 LA464 2 0.5(1/2)
3C6000 LA664 2 0.5(1/2)

x86dec.d

Synopsis

Instruction: x86dec.d rj
CPU Flags: LBT

Description

x86-style decrement: subtract 1 from the 64-bit value in rj. Update EFLAGS (AF, OF, PF, ZF, SF). Preserve CF. Only EFLAGS (LBT4) is updated; the GPR is not modified.

Operation

uint64_t v = (uint64_t)a;
uint64_t r = v - 1;
// CF preserved from input EFLAGS.CF
EFLAGS.PF = parity_even((uint8_t)r);
EFLAGS.AF = ((v ^ 1 ^ r) & 0x10) != 0;
EFLAGS.ZF = r == 0;
EFLAGS.SF = (int64_t)r < 0;
EFLAGS.OF = v == (uint64_t)INT64_MIN;

Tested on real machine.

Latency and Throughput

CPU µarch Latency Throughput (IPC)
3C5000 LA464 2 0.5(1/2)
3C6000 LA664 2 0.5(1/2)

x86mul.b

Synopsis

Instruction: x86mul.b rj, rk
CPU Flags: LBT

Description

x86-style multiply: multiply signed 8-bit values in rj and rk. Set CF and OF if overflow (result does not fit in 8-bits), clear other flags. Only EFLAGS (LBT4) is updated; the GPR is not modified.

Operation

uint8_t lhs = (uint8_t)a;
uint8_t rhs = (uint8_t)b;
__int128 product = (__int128)(int8_t)lhs * (__int128)(int8_t)rhs;
bool overflow = product < (__int128)INT8_MIN || product > (__int128)INT8_MAX;
EFLAGS.CF = overflow;
EFLAGS.OF = overflow;
EFLAGS.SF = 0;
EFLAGS.ZF = 0;
EFLAGS.AF = 0;
EFLAGS.PF = 0;

Tested on real machine.

Latency and Throughput

CPU µarch Latency Throughput (IPC)
3C5000 LA464 1.17 0.86(1/1.17)
3C6000 LA664 1 1

x86mul.h

Synopsis

Instruction: x86mul.h rj, rk
CPU Flags: LBT

Description

x86-style multiply: multiply signed 16-bit values in rj and rk. Set CF and OF if overflow (result does not fit in 16-bits), clear other flags. Only EFLAGS (LBT4) is updated; the GPR is not modified.

Operation

uint16_t lhs = (uint16_t)a;
uint16_t rhs = (uint16_t)b;
__int128 product = (__int128)(int16_t)lhs * (__int128)(int16_t)rhs;
bool overflow = product < (__int128)INT16_MIN || product > (__int128)INT16_MAX;
EFLAGS.CF = overflow;
EFLAGS.OF = overflow;
EFLAGS.SF = 0;
EFLAGS.ZF = 0;
EFLAGS.AF = 0;
EFLAGS.PF = 0;

Tested on real machine.

Latency and Throughput

CPU µarch Latency Throughput (IPC)
3C5000 LA464 1.17 0.86(1/1.17)
3C6000 LA664 1 1

x86mul.w

Synopsis

Instruction: x86mul.w rj, rk
CPU Flags: LBT

Description

x86-style multiply: multiply signed 32-bit values in rj and rk. Set CF and OF if overflow (result does not fit in 32-bits), clear other flags. Only EFLAGS (LBT4) is updated; the GPR is not modified.

Operation

uint32_t lhs = (uint32_t)a;
uint32_t rhs = (uint32_t)b;
__int128 product = (__int128)(int32_t)lhs * (__int128)(int32_t)rhs;
bool overflow = product < (__int128)INT32_MIN || product > (__int128)INT32_MAX;
EFLAGS.CF = overflow;
EFLAGS.OF = overflow;
EFLAGS.SF = 0;
EFLAGS.ZF = 0;
EFLAGS.AF = 0;
EFLAGS.PF = 0;

Tested on real machine.

Latency and Throughput

CPU µarch Latency Throughput (IPC)
3C5000 LA464 1.17 0.86(1/1.17)
3C6000 LA664 1 1

x86mul.d

Synopsis

Instruction: x86mul.d rj, rk
CPU Flags: LBT

Description

x86-style multiply: multiply signed 64-bit values in rj and rk. Set CF and OF if overflow (result does not fit in 64-bits), clear other flags. Only EFLAGS (LBT4) is updated; the GPR is not modified.

Operation

uint64_t lhs = (uint64_t)a;
uint64_t rhs = (uint64_t)b;
__int128 product = (__int128)(int64_t)lhs * (__int128)(int64_t)rhs;
bool overflow = product < (__int128)INT64_MIN || product > (__int128)INT64_MAX;
EFLAGS.CF = overflow;
EFLAGS.OF = overflow;
EFLAGS.SF = 0;
EFLAGS.ZF = 0;
EFLAGS.AF = 0;
EFLAGS.PF = 0;

Tested on real machine.

Latency and Throughput

CPU µarch Latency Throughput (IPC)
3C5000 LA464 1.17 0.86(1/1.17)
3C6000 LA664 1 1

x86mul.bu

Synopsis

Instruction: x86mul.bu rj, rk
CPU Flags: LBT

Description

x86-style multiply: multiply unsigned 8-bit values in rj and rk. Set CF and OF if overflow (result does not fit in 8-bits), clear other flags. Only EFLAGS (LBT4) is updated; the GPR is not modified.

Operation

uint8_t lhs = (uint8_t)a;
uint8_t rhs = (uint8_t)b;
unsigned __int128 product =
    (unsigned __int128)(uint8_t)lhs * (unsigned __int128)(uint8_t)rhs;
bool overflow = (product >> 8) != 0;
EFLAGS.CF = overflow;
EFLAGS.OF = overflow;
EFLAGS.SF = 0;
EFLAGS.ZF = 0;
EFLAGS.AF = 0;
EFLAGS.PF = 0;

Tested on real machine.

Latency and Throughput

CPU µarch Latency Throughput (IPC)
3C5000 LA464 1.17 0.86(1/1.17)
3C6000 LA664 1 1

x86mul.hu

Synopsis

Instruction: x86mul.hu rj, rk
CPU Flags: LBT

Description

x86-style multiply: multiply unsigned 16-bit values in rj and rk. Set CF and OF if overflow (result does not fit in 16-bits), clear other flags. Only EFLAGS (LBT4) is updated; the GPR is not modified.

Operation

uint16_t lhs = (uint16_t)a;
uint16_t rhs = (uint16_t)b;
unsigned __int128 product =
    (unsigned __int128)(uint16_t)lhs * (unsigned __int128)(uint16_t)rhs;
bool overflow = (product >> 16) != 0;
EFLAGS.CF = overflow;
EFLAGS.OF = overflow;
EFLAGS.SF = 0;
EFLAGS.ZF = 0;
EFLAGS.AF = 0;
EFLAGS.PF = 0;

Tested on real machine.

Latency and Throughput

CPU µarch Latency Throughput (IPC)
3C5000 LA464 1.17 0.86(1/1.17)
3C6000 LA664 1 1

x86mul.wu

Synopsis

Instruction: x86mul.wu rj, rk
CPU Flags: LBT

Description

x86-style multiply: multiply unsigned 32-bit values in rj and rk. Set CF and OF if overflow (result does not fit in 32-bits), clear other flags. Only EFLAGS (LBT4) is updated; the GPR is not modified.

Operation

uint32_t lhs = (uint32_t)a;
uint32_t rhs = (uint32_t)b;
unsigned __int128 product =
    (unsigned __int128)(uint32_t)lhs * (unsigned __int128)(uint32_t)rhs;
bool overflow = (product >> 32) != 0;
EFLAGS.CF = overflow;
EFLAGS.OF = overflow;
EFLAGS.SF = 0;
EFLAGS.ZF = 0;
EFLAGS.AF = 0;
EFLAGS.PF = 0;

Tested on real machine.

Latency and Throughput

CPU µarch Latency Throughput (IPC)
3C5000 LA464 1.17 0.86(1/1.17)
3C6000 LA664 1 1

x86mul.du

Synopsis

Instruction: x86mul.du rj, rk
CPU Flags: LBT

Description

x86-style multiply: multiply unsigned 64-bit values in rj and rk. Set CF and OF if overflow (result does not fit in 64-bits), clear other flags. Only EFLAGS (LBT4) is updated; the GPR is not modified.

Operation

uint64_t lhs = (uint64_t)a;
uint64_t rhs = (uint64_t)b;
unsigned __int128 product =
    (unsigned __int128)(uint64_t)lhs * (unsigned __int128)(uint64_t)rhs;
bool overflow = (product >> 64) != 0;
EFLAGS.CF = overflow;
EFLAGS.OF = overflow;
EFLAGS.SF = 0;
EFLAGS.ZF = 0;
EFLAGS.AF = 0;
EFLAGS.PF = 0;

Tested on real machine.

Latency and Throughput

CPU µarch Latency Throughput (IPC)
3C5000 LA464 1.17 0.86(1/1.17)
3C6000 LA664 1 1