Instruction Set Comparison¶
- x86 is CISC (Complex Instruction Set Computer) — it has some instructions that do a lot in one go (e.g., load from memory, multiply, and add to a register in a single instruction).
- ARM is RISC (Reduced Instruction Set Computer) — it uses simpler instructions that usually do one thing at a time, but you chain them together to get the same effect.
- At the macro level (run a web browser, control a drone, process an image), both architectures are fully capable.
-
At the micro level (per-instruction), the sequences differ — sometimes ARM needs more steps, sometimes it’s faster because each step is simpler and can be highly parallelised.
-
ARM has a simpler instruction set, uses less power but not as powerful.
- x86 has a more complex instruction set but is more powerful.
Operation | x86‑64 | ARM64 | Notes |
---|---|---|---|
Move immediate → reg | mov rax, 5 |
mov x0, #5 |
ARM also has movz/movn/movk for 16‑bit chunk moves. |
Move reg → reg | mov rbx, rax |
mov x1, x0 |
Same idea; different register names. |
Load from memory | mov rax, [rbx] |
ldr x0, [x1] |
ARM is load/store: memory access is via ldr/str . |
Load with complex addr | mov rax,[rbx+rcx*4+16] |
ldr x0,[x1,x2,lsl #2] then add x0,x0,#16 (or use base+imm) |
x86 has richer single‑instr addressing; ARM often composes. |
Store to memory | mov [rbp-8], rax |
str x0, [x29, #-8] |
Frame pointers: rbp vs x29 . |
Add | add rax, rbx |
add x0, x0, x1 |
ARM 3‑operand form keeps sources. |
Subtract | sub rax, rbx |
sub x0, x0, x1 |
Flags set similarly (NZCV on ARM, RFLAGS on x86). |
Multiply (int) | imul rax, rbx |
mul x0, x0, x1 |
x86 has many imul forms; ARM has separate widening variants (smull , etc.). |
Divide (int) | cqo ; idiv rbx |
sdiv x0, x0, x1 |
x86 uses implicit dividend in rax/rdx ; ARM uses 3‑operand sdiv/udiv . |
Bitwise AND/OR/XOR | and rax, rbx |
and x0, x0, x1 |
OR: or (x86) vs orr (ARM). |
Shifts | shl rax, 3 |
lsl x0, x0, #3 |
Arithmetic right: sar (x86) vs asr (ARM). |
Compare & branch | cmp rax, rbx + je label |
cmp x0, x1 + b.eq label |
ARM branches use condition codes on b.<cond> . |
Conditional move/select | cmovz rax, rbx |
csel x0, x1, x2, eq |
ARM uses csel to pick between two regs. |
Call & return | call func / ret |
bl func / ret |
bl writes return addr to lr /x30 . |
Push/Pop | push rax / pop rax |
stp x29, x30, [sp, #-16]! / ldp x29, x30, [sp], #16 |
ARM uses paired stores/loads; no single‑instr push/pop. |
Load effective address | lea rax, [rbx+8] |
add x0, x1, #8 or adr/adrp x0, label |
lea does address calc; ARM composes with add /adr . |
Function args (ABI) | SysV: rdi,rsi,rdx,rcx,r8,r9 |
AAPCS64: x0–x7 |
Extra args spill to stack; caller/callee‑saved sets differ. |
Why aren’t we “all‑ARM” yet?¶
What ARM does well - Performance per watt: excellent efficiency → laptops, mobile, dense servers. - Integration: easy to build SoC designs (CPU + GPU + IO on one die). - Modern ISA design: clean load/store model; strong compiler support; NEON/SVE vectors.
What slows a full switch - Software & ABI compatibility: mountains of x86‑only binaries, drivers, plugins, and legacy line‑of‑business apps. - Tooling & ecosystem inertia: build systems, CI images, container bases, ops runbooks—all tuned for x86. - Certain niche perf stacks: some HPC/finance/media stacks lean on AVX/AVX‑512 and hand‑tuned x86 code paths. - Vendor/platform lock‑in worries: orgs hesitate to revalidate everything on a new arch without a compelling TCO win.
But the trend is real - Mobile is 100% ARM, Macs moved, and major clouds offer ARM instances. New Windows‑on‑ARM devices and better emulation/translation reduce friction. It’s a stepwise migration, not a flip.
The same C program on x86‑64 vs ARM64¶
Here’s a tiny C function and typical (simplified) compiler output at -O2
. The logic is identical; the instruction “spelling” differs.
C (source)¶
// sum.c
int sum(const int *a, int n) {
int s = 0;
for (int i = 0; i < n; i++)
s += a[i];
return s;
}
x86‑64 System V ABI (Linux/macOS) – typical -O2
style¶
# rdi = a (pointer), esi = n
sum:
xor eax, eax # s = 0 (return reg)
xor edx, edx # i = 0
test esi, esi
jle .Ldone
.Lloop:
add eax, DWORD PTR [rdi + rdx*4] # s += a[i]
inc edx # i++
cmp edx, esi
jl .Lloop
.Ldone:
ret
ARM64/AArch64 (AAPCS64) – typical -O2
style¶
# x0 = a (pointer), w1 = n
sum:
mov w2, #0 # i = 0
mov w0, #0 # s = 0 (return reg)
cbz w1, .Ldone
.Lloop:
ldr w3, [x0, x2, lsl #2] # load a[i]
add w0, w0, w3 # s += a[i]
add w2, w2, #1 # i++
cmp w2, w1
b.lt .Lloop
.Ldone:
ret
What to notice
- Different register naming and calling conventions (x86: rdi/esi/eax
; ARM: x0/w1/w0
).
- x86 uses a rich memory operand inside add
; ARM uses explicit ldr
then add
(classic load/store).
- Same macro behavior, different micro-instruction sequences.