-
x86 is CISC (Complex Instruction Set Computer) — it has some instructions that do a lot in one go (e.g., load from memory, multiply, and add to a register in a single instruction).
-
ARM is RISC (Reduced Instruction Set Computer) — it uses simpler instructions that usually do one thing at a time, but you chain them together to get the same effect.
-
At the macro level (run a web browser, control a drone, process an image), both architectures are fully capable.
-
At the micro level (per-instruction), the sequences differ — sometimes ARM needs more steps, sometimes it’s faster because each step is simpler and can be highly parallelised.
-
ARM has a simpler instruction set, uses less power but not as powerful.
-
x86 has a more complex instruction set but is more powerful.
| Operation | x86‑64 | ARM64 | Notes |
|---|---|---|---|
| Move immediate → reg | mov rax, 5 | mov x0, #5 | ARM also has movz/movn/movk for 16‑bit chunk moves. |
| Move reg → reg | mov rbx, rax | mov x1, x0 | Same idea; different register names. |
| Load from memory | mov rax, [rbx] | ldr x0, [x1] | ARM is load/store: memory access is via ldr/str. |
| Load with complex addr | mov rax,[rbx+rcx*4+16] | ldr x0,[x1,x2,lsl #2] then add x0,x0,#16 (or use base+imm) | x86 has richer single‑instr addressing; ARM often composes. |
| Store to memory | mov [rbp-8], rax | str x0, [x29, #-8] | Frame pointers: rbp vs x29. |
| Add | add rax, rbx | add x0, x0, x1 | ARM 3‑operand form keeps sources. |
| Subtract | sub rax, rbx | sub x0, x0, x1 | Flags set similarly (NZCV on ARM, RFLAGS on x86). |
| Multiply (int) | imul rax, rbx | mul x0, x0, x1 | x86 has many imul forms; ARM has separate widening variants (smull, etc.). |
| Divide (int) | cqo ; idiv rbx | sdiv x0, x0, x1 | x86 uses implicit dividend in rax/rdx; ARM uses 3‑operand sdiv/udiv. |
| Bitwise AND/OR/XOR | and rax, rbx | and x0, x0, x1 | OR: or (x86) vs orr (ARM). |
| Shifts | shl rax, 3 | lsl x0, x0, #3 | Arithmetic right: sar (x86) vs asr (ARM). |
| Compare & branch | cmp rax, rbx + je label | cmp x0, x1 + b.eq label | ARM branches use condition codes on b.<cond>. |
| Conditional move/select | cmovz rax, rbx | csel x0, x1, x2, eq | ARM uses csel to pick between two regs. |
| Call & return | call func / ret | bl func / ret | bl writes return addr to lr/x30. |
| Push/Pop | push rax / pop rax | stp x29, x30, [sp, #-16]! / ldp x29, x30, [sp], #16 | ARM uses paired stores/loads; no single‑instr push/pop. |
| Load effective address | lea rax, [rbx+8] | add x0, x1, #8 or adr/adrp x0, label | lea does address calc; ARM composes with add/adr. |
| Function args (ABI) | SysV: rdi,rsi,rdx,rcx,r8,r9 | AAPCS64: x0–x7 | Extra args spill to stack; caller/callee‑saved sets differ. |
Why aren’t We “all‑ARM” Yet?
What ARM does well
- Performance per watt: excellent efficiency → laptops, mobile, dense servers.
- Integration: easy to build SoC designs (CPU + GPU + IO on one die).
- Modern ISA design: clean load/store model; strong compiler support; NEON/SVE vectors.
What slows a full switch
- Software & ABI compatibility: mountains of x86‑only binaries, drivers, plugins, and legacy line‑of‑business apps.
- Tooling & ecosystem inertia: build systems, CI images, container bases, ops runbooks—all tuned for x86.
- Certain niche perf stacks: some HPC/finance/media stacks lean on AVX/AVX‑512 and hand‑tuned x86 code paths.
- Vendor/platform lock‑in worries: orgs hesitate to revalidate everything on a new arch without a compelling TCO win.
But the trend is real
- Mobile is 100% ARM, Macs moved, and major clouds offer ARM instances. New Windows‑on‑ARM devices and better emulation/translation reduce friction. It’s a stepwise migration, not a flip.
The Same C Program on x86‑64 Vs ARM64
Here’s a tiny C function and typical (simplified) compiler output at -O2. The logic is identical; the instruction “spelling” differs.
C (source)
// sum.c
int sum(const int *a, int n) {
int s = 0;
for (int i = 0; i < n; i++)
s += a[i];
return s;
}x86‑64 System V ABI (Linux/macOS) – Typical -O2 Style
# rdi = a (pointer), esi = n
sum:
xor eax, eax # s = 0 (return reg)
xor edx, edx # i = 0
test esi, esi
jle .Ldone
.Lloop:
add eax, DWORD PTR [rdi + rdx*4] # s += a[i]
inc edx # i++
cmp edx, esi
jl .Lloop
.Ldone:
retARM64/AArch64 (AAPCS64) – Typical -O2 Style
# x0 = a (pointer), w1 = n
sum:
mov w2, #0 # i = 0
mov w0, #0 # s = 0 (return reg)
cbz w1, .Ldone
.Lloop:
ldr w3, [x0, x2, lsl #2] # load a[i]
add w0, w0, w3 # s += a[i]
add w2, w2, #1 # i++
cmp w2, w1
b.lt .Lloop
.Ldone:
retWhat to notice
- Different register naming and calling conventions (x86:
rdi/esi/eax; ARM:x0/w1/w0). - x86 uses a rich memory operand inside
add; ARM uses explicitldrthenadd(classic load/store). - Same macro behavior, different micro-instruction sequences.