Instruction Set Comparison¶

x86 is CISC (Complex Instruction Set Computer) — it has some instructions that do a lot in one go (e.g., load from memory, multiply, and add to a register in a single instruction).
ARM is RISC (Reduced Instruction Set Computer) — it uses simpler instructions that usually do one thing at a time, but you chain them together to get the same effect.
At the macro level (run a web browser, control a drone, process an image), both architectures are fully capable.
At the micro level (per-instruction), the sequences differ — sometimes ARM needs more steps, sometimes it’s faster because each step is simpler and can be highly parallelised.
ARM has a simpler instruction set, uses less power but not as powerful.
x86 has a more complex instruction set but is more powerful.

Operation	x86‑64	ARM64	Notes
Move immediate → reg	`mov rax, 5`	`mov x0, #5`	ARM also has `movz/movn/movk` for 16‑bit chunk moves.
Move reg → reg	`mov rbx, rax`	`mov x1, x0`	Same idea; different register names.
Load from memory	`mov rax, [rbx]`	`ldr x0, [x1]`	ARM is load/store: memory access is via `ldr/str`.
Load with complex addr	`mov rax,[rbx+rcx*4+16]`	`ldr x0,[x1,x2,lsl #2]` then `add x0,x0,#16` (or use base+imm)	x86 has richer single‑instr addressing; ARM often composes.
Store to memory	`mov [rbp-8], rax`	`str x0, [x29, #-8]`	Frame pointers: `rbp` vs `x29`.
Add	`add rax, rbx`	`add x0, x0, x1`	ARM 3‑operand form keeps sources.
Subtract	`sub rax, rbx`	`sub x0, x0, x1`	Flags set similarly (NZCV on ARM, RFLAGS on x86).
Multiply (int)	`imul rax, rbx`	`mul x0, x0, x1`	x86 has many `imul` forms; ARM has separate widening variants (`smull`, etc.).
Divide (int)	`cqo ; idiv rbx`	`sdiv x0, x0, x1`	x86 uses implicit dividend in `rax/rdx`; ARM uses 3‑operand `sdiv/udiv`.
Bitwise AND/OR/XOR	`and rax, rbx`	`and x0, x0, x1`	OR: `or` (x86) vs `orr` (ARM).
Shifts	`shl rax, 3`	`lsl x0, x0, #3`	Arithmetic right: `sar` (x86) vs `asr` (ARM).
Compare & branch	`cmp rax, rbx` + `je label`	`cmp x0, x1` + `b.eq label`	ARM branches use condition codes on `b.<cond>`.
Conditional move/select	`cmovz rax, rbx`	`csel x0, x1, x2, eq`	ARM uses `csel` to pick between two regs.
Call & return	`call func` / `ret`	`bl func` / `ret`	`bl` writes return addr to `lr`/`x30`.
Push/Pop	`push rax` / `pop rax`	`stp x29, x30, [sp, #-16]!` / `ldp x29, x30, [sp], #16`	ARM uses paired stores/loads; no single‑instr push/pop.
Load effective address	`lea rax, [rbx+8]`	`add x0, x1, #8` or `adr/adrp x0, label`	`lea` does address calc; ARM composes with `add`/`adr`.
Function args (ABI)	SysV: `rdi,rsi,rdx,rcx,r8,r9`	AAPCS64: `x0–x7`	Extra args spill to stack; caller/callee‑saved sets differ.

Why aren’t we “all‑ARM” yet?¶

What ARM does well - Performance per watt: excellent efficiency → laptops, mobile, dense servers. - Integration: easy to build SoC designs (CPU + GPU + IO on one die). - Modern ISA design: clean load/store model; strong compiler support; NEON/SVE vectors.

What slows a full switch - Software & ABI compatibility: mountains of x86‑only binaries, drivers, plugins, and legacy line‑of‑business apps. - Tooling & ecosystem inertia: build systems, CI images, container bases, ops runbooks—all tuned for x86. - Certain niche perf stacks: some HPC/finance/media stacks lean on AVX/AVX‑512 and hand‑tuned x86 code paths. - Vendor/platform lock‑in worries: orgs hesitate to revalidate everything on a new arch without a compelling TCO win.

But the trend is real - Mobile is 100% ARM, Macs moved, and major clouds offer ARM instances. New Windows‑on‑ARM devices and better emulation/translation reduce friction. It’s a stepwise migration, not a flip.

The same C program on x86‑64 vs ARM64¶

Here’s a tiny C function and typical (simplified) compiler output at -O2. The logic is identical; the instruction “spelling” differs.

C (source)¶

// sum.c 
int sum(const int *a, int n) {     
    int s = 0;     
    for (int i = 0; i < n; i++) 
        s += a[i];     
    return s; 
}

x86‑64 System V ABI (Linux/macOS) – typical `-O2` style¶

# rdi = a (pointer), esi = n 
sum:
    xor     eax, eax            # s = 0   (return reg)     
    xor     edx, edx            # i = 0     
    test    esi, esi     
    jle     .Ldone 
.Lloop:     
    add     eax, DWORD PTR [rdi + rdx*4]   # s += a[i]     
    inc     edx                              # i++     
    cmp     edx, esi     
    jl      .Lloop 
.Ldone:     
    ret

ARM64/AArch64 (AAPCS64) – typical `-O2` style¶

# x0 = a (pointer), w1 = n 
sum:     
    mov     w2, #0              # i = 0     
    mov     w0, #0              # s = 0   (return reg)     
    cbz     w1, .Ldone 
.Lloop:     
    ldr     w3, [x0, x2, lsl #2]   # load a[i]     
    add     w0, w0, w3             # s += a[i]     
    add     w2, w2, #1             # i++     
    cmp     w2, w1     
    b.lt    .Lloop 
.Ldone:     
    ret

What to notice - Different register naming and calling conventions (x86: rdi/esi/eax; ARM: x0/w1/w0). - x86 uses a rich memory operand inside add; ARM uses explicit ldr then add (classic load/store). - Same macro behavior, different micro-instruction sequences.

Instruction Set Comparison¶

Why aren’t we “all‑ARM” yet?¶

The same C program on x86‑64 vs ARM64¶

C (source)¶

x86‑64 System V ABI (Linux/macOS) – typical -O2 style¶

ARM64/AArch64 (AAPCS64) – typical -O2 style¶

See also¶

x86‑64 System V ABI (Linux/macOS) – typical `-O2` style¶

ARM64/AArch64 (AAPCS64) – typical `-O2` style¶