Skip to content

400 Software Development


Instruction Set Comparison

  • x86 is CISC (Complex Instruction Set Computer) — it has some instructions that do a lot in one go (e.g., load from memory, multiply, and add to a register in a single instruction).
  • ARM is RISC (Reduced Instruction Set Computer) — it uses simpler instructions that usually do one thing at a time, but you chain them together to get the same effect.
  • At the macro level (run a web browser, control a drone, process an image), both architectures are fully capable.
  • At the micro level (per-instruction), the sequences differ — sometimes ARM needs more steps, sometimes it’s faster because each step is simpler and can be highly parallelised.

  • ARM has a simpler instruction set, uses less power but not as powerful.

  • x86 has a more complex instruction set but is more powerful.
Operation x86‑64 ARM64 Notes
Move immediate → reg mov rax, 5 mov x0, #5 ARM also has movz/movn/movk for 16‑bit chunk moves.
Move reg → reg mov rbx, rax mov x1, x0 Same idea; different register names.
Load from memory mov rax, [rbx] ldr x0, [x1] ARM is load/store: memory access is via ldr/str.
Load with complex addr mov rax,[rbx+rcx*4+16] ldr x0,[x1,x2,lsl #2] then add x0,x0,#16 (or use base+imm) x86 has richer single‑instr addressing; ARM often composes.
Store to memory mov [rbp-8], rax str x0, [x29, #-8] Frame pointers: rbp vs x29.
Add add rax, rbx add x0, x0, x1 ARM 3‑operand form keeps sources.
Subtract sub rax, rbx sub x0, x0, x1 Flags set similarly (NZCV on ARM, RFLAGS on x86).
Multiply (int) imul rax, rbx mul x0, x0, x1 x86 has many imul forms; ARM has separate widening variants (smull, etc.).
Divide (int) cqo ; idiv rbx sdiv x0, x0, x1 x86 uses implicit dividend in rax/rdx; ARM uses 3‑operand sdiv/udiv.
Bitwise AND/OR/XOR and rax, rbx and x0, x0, x1 OR: or (x86) vs orr (ARM).
Shifts shl rax, 3 lsl x0, x0, #3 Arithmetic right: sar (x86) vs asr (ARM).
Compare & branch cmp rax, rbx + je label cmp x0, x1 + b.eq label ARM branches use condition codes on b.<cond>.
Conditional move/select cmovz rax, rbx csel x0, x1, x2, eq ARM uses csel to pick between two regs.
Call & return call func / ret bl func / ret bl writes return addr to lr/x30.
Push/Pop push rax / pop rax stp x29, x30, [sp, #-16]! / ldp x29, x30, [sp], #16 ARM uses paired stores/loads; no single‑instr push/pop.
Load effective address lea rax, [rbx+8] add x0, x1, #8 or adr/adrp x0, label lea does address calc; ARM composes with add/adr.
Function args (ABI) SysV: rdi,rsi,rdx,rcx,r8,r9 AAPCS64: x0–x7 Extra args spill to stack; caller/callee‑saved sets differ.

Why aren’t we “all‑ARM” yet?

What ARM does well - Performance per watt: excellent efficiency → laptops, mobile, dense servers. - Integration: easy to build SoC designs (CPU + GPU + IO on one die). - Modern ISA design: clean load/store model; strong compiler support; NEON/SVE vectors.

What slows a full switch - Software & ABI compatibility: mountains of x86‑only binaries, drivers, plugins, and legacy line‑of‑business apps. - Tooling & ecosystem inertia: build systems, CI images, container bases, ops runbooks—all tuned for x86. - Certain niche perf stacks: some HPC/finance/media stacks lean on AVX/AVX‑512 and hand‑tuned x86 code paths. - Vendor/platform lock‑in worries: orgs hesitate to revalidate everything on a new arch without a compelling TCO win.

But the trend is real - Mobile is 100% ARM, Macs moved, and major clouds offer ARM instances. New Windows‑on‑ARM devices and better emulation/translation reduce friction. It’s a stepwise migration, not a flip.

The same C program on x86‑64 vs ARM64

Here’s a tiny C function and typical (simplified) compiler output at -O2. The logic is identical; the instruction “spelling” differs.

C (source)

// sum.c 
int sum(const int *a, int n) {     
    int s = 0;     
    for (int i = 0; i < n; i++) 
        s += a[i];     
    return s; 
}

x86‑64 System V ABI (Linux/macOS) – typical -O2 style

# rdi = a (pointer), esi = n 
sum:
    xor     eax, eax            # s = 0   (return reg)     
    xor     edx, edx            # i = 0     
    test    esi, esi     
    jle     .Ldone 
.Lloop:     
    add     eax, DWORD PTR [rdi + rdx*4]   # s += a[i]     
    inc     edx                              # i++     
    cmp     edx, esi     
    jl      .Lloop 
.Ldone:     
    ret

ARM64/AArch64 (AAPCS64) – typical -O2 style

# x0 = a (pointer), w1 = n 
sum:     
    mov     w2, #0              # i = 0     
    mov     w0, #0              # s = 0   (return reg)     
    cbz     w1, .Ldone 
.Lloop:     
    ldr     w3, [x0, x2, lsl #2]   # load a[i]     
    add     w0, w0, w3             # s += a[i]     
    add     w2, w2, #1             # i++     
    cmp     w2, w1     
    b.lt    .Lloop 
.Ldone:     
    ret

What to notice - Different register naming and calling conventions (x86: rdi/esi/eax; ARM: x0/w1/w0). - x86 uses a rich memory operand inside add; ARM uses explicit ldr then add (classic load/store). - Same macro behavior, different micro-instruction sequences.

See also