Understanding Binary Arithmetic — Level 3: Hardware & Optimization

Dive deep into hardware-level binary operations. Learn how CPUs perform addition, understand bitwise manipulation, and explore optimization strategies.

Author

Mr. Oz

Date

1 March 2026

Read

12 mins

We've seen how to implement binary addition in software. Now let's understand what happens at the hardware level. How does a CPU actually add two numbers? What are the primitive operations that make all computation possible?

The ALU: The CPU's Calculator

The Arithmetic Logic Unit (ALU) is the part of the CPU that performs mathematical operations. At its heart, it's a collection of logic gates that implement binary arithmetic.

Basic Logic Gates

All binary operations reduce to three fundamental gates:

AND: Output 1 only if both inputs are 1
OR: Output 1 if at least one input is 1
XOR: Output 1 if inputs are different (exclusive or)
NOT: Invert the input (0→1, 1→0)

AND: 0&0=0 0&1=0 1&0=0 1&1=1
OR: 0|0=0 0|1=1 1|0=1 1|1=1
XOR: 0^0=0 0^1=1 1^0=1 1^1=0
NOT: ~0=1 ~1=0

The Half Adder

A half adder adds two single bits and produces a sum and carry:

Sum = A XOR B (different inputs → 1)
Carry = A AND B (both 1 → carry)

A | B | Sum | Carry
--+---+-----+------
0 | 0 | 0 | 0
0 | 1 | 1 | 0
1 | 0 | 1 | 0
1 | 1 | 0 | 1

Two gates, two inputs, two outputs. This is the atomic unit of addition!

The Full Adder

A full adder adds two bits AND a carry-in from the previous position:

Sum = A XOR B XOR Carry_in
Carry_out = (A AND B) OR (Carry_in AND (A XOR B))

A | B | Cin | Sum | Cout
--+---+-----+-----+------
0 | 0 | 0 | 0 | 0
0 | 0 | 1 | 1 | 0
0 | 1 | 0 | 1 | 0
0 | 1 | 1 | 0 | 1
1 | 0 | 0 | 1 | 0
1 | 0 | 1 | 0 | 1
1 | 1 | 0 | 0 | 1
1 | 1 | 1 | 1 | 1

Notice: when all three are 1, sum is 1 and carry is 1. That's 1+1+1=3, which is binary 11!

Ripple Carry Adder

To add multi-bit numbers, we chain full adders together. Each adder's carry-out connects to the next adder's carry-in:

A[3] B[3] A[2] B[2] A[1] B[1] A[0] B[0]
↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓
[Full Adder] [Full Adder] [Full Adder] [Half Adder]
↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓
S[3] ← Cout S[2] ← Cout S[1] ← Cout S[0] (no Cin)

The problem: The carry must "ripple" through all stages. In the worst case (adding 111...1 + 1), the carry propagates through every adder. This is slow for large numbers!

Bitwise Operations in Software

Programming languages expose these hardware operations through bitwise operators:

Adding with XOR and AND

// Add without + operator (demonstrates hardware principle)
int add(int a, int b) {
    while (b != 0) {
        int sum = a ^ b;       // XOR: add without carry
        int carry = (a & b) << 1;  // AND + shift: find and move carry
        a = sum;
        b = carry;
    }
    return a;
}

How it works:

a ^ b gives the sum ignoring carries (0+0=0, 0+1=1, 1+0=1, 1+1=0)
a & b finds where both are 1 (where we need to carry)
<< 1 shifts the carry to the next position
Loop until no more carries

Optimization: Carry Lookahead Adder

Modern CPUs use carry lookahead to compute carries in parallel rather than waiting for them to ripple:

Generate (G): A AND B — this position generates a carry
Propagate (P): A XOR B — this position propagates an incoming carry

With these signals, we can compute whether position i will have a carry without waiting for previous positions to compute theirs. This reduces addition time from O(n) to O(log n) gate delays.

Expert insight: Intel and AMD CPUs use hybrid approaches: carry lookahead for most bits, with carry select for the final stages. This achieves single-cycle addition for 64-bit numbers.

Practical Bitwise Tricks

1. Check if a number is even/odd

bool isEven = (n & 1) == 0;  // Check least significant bit

2. Multiply/divide by powers of 2

int doubled = n << 1;     // Multiply by 2
int halved = n >> 1;       // Divide by 2 (integer)
int times8 = n << 3;       // Multiply by 8

3. Swap without temporary

a ^= b;  // a = a XOR b
b ^= a;  // b = b XOR (a XOR b) = original a
a ^= b;  // a = (a XOR b) XOR original a = original b

4. Count set bits (population count)

// Brian Kernighan's algorithm
int countBits(int n) {
    int count = 0;
    while (n) {
        n &= (n - 1);  // Clear the least significant set bit
        count++;
    }
    return count;
}

Two's Complement: Negative Numbers

Computers represent negative numbers using two's complement:

Flip all bits (NOT operation)
Add 1

Example: -5 in 8-bit two's complement
5 = 00000101
~5 = 11111010 (flip bits)
-5 = 11111011 (add 1)

This representation makes subtraction identical to addition: a - b = a + (-b). The same adder circuit works for both!

Overflow Detection

When adding signed numbers, overflow occurs if:

Two positive numbers produce a negative result
Two negative numbers produce a positive result

// Detect overflow in addition
bool overflow = (a > 0 && b > 0 && sum < 0) ||
                (a < 0 && b < 0 && sum > 0);

Modern CPUs have an overflow flag in the status register that's automatically set.

Real-World Performance

Addition is one of the fastest CPU operations:

Operation	Typical Latency	Throughput
Integer Add (64-bit)	1 cycle	4 per cycle
Integer Multiply	3-4 cycles	1 per cycle
Integer Divide	10-60 cycles	1 per 10-60 cycles
Bitwise AND/OR/XOR	1 cycle	4 per cycle

Addition and bitwise operations are essentially free compared to multiplication and division!

SIMD: Adding Multiple Numbers at Once

Modern CPUs support SIMD (Single Instruction, Multiple Data) operations that add multiple numbers simultaneously:

// AVX2: Add eight 32-bit integers in one instruction
__m256i a = _mm256_loadu_si256((__m256i*)array_a);
__m256i b = _mm256_loadu_si256((__m256i*)array_b);
__m256i sum = _mm256_add_epi32(a, b);  // 8 additions at once!
_mm256_storeu_si256((__m256i*)result, sum);

This is why modern computers are so fast at operations like image processing — they perform 8, 16, or even 32 additions per clock cycle!

Explore all levels

← Level 1 Level 2

Key Takeaways

Binary addition is built from logic gates: XOR for sum, AND for carry
Full adders chain together for multi-bit addition, with carries rippling through
Carry lookahead enables parallel carry computation for faster addition
Bitwise operations in software directly map to hardware gates
Two's complement makes subtraction work with the same adder circuit
SIMD allows adding multiple numbers in a single CPU instruction