Understanding Hash Tables — Hardware-Level Deep Dive

Memory layout, CPU cache performance, collision resolution internals, and when to use hash tables vs alternatives.

Author

Mr. Oz

Date

Read

12 mins

Level 3

Memory layout visualization showing CPU cache lines and hash table bucket distribution

Author

Mr. Oz

Date

Read

12 mins

Share

At the hardware level, hash tables are a fascinating interplay between memory layout, CPU cache behavior, and algorithmic efficiency. Let's look under the hood.

Memory Layout & Cache Performance

Modern CPUs have multi-level caches (L1, L2, L3). Hash tables can be cache-friendly or cache-hostile depending on implementation:

  • Chaining: Each bucket is a pointer to a linked list — causes cache misses when traversing chains
  • Open addressing: Stores data directly in the array — better cache locality!

Cache line insight: When you access memory, the CPU loads a whole "cache line" (typically 64 bytes). Open addressing benefits from this as consecutive buckets share cache lines.

// Open addressing - better cache locality
struct Entry {
    Key key;
    Value value;
    bool occupied;
};
Entry table[SIZE];  // Contiguous memory!

Java's HashMap Internals

Java's HashMap is a sophisticated implementation that evolved over time:

  • Prior to Java 8: Simple chaining with linked lists
  • Java 8+: Uses red-black trees when bucket size > 8 (TREEIFY_THRESHOLD)
  • Why? O(log n) tree lookup beats O(n) list traversal for long chains

The tree-ification threshold is carefully tuned: converting to tree has overhead, so it only happens when beneficial.

Resizing and Rehashing

When a hash table resizes, ALL entries must be rehashed to new bucket positions. This is expensive!

  • Resizing cost: O(n) where n is the number of entries
  • Growth factor: Typically 2x (Java uses 2x, some use 1.5x)
  • Amortized analysis: Expensive resizes are spread over many operations

Pro tip: If you know the final size, initialize with that capacity to avoid resizes!

# Python: Pre-allocate for better performance
d = {}  # Starts small, resizes multiple times
d = {k: v for k, v in data}  # Better: knows final size

# Java: Pre-allocate
Map map = new HashMap<>(1000);  // Initial capacity

Performance Benchmarks

Real-world performance for 1 million lookups:

Data Structure Lookup Memory
Hash Table (avg case) ~50 ns ~32 MB
Hash Table (worst case) ~50 ms ~32 MB
Sorted Array (binary search) ~500 ns ~8 MB
Balanced Tree (std::map) ~200 ns ~40 MB

Note: Worst-case hash table performance occurs with pathological hash functions and poor resizing.

When Hash Tables AREN'T the Answer

Hash tables excel at lookups, but consider alternatives when:

  • Need ordered traversal: Use balanced tree (TreeMap, std::map)
  • Memory is constrained: Arrays use 2-4x less memory
  • Keys are in a small range: Direct-address table (array) is faster
  • Range queries: [min, max] lookups — trees are better
  • Prefix searches: Tries (prefix trees) excel here
  • Cache-critical hot loops: Sorted arrays with binary search

Real-World Use Cases

1. Database Indexes

PostgreSQL and MySQL use hash indexes for equality lookups. B-tree indexes are more common due to range query support.

2. Memoization & Caching

Python's `functools.lru_cache` uses a hash table to cache function results. Redis and Memcached are essentially distributed hash tables.

3. Symbol Tables

Compilers use hash tables to track variable names, function names, and scope information during parsing.

4. Network Routing

Routing tables map IP addresses to next-hop destinations. Fast lookups are critical for line-rate packet processing.

Advanced: Hash Function Quality

The quality of your hash function directly impacts performance. A poor hash function causes clustering.

Metrics for hash function quality:

  • Avalanche effect: Flipping one input bit flips ~50% of output bits
  • Uniform distribution: Keys spread evenly across all buckets
  • Speed: Should compute quickly (but not at the expense of quality)

Industry standard: SipHash, xxHash, and CityHash are modern, high-quality hash functions used in production systems.

Lock-Free Hash Tables

For concurrent access, traditional locking can be a bottleneck. Lock-free hash tables use atomic operations:

  • Read-copy-update (RCU): Readers proceed without locking
  • Compare-and-swap (CAS): Atomic updates to bucket pointers
  • Partitioning: Sharding the hash table across multiple locks

Real-world: Java's ConcurrentHashMap uses striped locks (multiple locks, each covering a subset of buckets).

Expert Recommendations

  • Profile before optimizing: Built-in hash tables are usually fast enough
  • Pre-allocate when possible: Avoid expensive resize operations
  • Use appropriate key types: Immutable, hashable keys prevent bugs
  • Consider alternatives: Trees, arrays, or tries may be better for specific use cases
  • Watch memory usage: Hash tables have significant memory overhead

Key Takeaways

  • Cache locality matters — open addressing has better cache behavior than chaining
  • Resizing is expensive — O(n) rehashing of all entries
  • Java 8+ uses trees for large buckets to maintain O(log n) worst-case performance
  • Alternatives exist: Trees, arrays, and tries each have their strengths
  • Concurrent access requires careful consideration of locking strategies
  • Production tip: Use built-in implementations unless you have a proven need for custom