Understanding Hash Tables — Implementation Deep Dive

Hash functions, collision resolution strategies, common operations, and production-ready implementation.

Author

Mr. Oz

Date

1 February 2026

Read

8 mins

Now that we understand the magical catalog analogy, let's dive into the actual implementation. We'll explore hash functions, collision resolution, and how to build production-ready hash tables.

The Hash Function

The hash function is the heart of a hash table. It takes a key and converts it into a bucket index.

A good hash function should:

Be deterministic: Same key always produces the same hash
Distribute evenly: Spread keys across all buckets
Be fast: Compute the hash quickly
Minimize collisions: Different keys should hash to different values

# Simple hash function for strings
def simple_hash(key, table_size):
    hash_value = 0
    for char in key:
        hash_value = (hash_value * 31 + ord(char)) % table_size
    return hash_value

# Python's built-in hash() then modulo
def better_hash(key, table_size):
    return hash(key) % table_size

# Java example
int hash = key.hashCode() % tableSize;

Collision Resolution

When two keys hash to the same bucket, we have a collision. Here's how we handle it:

1. Chaining (Linked List)

Each bucket contains a linked list of all key-value pairs that hash to that location.

# Visual representation
Bucket 0: []
Bucket 1: [("Alice", "555-1234")] → ("Charlie", "555-9999")
Bucket 2: [("Bob", "555-5678")]
Bucket 3: []

2. Open Addressing (Probing)

When a collision occurs, find the next empty bucket using a probing strategy.

Linear Probing: Check bucket 0, 1, 2, 3...
Quadratic Probing: Check bucket 0, 1, 4, 9...
Double Hashing: Use a second hash function to determine step size

# Linear probing example
def put(key, value):
    index = hash(key) % size
    while table[index] is not None:
        if table[index].key == key:
            table[index].value = value  # Update existing
            return
        index = (index + 1) % size  # Move to next bucket
    table[index] = KeyValue(key, value)

Common Operations

1. Insert/Update (Put)

def put(key, value):
    index = hash(key) % size
    # With chaining
    for entry in buckets[index]:
        if entry.key == key:
            entry.value = value  # Update
            return
    buckets[index].append(KeyValue(key, value))  # Insert

2. Lookup (Get)

def get(key):
    index = hash(key) % size
    for entry in buckets[index]:
        if entry.key == key:
            return entry.value
    return None  # Not found

3. Delete

def delete(key):
    index = hash(key) % size
    for i, entry in enumerate(buckets[index]):
        if entry.key == key:
            buckets[index].pop(i)  # Remove from chain
            return True
    return False  # Not found

Load Factor

The load factor is the ratio of filled buckets to total buckets:

load_factor = number_of_entries / number_of_buckets

When load factor exceeds a threshold (typically 0.7 or 0.75), it's time to resize the hash table!

Why resize? As the table fills up, collisions increase, performance degrades. Resizing rehashes all entries into a larger table.

Built-in Implementations

Most languages provide optimized hash table implementations:

Python: dict - Dictionary with chaining
Java: HashMap - Chaining with red-black tree for large buckets
C++: unordered_map - Chaining implementation
JavaScript: Map and Object - Hash table semantics
Go: map - Built-in hash table

Time & Space Complexity Reference

Operation	Average	Worst
Insert	O(1)	O(n)
Lookup	O(1)	O(n)
Delete	O(1)	O(n)
Space	O(n)	O(n)

Worst case occurs when all keys hash to the same bucket (degenerate to linked list). Good hash functions prevent this!

Common Pitfalls to Avoid

Using mutable keys: Never use mutable objects as keys (like lists in Python)
Wrong: hash_map[[1,2]] = "value"
Assuming order: Hash tables don't guarantee order (use LinkedHashMap if needed)
Poor hash functions: Can cause clustering and performance degradation
Ignoring load factor: Not resizing causes performance to degrade over time

Production Tips

Use built-in implementations: They're highly optimized and battle-tested
Choose initial capacity wisely: If you know the approximate size, pre-allocate
Consider immutable keys: Strings and integers make great hash keys
Profile before optimizing: Built-in hash tables are usually fast enough
Handle missing keys gracefully: Use get() with default values

Ready for the deep dive?

Continue to Level 3 Back to Level 1

Key Takeaways

Hash functions convert keys to bucket indices deterministically
Collisions are inevitable; handle them with chaining or probing
Chaining uses linked lists; probing finds empty buckets
Load factor indicates when to resize (typically at 0.75)
Average O(1) for insert, lookup, and delete operations
Use built-in implementations for production code