Steven's Knowledge

Hash Maps

The most versatile interview data structure — internals, patterns, and when they fail

Hash Maps

If you could only bring one data structure to an interview, bring the hash map. It turns O(n) lookups into O(1), and that single property solves an astonishing number of problems. Most "optimize from brute force" interview answers boil down to "use a hash map."

How Hash Maps Work

A hash map stores key-value pairs in an array of "buckets." The process:

  1. Hash the key -- convert the key to an integer (the hash code)
  2. Map to a bucket -- hash code % array size = bucket index
  3. Handle collisions -- when two keys map to the same bucket

Collision Resolution

Chaining (most common): Each bucket holds a linked list. Collisions just append to the list.

bucket[0] -> (key1, val1) -> (key5, val5)
bucket[1] -> (key2, val2)
bucket[2] -> null
bucket[3] -> (key3, val3) -> (key7, val7) -> (key9, val9)

Open addressing: On collision, probe other buckets (linear probing, quadratic probing, double hashing). Used by Python's dict.

Performance

OperationAverageWorst case
GetO(1)O(n)
SetO(1)O(n)
DeleteO(1)O(n)
Has keyO(1)O(n)

Worst case happens when all keys hash to the same bucket. In practice with a decent hash function, this essentially never happens.

Load factor: When the ratio of entries to buckets exceeds a threshold (typically 0.75), the map resizes -- doubles the bucket array and rehashes everything. This is O(n) but amortized O(1) per insertion.

Pattern 1: Frequency Counting

Count occurrences of each element.

function topKFrequent(nums: number[], k: number): number[] {
  // Step 1: Count frequencies
  const freq = new Map<number, number>();
  for (const num of nums) {
    freq.set(num, (freq.get(num) ?? 0) + 1);
  }

  // Step 2: Bucket sort by frequency
  const buckets: number[][] = Array.from({ length: nums.length + 1 }, () => []);
  for (const [num, count] of freq) {
    buckets[count].push(num);
  }

  // Step 3: Collect top K from highest frequency
  const result: number[] = [];
  for (let i = buckets.length - 1; i >= 0 && result.length < k; i--) {
    result.push(...buckets[i]);
  }

  return result.slice(0, k);
}
// Time: O(n), Space: O(n)

Character Frequency

function firstUniqChar(s: string): number {
  const count = new Map<string, number>();
  for (const c of s) {
    count.set(c, (count.get(c) ?? 0) + 1);
  }
  for (let i = 0; i < s.length; i++) {
    if (count.get(s[i]) === 1) return i;
  }
  return -1;
}

Pattern 2: Two Sum (Complement Lookup)

The canonical hash map problem. For each element, check if its complement exists.

function twoSum(nums: number[], target: number): [number, number] {
  const seen = new Map<number, number>(); // value -> index

  for (let i = 0; i < nums.length; i++) {
    const complement = target - nums[i];
    if (seen.has(complement)) {
      return [seen.get(complement)!, i];
    }
    seen.set(nums[i], i);
  }

  throw new Error('No solution');
}
// Time: O(n), Space: O(n)

This pattern generalizes: whenever you need to find a pair that satisfies some relationship, hash the first element and look up the second.

Pattern 3: Group By

Group elements by a computed key.

function groupAnagrams(strs: string[]): string[][] {
  const groups = new Map<string, string[]>();

  for (const str of strs) {
    // Sort characters as the group key
    const key = [...str].sort().join('');
    if (!groups.has(key)) groups.set(key, []);
    groups.get(key)!.push(str);
  }

  return [...groups.values()];
}
// Time: O(n * k log k) where k = max string length

Optimization: Instead of sorting, use a character count as the key:

function groupAnagramsFast(strs: string[]): string[][] {
  const groups = new Map<string, string[]>();

  for (const str of strs) {
    const count = new Array(26).fill(0);
    for (const c of str) {
      count[c.charCodeAt(0) - 'a'.charCodeAt(0)]++;
    }
    const key = count.join(',');
    if (!groups.has(key)) groups.set(key, []);
    groups.get(key)!.push(str);
  }

  return [...groups.values()];
}
// Time: O(n * k) -- no sorting needed

Pattern 4: Seen / Visited Tracking

Track which elements you have encountered.

function containsDuplicate(nums: number[]): boolean {
  const seen = new Set<number>();
  for (const num of nums) {
    if (seen.has(num)) return true;
    seen.add(num);
  }
  return false;
}

function longestConsecutive(nums: number[]): number {
  const numSet = new Set(nums);
  let longest = 0;

  for (const num of numSet) {
    // Only start counting from the beginning of a sequence
    if (!numSet.has(num - 1)) {
      let length = 1;
      let current = num;
      while (numSet.has(current + 1)) {
        current++;
        length++;
      }
      longest = Math.max(longest, length);
    }
  }

  return longest;
}
// Time: O(n), Space: O(n)

LRU Cache

A classic design problem that combines a hash map with a doubly linked list.

class LRUCache {
  private capacity: number;
  private cache: Map<number, number>;

  constructor(capacity: number) {
    this.capacity = capacity;
    // Map preserves insertion order in JS/TS
    this.cache = new Map();
  }

  get(key: number): number {
    if (!this.cache.has(key)) return -1;
    // Move to end (most recently used)
    const val = this.cache.get(key)!;
    this.cache.delete(key);
    this.cache.set(key, val);
    return val;
  }

  put(key: number, value: number): void {
    if (this.cache.has(key)) {
      this.cache.delete(key);
    }
    this.cache.set(key, value);
    if (this.cache.size > this.capacity) {
      // Delete least recently used (first inserted)
      const firstKey = this.cache.keys().next().value!;
      this.cache.delete(firstKey);
    }
  }
}
// All operations: O(1)

Note: This uses the fact that JavaScript Maps preserve insertion order. In other languages, you would use a doubly linked list + hash map explicitly.

When Hash Maps Fail

Hash maps are not always the answer:

SituationBetter alternative
Need sorted orderBalanced BST, sorted array
Need range queriesBalanced BST, segment tree
Need min/max quicklyHeap
Keys are small integers (0..n)Plain array -- faster constant
Memory is extremely tightSorted array + binary search
Need persistent/immutable versionTrie, immutable tree

Hash map overhead: Each entry has overhead for the hash, pointers, and load factor slack. For small integer keys where the range is known, a plain array is both faster and more memory-efficient.

Tips for Interviews

  1. Default to hash map when you need O(1) lookup. It is almost always the right instinct.
  2. Consider the key carefully. The grouping key determines the pattern. Anagram -> sorted string. Coordinate -> "x,y" string.
  3. Watch out for hash collisions in custom keys. If you use a string as a compound key, make sure the encoding is unambiguous.
  4. Sets are hash maps without values. Use Set for membership testing, Map when you need associated data.
  5. Frequency counting + bucket sort gives O(n) top-K without a heap. Useful when the frequency range is bounded by n.

On this page