면접 질문 💜
- 해시 테이블의 개념
- 시간복잡도 항상 O(1)인지
- 해시테이블 데이터 저장 방식
- 해시테이블의 단점
- 해시 충돌
- 개방주소법이 무엇인지 설명
- 해시 테이블을 쓰는 이유
- 해시 함수의 종류
- 자바스크립트 언어로 해시 테이블을 구현하는 방법
- HashMap vs Hashtable
- 버킷
- 해시 충돌이 일어나는 경우
- 해시테이블 활용예시
- 구현 방법
Reference (참고 사이트)
What is a Hash Table?
Data stucture that stores elements in key-value pairs where
- Key - unique integer (or property name is Javascript objects) that is used for indexing the values
- Value - data that is associated with the key
- Bucket - array cell/slot where nodes of key-value pairs are stored


For fast data store and retrieval
- In a linear search (ie. linked lists) and binary search (ie. BST), data lookups/search has a time complexity of O(n) and O(log n). For large datasets, these complexities can become significantly high
- Hashing allows lookups to occur in constant time (O(1))
Actions and their time complexity
insert ⇒ O(1)
- dependant on hash function (but it is assumed that most programming languages uses a hash function that is O(1))
- dependant on collision resolution
lookup (finding a value based on a key) ⇒ O(1)
- same as insert. given property name will be hashed and direct exactly to the address to find values
- however, depends on whether there are any collisions
delete ⇒ O(1)
- same thing because we use the key
- also, hash tables are unordered, we don’t have to shift indexes like we would with arrays
search (check if certain key exists) ⇒ O(1)
- same thing as lookup
Hash functions (해시 함수)
A hash function is used for mapping each element of a dataset to indexes in the table. The function generates a value of fixed length for each input that it gets. There are many types of hash functions (ie. md5, SHA-1, SHA-256, etc.) and we can assume that the hash functions in our programming languages use a type that has a time complexity of O(1).
A hash function for a hash table will generate a hash output, which is then used as an index in the hash table and an index space (or an address space) is alloted for that index.
Some key aspects of hash functions:
- One-way function (일방적) - it’s practically impossible to get what the input would be by the output (the key value cannot be retrieved from the hash)
- Idempotent (멱등성) (The same input will give the same output) - It’s not a random gibberish generator. If you keep inputing the same value, its output will be the same until you change the input value; avoids producing the same hash for different keys.
Hash Collisions (해시 충돌)
Hash Collision Animation
A collision occurs when two keys get mapped to the same index/memory space.

Remember, our computer has limited space and when we create a hash table, the computer decides how much space to allocate. It doesn’t decide to allocate all the space to this hash table; just a bit of it.
Within this limited space, the hash function randomly assigns a space and memory to store the data. Although hash functions are optimized to try to distribute data all over the available slots in memory, there is still a possibility that two keys will result in the same index. This situation where a newly inserted key maps to an already occupied slot in the has table is called a hash collision and there are mutliple ways of handling collisions.
Load Factor & Capacity
The load factor is the average number of key-value pairs per bucket.
When the load factor reaches a given limit, rehashing kicks in. Since rehashing increases the number of buckets, it reduces the load factor.
However, there is a tradeoff between time and space costs that we must consider when configuring the load factor limit.

Statistically speaking, the best performance for hash table is to operate between 0.25 and 0.75 load factor.
The capacity is the maximum number of key-value pairs for the given load factor limit and current bucket count. Since rehashsing increases the number of buckets, it increases the capacity.
Here is an example of a hash table, configured with a load factor limit of 4

Methods to resolve collisions (해시 충돌 해결)
Separate Chaining (분리 연결법)

Data structures used for Separate Chaining:
- Linked List
- Self Balancing Binary Search Tree (AVL tree, Red-Black Tree)
Rehashing (Resize)
As the lengths of the linked lists grow, the average lookup time increases (O(n)). Therefore, to keep linked lists short and lookups fast, the number of buckets must be increased (and nodes redistributed). This process is called rehashing.

Open Addressing (개방 주소법)
Tries to find another open slot to hold the item that caused the collision.

Linear Probing
- Systematically visiting each slot one at a time to find the next open slot (adding +1)
- Creates a tendency for clustering
- If many collisions occur at the same hash slot, the surrounding slots will be filled due to linear probing. Therefore, this will have an impact on items that are being inserted later (inducing collisions into other collisions)
- Therefore, more prone to having time complexity of O(n) if many collisions
linearProbing(pos) { return (pos + 1) % tableSize; } newHashValue = linearProbing(oldHashValue);
Quadratic Probing
- Similar to linear probing but instead of adding +1, adds a successive value of an arbitrary quadratic polynomial until an open slot is found. This is to lessen the problem of clustering.

quadraticProbing(pos) { for(let i = 0; i < tableSize; i++) { return (pos + i * i) % tableSize; } }
Open Addressing vs. Separate Chaining
- 일단 두 방식 모두 Worst Case 에서 O(n)이다
- 하지만
Open Address
방식은 연속된 공간에 데이터를 저장하기 때문에Separate Chaining
에 비해 캐시 효율이 높다. 따라서 데이터의 개수가 충분히 적다면Open Address
방식이Separate Chaining
보다 더 성능이 좋다.
- 한 가지 차이점이 더 존재한다.
Separate Chaining
방식에 비해Open Address
방식은 버킷을 계속해서 사용한다. 따라서Separate Chaining
방식은 테이블의 확장(rehash)을 보다 늦출 수 있다. - 보조 해시 함수(supplement hash function)의 목적은
key
의 해시 값을 변형하여 해시 충돌 가능성을 줄이는 것이다.Separate Chaining
방식을 사용할 때 함께 사용되며 보조 해시 함수로 Worst Case 에 가까워지는 경우를 줄일 수 있다.
Arrays vs. Sets and Objects vs. Maps
Arrays

- Use when you need ordered list of values (and it might contain duplicates)
- Use when you need to manipulate data
Sets

- Use when you need to work with unique values
- Use when high-performance is really important (operations like searching for an item or deleting an item from a set can be 10x faster than in arrays)
- Use to remove duplicates from arrays
Objects

- More “traditional” key/value store
- Easier to write and access values with . and [] operators
- Use when you need to include functions (methods) and also use
this
keyword to access properties of the same object
- Use when working with JSON (can convert to map)
Maps

- Better performance
- Keys can have any data type
- Maintains insert order
- Easy to iterate
- Easy to compute size of map
- Use when you simply need to map key to values
- Use when you need key that are NOT strings
Implementing Hash Tables (해시 테이블 구현)
Simple hash function
_hash(key) { let hash = 0; for(let i = 0; i < key.length; i++){ hash = (hash + key.charCodeAt(i) * i) % this.data.length } return hash; }
Simple hash table constructor
class HashTable { constructor(size){ this.limit = size; // set inital limit size this.data = new Array(this.limit); this.count = 0; // count how many key-values are currently stored } _hash(key) { let hash = 0; for (let i = 0; i < key.length; i++){ hash = (hash + key.charCodeAt(i) * i) % this.data.length } return hash; } _resize(limit) { const oldData = this.data; this.limit = limit; this.data = new Array(this.limit); this.count = 0; // iterate through oldData and insert into new for(let bucket of oldData) { if(bucket) { for(let arr of bucket){ this.insert(bucket[arr][0], bucket[arr][1]);; } } } } insert(key, value) { const index = this._hash(key); if(!this.data[index]) { this.data[index] = []; } this.data[index].push([key, value]); this.count++; // check if new insert exceeds 75% limit if(this.count / this.limit > 0.75) { this._resize(this.limit * 2); } } lookup(key) { const index = this._hash(key); const currentBucket = this.data[index]; if(!currentBucket) return console.error('Item does not exist'); for(const i in currentBucket){ if(currentBucket[i][0] === key) return currentBucket[i][1]; }; } // returns array of all keys in hash keys() { const keysArr = []; for(let bucket of this.data) { if(bucket) { // iterate over any linked list for(let arr of bucket) { keysArr.push(arr[0]) } } } return keysArr; } } const myHashTable = new HashTable(100); myHashTable.set('grapes', 10000); myHashTable.set('grape', 12345); myHashTable.set('apples', 500); myHashTable.set('banana', 2000); console.log(myHashTable); console.log(myHashTable.get('grape'));
Biggest downfall of hash tables
- In
keys()
, we have to loop over the whole length of the hash table & contains nested loop (time complexity of O(n^2)) to find existing keys because it is unordered and we need to consider for collisions - This is why in JavaScript,
for...in
method is very slow
Exercise: Find First Recurring Character in Array
Test Case
array | return |
[2, 5, 1, 2, 3, 5, 1, 2, 4] | 2 |
[2, 1, 1, 2, 3, 5, 1, 2, 4] | 1 |
[2, 3, 4, 5] | undefined |