# CS2040 review

## Hashing

Hashing is an algorithm that maps larger data sets of keys to a smaller data set of fixed length.

1. Direct addressing table
• keys must be non-negative integers
• range of keys must be small
• keys must be dense, otherwise lots of space is wasted
1. Hash table

### Hash functions

What makes a good hash function?

• fast to compute
• scatter keys evenly through the hash table
• fewer collisions
• save space

Perfect hash function

• One-to-one mapping between keys and hashed values (NO COLLISION)
• Minimal perfect hash function: perfect hash function + the hash table’s size is equal to the number of keys

Uniform hash function

• distributes keys evenly in the hash table

Divison method

where $m$ is the size of hash table. Pick $m$ to be a prime number close to a power of two

Multiplication method

• Multiply k by A, extract the fraction part $f$
• Multiply $f$ by the table size $m$, take the floor

A can be $\sqrt{5}-1 \over 2$ which is roughly 0.618.

Hashing of strings

@Override
public int hashCode() {
int sum = 0;
char[] t = this.toCharArray();
for(char i: t) {
sum = sum*31 + i;
}
return sum % m;    // m is the table size
}


### Collision resolution

What makes a good collision resolution method

• Always find an empty slot if it exists
• Minimize (primary) clustering
• No secondary clustering (give different probe sequences for the same initial probes)
• Fast

#### Seperate chaining

Use linked list to store collided keys.

Reconstructing hash table

To keep the load factor of this method bounded (i.e. smaller than a constant), we need to increase m. In such case, we need to rehash all keys into bigger tables.

#### Linear probing

Insert

When the slot is occupied, we look for the next slot until find an empty slot. (wrap around when reached the end of array)

Find

First find the slot at hash(x), if not found, look at the next slot until find an empty slot then return not found.

Delete (hard)

First use find to find the slot to be deleted. Label it as deleted so that it doesn’t affect find.

Change find such that it treats slots labeled as deleted the same as an occupied slot. Change insert such that it will treat deleted slots as an empty slot and overwrite it.

Problem

Create consecutive occupied slots (primary clustering)

Modified linear probing

New probe sequence

Given that gcd(m,d)=0

• hash(key)
• (hash(key) + 1*d) % m
• (hash(key) + 2*d) % m
• (hash(key) + 3*d) % m

Probe sequence

• hash(key)
• ( hash(key) + 1 ) % m
• ( hash(key) + 4 ) % m
• ( hash(key) + 9 ) % m

Theorem of QP

If $\alpha$ < 0.5, and m is prime, then we can always find an empty slot. (m is the table size and $\alpha$ is the load factor)

Problem with QP

If two keys have the same initial position, their probe sequences are the same. (Secondary clustering)

#### Double hashing

Probe sequence

• hash(key)
• ( hash(key) + 1*hash2(key) ) % m
• ( hash(key) + 2*hash2(key) ) % m
• ( hash(key) + 3*hash2(key) ) % m

Secondary hash function must not evaluate to 0

### Java HashMap class

HashMap<String, Integer> map = new HashMap<>(50);
// Constructors for HashMap includes
// () which initializes it with initial capacity 11 and load factor of 0.75
// (int initialCapacity) with load factor 0.75
// (int initialCapacity, float loadFactor)

map.put("Leo", 26);
map.put("Emily", 47);

map.containsKey("Leo"); // True
map.containsKey(26); // False
map.containsValue("Leo"); // False
map.containsValue(26); // True

map.get("Emily"); // returns 47

map.clear(); // make the hashmap empty


## Heap (PriorityQueue)

### ADT: Priority Queue

• Enqueue(x)
• Put a new item xin the priority queue PQ (in some order)
• Dequeue()
• Return an item ythat has the highest priority(key) in the PQ – If there are more than one item with highest priority, return the one that is inserted first (FIFO)

### Binary Tree

Complete binary tree

• Every level except for the last level is full
• the last level is filled from left to right
• If last level is full, it is called perfect binary tree

Height: level-1 or the greatest number of edges from the root to the deepest leaf. Notice it is one smaller than level.

Store a complete binary tree

Starting from index 1 (1-based)

• parent(i) = floor(i/2); except for the root (which has no parent)
• left(i) = 2i (no left child when 2i > heapsize)
• right(i) = 2i + 1 (no right child when 2i+1 > heapsize)

Binary heap property

• Max heap: A[parent(i)] >= A[i]
• Min heap: A[parent(i)] <= A[i]

Insert

insert(v) {
heapsize++;
A[heapsize] = v;
shiftup(heapsize);
}


ShiftUp

shiftup(idx) {
while idx > 1 && A[parent(idx)] < A[idx]:
swap(A[idx], A[parent(idx)])
idx = parent(idx)
}


ExtractMax

extractmax() {
max = A[1];
A[1] = A[heapsize];
heapsize--;
shiftdown(1);
return max;
}


ShiftDown

shiftdown(idx) {
while(idx<=heapsize) {
int max=A[idx], maxidx=idx;
if(left(maxidx) <= heapsize && A[left(maxidx)] > max) {
max=A[left(maxidx)];
maxidx=left(maxidx);
}
if(right(maxidx) <= heapsize && A[right(maxidx)] > max) {
max=A[right(maxidx)];
maxidx=right(maxidx);
}
if(maxidx != idx) {
swap(A[maxidx], A[idx]);
idx = maxidx;
} else break;
}
}


Create Heap (O(n) version)

createheap(int[] arr) {
int[] A = new int[arr.length];
for(int i=0;i<arr.length;i++)
A[i] = arr[i];
for(int i=parent(arr.length-1);i>=1;i--)
shiftdown(i);
}


## BST (Binary Search Tree)

• For every vertex x and y
• y.key < x.key if y is in left subtree of x
• y.key ≥ x.key if y is in right subtree of x
• For simplicity, we assume that the keys are unique so that we can change ≥ to >