# CS2040 review

## Hashing

Hashing is an algorithm that maps larger data sets of **keys** to a smaller data set of fixed length.

- Direct addressing table

- keys must be non-negative integers
- range of keys must be small
- keys must be dense, otherwise lots of space is wasted

- Hash table

### Hash functions

*What makes a good hash function?*

- fast to compute
- scatter keys evenly through the hash table
- fewer collisions
- save space

**Perfect hash function**

- One-to-one mapping between keys and hashed values (
**NO COLLISION**) *Minimal perfect hash function*: perfect hash function + the hash table’s size is equal to the number of keys

**Uniform hash function**

- distributes keys evenly in the hash table

**Divison method**

where $m$ is the size of hash table.
*Pick $m$ to be a prime number close to a power of two*

**Multiplication method**

- Multiply k by A, extract the fraction part $f$
- Multiply $f$ by the table size $m$, take the floor

A can be $\sqrt{5}-1 \over 2$ which is roughly 0.618.

**Hashing of strings**

```
@Override
public int hashCode() {
int sum = 0;
char[] t = this.toCharArray();
for(char i: t) {
sum = sum*31 + i;
}
return sum % m; // m is the table size
}
```

### Collision resolution

**What makes a good collision resolution method**

**Always**find an empty slot if it exists- Minimize (primary) clustering
- No secondary clustering (give different probe sequences for the same initial probes)
- Fast

#### Load factor

#### Seperate chaining

Use linked list to store collided keys.

*Reconstructing hash table*

To keep the load factor of this method bounded (i.e. smaller than a constant), we need to increase `m`

. In such case, we need to **rehash** all keys into bigger tables.

#### Linear probing

**Insert**

When the slot is occupied, we look for the next slot until find an empty slot. (wrap around when reached the end of array)

**Find**

First find the slot at hash(x), if not found, look at the next slot until find an empty slot then return not found.

**Delete (hard)**

First use `find`

to find the slot to be deleted. Label it as `deleted`

so that it doesn’t affect `find`

.

Change `find`

such that it treats slots labeled as `deleted`

the same as an occupied slot. Change `insert`

such that it will treat `deleted`

slots as an empty slot and overwrite it.

**Problem**

Create consecutive occupied slots (primary clustering)

**Modified linear probing**

*New probe sequence*

Given that **gcd(m,d)=0**

- hash(key)
- (hash(key) + 1*d) % m
- (hash(key) + 2*d) % m
- (hash(key) + 3*d) % m

#### Quadratic probing

*Probe sequence*

- hash(key)
- ( hash(key) + 1 ) % m
- ( hash(key) + 4 ) % m
- ( hash(key) + 9 ) % m

**Theorem of QP**

If $\alpha$ < 0.5, and m is prime, then we can always find an empty slot. (m is the table size and $\alpha$ is the load factor)

**Problem with QP**

If two keys have the same initial position, their probe sequences are the same. (Secondary clustering)

#### Double hashing

*Probe sequence*

- hash(key)
- ( hash(key) + 1*hash2(key) ) % m
- ( hash(key) + 2*hash2(key) ) % m
- ( hash(key) + 3*hash2(key) ) % m

**Secondary hash function must not evaluate to 0**

### Java HashMap class

```
HashMap<String, Integer> map = new HashMap<>(50);
// Constructors for HashMap includes
// () which initializes it with initial capacity 11 and load factor of 0.75
// (int initialCapacity) with load factor 0.75
// (int initialCapacity, float loadFactor)
map.put("Leo", 26);
map.put("Emily", 47);
map.containsKey("Leo"); // True
map.containsKey(26); // False
map.containsValue("Leo"); // False
map.containsValue(26); // True
map.get("Emily"); // returns 47
map.clear(); // make the hashmap empty
```

## Heap (PriorityQueue)

### ADT: Priority Queue

- Enqueue(x)
- Put a new item xin the priority queue PQ (in some order)

- Dequeue()
- Return an item ythat has the highest priority(key) in the PQ – If there are more than one item with highest priority, return the one that is inserted first (FIFO)

### Binary Tree

**Complete binary tree**

- Every level except for the last level is full
- the last level is filled from left to right
- If last level is full, it is called
*perfect binary tree*

**Height**: level-1 or the greatest number of edges from the root to the deepest leaf. **Notice** it is one smaller than level.

*Store a complete binary tree*

Starting from index 1 (1-based)

- parent(i) = floor(i/2); except for the root (which has no parent)
- left(i) = 2
*i (no left child when 2*i > heapsize) - right(i) = 2
*i + 1 (no right child when 2*i+1 > heapsize)

**Binary heap property**

- Max heap: A[parent(i)] >= A[i]
- Min heap: A[parent(i)] <= A[i]

*Insert*

```
insert(v) {
heapsize++;
A[heapsize] = v;
shiftup(heapsize);
}
```

*ShiftUp*

```
shiftup(idx) {
while idx > 1 && A[parent(idx)] < A[idx]:
swap(A[idx], A[parent(idx)])
idx = parent(idx)
}
```

*ExtractMax*

```
extractmax() {
max = A[1];
A[1] = A[heapsize];
heapsize--;
shiftdown(1);
return max;
}
```

*ShiftDown*

```
shiftdown(idx) {
while(idx<=heapsize) {
int max=A[idx], maxidx=idx;
if(left(maxidx) <= heapsize && A[left(maxidx)] > max) {
max=A[left(maxidx)];
maxidx=left(maxidx);
}
if(right(maxidx) <= heapsize && A[right(maxidx)] > max) {
max=A[right(maxidx)];
maxidx=right(maxidx);
}
if(maxidx != idx) {
swap(A[maxidx], A[idx]);
idx = maxidx;
} else break;
}
}
```

*Create Heap (O(n) version)*

```
createheap(int[] arr) {
int[] A = new int[arr.length];
for(int i=0;i<arr.length;i++)
A[i] = arr[i];
for(int i=parent(arr.length-1);i>=1;i--)
shiftdown(i);
}
```

## BST (Binary Search Tree)

- For every vertex x and y
- y.key < x.key if y is in left subtree of x
- y.key ≥ x.key if y is in right subtree of x

- For simplicity, we assume that the keys are unique so that we can change ≥ to >