Section 1 - Introduction
- Binary Data System
- How to convert between binary and deca
Section 2 - Analyzing Algorithms
- Time Complexity
- Standard way of analyzing and comparing different algorithms. e.g. Big O Notation.
- Math Refresher
- Logarithmic Functions
Logx(Y) = A: x^A = Y
- Factorial Functions
3! = 1*2*3
- Algebraic Expression
nlog2(n)
- Logarithmic Functions
- n Notation
- An analysis of how the algorithm will scale with more and more data. It’s represented as a function of n.
- Look for large patterns
- Dont care about multiples. e.g. sqat(n) equals 3*sqat(n)
- Just take the lastest in an equation. e.g. -> sqat(n) -> sqat(n) -> nlogn —> sqat(n) as the lastest one
- Typical pattern: n, N!, 2^N, n^2, Nlog(N), sqat(N), log(N), 1
- n Notation example:
- Every cycle of a program takes .001 sec, how much faster will nlog(n) taek than n^2 if 1,000 pieces of data are input?
1 2 3 4 5
nlogn = 1000log(1000) = 9960*0.001 = 9.96 seconds n^2 = 1000^2 = 1,000,000 * 0.001 = 1,000 seconds = 16 minutes nlogn = 25000log(25000) = 365,000*0.001 = 6 minutes 5 seconds n^2 = 25000^2 = 625,000,000*0.001 = 7 days 5 hours
- Big O Notation
- O(nlogn) means at worst, it will be nlogn. Cause it means at worst, that give guidance to evalute how fast the programe will run.It gives us a bound. e.g.
- Load: O(n^2)
- Execute Step 1: O(nlogn)
- Execute Step 2: O(n)
- Save: O(1) Time Complexity is biggest one as O(n^2) as worst case.
Big O Real-world Example
1 2 3 4 5 6 7
for each data "N" in data list { //Check if Data is in List For each data "W" in data list{ if N == W, print True } }
Time complexity is: n * n = n^2
- O(nlogn) means at worst, it will be nlogn. Cause it means at worst, that give guidance to evalute how fast the programe will run.It gives us a bound. e.g.
Section 3: Arrays
- Fixed Arrays
- Collection of data together with fixed size
- Run time
- Insert(Rand) - O(n)
- Insert(Front) - O(n)
- Insert(Back) - O(1)
- Delete(Front) - O(n)
- Delete(Back) - O(1)
- Search(Unsorted) - O(n)
- Search(Sorted) - O(logn) – Binary Search (L+R)/2
- Circular Array
Never Out of bound issue. Length == mod
1 2 3 4 5 6 7 8 9 10 11
def printCircularArray(array, size, ind): i = ind #print from ind-th index to (size+i)th index while i < size + ind: print(a[(i % size)], end = " ") i = i + 1 a = ['A', 'B', 'C', 'D', 'E', 'F'] n = len(a); printCircularArray(a, n, 3)
- Dynamic Arrays
- Why double the array when allocating new array instead of add extra one box every time?
- Approximity
- When data size growth, in some cases, why O(n) -> O(1) by taking dynamic array as sample.
Section 4: Linked Lists
- Nodes
- Composed by 3 parts: Data itself of address, Prev pointer, and Next pointer.
- Linked List
- A bunch of Nodes
- Singly Linked List(单向链表): One way. Pointing to NULL means it’s the end of the list.
- No need to allocate space ahead of time, which might be benefit comparing with Arrays by saving space. E.g. Satalite.
- Run Time
- Insert(Rand): O(n)
- Insert(Front): O(1)
- Insert(Back): O(n)
- Delete(Front): O(1)
- Delete(Back): O(n)
- Search(Unsorted): O(n)
- Search(Sorted): O(n)
- Access Time: O(n)
Code Example
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
class Node: def __init__(self, data): self.data = data self.next = None class LinkedList: def __init__(self): self.head = None def __init__(self, head): self.head = head def __iter__(self): node = self.head while node is not None: yield node node = node.next def add(self, new_data): new_node = Node(new_data) new_node.next = self.head self.head = new_node #run time to add at front is O(1) versus adding to the end
Run Code as below
1 2 3 4 5 6
a = Node("a") l = LinkedList(a) l.add("b") l.add("c") for node in l: print(node.data)
- Doubly Linked List (双向链表)
- Add a Prev pointer
- Benefit comparing with Singly Linked List is, when deleting a node, Singly Linked List will need run time as: Search O(n) + Delete O(n), total run time will be O(2n) With Prev pointer, it can improve to O(n).
- Tail Pointer
- Keep track the last node
- Optimize Insert(Back) and Delete(Back) from O(n) to O(1)
- Example
- Web page history in web browser
Section 5: Stacks and Queues
- Stacks
- Think about trays in the cafeteria
- Using Pop() to get the top element, and use Push() to put elements. This is called LIFO(Last In First Out, or Most Recent IN, First Out)
- Trying to Pop from an empty stack will cause an error
Examples
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
def createStack(): stack = [] return stack def isEmpty(stack): return len(stack) == 0 def push(stack, item): stack.append(item) print(item + " pushed to stack") def pop(stack): if(isEmpty(stack)): return str("Null") return stack.pop() stack = createStack() push(stack, str(10)) print(pop(stack) + " from stack") print(pop(stack))
1 2 3 4 5 6 7
stacks = ['orange', 'apple', 'pear', 'banana', 'kiwi', 'apple', 'banana'] print(stacks) stacks.pop() stacks.pop() stacks.append('coconut') print(stacks) print(type(stacks))
- Queues
- FIFO (First In, First Out).
- Using a Circular Array or Doubly Linked List with Tail Pointer to implement
E.g. How cpu process the tasks.
- Code Snippet
1 2 3 4 5 6 7 8 9
from collections import deque queue = deque(["Eric", "John", "Michael"]) queue.append("Smith") queue.append("Tony") print(queue) queue.pop() print(queue) queue.popleft() print(queue)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
class Queue: def __init__(self, capacity): self.front = self.size = 0 self.rear = capacity -1 self.Q = [None] * capacity self.capacity = capacity def isFull(self): return self.size == self.capacity def enQueue(self, item): if self.isFull(): print("Full) return self.rear = (self.rear + 1) % (self.capacity) self.Q[self.rear] = item self.size = self.size + 1 pritn("%s enqueued to queue" % str(item))
- Stack and Queue Real World Example
- Undo and Redo
- Maze
Case learning
1 2 3 4 5 6 7 8 9
We have a stack and queue, [-, -, 4, 3] and [-, -, 2, 1] respectively. When you pop off of the stack, it enqueue on to the queue, implemented as a circular array.(So pop(x) => both pop off the stack and enqueue(x)). (Front of the queue is currently on 2, and back on 1. We queue on to the front, and remove from the back. Note: ) What will the stack and queue look like after: *pop(), push(4), pop(), dequeue(), push(5), push(6), pop(), dequeue(), push(4), pop(), dequeue() a. Stack: [-,-,5,3] Queue: [4,-,4,6] b. Stack: [-,-,5,3] Queue: [-,-,4,6] c. Stack: [-,-,-,5] Queue: [4,-,-,6] d. Stack: [-,-,-,-] Queue: [4,-,4,6]
Sorting Algorithm
Visualgo.net shows all the sorting algorithm with visualized motional graphs. Very useful resoruce.
- Bubble Sort (冒泡排序)
- Bubble Sort is arguably the simplest sorting algorithm. It works by repeatedly swapping adjacent elements that are in the wrong order.
- Easy to implement, but bad algorithm
- Run Time:
- O(n^2)
1 2 3 4 5 6 7
do swapped = false for i = 1 to idnexOfLastUnsortedElement - 1 if leftElement > rightElement swap(leftElement, rightElement) swapped = true while swapped
- Code Sample
1 2 3 4 5 6 7
def bubbleSort(arr): n = len(arr) for i in range(0, n-1): for j in range(0, n-i-1): if arr[j] > arr[j+1]: arr[j], arr[j+1] = arr[j+1], arr[j]
- Selection Sort (选择排序)
- Run Time: O(n^2). Even the best run time is also n^2
- The most inefficient sort algorithm
Code Sample
1 2 3 4 5 6 7 8 9
def selectionSort(arr): n = len(arr) for i in range(0, n-1): min_index = i for j in range(i+1, n-1): if arr[min_index] > arr[j] min_index = j arr[i], arr[min_index] = arr[min_index], arr[i]
- Insertion Sort (插入排序法)
- The selection sort is very similar to that of bubble sort. Instead of finding a max however, It repeatedly finds the minimum element from the unsorted portion, and puts it into a “sorted portion”.
- Run time: O(n^2) (identical to bubble sort)
Recursion Sample
1 2 3 4 5 6 7 8
int sumBy3(int n, int x) { print("We are at: " + n) if(n < = 1) // base case return x; else return sumBy3(n-3, x+n) }
- Quick Sort
- Quick Sort works off picking a “pivot” point. All numbers less than the pivot point go to the left, and all numbers greater than the pivot point go to the right. It then reapplies this algorithm to each side of the pivot
- Run time: average case is O(nlogn)
- A Complete Overview of Quicksort worth to figure out, which can clearly explained the source codes.
Example
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
def quicksort(arr, low, high): if low < high: # base case to exit from recursion #create our pivot array partitionIndex = partition(arr, low, high) quickSort(arr, low, partitionIndex -1 ) quickSort(arr, partitionIndex + 1, high) def partition(arr, low, high): i = (low - 1) # index of smaller elelment pivot = arr[high] # pivot for j in range(low, high): # If curent elelment is smaller than the pivot if arr[j] < pivot: # increment index of smaller element i = i + 1 arr[i], arr[j] = arr[j], arr[i] arr[i+1], arr[high] = arr[high], arr[i+1] return (i+1) arr = [55, 44, 10, 30, 15, 35] quickSort(arr, 0, len(arr)-1)
- Merge Sort (归并排序)
- Merge Sort is the fastest algorithm we are going to be analyzing. (Fastest on average, quicksort is technically faster, but has the problem of the n^2 worst case). It has a way of dividing the data up like in quick sort, except that it doesn’t require selecting a pivot point to get the nlogn run times
- Kind of the fastest algorithm. Recommend Merge Sort - Data Structure & algorithms Tutorial Python which tell the merge sort with Python clearly.
Run Time: O(nlogn)
MergeSort(arr[], l, r) if r > 1 1. Find the pivot. We are choosing the array.: 1. middle m = (1+r)/2 2. Call mergeSort recursively for first half: 1. Call mergeSort(arr, 1, m) 3. Call mergeSort recursively for second half: 1. Call mergeSorge(arr, m+1, r) 4. Merge the two halves sorted in step 2 and 3: 1. Call merge(arr, 1, m, r)
Code Sample
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43
def merge_sort(arr): if len(arr) <= 1: return arr mid = len(arr)//2 left = arr[:mid] right = arr[mid:] left = merge_sort(left) right = merge_sort(right) return merge_two_sorted_lists(left, right) def merge_two_sorted_lists(a, b): sorted_list = [] len_a = len(a) len_b = len(b) i = j = 0 # i is the index of a, j is the index of b while i < len_a and j < len_b: if a[i] <= b[j]: sorted_list.append(a[i]) i+ = 1 else: sorted_list.append(b[j]) j+ = 1 while i < len_a: sorted_list.append(a[i]) i+ = 1 while j < len_b: sorted_list.append(b[j]) j+ = 1 return sorted_list if __name__ == '__main__': arr = [5, 8, 23, 56, 89, 100, 7, 9, 52, 45] sorted_arr = merge_sort(arr) print(sorted_arr)
Optimized Code Sample by saving memory
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44
def merge_sort(arr): if len(arr) <= 1: return mid = len(arr)//2 left = arr[:mid] right = arr[mid:] left = merge_sort(left) right = merge_sort(right) merge_two_sorted_lists(left, right, arr) def merge_two_sorted_lists(a, b, arr): len_a = len(a) len_b = len(b) i = j = k = 0 # i is the index of a, j is the index of b, k point to the postion of new sorted array while i < len_a and j < len_b: if a[i] <= b[j]: arr[k] = a[i] i+ = 1 else: arr[k] = b[j] j+ = 1 k+ = 1 while i < len_a: arr[k] = a[i] i+ = 1 k+ = 1 while j < len_b: arr[k] = b[j] j+ = 1 k+ = 1 if __name__ == '__main__': arr = [5, 8, 23, 56, 89, 100, 7, 9, 52, 45] merge_sort(arr) print(arr)
- Stable and Non-Stable
- Sometimes the order of data within data structures is important. Certain sorting algorithms adhere to this importance, while others don’t. If a data structure takes order in to account, we call it stable, if it doesn’t, we call it not stable or non-stable.
- Selection Sort and Quick Sort is NOT stable (Quick Sort happens when picking the pivot. Selection Sort happends when it swap)
- Bubble, Insertion and Merge Sort is stable
- Trees
- Binary Search Tree (BST)
- Each node can only have at most two children
- All right children must be greater than
- All left children must be less than or equal to. (You can put the equal to on the right or left)
- Trees are some of the most used data structures in computer science. They link information together in a relational way. You can use them to sort, to store information, to make inferences, and so much more. They are even used in databases to help index (find) the data in the database.
- Parents are the nodes that are directly above another node, while children are ones that are directly below
- Root
- Children
- Parent
- Sibling (have same parent)
- Leaf (has no child)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
class Node: def __init__(self, key): self.left = None self.right = None self.val = key def insert(root, key): if root is None: return Node(key) else if root.val == key: return root elif root.val < key: root.right = insert(root.right, key) else: root.left = insert(root.left, key) return root def search(root, key): if root is None or root.val = key: return root if root.val < key: return search(root.right, key) return search(root.left, key)
- Tree Traversals
- InOrder: Left, Root, Right
- PreOrder: Root, Left, Right
- PostOrder: Left, Right Root
- Tree Example:
InOrder: L, Ro, R: 5, 6, 7, 8, 10, 13 20, 23, 24, 25
PreOrder: Ro, L, R: 20, 10, 5, 7, 6, 8, 13, 25, 23, 24
PostOrder: L, R, Ro: 6, 8, 7, 5, 13, 10, 24, 23, 25, 20
LevelOrder: 20, 10, 25, 5, 13, 23, 7, 24, 6, 8
- Tree Real World Examples
- Tree Directory
- Database Indexing
- Decision Trees
- Binary Search Tree (BST)
- Heap
- Sub-devision of Tree with a few “Heap Property” rules. The heap property is a rule which gives a relationship between the parent adn child nodes within a tree. It states that the parent node must alwasys be either greater, or less than its children. If it has to be greater, then it is a max heap, if less, a min heap.
- Heap property is sort of Level Property, and the parent nodes has relationship (either Greater(max) or Less Than(min))
- A complete tree is a tree which every level, expect the last, is filled, and all nodes are as far left as possible.
- Run Time:
- Insert: O(logn)
- Check Max: O(1)
- Remove Max: O(logn) (swap with root first, then check if root is the max)
- Heap Real World Example
- Priority queue
- Graphs
- Directed Graph: The Graph has an order or direction. e.g. Twitter
- Undirected Graph: No direction
- Weighted Graph: Maybe the shortest path is not shortest segment
- Graph Terminology
- Vertex(vertices) A point on the graph. set = {A, B, C, D, E} All the elements in the graph
- Edge A connection between to vertices. set = {(A,B), (A,C), (B,C), (C,D), (A,E)}
- Cyclic Graph: A directed graph which has at least one cycle. Take extra work for loop
- Acyclic Graph: A directed graph which doesn’t have any cycles.
- Connected Graph: A graph in which each vertex is connected together by some path.
- Disconnected Graph: one or more Vertex missing Edge
- Adjacency – When two vertices are connected to one another.
- Depth First Search
- Breadth First Search
- Useful Resource
- B Trees and B+ Trees tells from disk structure (Tracks, Sectors, Blocks and Offsets) to Trees by Abdul Bari. Highly Recommended
- M-way Search Tree. 2 keys, 3 children, that it’s 3-way ST, a extention of Binary ST.
- B Tree is just M-way Search Tree with Some Rules. In database, it’s multilevel indexing
- at least m/2 children must be there before adding new node
- Root can have minimum 2 children
- All leaf at same level
- Bottom up
- B+ Tree
- All the keys are presented at leaf nodes
- In non-leaf nodes, might duplicate nodes be presented
- Record pointer in at leaf nodes
- B Trees and B+ Trees tells from disk structure (Tracks, Sectors, Blocks and Offsets) to Trees by Abdul Bari. Highly Recommended