Algorithms Part1 Week1 Notes

Steps to developing a usable algorithm

Model the problem
Find algorithm
Fast enough? Fits in memory?
If not, find a way to address the problem
Iterate until satisfied

Union Find

Modeling the connections

reflexive, symmetric, ransitive
Connected components represent object connective
Find query: check if two obj are in the same connected components
Union command: replace components containing two objs with theire union

U-F data type

UF(int N) — constructer;
void union(int p, int q);
boolean connected(int p, intq);

Quick Find (eager approach)

use array id[], size N, store id to determine if they are connected. if id[p]==id[q] return ture.

union(p,q) change id[p] and all those id[]==id[p] to id[q].

initialize : O(N);
union : O(N);
find : O(1);

problem : N union commands and N objects, union could be N^2. Quadratic! Too slow for huge problems.

Quadratic algorithms do not scale
mans: 10 times fast computer, 10 times slow

Quick Union (lazy approach)

use array id[], size N, store its parent, like a tree. id[i] is index of parent of i. root of i is id[…id[i]…], keep going until i == id[i]. if two objs have the same root, they are connected.

union(p,q) change root of p to q’s root. use while loop to find the root.

initialize : O(N);
union : O(N);
find : O(N); worst case

Quick Union Improvements

weighting

put smaller tree under larger tree. Linking root of smaller tree to root of larger tree.

maintain extra array sz[i] to count number of objects in the tree rooted at i. update the size array after each union.

Now:

union takes constant time, given roots.
find takes time proportional to depth of p and q. Depth of any node x is at most lgN
initialize: O(N); union: O(lgN); find(lgN)

path compression

Another way: Just after computing the root of p, set id[] of each examined to that root. 在寻找root的时候，把走过的每个object都连到root上。

Java implementation：
add one line in the root function while loop id[u] = id[id[i]]

In theory, weighted quick-union with path compression is not quite linear, however in practice, wqupc is linear.

Union-Find Application

percolation
Social network connectivity. Given a social network containing n members and a log file containing m timestamps at which times pairs of members formed friendships, design an algorithm to determine the earliest time at which all members are connected (i.e., every member is a friend of a friend of a friend … of a friend). Assume that the log file is sorted by timestamp and that friendship is an equivalence relation. The running time of your algorithm should be mlogn or better and use extra space proportional to n.

/**
 * Solution:
 *
 * Use a union-find data structure with each site representing a social network
 * member. Add unions between sites in time order of friendships being formed.
 * After each union is added, check the number of connected components
 * within the union-find data structure. If only one, all members are connected.
 *
 * Must keep track of number of unique components. Decreases when a union occurs
 * between different components.
 */
/**
 * Determine when all members of a social network are connected.
 */
class SocialNetworkConnectivity {
  private UnionFindUF uf;
  private int numComponents;
  public SocialNetworkConnectivity(int N) {
    uf = new QuickFindUF(N);
    numComponents = N;
  }
  public void addFriendship(int p1, int p2) {
    if (!uf.connected(p1, p2)) {
      --numComponents;
    }
    uf.union(p1,p2);
  }
  public boolean fullyConnected() {
    return this.numComponents == 1;
  }
  public static void main(String[] args) {
    // initialize social network data structure with N sites
    while (!f.isEmpty()) {
      // read timestamp and relationship
      // union relationship
      // check if members fully connected
    }
  }
}

Union-find with specific canonical element. Add a method 𝚏𝚒𝚗𝚍() to the union-find data type so that 𝚏𝚒𝚗𝚍(𝚒) returns the largest element in the connected component containing i. The operations, 𝚞𝚗𝚒𝚘𝚗(), 𝚌𝚘𝚗𝚗𝚎𝚌𝚝𝚎𝚍(), and 𝚏𝚒𝚗𝚍() should all take logarithmic time or better. For example, if one of the connected components is {1,2,6,9}, then the 𝚏𝚒𝚗𝚍() method should return 9 for each of the four elements in the connected components.

public class UFWithFindLargest {
    private int id[]; // id[i] = parent of node i
    private int sz[]; // sz[i] = size of node i
    private int large[]; // large[i] = largest element in component
    
    public UFWithFindLargest(int N) {
        id = new int[N];
        sz = new int[N];
        large = new int[N];
        for (int i = 0; i < N; ++i) {
            id[i] = i;
            sz[i] = 1;
            large[i] = i;
        }
    }
    
    private int root(int p) {
        while (id[p] != p) {
            id[p] = id[id[p]]; // path compression
            p = id[p];
        }
        return p;
    }
    
    // return the largest element in the connected component containing p
    public int find(int p) { 
        return large[root(p)];
    }
    
    public void union(int p, int q) {
        int rootp = root(p);
        int rootq = root(q);
        if (rootp == rootq) return;
        int largestP = large[rootp];
        int largestQ = large[rootq];
        if (sz[rootp] < sz[rootq]) {
            id[rootp] = rootq;
            sz[rootq] += sz[rootp];
            
            if (largestP > largestQ)
                large[rootq] = largestP;
        } else {
            id[rootq] = rootp;
            sz[rootp] += sz[rootq];
            
            if (largestQ > largestP)
                large[rootp] = largestQ;
        }
    }
    
    public boolean connected(int p, int q) {
        return root(p) == root(q);
    }
    
    public static void main(String[] args) {
        UFWithFindLargest uf = new UFWithFindLargest(10);
        uf.union(0, 2);
        uf.union(8, 4);
        StdOut.println(uf.find(0) == 2);
        StdOut.println(uf.find(4) == 8);
        uf.union(0, 4);
        StdOut.println(uf.find(0) == 8);
        StdOut.println(uf.find(2) == 8);
        uf.union(0, 6);
        StdOut.println(uf.find(6) == 8);
        uf.union(1, 9);
        uf.union(1, 2);
        StdOut.println(uf.find(4) == 9);
    }   
}

Successor with delete. Given a set of n integers S={0,1,…,n−1} and a sequence of requests of the following form: 1. Remove x from S; 2. Find the successor of x: the smallest y in S such that y≥x. Design a data type so that all operations (except construction) take logarithmic time or better in the worst case.

/*
  Solution:
  Use union find data structure to mark successor of each integer. When an
  integer is removed, the union find data structure connects to the next
  integer in the list (which may be farther up the list). Must be careful with
  optimizations since connected component id is the successor integer.
*/
import edu.princeton.cs.algs4.QuickFindUF;
class SuccessorWithDelete {
  private QuickFindUF uf;
  public SuccessorWithDelete(int N) {
    uf = new QuickFindUF(N);
  }
  public void remove(int x) {
    uf.union(x, x+1);
  }
  public int successor(int x) {
    return uf.find(x);
  }
  public static void main(String[] args) {
    int N = 50;
    SuccessorWithDelete swd = new SuccessorWithDelete(50);
    System.out.println(swd.successor(10));
    swd.remove(11);
    swd.remove(13);
    swd.remove(12);
    swd.remove(10);
    swd.remove(9);
    swd.remove(15);
    System.out.println(swd.successor(8));
    System.out.println(swd.successor(9));
    System.out.println(swd.successor(10));
  }
}

小结：UF我从来没用过，很新的知识点。

Analysis Of Algorithms

遇到的问题：

3Sum brute force algorithm time complexity is：n(n-1)(n-2)/6.

Calculate process:

The 3rd nested loop is the important one. It will loop through n-2 times. Then the 2nd loop increments.

At this point the 3rd loop loops through n-3 times.

We keep adding these till we get.
[(n-2) + (n-3) + ... + 2 + 1 + 0]

Then the 1st loop increments, so when we calculate the 3rd loop we begin at n-3 this time.
So we get:
[(n-3) + ... + 2 + 1 + 0]

Adding them all together we get:

[(n-2) + (n-3) + ... + 2 + 1 + 0] +
[(n-3) + ... + 2 + 1 + 0]         +
[(n-4) + ... + 2 + 1 + 0]         +
.
.                                     <- (n-2 times)
.
[2 + 1 + 0]                       +
[1 + 0]                           +
[0]

We can rewrite this as:

`(n-2)(1) + (n-3)(2) + (n-4)(3) + ... + (3)(n-4) + (2)(n-3) + (1)(n-2)`

Which in maths notation we can write like this:

![](https://lh3.googleusercontent.com/-AT9724QIv7c/WZJA-VTGviI/AAAAAAAAAIQ/aYyKOBtVGd8N4NZauOLBLD1cfGfw8dRZgCHMYCw/I/15027571132327.png)

Make sure you look up the additive properties of Summations. 

We have:

(n-1) * (n-2)(n-1)/2 - (n-1)(n-2)(2n-3)/6 = n(n-2)(n-1)/6

为什么循环中每次乘以2或除以2时间复杂度是lgn？

For example，for(i=1;i<n;i*2) 这个循环总共run了k次，那么最后的i=2^k，那么2^k >= n，得到 k >= lgn

除以2同理。
Cost model. 之前学过，找出最能决定big O的一项
harmonic numbers：Hn 1+1/2+1/3+…+1/n ～ lnN
triangular sum：1+2+…+n ～ n^2/2
geometric sum：1+2+4+8+…+N = 2N – 1 ~ 2N when N = 2^n
Stirling’s approximation：lgN! = lg1+lg2+lg3+…+lgN ~ NlgN
binomial coefﬁcients: (上N下k)~N^k/k! when k is a small constant
exponential: (1 – 1/x) x ~ 1/e
表示算法增长的方程只有几个：1 (constant), logN (logarithmic), N (linear), NlogN (linearithmic), N^2 (quadratic), N^3 (cubic), and 2^N (exponential)
设计一个算法，先找出最笨的算法，找到一个upper bound，然后prove这个问题有一个lower bound，如果upper bound和lower bound存在差距，那么算法就有优化的空间。
Big O，Big Theta，Big Omega的区别。Theta 表示classify algorithms，比如Theta(N^2) 就是10N^2或10N^2+8N+3NlogN。big O 表示upper bound，O(N^2)可以表示任何小于等于N^2的，比如10N^2。Big Omega表示 lower bound。
Tilde notation表示最重要的那一个term，是approximate model。10N^2+8N+3NlogN的tilde 就是10N^2
boolean 1 byte，byte 1 byte，char 2 byte，int 4 byte，float 4 byte，long 8 byte，double 8 byte；Char[] 2N+24 byte, int[] 8N+24 byte, double[] 8N+24 byte; char[][] 2MN byte, int[][] 4MN byte, double[][] 8MN byte
关于object的内存使用。一个object ：object overhead 16byte + reference（一个reference是8byte）+ padding（8的倍数，4，8，16……）+ object的attributes 占用的内存。如果array或其他数据结构里面存的是指针，那么recursively add memory for referenced object。

Stacks And Queues

一些细节问题：

stack linked list implementation 更快，Every operat

ion takes constant time in the worst case。但是需要更多space 和 time 来处理links

stack resizing array implementation 慢一点，push pop的worst case是n，但是占用的space 少。
compile-time error 比 run-time error 好，因为在compile 的时候就能发现错误。
good code has zero casts。但是java 不允许 generic type的array : Item[Capacity] 是不允许的。所以我们定义成这样 (Item[]) new Object[Capacity]
Iterable 有 Iterator() method; Iterator 有 next(); hasNext() method.