Steps to developing a usable algorithm
- Model the problem
- Find algorithm
- Fast enough? Fits in memory?
- If not, find a way to address the problem
- Iterate until satisfied
Union Find
Modeling the connections
- reflexive, symmetric, ransitive
- Connected components represent object connective
- Find query: check if two obj are in the same connected components
- Union command: replace components containing two objs with theire union
U-F data type
UF(int N) — constructer;
void union(int p, int q);
boolean connected(int p, intq);
Quick Find (eager approach)
use array id[], size N, store id to determine if they are connected. if id[p]==id[q] return ture.
union(p,q) change id[p] and all those id[]==id[p] to id[q].
initialize : O(N);
union : O(N);
find : O(1);
problem : N union commands and N objects, union could be N^2. Quadratic! Too slow for huge problems.
Quadratic algorithms do not scale
mans: 10 times fast computer, 10 times slow
Quick Union (lazy approach)
use array id[], size N, store its parent, like a tree. id[i] is index of parent of i. root of i is id[…id[i]…], keep going until i == id[i]. if two objs have the same root, they are connected.
union(p,q) change root of p to q’s root. use while loop to find the root.
initialize : O(N);
union : O(N);
find : O(N); worst case
Quick Union Improvements
weighting
put smaller tree under larger tree. Linking root of smaller tree to root of larger tree.
maintain extra array sz[i] to count number of objects in the tree rooted at i. update the size array after each union.
Now:
- union takes constant time, given roots.
- find takes time proportional to depth of p and q. Depth of any node x is at most lgN
- initialize: O(N); union: O(lgN); find(lgN)
path compression
Another way: Just after computing the root of p, set id[] of each examined to that root. 在寻找root的时候,把走过的每个object都连到root上。
Java implementation:
add one line in the root function while loopid[u] = id[id[i]]
In theory, weighted quick-union with path compression is not quite linear, however in practice, wqupc is linear.
Union-Find Application
- percolation
- Social network connectivity. Given a social network containing n members and a log file containing m timestamps at which times pairs of members formed friendships, design an algorithm to determine the earliest time at which all members are connected (i.e., every member is a friend of a friend of a friend … of a friend). Assume that the log file is sorted by timestamp and that friendship is an equivalence relation. The running time of your algorithm should be mlogn or better and use extra space proportional to n.
|
|
- Union-find with specific canonical element. Add a method 𝚏𝚒𝚗𝚍() to the union-find data type so that 𝚏𝚒𝚗𝚍(𝚒) returns the largest element in the connected component containing i. The operations, 𝚞𝚗𝚒𝚘𝚗(), 𝚌𝚘𝚗𝚗𝚎𝚌𝚝𝚎𝚍(), and 𝚏𝚒𝚗𝚍() should all take logarithmic time or better. For example, if one of the connected components is {1,2,6,9}, then the 𝚏𝚒𝚗𝚍() method should return 9 for each of the four elements in the connected components.
|
|
- Successor with delete. Given a set of n integers S={0,1,…,n−1} and a sequence of requests of the following form: 1. Remove x from S; 2. Find the successor of x: the smallest y in S such that y≥x. Design a data type so that all operations (except construction) take logarithmic time or better in the worst case.
|
|
小结:UF我从来没用过,很新的知识点。
Analysis Of Algorithms
遇到的问题:
3Sum brute force algorithm time complexity is:n(n-1)(n-2)/6.
Calculate process:
The 3rd nested loop is the important one. It will loop through
n-2times. Then the 2nd loop increments.At this point the 3rd loop loops through
n-3times.We keep adding these till we get.
[(n-2) + (n-3) + ... + 2 + 1 + 0]Then the 1st loop increments, so when we calculate the 3rd loop we begin at
n-3this time.
So we get:
[(n-3) + ... + 2 + 1 + 0]Adding them all together we get:
123456789[(n-2) + (n-3) + ... + 2 + 1 + 0] +[(n-3) + ... + 2 + 1 + 0] +[(n-4) + ... + 2 + 1 + 0] +.. <- (n-2 times).[2 + 1 + 0] +[1 + 0] +[0]
We can rewrite this as:
`(n-2)(1) + (n-3)(2) + (n-4)(3) + ... + (3)(n-4) + (2)(n-3) + (1)(n-2)`
Which in maths notation we can write like this:

Make sure you look up the additive properties of Summations.
We have:
(n-1) * (n-2)(n-1)/2 - (n-1)(n-2)(2n-3)/6 = n(n-2)(n-1)/6
为什么循环中每次乘以2或除以2时间复杂度是lgn?
For example,
for(i=1;i<n;i*2)这个循环总共run了k次,那么最后的i=2^k,那么2^k >= n,得到k >= lgn除以2同理。
Cost model. 之前学过,找出最能决定big O的一项
- harmonic numbers:Hn 1+1/2+1/3+…+1/n ~ lnN
- triangular sum:1+2+…+n ~ n^2/2
- geometric sum:1+2+4+8+…+N = 2N – 1 ~ 2N when N = 2^n
- Stirling’s approximation:lgN! = lg1+lg2+lg3+…+lgN ~ NlgN
- binomial coefficients: (上N下k)~N^k/k! when k is a small constant
- exponential: (1 – 1/x) x ~ 1/e
- 表示算法增长的方程只有几个:1 (constant), logN (logarithmic), N (linear), NlogN (linearithmic), N^2 (quadratic), N^3 (cubic), and 2^N (exponential)
- 设计一个算法,先找出最笨的算法,找到一个upper bound,然后prove这个问题有一个lower bound,如果upper bound和lower bound存在差距,那么算法就有优化的空间。
- Big O,Big Theta,Big Omega的区别。Theta 表示classify algorithms,比如Theta(N^2) 就是10N^2或10N^2+8N+3NlogN。big O 表示upper bound,O(N^2)可以表示任何小于等于N^2的,比如10N^2。Big Omega表示 lower bound。
- Tilde notation表示最重要的那一个term,是approximate model。10N^2+8N+3NlogN的tilde 就是10N^2
- boolean 1 byte,byte 1 byte,char 2 byte,int 4 byte,float 4 byte,long 8 byte,double 8 byte;Char[] 2N+24 byte, int[] 8N+24 byte, double[] 8N+24 byte; char[][] 2MN byte, int[][] 4MN byte, double[][] 8MN byte
- 关于object的内存使用。一个object :object overhead 16byte + reference(一个reference是8byte)+ padding(8的倍数,4,8,16……)+ object的attributes 占用的内存。如果array或其他数据结构里面存的是指针,那么recursively add memory for referenced object。
Stacks And Queues
一些细节问题:
- stack linked list implementation 更快,Every operat
ion takes constant time in the worst case。但是需要更多space 和 time 来处理links
- stack resizing array implementation 慢一点,push pop的worst case是n,但是占用的space 少。
- compile-time error 比 run-time error 好,因为在compile 的时候就能发现错误。
- good code has zero casts。但是java 不允许 generic type的
array:Item[Capacity]是不允许的。所以我们定义成这样(Item[]) new Object[Capacity] Iterable有Iterator()method;Iterator有next(); hasNext()method.