@smilence 2016-09-02T20:02:18.000000Z 字数 5071 阅读 2255

第四部分：数组和字符串

算法面试指南

4.1 知识解析

array, hashtable和字符串，从结构形式来说，这三个是一种东西。array是特殊的hashtable，key是连续的index，例如当key是char的时候就可以一一对应。比如有一些字符，我们需要做一些预处理来检索一些信息（每个字符出现的数量）。我们应该用hashtable还是array呢？（分布是密集还是稀疏）

Open hashing (Separate Chaining) & Closed hashing (Open Addressing)
- http://stackoverflow.com/questions/9124331/meaning-of-open-hashing-and-closed-hashing
- https://en.wikipedia.org/wiki/Hash_table#Collision_resolution
特别地，Hash table还可以用Balanced BST来实现，如std::map, lookup的时间由 $O(1)$ 增长为 $O(log(n))$ ，但可能节省space cost ,且可以按照key的值有序输出pairs。
使用hashtable时，key没有预设hash function情况下的语言细节：

对于C++，
map<Key,Val,key_comp> table ，对于严格弱序判断的不同定义，可以通过重载<操作符或者定义functor class实现。
注意对于map无需重载==运算符，因为 A==B 等效于 !(A<B)&&!(B<A)。

对于unordered_map<Key,Val>则更加复杂，需要定义：
(1) A hash function, 用于确定Index即hash值；(2) 重载==操作符，用于区分hash值相同的不同key。
详见：http://stackoverflow.com/questions/17016175/c-unordered-map-using-a-custom-class-type-as-the-key

对于JAVA，需要对HashMap重定义equals()与hashCode()。

e.g Find a line which passes the most number of points.
使用hashtable分类的问题，往往也可以用排序来解决，取决于对时间复杂度和空间复杂度的要求。sorting（排序）即分类。
string的语言细节 ( Java: StringBuilder, stringBuffer与String )
reverse string or array （经常用到的piece of code）
(1) 用头尾两个指针，相向遍历，swap元素（不适用于数据流）
(2) 顺序遍历元素，push入stack，再依次pop。(或者递归)
String matching (推荐使用Rabin-Karp算法)
将每一个匹配子串映射为一个hash值。例如，将子串看做一个多进制数，计算它的值与目标字符串的hash值比较，如果相同，再比较两个字符串是否相同。顺序shift过程中，可以增量计算hash值（扣除最高位，增加最低位）。因此能在average case下做到O(m+n)。
2-D array, 总是推荐用collection(vector of vector) 而不是C-style

4.2 数组与哈希表问题

1.统计一个元素集中元素出现的次数，用Hash Table，即std::unordered_map或std::map。只统计元素出现与否的问题，可以用bitvector即bitset，二进制数来记录。

e.g. 1 Implement an algorithm to determine if a string has all unique characters. (也可以用排序)
e.g. 2 Write a method to decide if one string is a permutation of the other. (也可以用sorting，不仅把相等的放在一起，更会有序地把相近地放到一起）
e.g. 3 Given a newspaper and message as two strings, check if the message can be composed using letters in the newspaper.(Google interview)
（只需要找到match_cnt == count(message)就可以了）

2. 处理vector，array或string问题的几点细节：

a.数组或string内前驱subproblem推出problem的问题，总是以 $[0,i]$ 为范围，检索前驱节点 $j$ 求解当前节点。如果只是检索特定的value，则可以用将当前value作为key建立hash table（value取决于你需要什么前驱子问题的信息），使得后驱节点每次检索的复杂度降低到 $O(1)$ 。实际上类似于dynamic programming的思维方式，记录前驱节点的信息，用空间换取重新扫描需要的时间。

e.g.1 Given an array of integers, find two numbers such that they add up to a specific number.(Leetcode: 2Sum) （也可以用排序）

sum = 6
[1, 3, 5, 7, 6]

e.g.2 Given an unsorted array of integers, find the length of the longest consecutive elements sequence. (Leetcode: Longest Consecutive Sequence)

Given [6, 4, 8, 1, 3, 2，7],
The longest consecutive elements sequence is [1, 2, 3, 4]. Return its length: 4.

一个好的习惯是walkthrough这个test case观察自己的思维过程。
1. 前驱子问题的解 + 当前节点这个增量 => 新的子问题的解
2. 需要什么前驱子问题的信息？上界和下界

struct Bound{
    int high;
    int low;
    Bound(int h = 0 , int l = 0){
        high = h;
        low = l;
    }
};
class Solution {
public:
    int longestConsecutive(vector<int> &num) {
        // Note: The Solution object is instantiated only once and is reused by each test case.
        unordered_map<int,Bound> table;
        int local;
        int max_len = 0;
        for(int i = 0; i < num.size(); i++){
            if(table.count(num[i])) continue;
            local = num[i];
            int low = local, high = local;
            if(table.count(local-1))
                low = table[local-1].low;
            if(table.count(local+1))
                high = table[local+1].high;
            table[low].high = table[local].high = high; // 试一下如果第一条不赋值会怎么样 [1,2,0,1]
            table[high].low = table[local].low = low; // 再想一想，为什么我们只需要维护low和high这两条，而不用考虑这个范围内的value [1,2,3,4，,3]
            if(high - low + 1 > max_len)
                max_len = high - low + 1;
        }
        return max_len;
    }
};

b.求最值或总和的问题，总是可以在update的过程中不断记录更新，而不用再次遍历。
也是一种从subproblem -> problem, 通过记录前驱子问题的解，而不需要再次遍历（只不过它只需要记录一个前驱子问题就可以了），就像上一个问题我们不需要再把这些bound重新遍历一遍

c.处理两个不等长的数组或string，总是可以以while( aIndex < aSize && bIndex < bSize )作为结束条件，再处理可能剩下的元素。

e.g. 1 Merge two sorted arrays into a new array

3.需要用数字或者bitset来映射/hash的情境， 可以利用进制数的正交性，即：数字的高位和低位是互相正交、独立的。或者反过来说，一个数用m进制来表示，只有唯一一种可能。

e.g. 1 Design a hash function that is suitable for words in a dictionary.

hash对象的特征是什么? 1. word不会很长 2.每一位的可能性非常有限且离散

e.g. 2 A hash function for the state of a chess game.

常识：对于一次transition，能够简单地转换hash ( $1+2*6 = 13$ ， $13^{64}$ )

e.g. 3 Implement a method rand7() given rand5(). (课后思考)

4.3 字符串问题

1.对于要求in-place的删除或修改，可以用两个int变量分别记录新index与原index，观察：如果改动后size增大，则先resize(int)，再倒序赋值；反之，则先顺序赋值，再resize.

e.g. 1 Given input "Mr John Smith", output "Mr%20John%20Smith"
e.g. 2 Given input "Mr%20John%20Smith" , output "Mr John Smith"
e.g. 3 Given two sorted arrays, merge B into A in sorted order.

2.对于简单修改长字符串的问题，通常可以顺序或倒序一次遍历，在O(n)内解决。对于分段处理的情况，则使用两个int变量记录段首、段尾Index来处理（slicing window）。可以利用C++ string类的 += 操作符来增量输出结果。

e.g.1 Given input -> "I have 36 books, 40 pens2."; output -> "I evah 36 skoob, 40 2snep."

用begin和end去标记window的初始和终止点，这两个总是核心变量；建议用 while(end < str.size()) 作为循环形式而不是 for loop 因为begin和end每次的增量未必是1.

e.g.2 Given a string S and a string T, find the minimum window in S which will contain all the characters in T in complexity O(n). (Leetcode: Minimum Window Substring)
典型的将复杂问题划归为简单问题的思维过程：4.2中的newspaper -> message 问题以及最值问题的结合，总体上又是用分段策略

string minWindow(string S, string T) {
        if(S.size() < T.size()) return string();
        unordered_map<char,int> table;
        int begin = 0, end = 0;
        int min_window = S.size()+1, min_begin;
        for(auto it = T.begin(); it != T.end(); ++it)
            ++table[*it];
        int cnt = 0;
        for(int end = 0; end < S.size(); end++){
            if( table.count(S[end]) == 0 ) continue;
            if( table[ S[end] ] > 0)
                cnt++;
            --table[ S[end] ];
            if( cnt == T.size() ){
                while( table.count(S[begin]) == 0 || table[ S[begin] ] < 0){         // 注意时间复杂度仍然是O(n)
                    if(table.count(S[begin]) > 0 )
                        ++table[ S[begin] ];
                    // 处理字符串至少固定一端
                    begin++;
                }
                // 最值的追踪
                if(end-begin+1 < min_window){
                    min_begin = begin;
                    min_window = end - begin +1;
                }
            }
        }
        if(min_window > S.size())
            return "";
        else
            return S.substr(min_begin, min_window);
    }

第四部分：数组和字符串

4.1 知识解析

4.2 数组与哈希表问题

4.3 字符串问题

内容目录