Here it is a set of spelling related problems.
- Given two strings s, t such that |s|=n, |t|=m, compute the edit distance(s, t). You may assume different costs for insert, delete, and substitution. You can make this in O(mn) and space O(min { m, n } )
- Given a set of words S = { w1, ... wk } an accettable bidirectional mutation (wi <-> wj) for the word wi is a word wj, such that edit_distance(wi, wj) < T. Suppose you can access a large query log Q of queries previously submitted to a search engine. Suppose that a user rewrites the word wi into the word wj within a distance D of queries submitted during her search session. In this case, you call the mutation (wi -> wj) real. Note that this mutation is in one direction only. A real accettable mutation is an accettable mutation which is both real and acceptable. A frequent real mutation is a mutation which appears in at least F different users' sessions. Given a configuration (Q, T, D, F) find all the frequent, real and acceptables word mutations. What is the optimal complexity in time and space?
- Find the diameter of the graph G formed considering all the frequent, real, acceptable word mutations (wi -> wj).
- A sentence is an order sequence of words. For instance, "britney spears" is different from "spear britnery". How can you deal with mispelling in case of sentences?
No comments:
Post a Comment