Problem · String

Longest Dictionary Tokenization

MediumGoogleFULLTIMEPHONE SCREEN
See Google hiring insights

You are given a text string and a dictionary of token-to-id mappings. Starting from the beginning of text, repeatedly choose the longest dictionary token that matches the current position and output its id. If no dictionary token matches, output the current character itself and advance by one character.

Return the sequence of emitted ids and literal characters.

Examples
01 · Example 1
text = "applepie"
dictionary = [["app","B"],["apple","A"],["pie","P"]]
return = ["A","P"]

apple is preferred over app because it is the longest match at index 0.

02 · Example 2
text = "xabcd"
dictionary = [["ab","1"],["abc","2"],["bc","3"]]
return = ["x","2","d"]

The first character has no match, then abc is the longest token starting at index 1.

Constraints

If multiple dictionary entries have the same token, use the first mapping provided. The tokenization is greedy and scans left to right.

More Google problems
drafts saved locally
public String[] encodeWithDictionary(String text, String[][] dictionary) {
    // write your code here
}
text"applepie"
dictionary[["app","B"],["apple","A"],["pie","P"]]
expected["A", "P"]
sign in to submit