FastPrepFastPrep
Problem Brief

Get Maximum Information Gain

NEW GRADOA
See Amazon online assessment and hiring insights

Data analysts at Amazon are analyzing a data set of n strings in the array dataSet[], each consisting of lowercase English letters. Each character in a string corresponds to a particular feature.

The information gain obtained by training a model with two strings, dataSet[i] and dataSet[j], is the difference between the lengths of the strings i.e., |len(dataSet[i]) - len(dataSet[j])|. To avoid too many overlapping features, two strings can be selected only if the number of common features between them does not exceed a given threshold, max_common_features. The number of common features here is equal to the number of common characters between the two strings. For example, "abc" and "bcd" have 2 common features 'b' and 'c'. While "aa" and "aaa" have two common features, the "a" two 'a' characters.

Given dataSet and max_common_features, determine the maximum information gain possible.

Function Description

Complete the function getMaxInformationGain in the editor.

getMaxInformationGain takes the following arguments:

  1. String[] dataSet: the strings of features
  2. int max_common_features: the maximum number of common features allowed between data points

Returns

int: the maximum possible information gain

𓇼 ⋆.˚ 𓆝 𓆡⋆.˚ 𓇼Forever thankful chizzy_elect 🍀

1Example 1

Input
dataSet = ["abofh", "ab", "mo"], max_common_features = 1
Output
3
Explanation
It is optimal to choose the strings "abofh" and "mo". Their number of common features is 1 ('o') and the information gain is |5 - 2| = 3.

2Example 2

Input
dataSet = ["a", "bcdef"], max_common_features = 1
Output
4
Explanation
The two strings can be chosen. They do not share any common features and their difference in length is 4.

Constraints

Limits and guarantees your solution can rely on.

  • 2 ≤ n ≤ 1000
  • 1 ≤ len(dataSet[i]) ≤ 1000
  • 1 ≤ max_common_features ≤ 1000
public int getMaxInformationGain(String[] dataSet, int max_common_features) {
  // write your code here
}
Input

dataSet

["abofh", "ab", "mo"]

max_common_features

1

Output

3

Sign in to submit your solution.