Problem · Union Find

Record Linkage Part 3 - Full Connected Component

MediumStripePHONE SCREEN
See Stripe hiring insights

This continues the record-linkage problem. You are given user records and a weighted similarity rule, and must return the full set of records transitively connected to a target.

Inputs: rows (a String[] of "id,name,email,company"), weights (a String[] of "field,weight" over {name, email, company}, summing to 1), threshold, and targetId.

Two records are linked when their weighted field-equality similarity is >= threshold (each matching field contributes its weight). Linkage is transitive: if A links to B and B links to C, then A, B, and C are all in the same group even if A and C are not directly linked.

Return the ids of every record in the same connected component as targetId (i.e. reachable through any chain of links), excluding the target itself, as an int[] sorted in ascending order.

Examples
01 · Example 1
rows = ["1,Alice,alice@gmail.com,Stripe", "2,Alicia,alice@gmail.com,Stripe", "3,Alice,alice@yahoo.com,Google", "4,Bob,bob@gmail.com,Stripe"]
weights = ["name,0.2", "email,0.5", "company,0.3"]
threshold = 0.5
targetId = 1
return = [2]
Record 1 links directly to record 2 (similarity 0.8). Record 2 does not link to record 3 (0) or record 4 (0.3 < 0.5), and record 1 links to neither 3 nor 4. So the component of record 1 is {1, 2}; excluding the target, the answer is [2].
02 · Example 2
rows = ["1,Alice,alice@gmail.com,Stripe", "2,Alicia,alice@gmail.com,Stripe", "3,Bob,bob@yahoo.com,Google", "5,Alicia,carol@outlook.com,Stripe"]
weights = ["name,0.2", "email,0.5", "company,0.3"]
threshold = 0.5
targetId = 1
return = [2, 5]
Record 1 links to record 2 (email + company match = 0.8). Record 2 links to record 5 (name 'Alicia' + company 'Stripe' match = 0.2 + 0.3 = 0.5 >= threshold), but record 5 does NOT link directly to record 1 (only company matches = 0.3 < 0.5). Transitivity pulls record 5 into record 1's component via record 2. Record 3 links to nobody. Component of 1 = {1, 2, 5}; excluding the target, the answer is [2, 5].
Constraints
  • Each row is "id,name,email,company" with an integer id.
  • Each weight entry is "field,weight" over {name, email, company}; weights sum to 1.
  • Similarity = sum of weights of matching fields; records are linked when similarity >= threshold.
  • Linkage is transitive (connected components over the similarity graph).
  • Return the ascending-sorted int[] of ids in the target's component, excluding the target.
More Stripe problems
drafts saved locally
public int[] findLinkedComponent(String[] rows, String[] weights, float threshold, int targetId) {
  // write your code here
}
rows["1,Alice,alice@gmail.com,Stripe", "2,Alicia,alice@gmail.com,Stripe", "3,Alice,alice@yahoo.com,Google", "4,Bob,bob@gmail.com,Stripe"]
weights["name,0.2", "email,0.5", "company,0.3"]
threshold0.5
targetId1
expected[2]
sign in to submit