Using trie solving a real world algorithm problem

📅2023-07-12🧐62

The problem

I saw a message in a tech group for 30 minutes ago:

a-real-world-algorithm-example-trie

Here is the inputs and the expected output:

full name list:

[   
    {
        "name": "John Doe"
    },
    {
        "name": "Jane Frances Dough"
    }
]

janky list

[   
    {
        "name": "doe@email.com",
        "ID": 1
    },
    {
        "name": "J Dough"
        "ID": 2
    },
    {
        "name": "Bob Barker"
        "ID": 3
    },
    {
        "name": "Barker"
        "ID": 4
    }
]

expected output:

[   
    {
        "name": "doe@email.com",
        "ID": 1
    },
    {
        "name": "J Dough"
        "ID": 2
    }
]
Solution

Trie is a good apporach in this case, we can use it to reduce the time complexity, and to use a least space as small as possible.

Here is how a Trie works:

https://en.wikipedia.org/wiki/Trie

The time complexity in this case is O(n) which n is the length of the janky list. And all the ideas are in the comment.

def search(arr1, arr2):
    initials = {} # a trie to save all possible combinations of all names
    """
    like this:
    "j": {
        "jane": { # her full first name
            "dough": {} # her full last name
        }
        "dough": {} # her full last name
    }
    
    two type of search can match the dict above:
    a["j"]["jane"]["dough"] # True
    a["j"]["dough"] # True
    
    if you like, also can save middle name as well
    """
    
    ln_emails = set() # last name as email username
    
    # gathered
    for item in arr2:
        names = item["name"].split()
        f_init = names[0][0].lower()
        f_full = names[0].lower()
        l_full = names[-1].lower()
        
        # add to initials trie
        initials[f_init] = {
            f_full: { 
                l_full:{}
            },
            l_full: {}
        }
        
        # add to email set
        ln_emails.add(l_full)
 
    result = []
     
    # janky
    for item in arr1:
        # Case 1: email , query time O(1)
        email_name_right = item["name"].find("@")
        if email_name_right >= 0:
            if item["name"][:email_name_right] in ln_emails:
                result.append(item)
            continue
        
        # Case 2: not an email, query time O(1)
        names = item["name"].split()
        f_init = names[0][0].lower()
        f_full = names[0].lower()
        l_full = names[-1].lower()
        
        if f_init in initials:
            found = False
            if f_full in initials[f_init]:
                if l_full in initials[f_init][f_full] or not len(initials[f_init][f_full]):
                    result.append(item)
                    found = True
            
            if not found and l_full in initials[f_init]:
                result.append(item)
    
    return result

Test

janky = [   
    {
        "name": "doe@email.com",
        "ID": 1
    },
    {
        "name": "J Dough",
        "ID": 2
    },
    {
        "name": "Bob Barker",
        "ID": 3
    },
    {
        "name": "Barker",
        "ID": 4
    }
]

gathered1 = [   
    {
        "name": "John Doe"
    },
    {
        "name": "Jane Frances Dough"
    }
]

gathered2 = [   
    {
        "name": "Bob Barker"
    },
    {
        "name": "Barker"
    }
]

print("test1", search(janky, gathered1))
print("test2", search(janky, gathered2))

"""
output:
Finished in 58 ms
test1 [{'name': 'doe@email.com', 'ID': 1}, {'name': 'J Dough', 'ID': 2}]
test2 [{'name': 'Bob Barker', 'ID': 3}, {'name': 'Barker', 'ID': 4}]
"""