Compare two non-matching lists and identify the row with maximum matching elements












2














Background



I've two lists (of lists), each created by reading data from two address tables.



The first element in each row is the unique identifier of the list row and the remaining elements are used for address comparison.



Each list would look somewhat like this:



List 1 (cli add)



['3', 'V8T5G2', 'VICTORIA', 'BR', 'CANADA']
['543', '234', '654', 'BELMONT', 'AVENUE', 'V8S3T4', 'VICTORIA', 'BR', 'CANADA']
['28', '70', 'RUSHTON', 'RD', 'M6C2X8', 'YK', 'ON', 'CANADA']


List 2 (struct add)



['7H0044', '234', '654', 'BELMONT', 'AVENUE', 'V8S3T4', 'VICTORIA', 'BC', 'CANADA']
['7H0033', 'V8T5G2', 'VICTORIA', 'BC', 'CANADA']
['7H0001', '700', 'RUSHTON', 'ROAD', 'M6C2X7', 'YORK', 'ON', 'CANADA']
['7H0034', '217', 'BONNYMUIR', 'DRIVE', 'V7S1L4', 'WEST', 'VANCOUVER', 'BC', 'CANADA']


Task Goal
My task is to compare the addresses from the test list to the other list, and flag all the matching and non matching records.



I am looping through each row in list 1 and comparing with each row of list 2, element wise. If all the elements pulled from list 1 row are found in any row from list 2, I mark that record as 'matching' and retain the row from list 2. Have been able to identify the completely matching records.



Problem Point
The real challenge is about the non matching rows. For the non matching records from list 1, I would want to identify the most closely matching row from list 2. e.g. if row from list 1 has matching elements in three rows from list 2, I would want to pick up the list 2 row which has the highest number of matching elements.



Expected Outcome
In the data shared above, from the list 1, second row (id 542) has complete match. But the other two records aren't yielding a complete match.



I want to be able to create a list of un-matching records and capture umatching elements:



[[comparison record from list 1],[Most matching record from list 2],[non-matching elements from list 1]]


So for above shared data, I need a new list (of lists) which looks something like:



[['3', 'V8T5G2', 'VICTORIA', 'BR', 'CANADA'],['7H0033', 'V8T5G2', 'VICTORIA', 'BC', 'CANADA'],['BR']]
[['28', '70', 'RUSHTON', 'RD', 'M6C2X8', 'YK', 'ON', 'CANADA'],['7H0001', '700', 'RUSHTON', 'ROAD', 'M6C2X7', 'YORK', 'ON', 'CANADA'],['70','RD','M6C2X8', 'YK']]


The code below gets me the results partially. I am not being able to find a way to filter the highest matching rows.



Code Snippet
Here is how I found the matching and non-matching records (list 1 is referred by cli_add_fnl and list 2 is struc_add_fnl). Have also figured the way to list the unmatched elements and count of matching elements. Just need a way to pull only the rows with max count for list element 1.



### Step 4 - Identifying the matching and non matching addresses ###  
validated_addresses_all =
invalid_addresses_all =

for cli_add in cli_add_fnl:
comparison_cli_add=cli_add.copy()

#removing the id column from comparison
comparison_cli_add.pop(0)


for struct_add in struct_add_fnl:
matching_elements = [address_element for address_element in comparison_cli_add if address_element in struct_add]

#capture the matching records
if matching_elements == comparison_cli_add:
validated_addresses_all.append(cli_add)
else:
invalid_addresses_all.append(cli_add)
invalid_addresses_all.append(struct_add)
invalid_addresses_all.append(len(set(comparison_cli_add) & set(struct_add)))
invalid_addresses_all.append(nonmatching_elements)

#remove the duplicate entries
fnl_validated_addresses =
for add in validated_addresses_all:
if add not in fnl_validated_addresses:
fnl_validated_addresses.append(add)









share|improve this question
























  • You should look into using sets: docs.python.org/3.7/library/…
    – Meow
    Nov 22 at 18:22












  • If any of the answers has helped you, please accept them as answers to help close the question. Thanks!
    – BernardL
    Nov 22 at 21:04
















2














Background



I've two lists (of lists), each created by reading data from two address tables.



The first element in each row is the unique identifier of the list row and the remaining elements are used for address comparison.



Each list would look somewhat like this:



List 1 (cli add)



['3', 'V8T5G2', 'VICTORIA', 'BR', 'CANADA']
['543', '234', '654', 'BELMONT', 'AVENUE', 'V8S3T4', 'VICTORIA', 'BR', 'CANADA']
['28', '70', 'RUSHTON', 'RD', 'M6C2X8', 'YK', 'ON', 'CANADA']


List 2 (struct add)



['7H0044', '234', '654', 'BELMONT', 'AVENUE', 'V8S3T4', 'VICTORIA', 'BC', 'CANADA']
['7H0033', 'V8T5G2', 'VICTORIA', 'BC', 'CANADA']
['7H0001', '700', 'RUSHTON', 'ROAD', 'M6C2X7', 'YORK', 'ON', 'CANADA']
['7H0034', '217', 'BONNYMUIR', 'DRIVE', 'V7S1L4', 'WEST', 'VANCOUVER', 'BC', 'CANADA']


Task Goal
My task is to compare the addresses from the test list to the other list, and flag all the matching and non matching records.



I am looping through each row in list 1 and comparing with each row of list 2, element wise. If all the elements pulled from list 1 row are found in any row from list 2, I mark that record as 'matching' and retain the row from list 2. Have been able to identify the completely matching records.



Problem Point
The real challenge is about the non matching rows. For the non matching records from list 1, I would want to identify the most closely matching row from list 2. e.g. if row from list 1 has matching elements in three rows from list 2, I would want to pick up the list 2 row which has the highest number of matching elements.



Expected Outcome
In the data shared above, from the list 1, second row (id 542) has complete match. But the other two records aren't yielding a complete match.



I want to be able to create a list of un-matching records and capture umatching elements:



[[comparison record from list 1],[Most matching record from list 2],[non-matching elements from list 1]]


So for above shared data, I need a new list (of lists) which looks something like:



[['3', 'V8T5G2', 'VICTORIA', 'BR', 'CANADA'],['7H0033', 'V8T5G2', 'VICTORIA', 'BC', 'CANADA'],['BR']]
[['28', '70', 'RUSHTON', 'RD', 'M6C2X8', 'YK', 'ON', 'CANADA'],['7H0001', '700', 'RUSHTON', 'ROAD', 'M6C2X7', 'YORK', 'ON', 'CANADA'],['70','RD','M6C2X8', 'YK']]


The code below gets me the results partially. I am not being able to find a way to filter the highest matching rows.



Code Snippet
Here is how I found the matching and non-matching records (list 1 is referred by cli_add_fnl and list 2 is struc_add_fnl). Have also figured the way to list the unmatched elements and count of matching elements. Just need a way to pull only the rows with max count for list element 1.



### Step 4 - Identifying the matching and non matching addresses ###  
validated_addresses_all =
invalid_addresses_all =

for cli_add in cli_add_fnl:
comparison_cli_add=cli_add.copy()

#removing the id column from comparison
comparison_cli_add.pop(0)


for struct_add in struct_add_fnl:
matching_elements = [address_element for address_element in comparison_cli_add if address_element in struct_add]

#capture the matching records
if matching_elements == comparison_cli_add:
validated_addresses_all.append(cli_add)
else:
invalid_addresses_all.append(cli_add)
invalid_addresses_all.append(struct_add)
invalid_addresses_all.append(len(set(comparison_cli_add) & set(struct_add)))
invalid_addresses_all.append(nonmatching_elements)

#remove the duplicate entries
fnl_validated_addresses =
for add in validated_addresses_all:
if add not in fnl_validated_addresses:
fnl_validated_addresses.append(add)









share|improve this question
























  • You should look into using sets: docs.python.org/3.7/library/…
    – Meow
    Nov 22 at 18:22












  • If any of the answers has helped you, please accept them as answers to help close the question. Thanks!
    – BernardL
    Nov 22 at 21:04














2












2








2







Background



I've two lists (of lists), each created by reading data from two address tables.



The first element in each row is the unique identifier of the list row and the remaining elements are used for address comparison.



Each list would look somewhat like this:



List 1 (cli add)



['3', 'V8T5G2', 'VICTORIA', 'BR', 'CANADA']
['543', '234', '654', 'BELMONT', 'AVENUE', 'V8S3T4', 'VICTORIA', 'BR', 'CANADA']
['28', '70', 'RUSHTON', 'RD', 'M6C2X8', 'YK', 'ON', 'CANADA']


List 2 (struct add)



['7H0044', '234', '654', 'BELMONT', 'AVENUE', 'V8S3T4', 'VICTORIA', 'BC', 'CANADA']
['7H0033', 'V8T5G2', 'VICTORIA', 'BC', 'CANADA']
['7H0001', '700', 'RUSHTON', 'ROAD', 'M6C2X7', 'YORK', 'ON', 'CANADA']
['7H0034', '217', 'BONNYMUIR', 'DRIVE', 'V7S1L4', 'WEST', 'VANCOUVER', 'BC', 'CANADA']


Task Goal
My task is to compare the addresses from the test list to the other list, and flag all the matching and non matching records.



I am looping through each row in list 1 and comparing with each row of list 2, element wise. If all the elements pulled from list 1 row are found in any row from list 2, I mark that record as 'matching' and retain the row from list 2. Have been able to identify the completely matching records.



Problem Point
The real challenge is about the non matching rows. For the non matching records from list 1, I would want to identify the most closely matching row from list 2. e.g. if row from list 1 has matching elements in three rows from list 2, I would want to pick up the list 2 row which has the highest number of matching elements.



Expected Outcome
In the data shared above, from the list 1, second row (id 542) has complete match. But the other two records aren't yielding a complete match.



I want to be able to create a list of un-matching records and capture umatching elements:



[[comparison record from list 1],[Most matching record from list 2],[non-matching elements from list 1]]


So for above shared data, I need a new list (of lists) which looks something like:



[['3', 'V8T5G2', 'VICTORIA', 'BR', 'CANADA'],['7H0033', 'V8T5G2', 'VICTORIA', 'BC', 'CANADA'],['BR']]
[['28', '70', 'RUSHTON', 'RD', 'M6C2X8', 'YK', 'ON', 'CANADA'],['7H0001', '700', 'RUSHTON', 'ROAD', 'M6C2X7', 'YORK', 'ON', 'CANADA'],['70','RD','M6C2X8', 'YK']]


The code below gets me the results partially. I am not being able to find a way to filter the highest matching rows.



Code Snippet
Here is how I found the matching and non-matching records (list 1 is referred by cli_add_fnl and list 2 is struc_add_fnl). Have also figured the way to list the unmatched elements and count of matching elements. Just need a way to pull only the rows with max count for list element 1.



### Step 4 - Identifying the matching and non matching addresses ###  
validated_addresses_all =
invalid_addresses_all =

for cli_add in cli_add_fnl:
comparison_cli_add=cli_add.copy()

#removing the id column from comparison
comparison_cli_add.pop(0)


for struct_add in struct_add_fnl:
matching_elements = [address_element for address_element in comparison_cli_add if address_element in struct_add]

#capture the matching records
if matching_elements == comparison_cli_add:
validated_addresses_all.append(cli_add)
else:
invalid_addresses_all.append(cli_add)
invalid_addresses_all.append(struct_add)
invalid_addresses_all.append(len(set(comparison_cli_add) & set(struct_add)))
invalid_addresses_all.append(nonmatching_elements)

#remove the duplicate entries
fnl_validated_addresses =
for add in validated_addresses_all:
if add not in fnl_validated_addresses:
fnl_validated_addresses.append(add)









share|improve this question















Background



I've two lists (of lists), each created by reading data from two address tables.



The first element in each row is the unique identifier of the list row and the remaining elements are used for address comparison.



Each list would look somewhat like this:



List 1 (cli add)



['3', 'V8T5G2', 'VICTORIA', 'BR', 'CANADA']
['543', '234', '654', 'BELMONT', 'AVENUE', 'V8S3T4', 'VICTORIA', 'BR', 'CANADA']
['28', '70', 'RUSHTON', 'RD', 'M6C2X8', 'YK', 'ON', 'CANADA']


List 2 (struct add)



['7H0044', '234', '654', 'BELMONT', 'AVENUE', 'V8S3T4', 'VICTORIA', 'BC', 'CANADA']
['7H0033', 'V8T5G2', 'VICTORIA', 'BC', 'CANADA']
['7H0001', '700', 'RUSHTON', 'ROAD', 'M6C2X7', 'YORK', 'ON', 'CANADA']
['7H0034', '217', 'BONNYMUIR', 'DRIVE', 'V7S1L4', 'WEST', 'VANCOUVER', 'BC', 'CANADA']


Task Goal
My task is to compare the addresses from the test list to the other list, and flag all the matching and non matching records.



I am looping through each row in list 1 and comparing with each row of list 2, element wise. If all the elements pulled from list 1 row are found in any row from list 2, I mark that record as 'matching' and retain the row from list 2. Have been able to identify the completely matching records.



Problem Point
The real challenge is about the non matching rows. For the non matching records from list 1, I would want to identify the most closely matching row from list 2. e.g. if row from list 1 has matching elements in three rows from list 2, I would want to pick up the list 2 row which has the highest number of matching elements.



Expected Outcome
In the data shared above, from the list 1, second row (id 542) has complete match. But the other two records aren't yielding a complete match.



I want to be able to create a list of un-matching records and capture umatching elements:



[[comparison record from list 1],[Most matching record from list 2],[non-matching elements from list 1]]


So for above shared data, I need a new list (of lists) which looks something like:



[['3', 'V8T5G2', 'VICTORIA', 'BR', 'CANADA'],['7H0033', 'V8T5G2', 'VICTORIA', 'BC', 'CANADA'],['BR']]
[['28', '70', 'RUSHTON', 'RD', 'M6C2X8', 'YK', 'ON', 'CANADA'],['7H0001', '700', 'RUSHTON', 'ROAD', 'M6C2X7', 'YORK', 'ON', 'CANADA'],['70','RD','M6C2X8', 'YK']]


The code below gets me the results partially. I am not being able to find a way to filter the highest matching rows.



Code Snippet
Here is how I found the matching and non-matching records (list 1 is referred by cli_add_fnl and list 2 is struc_add_fnl). Have also figured the way to list the unmatched elements and count of matching elements. Just need a way to pull only the rows with max count for list element 1.



### Step 4 - Identifying the matching and non matching addresses ###  
validated_addresses_all =
invalid_addresses_all =

for cli_add in cli_add_fnl:
comparison_cli_add=cli_add.copy()

#removing the id column from comparison
comparison_cli_add.pop(0)


for struct_add in struct_add_fnl:
matching_elements = [address_element for address_element in comparison_cli_add if address_element in struct_add]

#capture the matching records
if matching_elements == comparison_cli_add:
validated_addresses_all.append(cli_add)
else:
invalid_addresses_all.append(cli_add)
invalid_addresses_all.append(struct_add)
invalid_addresses_all.append(len(set(comparison_cli_add) & set(struct_add)))
invalid_addresses_all.append(nonmatching_elements)

#remove the duplicate entries
fnl_validated_addresses =
for add in validated_addresses_all:
if add not in fnl_validated_addresses:
fnl_validated_addresses.append(add)






python python-3.x






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 22 at 18:22

























asked Nov 22 at 17:15









Sushant Vasishta

948




948












  • You should look into using sets: docs.python.org/3.7/library/…
    – Meow
    Nov 22 at 18:22












  • If any of the answers has helped you, please accept them as answers to help close the question. Thanks!
    – BernardL
    Nov 22 at 21:04


















  • You should look into using sets: docs.python.org/3.7/library/…
    – Meow
    Nov 22 at 18:22












  • If any of the answers has helped you, please accept them as answers to help close the question. Thanks!
    – BernardL
    Nov 22 at 21:04
















You should look into using sets: docs.python.org/3.7/library/…
– Meow
Nov 22 at 18:22






You should look into using sets: docs.python.org/3.7/library/…
– Meow
Nov 22 at 18:22














If any of the answers has helped you, please accept them as answers to help close the question. Thanks!
– BernardL
Nov 22 at 21:04




If any of the answers has helped you, please accept them as answers to help close the question. Thanks!
– BernardL
Nov 22 at 21:04












1 Answer
1






active

oldest

votes


















1














This is one way to do it with ignoring position and the first item by comparing the values that are in adds and struct_adds and internally keeping a counter of the highest matches. As long there is a match it will update the counter and gets the index of the highest match else in the example below, it does nothing. Differences from item in add and the highest matches are then compared.



The results are then appended accordingly to a list.



adds = [['3', 'V8T5G2', 'VICTORIA', 'BR', 'CANADA'],
['543', '234', '654', 'BELMONT', 'AVENUE', 'V8S3T4', 'VICTORIA', 'BR', 'CANADA'],
['28', '70', 'RUSHTON', 'RD', 'M6C2X8', 'YK', 'ON', 'CANADA']]

struct_adds = [ ['7H0044', '234', '654', 'BELMONT', 'AVENUE', 'V8S3T4', 'VICTORIA', 'BC', 'CANADA'],
['7H0033', 'V8T5G2', 'VICTORIA', 'BC', 'CANADA'],
['7H0001', '700', 'RUSHTON', 'ROAD', 'M6C2X7', 'YORK', 'ON', 'CANADA'],
['7H0034', '217', 'BONNYMUIR', 'DRIVE', 'V7S1L4', 'WEST', 'VANCOUVER', 'BC', 'CANADA']]

results =

for add in adds:
match_count = 0
match_index = 0
for idx,struct_add in enumerate(struct_adds):
matches = [add_item in struct_add[1:] for add_item in add[1:]]
if matches.count(True) > match_count:
match_count = matches.count(True)
match_index = idx
if match_count == 0:
pass # no matches
else:
highest_match = struct_adds[match_index]
differences = [i for i in add[1:] if i not in highest_match]
results.append([add,highest_match,differences])


Or if you want to use set operations, which should be more effecient as suggested in the comments you can replace the for block with:



for add in adds:
match_count = 0
match_index = 0
for idx,struct_add in enumerate(struct_adds):
matches = set(add[1:]) & set(struct_add[1:])
if len(matches) > match_count:
match_count = len(matches)
match_index = idx
if match_count == 0:
pass # no matches
else:
highest_match = struct_adds[match_index]
differences = list(set(add[1:]) - set(highest_match[1:]))
results.append([add,highest_match,differences])


Both yields the same results:



results
>>
[[['3', 'V8T5G2', 'VICTORIA', 'BR', 'CANADA'],
['7H0033', 'V8T5G2', 'VICTORIA', 'BC', 'CANADA'],
['BR']],
[['543',
'234',
'654',
'BELMONT',
'AVENUE',
'V8S3T4',
'VICTORIA',
'BR',
'CANADA'],
['7H0044',
'234',
'654',
'BELMONT',
'AVENUE',
'V8S3T4',
'VICTORIA',
'BC',
'CANADA'],
['BR']],
[['28', '70', 'RUSHTON', 'RD', 'M6C2X8', 'YK', 'ON', 'CANADA'],
['7H0001', '700', 'RUSHTON', 'ROAD', 'M6C2X7', 'YORK', 'ON', 'CANADA'],
['70', 'RD', 'M6C2X8', 'YK']]]


I should also add that in this example and also not to further complicate things, it will take the first highest match. This part is managed in the if clause comparing the count of true matches must be more than the current count of matches.






share|improve this answer























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53435703%2fcompare-two-non-matching-lists-and-identify-the-row-with-maximum-matching-elemen%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    1














    This is one way to do it with ignoring position and the first item by comparing the values that are in adds and struct_adds and internally keeping a counter of the highest matches. As long there is a match it will update the counter and gets the index of the highest match else in the example below, it does nothing. Differences from item in add and the highest matches are then compared.



    The results are then appended accordingly to a list.



    adds = [['3', 'V8T5G2', 'VICTORIA', 'BR', 'CANADA'],
    ['543', '234', '654', 'BELMONT', 'AVENUE', 'V8S3T4', 'VICTORIA', 'BR', 'CANADA'],
    ['28', '70', 'RUSHTON', 'RD', 'M6C2X8', 'YK', 'ON', 'CANADA']]

    struct_adds = [ ['7H0044', '234', '654', 'BELMONT', 'AVENUE', 'V8S3T4', 'VICTORIA', 'BC', 'CANADA'],
    ['7H0033', 'V8T5G2', 'VICTORIA', 'BC', 'CANADA'],
    ['7H0001', '700', 'RUSHTON', 'ROAD', 'M6C2X7', 'YORK', 'ON', 'CANADA'],
    ['7H0034', '217', 'BONNYMUIR', 'DRIVE', 'V7S1L4', 'WEST', 'VANCOUVER', 'BC', 'CANADA']]

    results =

    for add in adds:
    match_count = 0
    match_index = 0
    for idx,struct_add in enumerate(struct_adds):
    matches = [add_item in struct_add[1:] for add_item in add[1:]]
    if matches.count(True) > match_count:
    match_count = matches.count(True)
    match_index = idx
    if match_count == 0:
    pass # no matches
    else:
    highest_match = struct_adds[match_index]
    differences = [i for i in add[1:] if i not in highest_match]
    results.append([add,highest_match,differences])


    Or if you want to use set operations, which should be more effecient as suggested in the comments you can replace the for block with:



    for add in adds:
    match_count = 0
    match_index = 0
    for idx,struct_add in enumerate(struct_adds):
    matches = set(add[1:]) & set(struct_add[1:])
    if len(matches) > match_count:
    match_count = len(matches)
    match_index = idx
    if match_count == 0:
    pass # no matches
    else:
    highest_match = struct_adds[match_index]
    differences = list(set(add[1:]) - set(highest_match[1:]))
    results.append([add,highest_match,differences])


    Both yields the same results:



    results
    >>
    [[['3', 'V8T5G2', 'VICTORIA', 'BR', 'CANADA'],
    ['7H0033', 'V8T5G2', 'VICTORIA', 'BC', 'CANADA'],
    ['BR']],
    [['543',
    '234',
    '654',
    'BELMONT',
    'AVENUE',
    'V8S3T4',
    'VICTORIA',
    'BR',
    'CANADA'],
    ['7H0044',
    '234',
    '654',
    'BELMONT',
    'AVENUE',
    'V8S3T4',
    'VICTORIA',
    'BC',
    'CANADA'],
    ['BR']],
    [['28', '70', 'RUSHTON', 'RD', 'M6C2X8', 'YK', 'ON', 'CANADA'],
    ['7H0001', '700', 'RUSHTON', 'ROAD', 'M6C2X7', 'YORK', 'ON', 'CANADA'],
    ['70', 'RD', 'M6C2X8', 'YK']]]


    I should also add that in this example and also not to further complicate things, it will take the first highest match. This part is managed in the if clause comparing the count of true matches must be more than the current count of matches.






    share|improve this answer




























      1














      This is one way to do it with ignoring position and the first item by comparing the values that are in adds and struct_adds and internally keeping a counter of the highest matches. As long there is a match it will update the counter and gets the index of the highest match else in the example below, it does nothing. Differences from item in add and the highest matches are then compared.



      The results are then appended accordingly to a list.



      adds = [['3', 'V8T5G2', 'VICTORIA', 'BR', 'CANADA'],
      ['543', '234', '654', 'BELMONT', 'AVENUE', 'V8S3T4', 'VICTORIA', 'BR', 'CANADA'],
      ['28', '70', 'RUSHTON', 'RD', 'M6C2X8', 'YK', 'ON', 'CANADA']]

      struct_adds = [ ['7H0044', '234', '654', 'BELMONT', 'AVENUE', 'V8S3T4', 'VICTORIA', 'BC', 'CANADA'],
      ['7H0033', 'V8T5G2', 'VICTORIA', 'BC', 'CANADA'],
      ['7H0001', '700', 'RUSHTON', 'ROAD', 'M6C2X7', 'YORK', 'ON', 'CANADA'],
      ['7H0034', '217', 'BONNYMUIR', 'DRIVE', 'V7S1L4', 'WEST', 'VANCOUVER', 'BC', 'CANADA']]

      results =

      for add in adds:
      match_count = 0
      match_index = 0
      for idx,struct_add in enumerate(struct_adds):
      matches = [add_item in struct_add[1:] for add_item in add[1:]]
      if matches.count(True) > match_count:
      match_count = matches.count(True)
      match_index = idx
      if match_count == 0:
      pass # no matches
      else:
      highest_match = struct_adds[match_index]
      differences = [i for i in add[1:] if i not in highest_match]
      results.append([add,highest_match,differences])


      Or if you want to use set operations, which should be more effecient as suggested in the comments you can replace the for block with:



      for add in adds:
      match_count = 0
      match_index = 0
      for idx,struct_add in enumerate(struct_adds):
      matches = set(add[1:]) & set(struct_add[1:])
      if len(matches) > match_count:
      match_count = len(matches)
      match_index = idx
      if match_count == 0:
      pass # no matches
      else:
      highest_match = struct_adds[match_index]
      differences = list(set(add[1:]) - set(highest_match[1:]))
      results.append([add,highest_match,differences])


      Both yields the same results:



      results
      >>
      [[['3', 'V8T5G2', 'VICTORIA', 'BR', 'CANADA'],
      ['7H0033', 'V8T5G2', 'VICTORIA', 'BC', 'CANADA'],
      ['BR']],
      [['543',
      '234',
      '654',
      'BELMONT',
      'AVENUE',
      'V8S3T4',
      'VICTORIA',
      'BR',
      'CANADA'],
      ['7H0044',
      '234',
      '654',
      'BELMONT',
      'AVENUE',
      'V8S3T4',
      'VICTORIA',
      'BC',
      'CANADA'],
      ['BR']],
      [['28', '70', 'RUSHTON', 'RD', 'M6C2X8', 'YK', 'ON', 'CANADA'],
      ['7H0001', '700', 'RUSHTON', 'ROAD', 'M6C2X7', 'YORK', 'ON', 'CANADA'],
      ['70', 'RD', 'M6C2X8', 'YK']]]


      I should also add that in this example and also not to further complicate things, it will take the first highest match. This part is managed in the if clause comparing the count of true matches must be more than the current count of matches.






      share|improve this answer


























        1












        1








        1






        This is one way to do it with ignoring position and the first item by comparing the values that are in adds and struct_adds and internally keeping a counter of the highest matches. As long there is a match it will update the counter and gets the index of the highest match else in the example below, it does nothing. Differences from item in add and the highest matches are then compared.



        The results are then appended accordingly to a list.



        adds = [['3', 'V8T5G2', 'VICTORIA', 'BR', 'CANADA'],
        ['543', '234', '654', 'BELMONT', 'AVENUE', 'V8S3T4', 'VICTORIA', 'BR', 'CANADA'],
        ['28', '70', 'RUSHTON', 'RD', 'M6C2X8', 'YK', 'ON', 'CANADA']]

        struct_adds = [ ['7H0044', '234', '654', 'BELMONT', 'AVENUE', 'V8S3T4', 'VICTORIA', 'BC', 'CANADA'],
        ['7H0033', 'V8T5G2', 'VICTORIA', 'BC', 'CANADA'],
        ['7H0001', '700', 'RUSHTON', 'ROAD', 'M6C2X7', 'YORK', 'ON', 'CANADA'],
        ['7H0034', '217', 'BONNYMUIR', 'DRIVE', 'V7S1L4', 'WEST', 'VANCOUVER', 'BC', 'CANADA']]

        results =

        for add in adds:
        match_count = 0
        match_index = 0
        for idx,struct_add in enumerate(struct_adds):
        matches = [add_item in struct_add[1:] for add_item in add[1:]]
        if matches.count(True) > match_count:
        match_count = matches.count(True)
        match_index = idx
        if match_count == 0:
        pass # no matches
        else:
        highest_match = struct_adds[match_index]
        differences = [i for i in add[1:] if i not in highest_match]
        results.append([add,highest_match,differences])


        Or if you want to use set operations, which should be more effecient as suggested in the comments you can replace the for block with:



        for add in adds:
        match_count = 0
        match_index = 0
        for idx,struct_add in enumerate(struct_adds):
        matches = set(add[1:]) & set(struct_add[1:])
        if len(matches) > match_count:
        match_count = len(matches)
        match_index = idx
        if match_count == 0:
        pass # no matches
        else:
        highest_match = struct_adds[match_index]
        differences = list(set(add[1:]) - set(highest_match[1:]))
        results.append([add,highest_match,differences])


        Both yields the same results:



        results
        >>
        [[['3', 'V8T5G2', 'VICTORIA', 'BR', 'CANADA'],
        ['7H0033', 'V8T5G2', 'VICTORIA', 'BC', 'CANADA'],
        ['BR']],
        [['543',
        '234',
        '654',
        'BELMONT',
        'AVENUE',
        'V8S3T4',
        'VICTORIA',
        'BR',
        'CANADA'],
        ['7H0044',
        '234',
        '654',
        'BELMONT',
        'AVENUE',
        'V8S3T4',
        'VICTORIA',
        'BC',
        'CANADA'],
        ['BR']],
        [['28', '70', 'RUSHTON', 'RD', 'M6C2X8', 'YK', 'ON', 'CANADA'],
        ['7H0001', '700', 'RUSHTON', 'ROAD', 'M6C2X7', 'YORK', 'ON', 'CANADA'],
        ['70', 'RD', 'M6C2X8', 'YK']]]


        I should also add that in this example and also not to further complicate things, it will take the first highest match. This part is managed in the if clause comparing the count of true matches must be more than the current count of matches.






        share|improve this answer














        This is one way to do it with ignoring position and the first item by comparing the values that are in adds and struct_adds and internally keeping a counter of the highest matches. As long there is a match it will update the counter and gets the index of the highest match else in the example below, it does nothing. Differences from item in add and the highest matches are then compared.



        The results are then appended accordingly to a list.



        adds = [['3', 'V8T5G2', 'VICTORIA', 'BR', 'CANADA'],
        ['543', '234', '654', 'BELMONT', 'AVENUE', 'V8S3T4', 'VICTORIA', 'BR', 'CANADA'],
        ['28', '70', 'RUSHTON', 'RD', 'M6C2X8', 'YK', 'ON', 'CANADA']]

        struct_adds = [ ['7H0044', '234', '654', 'BELMONT', 'AVENUE', 'V8S3T4', 'VICTORIA', 'BC', 'CANADA'],
        ['7H0033', 'V8T5G2', 'VICTORIA', 'BC', 'CANADA'],
        ['7H0001', '700', 'RUSHTON', 'ROAD', 'M6C2X7', 'YORK', 'ON', 'CANADA'],
        ['7H0034', '217', 'BONNYMUIR', 'DRIVE', 'V7S1L4', 'WEST', 'VANCOUVER', 'BC', 'CANADA']]

        results =

        for add in adds:
        match_count = 0
        match_index = 0
        for idx,struct_add in enumerate(struct_adds):
        matches = [add_item in struct_add[1:] for add_item in add[1:]]
        if matches.count(True) > match_count:
        match_count = matches.count(True)
        match_index = idx
        if match_count == 0:
        pass # no matches
        else:
        highest_match = struct_adds[match_index]
        differences = [i for i in add[1:] if i not in highest_match]
        results.append([add,highest_match,differences])


        Or if you want to use set operations, which should be more effecient as suggested in the comments you can replace the for block with:



        for add in adds:
        match_count = 0
        match_index = 0
        for idx,struct_add in enumerate(struct_adds):
        matches = set(add[1:]) & set(struct_add[1:])
        if len(matches) > match_count:
        match_count = len(matches)
        match_index = idx
        if match_count == 0:
        pass # no matches
        else:
        highest_match = struct_adds[match_index]
        differences = list(set(add[1:]) - set(highest_match[1:]))
        results.append([add,highest_match,differences])


        Both yields the same results:



        results
        >>
        [[['3', 'V8T5G2', 'VICTORIA', 'BR', 'CANADA'],
        ['7H0033', 'V8T5G2', 'VICTORIA', 'BC', 'CANADA'],
        ['BR']],
        [['543',
        '234',
        '654',
        'BELMONT',
        'AVENUE',
        'V8S3T4',
        'VICTORIA',
        'BR',
        'CANADA'],
        ['7H0044',
        '234',
        '654',
        'BELMONT',
        'AVENUE',
        'V8S3T4',
        'VICTORIA',
        'BC',
        'CANADA'],
        ['BR']],
        [['28', '70', 'RUSHTON', 'RD', 'M6C2X8', 'YK', 'ON', 'CANADA'],
        ['7H0001', '700', 'RUSHTON', 'ROAD', 'M6C2X7', 'YORK', 'ON', 'CANADA'],
        ['70', 'RD', 'M6C2X8', 'YK']]]


        I should also add that in this example and also not to further complicate things, it will take the first highest match. This part is managed in the if clause comparing the count of true matches must be more than the current count of matches.







        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Nov 22 at 18:34

























        answered Nov 22 at 18:22









        BernardL

        2,3381829




        2,3381829






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.





            Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


            Please pay close attention to the following guidance:


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53435703%2fcompare-two-non-matching-lists-and-identify-the-row-with-maximum-matching-elemen%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Berounka

            Sphinx de Gizeh

            Different font size/position of beamer's navigation symbols template's content depending on regular/plain...