Python: merge lists of tuples based on its values -
i'm trying figure out method merge 2 lists in python in order accomplish this:
list_a = [(item_1, attribute_x), (item_2, attribute_y), (item_3, attribute_z)] list_b = [(item_1, attribute_n), (item_3, attribute_p) ]
as result:
list_result = [(item_1, attribute_x, attribute_n), (item_2, attribute_y, false), (item_3, attribute_z, attribute_p)]
any ideas?
here interesting way solve problem, robust function returns generator:
def combine_item_pairs(l1, l2): d = {k:[v, false] k, v in l1} key, value in l2: if key in d: d[key][1] = value else: d[key] = [false, value] return (tuple([key]+value) key, value in d.iteritems())
using it:
>>> list(combine_item_pairs(list_a, list_b)) [('item_2', 'attribute_y', false), ('item_3', 'attribute_z', 'attribute_p'), ('item_1', 'attribute_x', 'attribute_n')]
here bonus solution (same interface, more efficient solution:
from itertools import groupby operator import itemgetter def combine_item_pairs(l1, l2): return (tuple(list([k]+[i(1)(i) in g]+[false])[:3]) k, g in groupby(sorted(l1+l2), key=i(0)))
results:
>>> list(combine_item_pairs(list_a, list_b)) [('item_1', 'attribute_n', 'attribute_x'), ('item_2', 'attribute_y', false), ('item_3', 'attribute_p', 'attribute_z')]
note: efficiency of solution diminished if lists require sorting, or if lot of values absent. (also, absences reflected false
value in last item of tuple, no way of knowing list missing item (that's price of efficiency) should used large data when less important know list missing item)
edit: timers:
a = [('item_1', 'attribute_x'), ('item_2', 'attribute_y'), ('item_3', 'attribute_z')] b = [('item_1', 'attribute_n'), ('item_3', 'attribute_p')] def inbar(l1, l2): d = {k:[v, false] k, v in l1} key, value in l2: if key in d: d[key][1] = value else: d[key] = [false, value] return (tuple([key]+value) key, value in d.iteritems()) def solus(l1, l2): dict_a,dict_b = dict(l1), dict(l2) items = sorted({i i,_ in l1+l2}) return [(i, dict_a.get(i,false), dict_b.get(i,false)) in items] import timeit # running each timer 3 times sure. print timeit.timer('inbar(a, b)', 'from __main__ import a, b, inbar').repeat() # [2.2363221572247483, 2.1427426716407836, 2.1545361420851963] # [2.2058199808040575, 2.137495707329387, 2.178640404817184] # [2.4588094406466743, 2.4221991975274215, 2.3586636366037856] print timeit.timer('solus(a, b)', 'from __main__ import a, b, solus').repeat() # [5.841498824468664, 5.951693880486182, 5.866254325691159] # [5.843569212526087, 5.919173415087307, 6.027018876010061] # [6.41402184345621, 6.229860036924308, 6.562849100520403]
Comments
Post a Comment