Finding values and duplicates in large JSON dataset with python -


i have huge data set (b) of json objects. have smaller data set (a) of json objects well. fastest way see if every element in within b? how check if there duplicates of elements in in b?

what had in mind creating dictionary of data set b key value pair being same json value. allow fast lookups. same thing set value each key in set empty list.

with each key in set looked on set b , appended respective list. once done, length of list determine if of these values not found, matched, duplicated.

if length of list key in set

0  --> none found in b 1  --> 1 found in b >1 --> more 1 found in b (duplicates found) 

i dont think standard dictionaries support duplicates, not sure data structure use support duplicate key, value pairs

well, here's best guess. uses dicts instead of json objects, you'll want double check comparison works in case. uses generator make tuples of (object, count). if there 0 instances of obj in in b, false gets inserted. then, checks see if false there, satisfy:

see if every element in within b

i don't know if it's fastest way, came off top of head. , b lists of dicts, in case, pairs of {'a':'a'} testing. play slices @ end see if meets requirements.

list contains every other letter , list b contains letters d-z.

import string  = [{x:x} x in list(string.ascii_lowercase)[0:26:2]] # every other letter                                                                                b = [{x:x} x in list(string.ascii_lowercase)[3:26]]  def compare(a, b):     obj in a:         c = b.count(obj)         if c == 0:             yield false # every element in not in b                                                                                                              yield (obj, c)   findings = [res res in compare(a,b)] print findings  if false in findings:     print "every element of not in b" else:     res in findings:         obj, num = res         print "object %s found %d times" % (str(obj), num) 

if don't need counts, can change compare function to:

def compare2(a, b):     obj in a:         if b.count(obj) == 0:             return false     return true 

Comments

Popular posts from this blog

c++ - QTextObjectInterface with Qml TextEdit (QQuickTextEdit) -

javascript - angular ng-required radio button not toggling required off in firefox 33, OK in chrome -

xcode - Swift Playground - Files are not readable -