hadoop - Apache Pig: filter based on tupple member content -
i'm learning apache pig , have encountered issue realise wish. i've object (after doing group by):
mlset_1: {group chararray,mlset: {(key: chararray, text: chararray)}}
i'd generate key when pattern (pattern_a) appears in text , pattern (pattern_b) not appear in text field 1 key.
i know can use mlset.text tupple of text values specific key i'm still having same issue on how filter on list of items tuple.
here's example:
(key_a,{(key_a,start),(key_a,stop),(key_a,unknown),(key_a,whatever)}) (key_b,{(key_b,stop),(key_b,whatever)}) (key_c,{(key_c,start),(key_c,stop),(key_c,whatever)})
i'd keys lines "start" appears , "unknown" not appears. in example key_c result.
thanks in advance !
here's code might out. solution nested foreach here:
c = foreach mlset_1 {f1 = filter mlset (text == pattern_a); f2 = filter mlset (text != pattern_b); generate group, count(f1) cnt1, count(f2) cnt2;}; d = filter c (cnt1 > 1 , cnt2 == 0);
you'll have adapt comparison in nested filter.
Comments
Post a Comment