python 2.7 - FIltering Pandas Dataframe using vectorization -
i have data frame x rows , y colums, called df. have datafame df2 less x rows , y-1 colums. want filter df rows identical rows of df2 column 1 y-1. there way in vectorized fashion without iterating through rows of df2?
here code sample df:
import pandas import numpy.random rd dates = pandas.date_range('1/1/2000', periods=8) df = pandas.dataframe(rd.randn(8, 5), index=dates, columns=['call/put', 'expiration', 'strike', 'ask', 'bid']) df.iloc[2,4]=0 df.iloc[2,3]=0 df.iloc[3,4]=0 df.iloc[3,3]=0 df.iloc[2,2]=0.5 df=df.append(df.iloc[2:3]) df.iloc[8:9,3:5]=1 df.iloc[8:9,2:3]=0.6 df=df.append(df.iloc[8:9]) df.iloc[9,2]=0.4
df2 calculated follows:
df4=df[(df["ask"]==0) & (df["bid"]==0)]
now want filter df rows in df2 except column strike, should have value of 0.4. filter process should without iteration, because real world df large.
you try merge on both dataframes, should return (set) intersection of both.
pandas.merge (df,df2,on=['call/put','expiration','strike','ask'],left_index=true,right_index=true) call/put expiration strike ask bid_x bid_y 2000-01-03 0.614738 -0.363933 0.500000 0 0 0 2000-01-03 0.614738 -0.363933 0.600000 1 1 0 2000-01-03 0.614738 -0.363933 0.400000 1 1 0 2000-01-04 1.077427 -1.046127 0.025931 0 0 0
i renamed df4 df2 - dataframe returned above should complete list of records df match records in "whitelist" contained within df2, based on columns listed in statement above.
a different statement, drops 'strike' , adds 'bid' columns matched on , returns:
pandas.merge (df,df2,on=['call/put','expiration','ask','bid'],left_index=true,right_index=true,how='inner') call/put expiration strike_x ask bid strike_y 2000-01-03 0.614738 -0.363933 0.500000 0 0 0.500000 2000-01-03 0.614738 -0.363933 0.600000 1 1 0.500000 2000-01-03 0.614738 -0.363933 0.400000 1 1 0.500000 2000-01-04 1.077427 -1.046127 0.025931 0 0 0.025931
that's still not quite right - think it's because of index=true parts. force it, can convert date-indices regular columns, , include them match columns.
e.g.
df['date'] = df.index df2['date'] = df2.index
and
pandas.merge (df,df2,on=['call/put','expiration','ask','bid','date'],how='inner')
returns:
call/put expiration strike_x ask bid date strike_y 0 0.367269 -0.616125 0.50000 0 0 2000-01-03 00:00:00 0.50000 1 -0.508974 0.281017 0.65791 0 0 2000-01-04 00:00:00 0.65791
which think more closely matches you're looking for.
Comments
Post a Comment