python 2.7 - FIltering Pandas Dataframe using vectorization -


i have data frame x rows , y colums, called df. have datafame df2 less x rows , y-1 colums. want filter df rows identical rows of df2 column 1 y-1. there way in vectorized fashion without iterating through rows of df2?

here code sample df:

import pandas import numpy.random rd dates = pandas.date_range('1/1/2000', periods=8) df = pandas.dataframe(rd.randn(8, 5), index=dates, columns=['call/put', 'expiration', 'strike', 'ask', 'bid']) df.iloc[2,4]=0 df.iloc[2,3]=0 df.iloc[3,4]=0 df.iloc[3,3]=0 df.iloc[2,2]=0.5 df=df.append(df.iloc[2:3]) df.iloc[8:9,3:5]=1 df.iloc[8:9,2:3]=0.6 df=df.append(df.iloc[8:9]) df.iloc[9,2]=0.4 

df2 calculated follows:

df4=df[(df["ask"]==0) & (df["bid"]==0)] 

now want filter df rows in df2 except column strike, should have value of 0.4. filter process should without iteration, because real world df large.

you try merge on both dataframes, should return (set) intersection of both.

pandas.merge (df,df2,on=['call/put','expiration','strike','ask'],left_index=true,right_index=true)               call/put  expiration    strike  ask  bid_x  bid_y 2000-01-03  0.614738   -0.363933  0.500000    0      0      0 2000-01-03  0.614738   -0.363933  0.600000    1      1      0 2000-01-03  0.614738   -0.363933  0.400000    1      1      0 2000-01-04  1.077427   -1.046127  0.025931    0      0      0 

i renamed df4 df2 - dataframe returned above should complete list of records df match records in "whitelist" contained within df2, based on columns listed in statement above.

a different statement, drops 'strike' , adds 'bid' columns matched on , returns:

pandas.merge (df,df2,on=['call/put','expiration','ask','bid'],left_index=true,right_index=true,how='inner')             call/put  expiration  strike_x  ask  bid  strike_y 2000-01-03  0.614738   -0.363933  0.500000    0    0  0.500000 2000-01-03  0.614738   -0.363933  0.600000    1    1  0.500000 2000-01-03  0.614738   -0.363933  0.400000    1    1  0.500000 2000-01-04  1.077427   -1.046127  0.025931    0    0  0.025931 

that's still not quite right - think it's because of index=true parts. force it, can convert date-indices regular columns, , include them match columns.

e.g.

df['date'] = df.index df2['date'] = df2.index 

and

pandas.merge (df,df2,on=['call/put','expiration','ask','bid','date'],how='inner') 

returns:

    call/put  expiration  strike_x  ask  bid                date  strike_y  0  0.367269   -0.616125   0.50000    0    0 2000-01-03 00:00:00   0.50000  1 -0.508974    0.281017   0.65791    0    0 2000-01-04 00:00:00   0.65791 

which think more closely matches you're looking for.


Comments

Popular posts from this blog

c++ - QTextObjectInterface with Qml TextEdit (QQuickTextEdit) -

javascript - angular ng-required radio button not toggling required off in firefox 33, OK in chrome -

xcode - Swift Playground - Files are not readable -