numpy - Python Pandas drop columns based on max value of column -
im getting going pandas tool munging 2 dimensional arrays of data. it's super overwhelming, after reading docs. can can't figure out how anything, if makes sense.
my dataframe (simplified):
date stock1 stock2 stock3 2014.10.10 74.75 nan nan 2014.9.9 nan 100.95 nan 2010.8.8 nan nan 120.45
so each column has 1 value.
i want remove columns have max value less x. here example, if x = 80, want new dataframe:
date stock2 stock3 2014.10.10 nan nan 2014.9.9 100.95 nan 2010.8.8 nan 120.45
how can acheived? i've looked @ dataframe.max() gives me series. can use that, or have lambda function somehow in select()?
use df.max()
index with.
in [19]: pandas import dataframe in [23]: df = dataframe(np.random.randn(3,3), columns=['a','b','c']) in [36]: df out[36]: b c 0 -0.928912 0.220573 1.948065 1 -0.310504 0.847638 -0.541496 2 -0.743000 -1.099226 -1.183567 in [24]: df.max() out[24]: -0.310504 b 0.847638 c 1.948065 dtype: float64
next, make boolean expression out of this:
in [31]: df.max() > 0 out[31]: false b true c true dtype: bool
next, can index df.columns (this called boolean indexing):
in [34]: df.columns[df.max() > 0] out[34]: index([u'b', u'c'], dtype='object')
which can pass df:
in [35]: df[df.columns[df.max() > 0]] out[35]: b c 0 0.220573 1.948065 1 0.847638 -0.541496 2 -1.099226 -1.183567
of course, instead of 0, use value want cutoff dropping.
Comments
Post a Comment