numpy - Python Pandas drop columns based on max value of column -

- July 15, 2010

im getting going pandas tool munging 2 dimensional arrays of data. it's super overwhelming, after reading docs. can can't figure out how anything, if makes sense.

my dataframe (simplified):

date       stock1  stock2   stock3 2014.10.10  74.75  nan     nan 2014.9.9    nan    100.95  nan  2010.8.8    nan    nan     120.45

so each column has 1 value.

i want remove columns have max value less x. here example, if x = 80, want new dataframe:

date        stock2   stock3 2014.10.10   nan     nan 2014.9.9     100.95  nan  2010.8.8     nan     120.45

how can acheived? i've looked @ dataframe.max() gives me series. can use that, or have lambda function somehow in select()?

use df.max() index with.

in [19]: pandas import dataframe  in [23]: df = dataframe(np.random.randn(3,3), columns=['a','b','c'])  in [36]: df out[36]:                    b         c 0 -0.928912  0.220573  1.948065 1 -0.310504  0.847638 -0.541496 2 -0.743000 -1.099226 -1.183567   in [24]: df.max() out[24]:    -0.310504 b    0.847638 c    1.948065 dtype: float64

next, make boolean expression out of this:

in [31]: df.max() > 0 out[31]:     false b     true c     true dtype: bool

next, can index df.columns (this called boolean indexing):

in [34]: df.columns[df.max() > 0] out[34]: index([u'b', u'c'], dtype='object')

which can pass df:

in [35]: df[df.columns[df.max() > 0]] out[35]:            b         c 0  0.220573  1.948065 1  0.847638 -0.541496 2 -1.099226 -1.183567

of course, instead of 0, use value want cutoff dropping.

Search This Blog

Add

numpy - Python Pandas drop columns based on max value of column -

Comments

Post a Comment

Popular posts from this blog

c++ - QTextObjectInterface with Qml TextEdit (QQuickTextEdit) -

xcode - Swift Playground - Files are not readable -

jboss7.x - JBoss AS 7.3 vs 7.4 and differences -