pandas - Efficient way to add to a series without duplicates -

- January 15, 2013

i need add a dataframe (or series if that's more efficient) quite often, while making sure additions don't create duplicates. dataframe grows, seems inefficient, concating calling drop_duplicates, whole dataset needs checked duplicates each addition.

the data has 2 columns guessing turning 1 index might speed things up. (or both columns hierarchical index). pandas has way of disallowing duplicate indexes?

here sample problem:

print accumulating_result   c1  c2 0   x1 1  b  x2 2  b  x3 3  c  x4  print new   c1  c2 0  b  x3 1  c  x4 2  c  x5

perform addition of new accumulating_result , get:

print accumulating_result   c1  c2 0   x1 1  b  x2 2  b  x3 3  c  x4 4  c  x5

for what's it's worth, every entry in column c2 unique.

any ideas?

you can use combine_first():

data1 = """  c1  c2 0   x1 1  b  x2 2  b  x3 3  c  x4"""   data2 = """  c1  c2 0  x  x3 1  y  x4 2  z  x5"""  import io import pandas pd  df1 = pd.read_csv(io.bytesio(data1), delim_whitespace=true) df2 = pd.read_csv(io.bytesio(data2), delim_whitespace=true)  df1.set_index("c2", inplace=true) df2.set_index("c2", inplace=true)  df1.combine_first(df2)

the output:

   c1 c2    x1  x2  b x3  b x4  c x5  z

but copy data every time. maybe use hdf5 or database better.

Search This Blog

Add

pandas - Efficient way to add to a series without duplicates -

Comments

Post a Comment

Popular posts from this blog

c++ - QTextObjectInterface with Qml TextEdit (QQuickTextEdit) -

xcode - Swift Playground - Files are not readable -

jboss7.x - JBoss AS 7.3 vs 7.4 and differences -