3

For pandas DataFrames in python, multiple member methods have an inplace parameter which purportedly allow you to NOT create a copy of the object, but rather to directly modify the original object*.

[*Edited to add: however, this proves to not be the case as pointed out by @juanpa.arrivillaga. inplace=True DOES copy data and merely updates a pointer associated with the modified object, so has few advantages over a manual re-assignment to the name of the original object.]

Examples that I have seen online for the use of inplace=True do not include examples where chaining is used. This comment in a related SO thread may be an answer to why I don't see such examples anywhere:

you can't method chain and operate in-place. in-place ops return None and break the chain

But, would "inplace chaining" work if you put an inplace=True in the last entry in the chain? [Edited to add: no] Or would that be equivalent to trying to change a copy created in an earlier link in the chain, which, as it is no longer your original object, is "lost" after the chain statement is complete? [Edited to add: yes; see answer here]

The use of large data objects would seem to preclude the notion of chaining without the ability to do so in-place, at least insofar as desire to maintain a low memory overhead and high computational speed. Is there an alternate implementation of pandas or, e.g. an equivalent of R's data.table available in python that might be appropriate for my needs? Or are my only options to not chain (and compute quickly) or to chain but make redundant copies of the data, at least transiently?

5
  • If the second-to-last entry in chain returns a copy and the last entry modifies it inplace but doesn't return it, it is lost. Commented Jun 13, 2023 at 16:01
  • " Or would that be equivalent to trying to change a copy created in an earlier link in the chain, which, as it is no longer your original object, is "lost" after the chain statement is complete?" yes, yes that is what would happen. Did you try it? Commented Jun 13, 2023 at 16:04
  • 1
    Also note, inplace does not generally help you actually conserve memory. See this claim by a core contributor. The keyword argument is being deprecated in Pandas v2 (instead, there will be new COW improvements) Commented Jun 13, 2023 at 16:06
  • 1
    Frankly, it seems rather strange to me to abandon the whole pandas project because you cannot use method chaining. which is essentially syntactic sugar. Commented Jun 13, 2023 at 16:12
  • @juanpa.arrivillaga your mention that inplace does not actually do what the name suggests (changing an object without copying or pointer magic), led me down a welcome rabbit hole. As there don't appear to be any advantages to using inplace and limited advantages of chaining, I may go back to "plain vanilla" pandas. I hope that, down the road, someone does develop an intuitive, fast library for big data in python. I will edit my post with a note about inplace. Commented Jun 14, 2023 at 13:45

2 Answers 2

3

Let's try it.

import pandas as pd
import numpy as np

df = pd.DataFrame({'value' : [2, 2, 1, 1, 3, 4, 5, np.NaN]})

df.sort_values('value').drop_duplicates().dropna(inplace=True)

Expect:

   value
2    1.0
0    2.0
4    3.0
5    4.0
6    5.0

Result:

   value
0    2.0
1    2.0
2    1.0
3    1.0
4    3.0
5    4.0
6    5.0
7    NaN

Answer: No, inplace=True at the end of the chain does not modify the original dataframe.

Sign up to request clarification or add additional context in comments.

4 Comments

It's more relevant that drop_duplicates does not use inplace=True than it is that dropna does. df.dropna is simply never called, which is why df is not modified.
@chepner sorry, I don't understand, why is dropna never called?
dropna is called, but not on df; it's called on whatever is returned by df.sort_values('value').drop_duplicates. (I'm distinguishing between the bound method and the unbound method.)
To be clearer, chaining is just a way to avoid a temporary variable. The chained form is equivalent to t1 = df.sort_values('values'); t2 = t1.drop_duplicates(); t2.dropna(inplace=True), except no reference to t2 is retained in the chained version.
1

If you really want it, you can create a wrapper like so (untested):

class Wrap():
    def __init__(self, df):
        self.df = df
    
    def __getattr__(self, name):
        m = getattr(self.df, name)
        def f(*args, **kwargs):
            m(*args, **kwargs, inplace=True)
            return self
        
        return f

# Usage:

Wrap(df).func1(*params1).func2(*params2)

3 Comments

Dataframes are certainly mode complex objects than this naive wrapper can comply with. It does not forward any operator like __getitem__ for example.
@jsbueno It was only meant to show basically how this could be done if one really really wants it.
@MichaelButscher thank you for the wrapper suggestion. I assumed there was a way but didn't know how off the top of my head. I appreciate that you took the time to explain this method. However, implementing this strategy is beyond the scope of what I currently wish to do.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.