Creating variables dynamically is not a good idea, but you can easily take advantage of mutable objects like dictionaries.
Adding a new DataFrame method to do this seamlessly:
from pandas.core.base import PandasObject
### this only needs to be done once per session
def to_name(df, dic, name, copy=False):
dic[name] = df.copy() if copy else df
return df
PandasObject.to_name = to_name
###
tmp = {}
df = (pd.DataFrame([[2, 4, 6],
[8, 10, 12],
[14, 16, 18],
])
.assign(something_else=100)
.div(2)
.to_name(tmp, 'after_div2', copy=True)
.div(10)
)
print(tmp['after_div2'])
print(df)
Output:
# tmp['after_div2']
0 1 2 something_else
0 1.0 2.0 3.0 50.0
1 4.0 5.0 6.0 50.0
2 7.0 8.0 9.0 50.0
# df
0 1 2 something_else
0 0.1 0.2 0.3 5.0
1 0.4 0.5 0.6 5.0
2 0.7 0.8 0.9 5.0
If you don't want to monkey patch the DataFrame objects, use pipe:
def to_name(df, dic, name, copy=False):
dic[name] = df.copy() if copy else df
return df
tmp = {}
df = (pd.DataFrame([[2, 4, 6],
[8, 10, 12],
[14, 16, 18],
])
.assign(something_else=100)
.div(2)
.pipe(to_name, tmp, 'after_div2')
.div(10)
.pipe(lambda df: print('\nQuick alternative:', df, sep='\n') or df)
)
print(tmp['after_div2'])
printing
In the same line you can also add a chainable print method, or again use a lambda in pipe:
from pandas.core.base import PandasObject
### this only needs to be done once per session
def df_print(df, *args):
if args:
print(*args)
print(df)
return df
PandasObject.print = df_print
###
df = (pd.DataFrame([[2, 4, 6],
[8, 10, 12],
[14, 16, 18],
])
.print()
.assign(something_else=100)
.div(2)
.print('\nAfter 2:')
.div(10)
.pipe(lambda df: print('\nQuick alternative:', df, sep='\n') or df)
)
Output:
0 1 2
0 2 4 6
1 8 10 12
2 14 16 18
After 2:
0 1 2 something_else
0 1.0 2.0 3.0 50.0
1 4.0 5.0 6.0 50.0
2 7.0 8.0 9.0 50.0
Quick alternative:
0 1 2 something_else
0 0.1 0.2 0.3 5.0
1 0.4 0.5 0.6 5.0
2 0.7 0.8 0.9 5.0
As a module
You could also create a module:
pandas_debug.py
from pandas.core.base import PandasObject
def df_print(df, *args):
if args:
print(*args)
print(df)
return df
PandasObject.print = df_print
def to_name(df, dic, name, copy=False):
dic[name] = df.copy() if copy else df
return df
PandasObject.to_name = to_name
Then in your code:
import pandas as pd
import pandas_debug
tmp = {}
df = (pd.DataFrame([[2, 4, 6],
[8, 10, 12],
[14, 16, 18],
])
.assign(something_else=100)
.div(2)
.to_name(tmp, 'after_div2')
.div(10)
.print()
)
.copy_to_new_variable(df_imag)instead of the:=operator. Thank you for your thoughts..copy_to_new_variable(df_imag)would be syntactic sugar fordf_imag :=. But pandas [df.copy()](pandas.pydata.org/docs/reference/api/pandas.DataFrame.copy.html intentionally doesn't allow you use an assignment target on the RHS, they really don't want you putting assigns with side-effects in a pipeline. Why do you want to do this in production? That sort of code will break lots of things, like optimization (e.g. numba). By the way, do you want the copy to be a deep-copy or shallow copy? Is your dataframe ints, floats, strings, arbitrary objects...?df_imag = dfdoes not copy the data frame