I'm currently optimising my code and I have found bottle neck. I have dataframe df with column 'Numbers' with numbers from 1 to 100 (integers). I would like to map those numbers with dictionary. I know that I can use .map() or .replace() function but it seems that both solutions are slow and does not take into account that numbers from 'Numbers' are index of my dictionary (which is series), i.e.: I would like to perform the following:
dict_simple=[]
for i in range(100):
dict_simple.append('a' +str(i))
df['Numbers_with_a']=df['Numbers'].apply(lambda x: dict_simple[x])
Unfortunatelly apply function is also very slow. Is there any other way to do it faster? Dataframe is 50M+ records.
I have tried .map(), replace() and .apply() functions from pandas package, but performance is very poor. I would like to improve calculation time.
dict_simpleis a list... so this is confusing. Are you just trying to map those 100 possible integers to strings? And its 1 to 100, inclusive, not 0 to 99?mapper = pd.Series('a', index=range(100)) + pd.Series(range(100), dtype=str); seq = pd.Series(rng.choice(100, size=50_000_000)); seq.map(mapper). It runs 2 sec on my quite old machine. Is it not enough?