157

How to covert a DataFrame column containing strings and NaN values to floats. And there is another column whose values are strings and floats; how to convert this entire column to floats.

1
  • 12
    DO NOT USE convert_objects. It is deprecated. Use to_numeric or astype instead Commented Nov 6, 2017 at 16:33

7 Answers 7

83

NOTE: pd.convert_objects has now been deprecated. You should use pd.Series.astype(float) or pd.to_numeric as described in other answers.

This is available in 0.11. Forces conversion (or set's to nan) This will work even when astype will fail; its also series by series so it won't convert say a complete string column

In [10]: df = DataFrame(dict(A = Series(['1.0','1']), B = Series(['1.0','foo'])))

In [11]: df
Out[11]: 
     A    B
0  1.0  1.0
1    1  foo

In [12]: df.dtypes
Out[12]: 
A    object
B    object
dtype: object

In [13]: df.convert_objects(convert_numeric=True)
Out[13]: 
   A   B
0  1   1
1  1 NaN

In [14]: df.convert_objects(convert_numeric=True).dtypes
Out[14]: 
A    float64
B    float64
dtype: object
Sign up to request clarification or add additional context in comments.

7 Comments

Please note that this does not work for columns (at leadt multiindex), works just for values in the dataframe
I had to use set_levels to convert string to float
df['ColumnName'] = df['ColumnName'].convert_objects(convert_numeric=True) You can convert just a single column.
this is now pd.to_numeric(col) in newer versions
convert_objects is deprecated in newer pandas. Use the data-type specific converters pd.to_numeric.
|
79

You can try df.column_name = df.column_name.astype(float). As for the NaN values, you need to specify how they should be converted, but you can use the .fillna method to do it.

Example:

In [12]: df
Out[12]: 
     a    b
0  0.1  0.2
1  NaN  0.3
2  0.4  0.5

In [13]: df.a.values
Out[13]: array(['0.1', nan, '0.4'], dtype=object)

In [14]: df.a = df.a.astype(float).fillna(0.0)

In [15]: df
Out[15]: 
     a    b
0  0.1  0.2
1  0.0  0.3
2  0.4  0.5

In [16]: df.a.values
Out[16]: array([ 0.1,  0. ,  0.4])

Comments

64

In a newer version of pandas (0.17 and up), you can use to_numeric function. It allows you to convert the whole dataframe or just individual columns. It also gives you an ability to select how to treat stuff that can't be converted to numeric values:

import pandas as pd
s = pd.Series(['1.0', '2', -3])
pd.to_numeric(s)
s = pd.Series(['apple', '1.0', '2', -3])
pd.to_numeric(s, errors='ignore')
pd.to_numeric(s, errors='coerce')

1 Comment

To apply pd.to_numeric to a DataFrame, one can use df.apply(pd.to_numeric) as explained in detail in this answer.
37
df['MyColumnName'] = df['MyColumnName'].astype('float64') 

4 Comments

This does not work when converting from a String to a Float: ValueError: could not convert string to float: 'date'
@Jack do you know the workaround here? I'm running into this exact issue converting string to float.
@Hatt i am facing the same issue. did you find the solution for it?
@Jack I'm not sure but you seem to mix up date format and float. # convert to datetime df['date'] = pd.to_datetime(df['date'])
14

you have to replace empty strings ('') with np.nan before converting to float. ie:

df['a']=df.a.replace('',np.nan).astype(float)

Comments

4
import pandas as pd
df['a'] = pd.to_numeric(df['a'])

1 Comment

Remember that Stack Overflow isn't just intended to solve the immediate problem, but also to help future readers find solutions to similar problems, which requires understanding the underlying code. This is especially important for members of our community who are beginners, and not familiar with the syntax. Given that, can you edit your answer to include an explanation of what you're doing and why you believe it is the best approach?
1

Here is an example

                            GHI             Temp  Power Day_Type
2016-03-15 06:00:00 -7.99999952505459e-7    18.3    0   NaN
2016-03-15 06:01:00 -7.99999952505459e-7    18.2    0   NaN
2016-03-15 06:02:00 -7.99999952505459e-7    18.3    0   NaN
2016-03-15 06:03:00 -7.99999952505459e-7    18.3    0   NaN
2016-03-15 06:04:00 -7.99999952505459e-7    18.3    0   NaN

but if this is all string values...as was in my case... Convert the desired columns to floats:

df_inv_29['GHI'] = df_inv_29.GHI.astype(float)
df_inv_29['Temp'] = df_inv_29.Temp.astype(float)
df_inv_29['Power'] = df_inv_29.Power.astype(float)

Your dataframe will now have float values :-)

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.