-1

I have uploaded data from https://archive.ics.uci.edu/ml/machine-learning-databases/arrhythmia/ . As you see it has .data format. How to read it as pandas datframe in Python?

I try this. but it dens work:

with open("arrhythmia.data", "r") as f:
    arryth_df = pd.DataFrame(f.read())

It says ValueError: DataFrame constructor not properly called!

4
  • @jezrael yeah, yours is right. thought column names part is still unclear Commented Dec 16, 2020 at 13:04
  • I check arrhythmia.names but here is only description, It seems some values should be changed by rename df =df.rename(columns=0:'col1', 1:'col2') Commented Dec 16, 2020 at 13:05
  • If possible create list of all names of columns like names= ['col1','col2','col3',...] is possible use df = pd.read_csv(url, header=None, names=names) Commented Dec 16, 2020 at 13:06
  • I think this Q/A, only my list is called names, so assigned instead names=colnames here names=names Commented Dec 16, 2020 at 13:13

2 Answers 2

0

You can pass url of file to read_csv because here .data is csv format, but no header, so added header=None:

#if want see all data
pd.options.display.max_columns = None

url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/arrhythmia/arrhythmia.data'
df = pd.read_csv(url, header=None)
print (df.head())

  0    1    2    3    4    5    6    7    8    9   10   11  12  13  14   15   \
0   75    0  190   80   91  193  371  174  121  -16  13   64  -2   ?  63    0   
1   56    1  165   64   81  174  401  149   39   25  37  -17  31   ?  53    0   
2   54    0  172   95  138  163  386  185  102   96  34   70  66  23  75    0   
3   55    0  175   94  100  202  380  179  143   28  11   -5  20   ?  71    0   
4   75    0  190   80   88  181  360  177  103  -16  13   61   3   ?   ?    0   

   16   17   18   19   20   21   22   23   24   25   26   27   28   29   30   \
0   52   44    0    0   32    0    0    0    0    0    0    0   44   20   36   
1   48    0    0    0   24    0    0    0    0    0    0    0   64    0    0   
2   40   80    0    0   24    0    0    0    0    0    0   20   56   52    0   
3   72   20    0    0   48    0    0    0    0    0    0    0   64   36    0   
4   48   40    0    0   28    0    0    0    0    0    0    0   40   24    0   
...
...
...

If want also convert ? to missing values NaNs add na_values='?' parameter:

url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/arrhythmia/arrhythmia.data'
df = pd.read_csv(url, header=None, na_values='?')
print (df.head())

   0    1    2    3    4    5    6    7    8    9     10    11    12    13   \
0   75    0  190   80   91  193  371  174  121  -16  13.0  64.0  -2.0   NaN   
1   56    1  165   64   81  174  401  149   39   25  37.0 -17.0  31.0   NaN   
2   54    0  172   95  138  163  386  185  102   96  34.0  70.0  66.0  23.0   
3   55    0  175   94  100  202  380  179  143   28  11.0  -5.0  20.0   NaN   
4   75    0  190   80   88  181  360  177  103  -16  13.0  61.0   3.0   NaN   

    14   15   16   17   18   19   20   21   22   23   24   25   26   27   28   \
0  63.0    0   52   44    0    0   32    0    0    0    0    0    0    0   44   
1  53.0    0   48    0    0    0   24    0    0    0    0    0    0    0   64   
2  75.0    0   40   80    0    0   24    0    0    0    0    0    0   20   56   
3  71.0    0   72   20    0    0   48    0    0    0    0    0    0    0   64   
4   NaN    0   48   40    0    0   28    0    0    0    0    0    0    0   40  
...
...
Sign up to request clarification or add additional context in comments.

3 Comments

This is the right answer. Here is a complement that I was about to write in my answer. "A .data could be any format. The file extension does not dictate the format of the file. I opened the file in notepad and noticed that the values in it are comma-separated. There is a built-in reader function in pandas that takes care of loading comma-separated files, pd.read_csv"
and there is no column names? i think they are in arrhythmia.names. is it possible to add them as column names?
@french_fries - Is possible defined columns names by list?
0

Do it this way with StringIO:

from io import StringIO
import pandas as pd
with open("arrhythmia.data", "r") as f:
    data = StringIO(f.read())
    arryth_df = pd.read_csv(data)

1 Comment

and there is no column names? i think they are in arrhythmia.names. is it possible to add them as column names?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.