0

I have a program where the user can copy an excel table and paste it on the program itself for data analysis. The only value allowed for the table is float. The problem is that there are 4 different ways to write a float, those being:

x = 123456.789
y = 123456,789
a = 123.456,789
b = 123,456.789

and I need the code to interpret all 4 as: 123456.789

What would be the best way to do that, such that it would work on windows, linux and mac?

Here is the full function:

# Copies csv from clipboard to a pandas dataframe
clipboard_dataframe = pd.read_clipboard()
# Get row and column count
paste_rows = len(clipboard_dataframe.index)
paste_columns = len(clipboard_dataframe.columns)
# Create the table to show the values
paste_janela = QTableWidget(paste_rows, paste_columns)
# Define the labels for headers and indexes
paste_janela.setHorizontalHeaderLabels(clipboard_dataframe.columns)
paste_janela.setVerticalHeaderLabels(clipboard_dataframe.index)
# Populate the table with the proper values
for x in range(paste_janela.rowCount()):
    for y in range(paste_janela.columnCount()):
        # Error handling in case the cell value isn't a float
        if not isinstance(clipboard_dataframe.iat[x,y], numbers.Number):
            error_message = QMessageBox.critical(self.janela_principal, 
                                                    "Erro de importação de dados", 
                                                    "Houe um erro na importação de dados de tabela. \nOs únicos valores aceitos são números reais")
            raise ValueError("Valor inválido foi encontrado na tabela. Só é aceito números reais nas tabelas")
        table_value = str(clipboard_dataframe.iat[x,y])
        table_item = QTableWidgetItem(table_value)
        paste_janela.setItem(x, y, table_item)
# Pass the table to the MDI Area, turning it into a subwindow in the process
self.sandbox_mdiarea.addSubWindow(paste_janela)
# Needed to load the window, otherwise it will be hidden as default
paste_janela.show()
3
  • 2
    My first thought is to look into locale-aware parsing, rather than trying to implement it yourself. Start with locale in the stdlib, notably: locale.atoi() Commented Nov 14, 2024 at 19:18
  • Keep the last comma (",") or period (".") and convert it to a period (".") while removing the preceding commas or periods, and combine the numbers. Commented Nov 15, 2024 at 2:21
  • Actually there are more than four possible ways to represent a floating point number. And the locale module is poorly suited to handle all the variations since locale data differs across libc implementations. Commented Nov 18, 2024 at 22:56

1 Answer 1

0

Here is a solution that works for your test cases, plus a couple more I came up with. The biggest problem is a number like 123,456 where its not clear if it should be 123456 or 123.456.

The basic solution is to check if it has a decimal or a period or if it has both, then adjust the string to fit the built-in float function. Assuming a decimal divider is always provided, here is a function which meets your requirements I believe:

def read_float(floatstr: str) -> float:
    period_count = floatstr.count('.')
    comma_count = floatstr.count(',')

    if period_count > 1 and comma_count > 1:
        raise ValueError(f'Unable to read float with multiple of both period and comma: {floatstr}')

    if period_count == 0 and comma_count == 0:
        return float(floatstr)
    if period_count == 0:
        return float(floatstr.replace(',', '.'))
    if comma_count == 0:
        return float(floatstr)

    period_first = floatstr.find('.')
    comma_first = floatstr.find(',')

    if period_first < comma_first:
        return float(floatstr.replace('.', '').replace(',', '.'))
    return float(floatstr.replace(',', ''))


def _main():
    import numpy as np

    test_points = [
        ('123', 123),
        ('1.25', 1.25),
        ('1234.56', 1234.56),
        ('1,234.56', 1234.56),
        ('1,234,567.89', 1234567.89),
        ('1234567.89', 1234567.89),
        ('1,25', 1.25),
        ('1234,56', 1234.56),
        ('1.234,56', 1234.56),
        ('1.234.567,89', 1234567.89),
        ('1234567,89', 1234567.89),
        ('123,456', 123.456)  # this is problematic - no way to tell if 123456 or 123.456
    ]

    for teststr, testval in test_points:
        output = read_float(teststr)
        print(f'{teststr:>12} - {"PASS" if np.equal(testval, output) else "FAIL"}')


if __name__ == '__main__':
    _main()

Which gives the following output:

         123 - PASS
        1.25 - PASS
     1234.56 - PASS
    1,234.56 - PASS
1,234,567.89 - PASS
  1234567.89 - PASS
        1,25 - PASS
     1234,56 - PASS
    1.234,56 - PASS
1.234.567,89 - PASS
  1234567,89 - PASS
     123,456 - PASS

I'm sure a simpler solution exists using some localization library, but I was unable to find it.

Let me know if you have any questions. If you have a test point this function doesn't work for let me know and I'll see about updating it.

Sign up to request clarification or add additional context in comments.

1 Comment

Good catch of 123,456. Totally forgot about it. It is assumed that no thousandth separator is given by default, leading it to evalute 123,456 as 123.456 instead of 123456.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.