0

My script doesn't work. All 10 Processes take the first list item, then stop. output 10x 1 list entry How to fix this? Error must be in the loop or do I need a queue for this?

import finanzen_fundamentals.stocks as ff
import mysql.connector
import pandas as pd
import multiprocessing
import time

results = []

def get_list():
    try:
        mydb = mysql.connector.connect( host="localhost", user="changed", password="changed", database="stockdata")
        mycursor = mydb.cursor()
        mycursor.execute("select * from url_name")
        record = mycursor.fetchall()
        return record
    except Exception as e:
        return str(e)

def create_json(record):
    for row in record:
        try:
            df = ff.get_current_value_lxml(str(row[2])[:-1], exchange = "FSE")
            print('Name:' + row[0] + ' WKN:' + df['wkn'].values[0] + ' Preis:' + str(df['price'].values[0]) + ' Currency:' + df['currency'].values[0] + ' Zeit:' + df['time'].values[0])
            result = [[row[0], df['wkn'].values[0], df['price'].values[0], df['currency'].values[0], df['time'].values[0]]]
            return result
        except Exception as e:
            print(str(e))

def collect_results(result):
     results.extend(result)

if __name__ == '__main__':
    record = get_list()
    start_time = time.time()
    pool = multiprocessing.Pool(processes=multiprocessing.cpu_count())
    for i in range(10):
        pool.apply_async(create_json, args=(record, ), callback=collect_results)
    pool.close()
    pool.join()

    df_out = pd.DataFrame(results, columns=['Name', 'WKN', 'Preis', 'Currency', 'Zeit'])
    print(df_out)

Output:

                      Name     WKN  Preis Currency        Zeit
0  21VIANET GRP ADR A/6 O.  A1H9DT   20.0      EUR  23.10.2020
1  21VIANET GRP ADR A/6 O.  A1H9DT   20.0      EUR  23.10.2020
2  21VIANET GRP ADR A/6 O.  A1H9DT   20.0      EUR  23.10.2020
3  21VIANET GRP ADR A/6 O.  A1H9DT   20.0      EUR  23.10.2020
4  21VIANET GRP ADR A/6 O.  A1H9DT   20.0      EUR  23.10.2020
5  21VIANET GRP ADR A/6 O.  A1H9DT   20.0      EUR  23.10.2020
6  21VIANET GRP ADR A/6 O.  A1H9DT   20.0      EUR  23.10.2020
7  21VIANET GRP ADR A/6 O.  A1H9DT   20.0      EUR  23.10.2020
8  21VIANET GRP ADR A/6 O.  A1H9DT   20.0      EUR  23.10.2020
9  21VIANET GRP ADR A/6 O.  A1H9DT   20.0      EUR  23.10.2020
0

1 Answer 1

2

You got the loops structure wrong. Inside create_json you are looping the rows of record but you always call it with the same original record list and return on the first iteration. So all workers will always just work on the first line. You need to change the worker function to operate on a row:

def create_json(row):
    try:
        df = ff.get_current_value_lxml(str(row[2])[:-1], exchange = "FSE")
        print('Name:' + row[0] + ' WKN:' + df['wkn'].values[0] + ' Preis:' + str(df['price'].values[0]) + ' Currency:' + df['currency'].values[0] + ' Zeit:' + df['time'].values[0])
        result = [[row[0], df['wkn'].values[0], df['price'].values[0], df['currency'].values[0], df['time'].values[0]]]
        return result
    except Exception as e:
        print(str(e))

And then call it with each row, in the main code:

if __name__ == '__main__':
    ...
    for row in record:
        pool.apply_async(create_json, args=(row, ), callback=collect_results)
    ...

Note that in this case, instead of looping and calling apply_async, you can just use map. It even already returns a list of the results so you don't even need the callback anymore, something like:

def create_json(row):
    try:
        df = ff.get_current_value_lxml(str(row[2])[:-1], exchange = "FSE")
        print('Name:' + row[0] + ' WKN:' + df['wkn'].values[0] + ' Preis:' + str(df['price'].values[0]) + ' Currency:' + df['currency'].values[0] + ' Zeit:' + df['time'].values[0])
        result = [row[0], df['wkn'].values[0], df['price'].values[0], df['currency'].values[0], df['time'].values[0]]
        # NOTE THAT NOW IT'S A 1-D LIST!
        return result
    except Exception as e:
        print(str(e))

if __name__ == '__main__':
    record = get_list()
    start_time = time.time()
    with multiprocessing.Pool(processes=multiprocessing.cpu_count()) as pool:
        results = pool.map(create_json, record)

    df_out = pd.DataFrame(results, columns=['Name', 'WKN', 'Preis', 'Currency', 'Zeit'])
    print(df_out)
Sign up to request clarification or add additional context in comments.

2 Comments

Your answer works pretty well, but it isn't as fast as excepted. Do you have any recommendations to improve the speed? (Of course better Hardware but at first Softwareside?). Here my updatet Method: def on_call(): record = get_list() start_time = time.time() pool = multiprocessing.Pool(processes=multiprocessing.cpu_count()) for row in record: pool.apply_async(create_json, args=(row, ), callback=collect_results) pool.close() pool.join()
Did you try the map version I provided? It might be faster because the loop is internal and not an explicit for. You might also try with a thread pool by changing the import to multiprocessing.dummy but not sure if that will make it better because of GIL

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.