Python Script Stuck in a For Loop

I have this code that iterates through several links. For each one, it retrieves the HTML response and runs response_html.find('relative-time')

import pandas as pd
import requests_html

def main():
    df_links = pd.read_csv('links2.csv', index_col=0)

    session = requests_html.HTMLSession()

    try:
        for i in range(0, len(df_links.index)):
            url = df_links.iloc[i]['hyperlink']
            print(f"[{i}/{len(df_links.index)}]: {url}", flush=True)
            response = session.get(url)
            status_code = response.status_code
            if status_code == 200:
                response_html = response.html
                dateList = response_html.find('relative-time')
    except Exception as e:
        print("Something went wrong...", flush=True)

if __name__ == "__main__":
    main()

However, with 15,000 links, the code mysteriously stops halfway through execution and gets stuck in the middle of the for loop. What could be causing this?

I've asked friends to simulate it, and the same thing happened to them.

For testing purposes, the repository with the CSV is available here: https://github.com/carloseduardobanjar/nvd-linked-content-crawler-bug

When I comment out the line dateList = response_html.find('relative-time'), the code runs smoothly until completion. It seems the issue lies within that line.

ps: I know the code may seem nonsensical, but it's just an example to illustrate the problem.

edited May 3, 2024 at 16:26

asked Apr 26, 2024 at 22:05

Carlos Eduardo de Schuller Ban

715 bronze badges

Does it always stop at the same URL? Can you do that one by itself?

Tim Roberts
– Tim Roberts

2024-04-27 17:22:21 +00:00
Commented Apr 27, 2024 at 17:22
1

I know the code may seem nonsensical, but it's just an example I'm guessing the true problem is somewhere in the code you removed in order to make the example.

John Gordon
– John Gordon

2024-04-27 17:31:41 +00:00
Commented Apr 27, 2024 at 17:31
@TimRoberts yes, it always stop at the same URL, but if I start running from that URL it works.

Carlos Eduardo de Schuller Ban
– Carlos Eduardo de Schuller Ban

2024-04-29 22:31:33 +00:00
Commented Apr 29, 2024 at 22:31
@JohnGordon I tested the example code and I reproduced the error.

Carlos Eduardo de Schuller Ban
– Carlos Eduardo de Schuller Ban

2024-04-29 22:34:28 +00:00
Commented Apr 29, 2024 at 22:34
One thing that I noticed is that the response object is being left without closing. The get function from a Session in requests-html returns a requests.Response, which has the close function. Also, the Session supports a close function, maybe you can also try to close and recreate it periodically.

Miguel Angelo
– Miguel Angelo

2024-05-01 00:40:19 +00:00
Commented May 1, 2024 at 0:40

| Show 6 more comments

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

Python Script Stuck in a For Loop

0

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest