1

I have a program that scrapes a website and downloads files when it finds it. Often it runs just fine but at other times it flat out terminates the operation of the program before it is finishing searching the sequence. I'm stumped. It never quits while downloading only while searching. I'm currently guessing a socket error problem but like I said above I'm stumped. I've put in httperror checking and nothing gets displayed for an http error code. I'm trying right now to insert socket error checking and it gets even crazier. Prior to adding anything for socket error checking it works fine. Once I add in the socket error checking the program won't even run. It brings up the IDLE display and shows the cursor, like IDLE is ready and waiting for the next command. Otherwise without socket error checking it indicates that the program is running until either the tkinter window shuts down(err program terminates unexpectedly). If the tkinter window shuts downs it doesn't give any error on IDLE.

What do I have to do to find out why this program is terminating early at times and be able to trap it out so it won't terminate and just go back and rerun the same web address again. I think I have the rerunning the same web address taken care of but I don't have the socket error handling correct, if it's even socket error trouble. I'm stumped.

#!/usr/bin/python3.4

import urllib.request
import os
from tkinter import *
import time
import urllib.error
import errno

root = Tk()
root.title("photodownloader")
root.geometry("200x200")

app = Frame(root)
app.grid()

os.chdir('/home/someone/somewhere/')
Fileupdate = 10000
Filecount = 19999
while Fileupdate <= Filecount:
        try:
                root.title(Fileupdate)
                url = 'http://www.webpage.com/photos/'+str(Fileupdate)+'.jpg'
                a = urllib.request.urlopen(url)
                urllib.request.urlretrieve(url, str(Fileupdate)+'.jpg')
        except urllib.error.HTTPError as err:
                if err.code == 404:
                        Fileupdate = Fileupdate + 1
                        root.update_idletasks()
                        continue
                else:
                        print(err.code)
                        Fileupdate = Fileupdate + 1
                        root.update_idletasks()
                        continue
        except socket.error, v:
                print(Fileupdate, v[0])
                continue

        Fileupdate = Fileupdate+1
        root.update_idletasks()
4
  • Please also share the error message with stacktrace. Without the error, how may we help you? Commented Aug 23, 2016 at 21:04
  • @MoinuddinQuadri "If the tkinter window shuts downs it doesn't give any error on IDLE." there was no error to be had. Commented Aug 23, 2016 at 21:06
  • I'd assume this problem is caused by tkinter never starting it's mainloop(), one thing to try is to put the code inside your while loop instead in a function and call that function occasionally with the root.after method. Commented Aug 23, 2016 at 21:10
  • @Tadhg McDonald-Jensen Do you mean the root code, err root = Tk(), root.title, root.geometry? I have had the root.mainloop() as the last line of the program, just commented out for whatever reason for ages now. Commented Aug 23, 2016 at 21:20

1 Answer 1

1

I think the problem is caused by tkinter not given the chance to start it's main event loop which is done when you call root.mainloop(), I'd recommend making the code you currently have in a while loop instead to be a function that is periodically called with the root.after() method. I have included a potential change to test if this would fix the issue.

Note that the lines:

                    Fileupdate = Fileupdate + 1
                    root.update_idletasks()
                    continue

in some except branches are redundant since that would happen if the code kept going anyway, so part of modifying the code to work in a function was to simply get rid of those parts. Here is the code I'd like you to try running starting from the original while statement:

#-while Fileupdate <= Filecount:
def UPDATE_SOCKET():
    global Fileupdate #allow the global variable to be changed
    if Fileupdate <= Filecount:
#/+
        try:
                root.title(Fileupdate)
                url = 'http://www.webpage.com/photos/'+str(Fileupdate)+'.jpg'
                a = urllib.request.urlopen(url)
                urllib.request.urlretrieve(url, str(Fileupdate)+'.jpg')
        except urllib.error.HTTPError as err:
                #<large redundant section removed>
                print("error code",err.code)
        except socket.error as v:
                print("socket error",Fileupdate, v[0])
#-                continue
                root.after(500, UPDATE_SOCKET)
                return
#/+


        Fileupdate = Fileupdate+1
#-        root.update_idletasks()
        root.after(100, UPDATE_SOCKET) #after 100 milliseconds call the function again
        #change 100 to a smaller time to call this more frequently.


root.after(0,UPDATE_SOCKET) #when it has a chance, call the update for first time

root.mainloop() #enter main event loop
#/+

I indicate changed lines with a #- followed by the chunk that replaces it ending with a #/+

Sign up to request clarification or add additional context in comments.

3 Comments

I'll give it a test over the next couple of days to confirm whether it remains working or not. It does look like it solved the problem but can't tell for sure with one run through of the program. Before it worked sometimes fine other times it would choke. I did have to correct my own darn mistake once I started to realize why you removed except httperror Fileupdate line. OOPS my mistake I left it in thinking I would need it and instead it made everything increase by two instead of one, dumb me.
testing is great, at some point you will probably feel confident about it and I'd expect you to accept this then and only then. I can vouch for this working more consistently then implementing your own loop that occationally does update_idletasks since it doesn't do all the things required to keep the application running smoothly. Also I wouldn't call the code in except httperror a mistake, just extra (repetitive) noise
McDonalds-Jensen One thing I am noticing right now it is locking up the program. It doesn't give me any kind of error it just stops the execution of the program midprogram. One other thing I never used to have problems with was the program not downloading the entire file. Now it will only download a small sliver of the file, at times(other times it downloads the whole file), generally whenever it only downloads part of the file it locks right up on me and won't do anything else.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.