I am trying to use request.get() to download files from the assist.org website for a research project. Specifically, when you go to the website they have a box for articulation agreements. While it would be awesome to come up with a way to go through all the drop down menus (Academic Year, Institution, Agreements with Other Institutions) and then view the agreement for each combination of these and download them, I need help for an even simpler step.
Clicking through and finding a link, the reports in the articulation agreements are stored in the URL format https://assist.org/transfer/report/XXXXXXX, where the X's are digits. Here is an example.
Clicking the link in my browser (Safari) opens the PDF and I can click the download button. But using the following sample Python code, it gives me only a corrupt .pdf file. I am not that well acquainted with HTML and websites etc., so I am not quite sure how to adjust the code to get the PDF file from the above link.
import requests
def download_file(file_number):
url = f"https://assist.org/transfer/report/{file_number}"
response = requests.get(url)
if response.status_code == 200:
with open(f"report_{file_number}.pdf", "wb") as file:
file.write(response.content)
print(f"File 'report_{file_number}.pdf' downloaded successfully!")
else:
print(f"Failed to download the file. HTTP status code: {response.status_code}")
file_number = "26917146"
download_file(file_number)
I tried the above piece of code, and all I got is a file that nominally has the extension .pdf, but it fails to open in Preview on MacOS.
I have also looked in the source code for the website but cannot find any references to a .pdf file...
Furthermore, contacting the people behind the webpage doesn't help much, as they cannot readily send all the PDF files yet (they are doing some restructuring).
curlyour example URL. Since you're on a mac, try thefilecommand line command, e.g.file report_26917146.pdfto see what you're getting - likely not a PDF.