0

I am starting out in python and when doing a web scraping in python it won't show the whole list I will leave the code there, I was trying to pull the A24 films ranked in IMDB

from cmath import e
from pydoc import synopsis
from bs4 import BeautifulSoup
import requests


try:
    source =requests.get('https://www.imdb.com/list/ls024372673/')
    source.raise_for_status()  

    soup=BeautifulSoup(source.text,'html.parser')
    movies=soup.find('div',class_="lister-list").find_all('div')
   
    for movie in movies :
        name= movie.find('h3',class_="lister-item-header").a.text

        rank= movie.find('span',class_="lister-item-index unbold text-primary").text
        
        year= movie.find('span',class_="lister-item-year text-muted unbold").text

        star= movie.find('span',class_="ipl-rating-star__rating").text
        
        metascore= movie.find('div',class_="inline-block ratings-metascore").span.text

        score=movie.find('div',class_="list-description").text

        genre=movie.find('span',class_="genre").text
        
        runtime=movie.find('span',class_="runtime").text

        about=movie.find('p',class_="").text
       
        elements = movie.findAll('span', attrs = {'name':'nv'})
        votes = elements[0]['data-value']
        gross = elements[1]['data-value']

    print(name,rank,year,star,metascore,score,genre,runtime,about,votes,gross)
except Exception as e:
         print(e) 
1
  • Did you mean to indent: print(name, rank, ... so that it is inside the for movie in movies: loop? Commented Jun 25, 2022 at 14:45

2 Answers 2

0

You better should check what happens in your try / except blocks and handle exceptions e.g. with if statements:

'score': movie.find('div',class_="list-description").text if movie.find('div',class_="list-description") else None,
Example

You also could use a more structured way to hold your results:

import requests
from bs4 import BeautifulSoup

headers = {"User-Agent": "Mozilla/5.0"}
page = requests.get('https://www.imdb.com/list/ls024372673/', headers=headers)
soup = BeautifulSoup(page.content)

data = []
for movie in soup.select('.lister-item'):
    data.append({
        'name': movie.find('h3',class_="lister-item-header").a.text,
        'rank': movie.find('span',class_="lister-item-index unbold text-primary").text,
        'year': movie.find('span',class_="lister-item-year text-muted unbold").text,
        'star': movie.find('span',class_="ipl-rating-star__rating").text,
        'metascore': movie.find('div',class_="inline-block ratings-metascore").span.text,
        'score': movie.find('div',class_="list-description").text if movie.find('div',class_="list-description") else None,
        'genre': movie.find('span',class_="genre").text.strip(),
        'runtime': movie.find('span',class_="runtime").text,
        'about': movie.find('p',class_="").text,
        'elements': movie.find_all('span', attrs = {'name':'nv'}),
        'votes': elements[0]['data-value'],
        'gross': elements[1]['data-value']
    })
data
Sign up to request clarification or add additional context in comments.

Comments

0

movies it's not a list. You are using .find() that return the first found element. You have to use instead .find_all() which return a list.

Also you are looking for all the items inside the element with class="lister-list", but in this way you will get only one element, not a list of movies. You should search for all the elements with class="lister-item-content".

source = requests.get("https://www.imdb.com/list/ls024372673/")
source.raise_for_status()  

soup = BeautifulSoup(source.text, "html.parser")
movies = soup.find_all("div", class_="lister-item-content")

for movie in movies:
    name      = (movie.find("h3", class_="lister-item-header").find("a").text).strip()
    rank      = (movie.find("span", class_="lister-item-index unbold text-primary").text).strip()
    year      = (movie.find("span", class_="lister-item-year text-muted unbold").text).strip()
    stars     = (movie.find("span", class_="ipl-rating-star__rating").text).strip()
    metascore = (movie.find("div", class_="inline-block ratings-metascore").find("span").text).strip()
    # score   = movie.find("div", class_="list-description").text // There isn't this class inside movie
    genre     = (movie.find("span", class_="genre").text).strip()
    runtime   = (movie.find("span", class_="runtime").text).strip()
    about     = (movie.find("p", class_="").text).strip()

    elements = movie.findAll("span", attrs = {"name":"nv"})
    votes    = elements[0]['data-value']
    gross    = elements[1]['data-value']

An other problem is the score variable. There is no div with class="list-description" inside your movie element. You will get an error because it will return a NoneType object that have no attribute text. I have also added a .strip() to remove the spaces.

Edit: I agree with HedgeHog. His example is a perfect solution for this type of code structure. Just remember adding the .strip().

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.