Can't scrape an image url from Zara

Question

I am trying to scrape an image url from Zara, but the only think I get back is the url of the transparent background.

This is the link I'm trying to scrape: https://static.zara.net/photos///2022/V/0/1/p/9598/176/406/2/w/850/9598176406_1_1_1.jpg?ts=1640187784252

This is the link I keep getting: https://static.zara.net/stdstatic/1.249.0-b.13/images/transparent-background.png'

Any ideas? This is my code. Thank you in advance!! *Note: I used extract() in the image, not extract_first(), to see if there were several links, but they are all the same.

import scrapy
from scrapy.linkextractors import LinkExtractor

    from Zara.items import Producto

    class ZaraSpider(scrapy.Spider):
        name = 'zara'
        allowed_domains = ['zara.com']
        start_urls = [
        'https://www.zara.com/es/es/jersey-punto-cuello-subido-p09598176.html'
        ]
    def parse(self, response):
        
        producto = Producto()
        
        # Extraemos los enlaces
        links = LinkExtractor(
            allow_domains=['zara.com'],
            restrict_xpaths=["//a"],
            allow="/es/es/"
            ).extract_links(response)
        
        outlinks = [] # Lista con todos los enlaces
        for link in links:
            url = link.url
            outlinks.append(url) # Añadimos el enlace en la lista
            yield scrapy.Request(url, callback=self.parse) # Generamos la petición  

        
        product = response.xpath('//meta[@content="product"]').extract()
        if product:
        # Extraemos la url, el nombre del producto, la descripcion y su precio
            producto['url'] = response.request.url
            producto['nombre'] = response.xpath('//h1[@class="product-detail-info__name"]/text()').extract_first()
            producto['precio'] = response.xpath('//span[@class="price__amount-current"]/text()').extract_first()
            producto['descripcion'] = response.xpath('//div[@class="expandable-text__inner-content"]//text()').extract_first()
            
            producto['imagen'] = response.xpath('//img[@class="media-image__image media__wrapper--media"]/@src').extract()
            #producto['links'] = outlinks
        
        yield producto

What is the start_urls list?

SuperUser
– SuperUser

2021-12-28 19:01:59 +00:00
Commented Dec 28, 2021 at 19:01 — SuperUser
– SuperUser, Commented Dec 28, 2021 at 19:01
I edit the code, that is the full code. Thank you!

Emilia Pérez Martín
– Emilia Pérez Martín

2021-12-29 09:28:27 +00:00
Commented Dec 29, 2021 at 9:28 — Emilia Pérez Martín
– Emilia Pérez Martín, Commented Dec 29, 2021 at 9:28

SuperUser · Accepted Answer · 2021-12-29 21:48:44Z

So the problem that it's generated with javascript. Try to request a webpage with scrapy shell and view the response, then you'll see that you can find to requested image url in another way.

import scrapy
from scrapy.linkextractors import LinkExtractor
# from Zara.items import Producto


class Producto(scrapy.Item):
    url = scrapy.Field()
    nombre = scrapy.Field()
    precio = scrapy.Field()
    descripcion = scrapy.Field()
    imagen = scrapy.Field()
    links = scrapy.Field()


class ZaraSpider(scrapy.Spider):
    name = 'zara'
    allowed_domains = ['zara.com']
    start_urls = [
        'https://www.zara.com/es/es/jersey-punto-cuello-subido-p09598176.html'
    ]

    def parse(self, response):
        producto = Producto()
    
        # Extraemos los enlaces
        links = LinkExtractor(
            allow_domains=['zara.com'],
            restrict_xpaths=["//a"],
            allow="/es/es/"
        ).extract_links(response)
    
        outlinks = []   # Lista con todos los enlaces
        for link in links:
            url = link.url
            outlinks.append(url)    # Añadimos el enlace en la lista
            yield scrapy.Request(url, callback=self.parse)  # Generamos la petición  

        product = response.xpath('//meta[@content="product"]').get()
        if product:
            # Extraemos la url, el nombre del producto, la descripcion y su precio
            producto['url'] = response.request.url
            producto['nombre'] = response.xpath('//h1[@class="product-detail-info__name"]/text()').get()
            producto['precio'] = response.xpath('//span[@class="price__amount-current"]/text()').get()
            producto['descripcion'] = response.xpath('//div[@class="expandable-text__inner-content"]//text()').get()
            producto['imagen'] = response.xpath('//meta[@property="og:image"]/@content').get()
            #producto['links'] = outlinks
    
            yield producto

BTW check out CrawlSpider.

Collectives™ on Stack Overflow

Can't scrape an image url from Zara

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related