Skip to content

Conversation

@T-leke
Copy link

@T-leke T-leke commented Sep 29, 2025

This PR adds unit tests for the MemoryUsage extension to cover:

  • get_virtual_size() behavior (platform aware)
  • update() sets memusage/max stat
  • _check_limit() sets memusage/limit_reached, sends notification and closes the spider
  • _check_warning() sets memusage/warning_reached and only warns once

Implementation notes:

  • Tests use small fakes for crawler, stats and engine to avoid starting Twisted LoopingCalls.
  • Mail sending is mocked by patching MailSender.from_crawler to a dummy object (no real e-mails).
  • get_virtual_size behaviour is tested by assigning a fake resource (and by mocking sys.platform in the module).

Resolves: #7002

How to run locally:

  1. create a virtualenv and install test deps: pip install pytest
  2. run: pytest tests/test_extension_memusage.py -q

Please let me know if you'd prefer these tests as integration tests in a different file/structure or if you'd like me to also add a minimal CI matrix target for the new tests.

@codecov
Copy link

codecov bot commented Sep 29, 2025

❌ 4 Tests Failed:

Tests completed Failed Passed Skipped
3681 4 3677 216
View the top 3 failed test(s) by shortest run time
tests/test_extension_memusage.py::test_check_limit_triggers_mail_and_close
Stack Traces | 0.001s run time
monkeypatch = <_pytest.monkeypatch.MonkeyPatch object at 0x7f29267ac080>

    def test_check_limit_triggers_mail_and_close(monkeypatch):
        # prepare dummy mailer and patch MailSender.from_crawler to return it
        dummy_mail = DummyMail()
        monkeypatch.setattr(memusage, "MailSender", SimpleNamespace)
        # monkeypatch the from_crawler factory to return our dummy mail
>       monkeypatch.setattr(memusage.MailSender, "from_crawler", staticmethod(lambda c: dummy_mail))
E       AttributeError: <class 'types.SimpleNamespace'> has no attribute 'from_crawler'

tests/test_extension_memusage.py:138: AttributeError
tests/test_extension_memusage.py::test_check_warning_only_once
Stack Traces | 0.001s run time
monkeypatch = <_pytest.monkeypatch.MonkeyPatch object at 0x7f29267af890>

    def test_check_warning_only_once(monkeypatch):
        monkeypatch.setattr(memusage, "MailSender", SimpleNamespace)
>       monkeypatch.setattr(memusage.MailSender, "from_crawler", staticmethod(lambda c: DummyMail()))
E       AttributeError: <class 'types.SimpleNamespace'> has no attribute 'from_crawler'

tests/test_extension_memusage.py:171: AttributeError
tests/test_extension_memusage.py::test_update_sets_max
Stack Traces | 0.001s run time
monkeypatch = <_pytest.monkeypatch.MonkeyPatch object at 0x7f2926729e50>

    def test_update_sets_max(monkeypatch):
        monkeypatch.setattr(memusage, "MailSender", SimpleNamespace)
        crawler = make_crawler({
            "MEMUSAGE_ENABLED": True,
            "MEMUSAGE_LIMIT_MB": 0,
            "MEMUSAGE_WARNING_MB": 0,
            "MEMUSAGE_CHECK_INTERVAL_SECONDS": 1,
            "BOT_NAME": "tests",
        })
>       mu = memusage.MemoryUsage(crawler)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

tests/test_extension_memusage.py:124: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <scrapy.extensions.memusage.MemoryUsage object at 0x7f292672b320>
crawler = namespace(settings=<tests.test_extension_memusage.DummySettings object at 0x7f292672b230>, stats=<tests.test_extension...ge.DummySignals object at 0x7f292672b290>, engine=<tests.test_extension_memusage.DummyEngine object at 0x7f292672b2c0>)

    def __init__(self, crawler: Crawler):
        if not crawler.settings.getbool("MEMUSAGE_ENABLED"):
            raise NotConfigured
        try:
            # stdlib's resource module is only available on unix platforms.
            self.resource = import_module("resource")
        except ImportError:
            raise NotConfigured
    
        self.crawler: Crawler = crawler
        self.warned: bool = False
        self.notify_mails: list[str] = crawler.settings.getlist("MEMUSAGE_NOTIFY_MAIL")
        self.limit: int = crawler.settings.getint("MEMUSAGE_LIMIT_MB") * 1024 * 1024
        self.warning: int = crawler.settings.getint("MEMUSAGE_WARNING_MB") * 1024 * 1024
        self.check_interval: float = crawler.settings.getfloat(
            "MEMUSAGE_CHECK_INTERVAL_SECONDS"
        )
>       self.mail: MailSender = MailSender.from_crawler(crawler)
                                ^^^^^^^^^^^^^^^^^^^^^^^
E       AttributeError: type object 'types.SimpleNamespace' has no attribute 'from_crawler'

scrapy/extensions/memusage.py:53: AttributeError
tests/test_extension_memusage.py::test_get_virtual_size_linux
Stack Traces | 0.002s run time
monkeypatch = <_pytest.monkeypatch.MonkeyPatch object at 0x7f292672a4b0>

    def test_get_virtual_size_linux(monkeypatch):
        """get_virtual_size should use resource.getrusage().ru_maxrss and multiply by 1024 on non-darwin."""
        # arrange
        monkeypatch.setattr(memusage, "MailSender", SimpleNamespace)  # avoid real mail construction
        crawler = make_crawler({
            "MEMUSAGE_ENABLED": True,
            "MEMUSAGE_LIMIT_MB": 0,
            "MEMUSAGE_WARNING_MB": 0,
            "MEMUSAGE_CHECK_INTERVAL_SECONDS": 1,
            "BOT_NAME": "tests",
        })
        # create instance
>       mu = memusage.MemoryUsage(crawler)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

tests/test_extension_memusage.py:94: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <scrapy.extensions.memusage.MemoryUsage object at 0x7f292672a660>
crawler = namespace(settings=<tests.test_extension_memusage.DummySettings object at 0x7f292672a180>, stats=<tests.test_extension...ge.DummySignals object at 0x7f292672a5a0>, engine=<tests.test_extension_memusage.DummyEngine object at 0x7f292672a5d0>)

    def __init__(self, crawler: Crawler):
        if not crawler.settings.getbool("MEMUSAGE_ENABLED"):
            raise NotConfigured
        try:
            # stdlib's resource module is only available on unix platforms.
            self.resource = import_module("resource")
        except ImportError:
            raise NotConfigured
    
        self.crawler: Crawler = crawler
        self.warned: bool = False
        self.notify_mails: list[str] = crawler.settings.getlist("MEMUSAGE_NOTIFY_MAIL")
        self.limit: int = crawler.settings.getint("MEMUSAGE_LIMIT_MB") * 1024 * 1024
        self.warning: int = crawler.settings.getint("MEMUSAGE_WARNING_MB") * 1024 * 1024
        self.check_interval: float = crawler.settings.getfloat(
            "MEMUSAGE_CHECK_INTERVAL_SECONDS"
        )
>       self.mail: MailSender = MailSender.from_crawler(crawler)
                                ^^^^^^^^^^^^^^^^^^^^^^^
E       AttributeError: type object 'types.SimpleNamespace' has no attribute 'from_crawler'

scrapy/extensions/memusage.py:53: AttributeError

To view more test analytics, go to the Test Analytics Dashboard
📋 Got 3 mins? Take this short survey to help us improve Test Analytics.

@wRAR
Copy link
Member

wRAR commented Sep 29, 2025

Can you please explain why do your tests fail if you ran them before submitting?

@wRAR wRAR marked this pull request as draft October 1, 2025 11:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Cover scrapy.extensions.memusage.MemoryUsage with tests

2 participants