Sessions are not created in advance if `BaseProductPage` is used. #244

Nykakin · 2025-01-27T23:43:42Z

Consider simplified session config:

@session_config("toscrape.com")
class ToScrapeComLocationSessionConfig(SessionConfig):
    def params(self, request):
        logger.debug(">>>>>>>>>>>>>>>>>>>>>>>>>>> Create session <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<")
        return {
            "url": "https://toscrape.com",
            "browserHtml": True,
            # ...
        }

    def check(self, response, request):
        return True

And a simplified page object:

@handle_urls("toscrape.com", instead_of=ProductPage)
@attrs.define
class ToScrapeComProductPage(ProductPage):
    downloader: HttpClient
    
    @field    
    async def name(self):
        page_data = await self.get_page_data()
        return page_data.get("name")
           
    async def get_page_data(self):
        logger.debug(">>>>>>>>>>>>>>>>>>>>>>>>>>> Get data from graphql <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<")
        response = await self.downloader.post(
            url="https://toscrape.com/graphql",
            # ...
        )              
        return {"name": "test name"}

When I crawl everything works as expected: first session is created and then I query for graphql while in this session:

2025-01-28 00:23:54 [majornetlocs.page_objects.toscrape.com.sessions] DEBUG: >>>>>>>>>>>>>>>>>>>>>>>>>>> Create session <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
2025-01-28 00:23:54 [zyte_api._retry] DEBUG: Starting call to 'zyte_api._async.AsyncZyteAPI.get.<locals>.request', this is the 1st time calling it.
2025-01-28 00:23:58 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://toscrape.com> (referer: None) ['zyte-api']
2025-01-28 00:23:58 [zyte_api._retry] DEBUG: Starting call to 'zyte_api._async.AsyncZyteAPI.get.<locals>.request', this is the 1st time calling it.
2025-01-28 00:24:01 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://toscrape.com/p/316979607> (referer: None) ['zyte-api']
2025-01-28 00:24:01 [majornetlocs.page_objects.toscrape.com.products] DEBUG: >>>>>>>>>>>>>>>>>>>>>>>>>>> Get data from graphql <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

So far so food. The problem is that my ToScrapeComProductPage class doesn't actually inherit from ProductPage but from BaseProductPage. This is because we don't want to actually retrieve HTML data as we're getting all we need from graphql request:

@handle_urls("toscrape.com", instead_of=ProductPage)
@attrs.define
class ToScrapeComProductPage(BaseProductPage):
    downloader: HttpClient

I don't enter input product url, and therefore my session is not created. Only sending graphql request is causing to create a session. As a result the ordering is incorrect:

2025-01-28 00:26:43 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://toscrape.com/p/12345> (referer: None)
2025-01-28 00:26:43 [majornetlocs.page_objects.toscrape.com.products] DEBUG: >>>>>>>>>>>>>>>>>>>>>>>>>>> Get data from graphql <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
2025-01-28 00:26:44 [majornetlocs.page_objects.toscrape.com.sessions] DEBUG: >>>>>>>>>>>>>>>>>>>>>>>>>>> Create session <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

This is problematic because inside get_page_data we already need to have session initialized in order to sent a proper request.

Therefore I'd like to request that session is created earlier when BaseProductPage is used.

The text was updated successfully, but these errors were encountered:

Gallaecio linked a pull request Jan 29, 2025 that will close this issue

Add SessionConfig.process_request #246

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sessions are not created in advance if `BaseProductPage` is used. #244

Sessions are not created in advance if `BaseProductPage` is used. #244

Nykakin commented Jan 27, 2025 •

edited by Gallaecio

Loading

Sessions are not created in advance if BaseProductPage is used. #244

Sessions are not created in advance if BaseProductPage is used. #244

Comments

Nykakin commented Jan 27, 2025 • edited by Gallaecio Loading

Sessions are not created in advance if `BaseProductPage` is used. #244

Sessions are not created in advance if `BaseProductPage` is used. #244

Nykakin commented Jan 27, 2025 •

edited by Gallaecio

Loading