-
Notifications
You must be signed in to change notification settings - Fork 19
Complex Websites
While Infy was designed to work with as many websites as possible, some sites use techniques that make them complex (or even impossible!) to work with. The following is a list of these types of websites:
- AJAX Sites and SPAs (Single-page Applications)
- JavaScript Links
- Forms
- Sites that require Puppeteering
Important: All of these strategies presented here require you to use the Click Element action and the new AJAX Append Mode.
This is the most common type you'll want to understand. Some sites are not designed to be viewed as multiple pages, but rather as AJAX Websites or Single-page Applications (SPAs). These sites typically don't feature links, so it makes it difficult to append pages for them.
So, how can you tell if you're on an AJAX site or SPA? Here are three big giveaways:
- The entire page (tab) isn't reloading as you navigate from page to page (in other words, only part of the page is being updated)
- The address bar isn't updating as you navigate from page to page (however, many AJAX sites do update the address bar, and there is a small possibility it may actually be a Form site even if it doesn't update it)
- The site isn't using links for its pagination/next page button (inspect the HTML content and see if it's actually using any
a
elements for links)
https://www.pixiv.net/tags/オリジナル/illustrations?mode=safe
You can follow this example website more in depth here.
It's difficult to give just one example website because this type encompasses a lot of sites, such as:
- Many modern and newer websites
- Manga or Comic Readers
- Novel Readers
This is a pretty broad category that encompasses a lot of different types of websites, so it's hard to give just one example. However, if you inspect their Next buttons and it doesn't look like it's a link (an a
element), it might be an SPA, like the following:
<button>Next</button>
As the above example shows, this site is just using a button
and not an a
with a href
that points to a URL.
Infy should be compatible with many of these sites using the AJAX append mode. SPAs and AJAX sites work by having you click a button and then by replacing the elements on the current page with the next page's elements. Try using Infy's Click Element action in conjunction with the new AJAX append mode.
Some sites have a next link that doesn't point to a normal URL, but rather a javascript:
protocol URL. Alternatively, the next link may use an onclick event handler, either declared inline in its HTML or added separately in a script. The JavaScript code usually calls a function that navigates you to the next link. This style of link is generally considered bad web design practice and dated by today's standards.
Example 1
<a href="javascript:navigateToNextURL('foo', 2);">Next</a>
Example 2
<a href="#" onclick="self.location='/foo/2'; return false;">Next</a>
You should be able to make most of these sites work by using the new AJAX append mode. Alternatively, you can try making it work in other ways. For example, try clicking the Next link and see what the final URL is in your browser's address bar; if it's just doing something simple like incrementing a page number, you can try using the Increment URL action.
Some sites use forms as their next buttons, not links. The form
usually uses the post
method, so it doesn't update the URL in your address bar with the inputs it used to find the next page.
<form method="post" action="/search">
<input type="hidden" name="query" value="foo">
<input type="hidden" name="page" value="2">
<button type="submit">Next</button>
</form>
You should be able to make most of these sites work by using the Click Element
and AJAX Iframe
append mode. In the UI Window, open the Scripts dialog and check the Mirror the page (Forms/SPAs)
setting and select the By Importing
option as well. This will clone the page (and its form), allowing the AJAX Iframe to start on the same page you're on before it starts clicking the form's submit button. Then proceed as normal to fill in the Click Element (the form's button/input) and AJAX Iframe settings.
As a last resort, alternatively, you may be able to make it work by using the Increment URL action to construct a URL that contains both the form's action with the inputs the form is expecting. Try using DevTools to inspect the website and look at the HTML of the form
. For example, if the form's action
is /search
and you can see the inputs inside the form's HTML are query
and page
, you could construct a URL like https://www.example.com/search?query=foo&page=2
and have it keep incrementing the page number. This may not work if the website requires the form to be submitted via the post
method, but is worth a try.
This is technically a sub-category of SPAs, but warrants its own category as it requires some special handling. Some sites are not only SPAs, but hide their page content in such a way that requires some form of manual clicking (puppeteering) to load the content you want to load. These sites require the AJAX Iframe append mode to mirror the page, as well as a puppeteering script.
For example, consider sites that load their page content (the part you want infinite scrolling on) in:
- Dialogs or Popups
- Collapsible elements (like Accordions)
When a site does this, Infy may not be able to see the content in the AJAX Iframe initially. This would require a form of Puppeteering to mimic the clicks you performed to get the AJAX Iframe in the same state you're currently seeing the page in. Luckily, there's an option that does just that in Infy's Scripts dialog.
There's an example website in the InfyScroll Database you can look at. This specific example site has the part we want infinite scrolling in its Reviews popup, which is hidden inside an accordion. (Note that it's possible that this site may no longer be working at the point in time you're reading this.)
- Example URL: https://www.nike.com/t/lebron-xx-four-horsemen-basketball-shoes-ct1qVm/DJ5423-002
- Database Item: http://wedata.net/items/86120
First, make sure you are using the AJAX Iframe append mode. Then, open the Scripts dialog (bottom right corner) and:
- Check the
Mirror the page (Forms/SPAs)
setting and select theBy Puppeteering
option. You can then write a puppeteer-like script that contains the elements to click on. - Also, you'll almost always want to enable the
Watch for changes on the page to help enable late activation (AJAX/SPA)
setting, so make sure to check it.
Currently, Infy only supports scripts that can click elements and the paths must be CSS Selectors (XPath is not allowed here). So each line in the Puppeteer script must look something like this:
await page.click("<selector>");
Here's an example script for the above example website that clicks on its reviews accordion and then clicks a button opened by the accordion that opens the reviews in a dialog (a more complex case).
await page.click("[data-test='reviewsAccordionClick'] > summary");
await page.click("button[data-test='more-reviews']");
If puppeteering isn't working, you can try using the AJAX Native append mode as a last resort.