Jordan Plays With Playwright

Demo code here

Much to my surprise, Playwright has entered the scene. I follow Andrey Lushnikov on twitter and on January 22nd, he made this tweet:

Folks! I'm happy to share what we've been working on:



📣 https://t.co/ABrJpvbwSy



Playwright is like Puppeteer, but cross-browser. pic.twitter.com/PiMjqwr7uF — Andrey Lushnikov (@aslushnikov) January 22, 2020

It turns out that the whole Puppeteer team has moved over to Microsoft in pursuit of creating Playwright. Playwright uses, as far as I can tell, almost exactly the same API as Puppeteer. One big drawback for a typescript guy like me is that there isn’t a type definition file for it yet, like there is for puppeteer. Maybe it’s time for me to learn how to create a definition file.

Check out the documentation for Playwright here.

For learning to web scrape with puppeteer, check here.

Different devices

Playwright and puppeteer were both largely built for automated web testing and they do a great job with this. While I mostly use them for web scraping and automating tedious tasks, there is a large part of these tools that is available to help with testing.

One of the opening examples it shows is how easy it is to test with different devices. Look how the code works:

const pixel2 = devices['Pixel 2']; const browser = await chromium.launch({ headless: false }); const context = await browser.newContext({ viewport: pixel2.viewport, userAgent: pixel2.userAgent, geolocation: { longitude: longitude, latitude: latitude }, permissions: { 'https://www.google.com': ['geolocation'] } }); const page = await context.newPage(); await page.goto('https://maps.google.com'); await page.click('text="Your location"'); await page.waitForRequest(/.*pwa\/net.js.*/); await page.screenshot({ path: `${longitude}, ${latitude}-android.png` }); await browser.close();

pixel2 is imported from Playwright ( const playwright = require('playwright'); ) and from there you can just all the stats that comes with that device. Pretty amazing and very simple.

I wanted to mess around a little bit with the geolocation things since I’d never used that with puppeteer. I built a random longitude and latitude function and then tried hitting google maps from each of these random positions and see how that kind of thing would affect google blocking me. After 20 attempts google hadn’t flagged anything. In this example I just have five loops.

async function tryDevices() { // Loop five times with random locations for (let i = 0; i < 5; i++) { const latitude = getRandomInRange(-90, 90, 3); const longitude = getRandomInRange(-90, 90, 3); const pixel2 = devices['Pixel 2']; const browser = await chromium.launch({ headless: false }); const context = await browser.newContext({ viewport: pixel2.viewport, userAgent: pixel2.userAgent, geolocation: { longitude: longitude, latitude: latitude }, permissions: { 'https://www.google.com': ['geolocation'] } }); const page = await context.newPage(); await page.goto('https://maps.google.com'); await page.click('text="Your location"'); await page.waitForRequest(/.*pwa\/net.js.*/); await page.screenshot({ path: `${longitude}, ${latitude}-android.png` }); await browser.close(); } } // Longitude and latitude function function getRandomInRange(from, to, fixed) { return (Math.random() * (to - from) + from).toFixed(fixed) * 1; }

I also learned that there is a lot of ocean on Earth. Surprise.

It could possibly be a neat trick to use the differing geolocations but I still think what happens with puppeteer stealth and the items I discussed in the how to avoid being blocked with puppeteer post are better for just avoiding blocked.

Different browsers

Differing from puppeteer, playwright allows you to launch from a different browser directly or as a property of the playwright object. As we saw up with the differing devices, we call the launch function directly from a browser type with const browser = await chromium.launch({ headless: false }); . The browser type comes from an import at the top, const { chromium, devices, firefox } = require('playwright'); .

The docs also show it’s simple to just loop through the available browsers like so:

for (const browserType of ['chromium', 'firefox', 'webkit']) { const browser = await playwright[browserType].launch({ headless: false }); // do your stuff here }

Conclusion

At this point, it looks to be superior to puppeteer. While the fact that it can handle multiple browsers very easily and is clearly a major goal for them is awesome, it’s probably not that impactful when using for web scraping.

An important point is, however, with the whole amazing team that created puppeteer in the first place working on playwright, this is where the updates will be. In fact, I found a cool one that wasn’t even explicitly mentioned. The ability to select based on text content. I searched high and low and couldn’t find anyway to do it this way in puppeteer, so I’m fairly certain it’s specific to playwright.

This is how I would have done something where I had a list of header items with the same selectors and I only wanted to select the one that had pricing.

// Search through content and find pricing const headerElementHandles = await page.$$('.hometop-btn .mat-button-wrapper'); for (let elementHandle of headerElementHandles) { const text: string = await elementHandle.$eval('strong', element => element.textContent); console.log('text', text); if (text && text.toLocaleLowerCase().includes('pricing')) { await elementHandle.click(); } }

I’d just get the list of all of them and then loop through them and click the one that has the text content I’m looking for.

And…with this new playwright way?

// Click based on text content await page.click('text="Pricing"');

That’s it. A lot simpler. Love it. Good job, playwright team!

Demo code here

Looking for business leads?

Using the techniques talked about here at javascriptwebscrapingguy.com, we’ve been able to launch a way to access awesome web data. Learn more at Cobalt Intelligence!