Sunday, 20 September 2020

puppeteer.js inside loop performance question, reusing browser instances?

Hi everybody, I'm making a scraping app, I have an array of urls from a single domain with headless puppeteer.js.For each url I have I launch a browser, check if I'm logged, if not, log and write cookies for next iteration of the loop (this is soI don't have to login each time).After that I do some classic scraping, close the page and close the browser, back to the next iteration of the loop and the same again.I was wondering if it would be possible to increase the performance and overall quality of my operations, I think opening and closing chrome instance in loop is pretty CPU intensive, I don't want memory leaks either, can I alter my code to reuse browser or page intances for my script.``` async scrapeForResults() { let urls = [ ... ]: for(var i = 0; i < urls.length; i++) { let response = await this.scrapWebForData(urls[i]); console.log('Contact info scraped - - - - - - - -'); } } ```puppeteer.js action, I do this for each url of my array, all urls are from the same domain: ``` async scrapWebForData(url) { let browser = await puppeteer.launch({ headless: true}); const context = browser.defaultBrowserContext(); context.overridePermissions("https://www.facebook.com", []); let page = await browser.newPage(); await page.setDefaultNavigationTimeout(100000); await page.setViewport({ width: 1365, height: 623 }); if (!Object.keys(cookies).length) { this.handleLoginCookies(page); let currentCookies = await page.cookies(); fs.writeFileSync('./cookies.json', JSON.stringify(currentCookies)); } else { await page.setCookie(...cookies); await page.goto(facebook, { waitUntil: "networkidle2" }); //DO ACTION HERE } await page.close(); await browser.close(); } ```

Submitted September 21, 2020 at 01:39AM by Gabotron_ES

No comments:

Post a Comment