Saturday, 20 October 2018

Sequencing asynchronous function calls in a foreach loop

I recently created web scraper for a local real estate investor that scrapes addresses from a single website that displays information about all counties in my state. This real estate investor needs addresses from multiple counties, so the scraper was built to take one argument - the county - so it can scrape everything for that county.The idea was to create an array of counties that I could pass to a foreach loop, calling the scraper as a function. Here's a sample:const ws = require("webScraper");​const counties = ["county1", "county2", "county3"];​counties.forEach(county => {ws.scrapePage(county);})​The information being scraped gets inserted into a CSV file, but I noticed that the counties are all out of order because scrapePage() function runs multiple times in parallel. I'd like it to wait until the one county is finished before passing in the next county. Does anyone know how to accomplish this?​***Additional Information***The site I'm scraping is built with React, so I needed a headless browser (nightmare.js) to navigate the site. That said, the webScraper module that holds the scrapePage() function uses promises. I just need to make sure that this forEach runs the scrapePage() function sequentially so I don't over-hit the server. How do I make the forEach finish the entire scrapePage() function for the first element in the counties array before calling the scrapePage() function again with the next element from the array?​

Submitted October 21, 2018 at 06:36AM by androidengel

No comments:

Post a Comment