Sunday, 12 August 2018

Scrapping multiple links with Cheerio giving array of null or empty

I am writing a node.js application which scraps a web page to get a bunch of anchor tags (links) (close to 50-60).Then it goes through each of those links, scraps that page and within the page it looks for a matching keyword inside a div. If that keyword is found then it pushes the link and page name in an array which then has to be displayed to the user.I am using request-promise for making requests to the page.router.post('/', function (req, res) {var medicineObjArray = [];rp({uri: 'https://ift.tt/2MnQyqm function (body) {return cheerio.load(body);}}).then(function ($) {$('.remedy_list a').each((index, elem) => {var text = $(elem).text();var link = $(elem).attr('href');if (text != '' && link != undefined) {medicineObjArray.push({pageName: text,link: link});}});var promises = medicineObjArray.map(function(item, index){rp({uri : item.link,method: 'GET',transform: function(body){return cheerio.load(body);}}).then(function($) {if ($('.content:contains("paralysis")').length > 0) {var med = {medicine: item.medicine,link: item.link};return med;} else {return;}}).catch(function(err){console.log('--------- ERROR getting the page data: ' + err);});});return Promise.all(promises);}).then((data) => {res.send(data); //want to send this json array to my ajax call or just print on the page}).catch(function (err) {console.log('--------- ERROR Cheerio chocked');});});However the page loads with empty data, sometimes I get an array of null values. What am I missing here? I thought Promise.all() waits till all the promises are done and then returns them all but in my case thats not happening.I even tried removing the last then and adding it just after the Promise.all().return Promise.all(promises).then((data) => { res.send(data);});But it didn't work either.Any help/suggestion will be appreciated.Thanks :)

Submitted August 13, 2018 at 05:24AM by codeinprogress

No comments:

Post a Comment