Monday, 17 February 2020

Which library to use to parse multiple html pages?

I am trying to parse through the html responses I get from urls collected from news articles in RSS feeds, and extract the paragraphs (content) but the problem is I don't know which library to use.I have tried using nightmare, cheerio, and axios and read about JSDOM - which doesn't have much information so I didn't understand it. The problem is, these are fine if you know the html format of the webpages. Most of the time the formats are different so I am getting an empty array in my cheerio function.Which library could I use for webscrapping? One that will let me scrape from multiple different websites?

Submitted February 17, 2020 at 11:02AM by pappermanfan

No comments:

Post a Comment