Wednesday 2 December 2015

Web scraper starter project

I've been working on a few web scrapers lately and I have found a structure that has served me well. I've decided to create and share a scraper starter repo that can be cloned and modified for each new project.There are loads of tutorials out there for how to build a web scraper, but if you're past that, are familiar with Node or Javascript and want a modular structure to scrape multiple sources for similar data, this is for you.It uses Request for making the HTTP calls to each website, Cheerio for parsing the returned HTML, and saves the scraped data in MongoDB. I'm new to MongoDB and was surprised that their examples were a mess of callbacks so I pieced together the best structure I could for my limited use but if you know of good patterns for using Mongo, I'd love to know.

Submitted December 02, 2015 at 05:44PM by spetsnaz8

No comments:

Post a Comment