Hello all. I've delved into the world of opensource ecosystems in the past 9 months after 15 years away from tech ... wow things have changed since the days of JS in the early 2000's :SI've built a project using NodeJS+Express+MySQL with a HTTP request API, and now that i've gotten up to speed with JS etc. i've decided to move into react. Fortunately this is giving me the opportunity to rewrite my project from the top down, and I can right all the horrendous wrongs of my first projects 'learning phase'.The project can roughly be described as:-A web scraping service that harvests data from many different public facing sources on the WWW, incl. puppeteer driven web scraping, tapping into public APIs, etc.This service then feeds all the data into a MySQL database. The scraping service is rock solid, and has collected 5 million rows of data already and is working well with no issues. The DB itself could do with some optimizations, but the query with the highest cost returns the data in < 2 seconds so i'm not too worried.In the first iteration of my project, i created an API, which while rough and designed as i went, served it's purpose and did it well. However now that i am onto the re-write, i wanted to seek advice on design ideas around an API such as this.There was a single API endpoint, but i would create a view in MySQL for each different data set i wanted to be able to access. It wasn't a completely open conduit to the database, as the requested view had to be in a list that was checked, so accessing anything other than what i allowed was not possible.This also allowed me to use the view to transform the data, and the API code simply had to parse it into JSON and return it. This left the GROUP, ORDER, JOIN, etc. logic out of Node and in SQL where, in my view, it should be.The data from this API would be used to populate charts, and the front-end would just poll the API every 10 minutes, 30 minutes, or whatever, and update the chart ... Works well but i feel that there must be a better way.I guess my thoughts are primarily around whether using SQL views in this way has any downsides or hidden flaws. An immediate one i have come into is that using something like mysql-events in Node can track changed to BASE TABLES, but i'd need to write additional logic/mappings to establish which views are affected, what data is new, etc. etc.If i fed data from table to Node API server, the mysql-events could essentially just stream changes to the API (or another service), which could then be pushed via socket.io to subscribed clients.any ideas or recommendations? I prefer to keep things simple and dynamic as the project is very much a evolving design. I've had many years of programming experience in Pascal, PHP, SQL, JS, HTML, etc. but i have never working as a coder professionally, and this tends to make me feel that i deviate far from best practise sometimes.The API's only requirement is that it serves time-series data to the front-end, and keeps them updated when new data is brought in... Timeliness is not critical, but always welcome.The project does have a commercial sensitivity so i am reluctant to post the link on here, but i will be happy to provide it to those who want to contact me directly.I hope this makes sense... I started rambling :S
Submitted February 28, 2020 at 04:08AM by the-retlif
No comments:
Post a Comment