Specific Question: How do I set up a scheduler like Chron / Node-Scheduler within a pre-launch nodeJS application. First reddit post after 10 years of stubbornly lurking on reddit, so clearly I am very very motivated to solve this problem and desperate for help / advice :)I may need to set up a scheduler in nodeJS to run certain batch data analytics functions that generate our mongo data collections. If I provided more details than you're interested, feel free to just read the bold which hits the key points. --> For now I just want to try it out and spend no more than 1-2 days playing around to see if this is a fruitful path.The technical issue: I can't seem to install a Scheduler in our existing nodeJS project that uses typescript.I can't seem for the life of me to install a Schedule Library, import it, and configure it such that I can console.log('hello world') to get to a minimum baseline we can work with.I know there are a number of tutorials online that indicate how to setup a demo-app for a scheduler, and these tutorials (helpfully) go through the setup of the entire server side program.That said, I'm unclear where those depedencies go in our existing program. I am a startup CEO with more of a business background, and I am hesitant about messing around with our existing server.ts file (until we scale and I hire a full time CTO).Ideally, I would like to write all the import statements in its own folder. For Example in the Server Folder/Services/[SCHEDULE FOLDER would go here], and there would be primary file that sets up the basics (so that it's totally encapsulated / and if we don't go in this direction we can remove it without surgery across the codebase).The business context / issue: the particular way we use a route triggered by a key user action in our workflow to update our key data analytics layers [which are the core of the Product] is proving to be a not useful patternWe are "close" to launching a full stack web-application. Stack is Mongo, Express, Angular6, NodeJS. We use typescript.The application is gamified betting on the news. Users compete to make predictions on best with links to controversial news stories about what will happen next (sports, politics, etnertainment). The odds / lines on these markets change dynamically as users bet (like any betting market). When bets are settled, we figure out which users are right or wrong, and currently award them payouts of the type points (other payout structures like game currency or real currency could be added).The users scores are a critical part of the user experience. They want to see and visualize the scores. The scores a hierarchical. So a user may have a sports score where they won 150 points ==> Sports Score May break down into a Basketball score of Basketball +100, Football + 200, Soccer - 150 [these are the sums of all the points won or lost against that specific topic]There are a couple (I think the word is) ONTOLOGIES with various levels of ID Maps, so granular scores like NBA + College Basketball ROLL UP to a more general hierarchy (e.g. Basketball) ROLLS Up to Sports, and so on. The scores will be served to the front endFRONT END: in the long term there will be SCORE EXPLORER functionality, so users may click one data visualization (like a sexy bar chart) representing SPORTS POLITICS ENTERTAINMENT MARKETS TECH and if they click one bar chart, they would see the cascading elements underneath the selected element [BACK END IMPLICAITON: when storing the scores in the database, we need to carefully label some of the metadata, so the labels agree with the front end logic and we don't do backflips to convert back end data to the necessary front end models, etc.]I suspect that setting up a timer to call the functions to build the data analytics layers might be a fruitful way to get the server and database to cooperate.I've received the feedback - encapsulate and test - but there isn't a specific function that isn't running well in isolation.Currently, we've built a number of encapsulated functions that work, but I'm trying to solve the "meta process" that connects these functions to the right API routes / business triggers. It's when we put everything together that we often get promises errors that some process timed out, or later stage analytics layer outputs as an empty collection even though the prior collection (upon investigation - had the expected data we would want to see).I think the function we built probably works, and the api triggers work my sense is that we've written encapsulated code that while it works in isolation, might have conflicting assumptions (so back to the drawing board, but to get to launch before time runs out, want to reuse what exists vs. reinvent the wheel)Function(s):A Query is sent to MONGO [BETS COLLECTION] == > Update the BETS to an AGGREGATION COLLECTION CALLED PAYOUTS [For now, let's assume there is a 1:1 relationship on a payout][This step happens several times for each Score Classification Ontology] A Query is sent to MONGO [PAYOUTS COLLECTION from BETS COLLECTION==> GROUPBY so the Payouts turn into Sums of scores segmented By UserID & ScoreDefition [e.g. JOHNNY Scored 55 TOTAL POINTS on the topic called SPORTS-BASKETBALL-NBA] .Final step is an $Out stage that rewrites a collection called [ScoresA or ScoresB, etc. conforming to the hierarchy of score][For each ontology apply an almost parralel process] A Query is sent to a SUM OF PAYOUTS Collection [Main processing is a lookup + groupBY] ==> outputs to a SUM of SUM of payouts Collection.And So on. Sum of Sum of payouts ==> LOOKUP & GROUP BY & OUT ==> Sum of Sum of Sum of Payouts, and so on.API: the current request that triggers these functions a SUPER USER on our MODERATION TEAM completing a form where they select the correct outcome to the BET => this updates the BET information in the database, and it also updates the votes on that bet. THIS CALL also triggers the functions noted above [but maybe that's not the appropriate trigger?]Updating the analytics is the final process that occurs. The way it works is that it calls the first aggregation (including the command to OUT the data to the parent collection) and then it applies the next aggregation to that subsequent collection and so it. Each function uses as a dependency the outputted collection from the prior query.My sense is THIS IS A BAD PATTERN (?) but i'm not quite sure how to make it better. That's why I'm investigating Chron Scheduler.It's a bad pattern because the server side is timing. Mongo trying to run each query after the new collection persists, and then taking the next collection and so on, all as part of some trigger controlled by some moderator updating a bet, is probably too much coordination to workout between server and database [or I'm not sophisticated enough at this stage to confidently solve the problem]I suspect if we did the aggregations / group by's off some timed schedule, that would be fine. Like once an hour we could run each query every minute. The data analytics collections could process in the database layer of the application, the server wouldn't be waiting for these processes to resolve and everything would be happy?I'm sure that's not the only way to solve the problem. Another avenue I am investigating is to replace the OUT stage. And to add a match parameter, so each real time update only concerned the very small subset of users and scores related to the resolved bet. and then just upsert that data.So please help me node community!How can i get a scheduler for batch jobs up and running (which i'd use to build the data analytics layer, layer by layer, every interim unit of time (daily accuracy or hourly accuracy would be fine). Success would be if i could install the right library, import the right file, run my server and see "hello world" by the minuteOr... Judging from the overall issue i described, do you think i'm missing something... do you have other suggestions to help me solve the problem quickly and efficiently so we can launch (or that you think would be helpful feedback or advice to take into account next time)Startup founder (business guy far deeper in the codebase than he expected to be). Willing to grind / decently competent at coding fundamentals, but keep in mind not a leet-coder and already in-over his skis. Looking for a pattern to get to an MVP launch with the assumption that given data models are defined well, we can refactor the specific queries and server patterns as an entirely new API when we have bandwith / traction and money to hire the next tier of devs and I fire myself out of the job.Thank you!
Submitted January 14, 2020 at 05:58PM by waystar_corp
No comments:
Post a Comment