Personally, I believe that "uptime is king....". By uptime, I mean, application and server (might it be the web or database server)
In storytelling, many prefer to talk about the things needed to be improved and in the latter part are the things that were excellent. I'll be taking this approach too.
So what are the things we identified that needed to be improve?
(1) Since the system is new, there were a lot of familiarity issues upon transitioning from Podio (legacy CRM) to Salesforce. Though training is conducted, I'll take an amount of time for people in our operations to fully embrace the "Salesforce" way. We are aware that time is one great factor we overlooked and has made poor estimates on the timeline. If only we've allocated more time in training, it'll be a bit smoother for people to jump into their daily routine by the time we live the new system.
(2) Knowing that we've built the new platform by gluing Salesforce and Mainstack via Heroku Connect, it was a pain-in-the-ass tracking the changes. Literally, there's just too many moving parts -- and if you're working in parallel with other engineers, you can't avoid the fact that your changes might break someone else's code (that was once working well).
(3) Accountability was somehow neglected. It was somehow neglected not because everyone is evil but for the reason of "everyone was just trying their best to work things out within the timeline". When issues are found, there was no acknowledgment as to how come it happened, rather, people just fix what are "known issues".
(4) Documentation is outdated. While we've got a lot of accomplishment on the programming side, the documentation section was not being updated along-side with the changes. And we all know that documentation from a backtracked information could potentially cause incomplete notes.
So what are the things we identified that are excellent?
There are tons of things we can be proud of, however, I would be more specific on the points I'll be sharing on this blog.
(1) One great outcome was when we plan to separate the node application worker process and taking the path of cronjob from host rather than the nodejs cron approach. This gives us more room to control the process resources, depending on the host resource.
|The blue line is the one referring to the "cronserver"|
Imagine if we didn't separate this processes and host it on the same server where the application is running. What would you think will happen? For sure, hammering of resources will take place as concurrency will become a factor in resource allocation. Worst, a process might die or become stale (zombie process) due to resource limitation -- which means, application errors and timeouts are expected from time to time.
(2) Our Backend APIs are designed in a modular way (if you want to know how we are architecting our platform, you can read more about it here). Which means that changes for certain functions won't be affecting other models, limiting issues in having additional feature requests.
(3) Fully utilizing the content-delivery-network offering of AWS (Cloudfront) and Cloudinary, our Frontend and Wordpress landing pages have extremely increased the loading and response time of our website assets. Known to affect SEO rankings and better user engagement -- we expect to have great conversion rate (which at the moment, proven "working" as per the numbers shown on Google Analytics).
(5) Incorporating tools like Rollbar and Jenkins has been a winning decision for us. This helps everyone isolate an issue and mitigate "unknown" errors. Along side with the TDD approach we've observe in crafting our backend/frontend applications. We've also created automation deployment scripts via Ansible (triggered via Slack command -- chatops), which made it handy for engineers to work on their code and test in different environments.
So what are the initiative we've implemented knowing the lapses?
3 weeks after the launch date, we've huddled for half a day to talk about everything about the past 6 months of working building the platform. This was a sort of a retrospective + postmortem session to everyone inside Engineering.
Here are some agreements we've come up:
- Giving everyone the freedom to experiment and innovate but at the same time, held them responsible for whatever the code will do (accountability matters more, now).
- Documentation will always be up-to-date. Classification of notes should also be considered. Like user-specific, developer-specific, code-specific and overview-of-everything (for our board of directors and stake holders).
- Rules and Guides should be in black-and-white. Rather than relying on someone's greater knowledge, we aim to avoid human biases and make sure what's agreed are written down and followed. Inside Onerent's Engineering Department, no one is above our engineering bible.
There are still a lot of things we're planning to add to shape our Engineering Culture.
I'll be a matter of time for us to nail down what is best for us, but at least -- there's an improvement from time to time.
At the end of the day, it's all about delivery. Keeping our promise to our customers, clients, investors and everyone working at Onerent is our fuel in building more advancements on the platform and it's always a work in progress.