Currently, for our SaaS customers, we already do some low-level monitoring such as CPU/memory/disks. In addition, we measure the application uptime. In looking at the chart showing data from the last few weeks, the worst uptime was 99.09 %; in total there was an uptime of 99.92 %.
Not bad at all, however we know there is more to the story. What can tell us about the real performance of the application? What started as extracting some of the key processes from our SaaS customers has evolved into implementing a performance measurement. How does it work?
We extract key processes in use at all our SaaS customers such as:
- login - creating contact records - executing queries - opening the calendar - searching for companies - opening a company treeview
And we then define a maximum time that process should be allowed to take. The measurement was implemented in our own developed monitoring solution that runs Selenium scripts, API scripts written in Node.js and ping checks. All results come to our central Zabbix where we also run the aforementioned low-level monitoring and eventually create incidents or messages.
This is all part of an ongoing process to further improve the reliability of our SaaS platform.
If you have questions or if you know of an important process that should be measured, please share your ideas below this blog (or contact me anytime).