Usually, our applications work at a speed that provides a great experience for the end user. But what happens when the Product Owner wants to access and has to wait endless seconds to do so, or just can’t? What can we say when they call complaining about it? What kind of future action can we promise them?
Monitoring is key to know if an application is available or not. It is essential to measure an application’s performance. This analysis enables us to answer the following questions:
It tells whether the application works and provides service, not only if the servers and services are working, but also if there are clients using them. We need to monitor the services as well as the components that support the users’ ability to access.
In order to verify availability, we should:
- Know if the application’s components are working in all layers. For that, we should check at least one component per layer, so as to know if we are providing an end-to-end service.
- Know if the application’s traffic is lower than usual. If that happens, it may mean that users are unable to access, or that some cannot. We have to know the baseline and, consequently, get a metrics history.
- Perform synthetic transactions in order to check that all functionalities are providing service.
It tells how the application is responding from the user’s point of view. For example, we can measure the use of resources and the response times.
In the case of languages that use execution environments such as CLR (for .Net) or JVM (for Java), the visibility in the use of resources is different if we are evaluating the application or the operating system as a whole. For example, certain memory uses are not visible outside the environment. That is why we need tools that enable us to obtain that information, so as to know what the status is at any time. For example, tools such as JConsole or VisualVM allow us to explore a JVM, maybe not in a production environment, but to observe an application’s behavior (we will talk about other tools in the market shortly).
In addition, if we know our application’s and its components’ performance we can predict future behaviors. For example, if there is a growing number of components open, it is likely that we will use up all the resources to create new ones, with the result that the application might stop working at some point. That prediction can help us take preventive action to avoid that situation or control it if it does happen (for example, we can restart an instance to prevent all others to crash without prior notice). There are different approaches for implementing monitoring:
- Learn how each component of the architecture is working: response times, use of memory, use of CPU, I/O, etc. Some tools can even show how many times the selected classes’ methods are appealed and what is their response time.
- Learn what it takes for a business transaction to appear to the client, whether by measuring it from the client’s point of view (with a snippet in the browser, for example) or by observing traffic from a controllable point in the network (by which we could check every http interaction between them and work out the interactions’ total duration).
In a mature monitoring discipline, performance trends can be analyzed, too: how do we expect to provide service in the near future, considering the application’s behavior history. Thus, without the need to previously set performance thresholds for n metrics, we can know if there is damage in progress all the same. In that way, we can implement preventive monitoring, take action and anticipate possible flaws.
How many clients are using the application? How many are using the same functionality? What will be the impact of the next deployment to production regarding the most frequently used functionalities?
The concept of capacity can give us the answers to those questions. It can also help us estimate the future growth of infrastructure, or of services’ scalability. It will allow us to do the following:
- Know the transaction volume
- Know which resources were used
- Measure and restructure resources
- Plan releases
This variable can be seen as a corollary to the previously discussed concepts. If less than 𝝰% of any layer of our application is available, we can say that it is at high risk of not being able to provide service.
The analysis of each component’s ability to provide service will prove relevant when measuring the application’s risk.
By analyzing the three goals and monitoring the various layers of our application, we can optimize its study in specific functionalities. That is, by knowing which components are involved in a given transaction, we can tell if a functionality is providing service in each layer of the architecture, if it is damaged, or of it is flawed.
For example, in a given period of time, our website executed x number of sale attempts, of which 80% were successful, 10% were cancelled by the user, and 10% failed due to timeout. Coincidentally, at times the CPU was in a critical point, and the methods involved in such transactions took longer than usual to execute.
Who is interested in the results obtained after monitoring?
Viewing monitoring in this way can yield many benefits for the IT area, since it can help control how applications in production behave and watch over homologation environments in order to know if new releases are affected in some way by the application’s performance.
In turn, monitoring both environments can be very useful for developers, to understand in what type of transactions the application is having low performance.
In addition, if we fear an incident is about to happen, the help desk can anticipate to lots of complaints when it actually occurs.
Lastly, the Product Owner will also be interested in knowing what is the operational capacity available with the current IT budget, what is the clients’ feel regarding the app, and what are the points that need improvement.
There are tools of different complexity and pricing in the market that can help apply a few or all of the goals discussed. There are also open source applications to which we can add our own components to provide monitoring information, so as to implement the kind of monitoring we need. It is worth remembering that with all the necessary goals covered even with a few metrics, it is enough to improve the results we get.
These are world class tools:
In general, those tools belong to a comprehensive monitoring suite. Most of them have components (agents) that internally explore JVMs, CLRs, and even runtime frameworks, like PHP + Zend. They are evolving towards newer technologies, and can capture health data from Docker, VMWare ESX, etc.
They are leader methodologies as per Gartner Magic Quadrant; they usually have high pricing and high levels of support, and tend to be used by large companies and banks.
Open Source Tools
The two most important open source tools that enable the creation of a monitoring suite are the following, each with its own strong points. There are plenty of reviews online (as in https://geekflare.com/best-open-source-monitoring-software/).
They require a greater configuration and creation effort to design a customized monitoring solution, but can provide the same information as the top class tools, giving us the possibility to meet our goals.