Performance

Memory management : Java containers on K8s

This page documents a few aspects of memory management on Java containers on K8s clusters. For java containers, memory management on K8s have various factors: Xmx and Xms limits managed by java Request/limit values for the container HPA policies used for scaling the number of pods Misconfigurations / misunderstanding of any of these parameters leads to OOMs of java containers on K8s clusters. Memory management on java containers: -XX:+UseContainerSupport is enabled by default form java 10+ ...

[Kubernetes]: CPU and Memory Request/Limits for Pods

In this write up, we will try and explore how to make the most out of the resources in K8s cluster for the Pods on them. Resource Types: When it comes to resources on Kubernetes cluster, they can be fairly divided in to two categories: compressible: If the usage of this resource for an application goes beyond the max, it can be throttled without directly killing the application/process. example : cpu - if a container consumes too much of compressible resource, they are throttled ...

[Memory-metrics]: Linux /proc interface

This writeup is more of a demo to showcase the power of “proc” (process information pseudo-filesystem) interface in linux to get the memory details of process, and also a quick brief on the power of “proc interface”. In the current trend of building abstraction over abstractions in software/tooling, very few tend to care about the source of truth of a metrics. There are various APM / Monitoring tools to get the memory details of a process for a linux system, but when the need arises, I believe, one must know the ways of going closer to the source of truth on a linux system and verify things. ...

[Performance] : Understanding CPU Time

As a Performance Engineer, time and again you will come across a situation where you want to profile CPU of a system. The reasons might be many; like, CPU usage being high, you want to trace a method to see its CPU cost or you suspect CPU times for a slow transaction. You might use one of the various profilers out there to do this. (I use yourkit and Jprofiler). All these profilers report the CPU costs in terms of CPU Time, when you profile the CPU. This time is not the equivalent of your watch time. ...

Elastic Search Best practices

These are the self-notes from managing 100+ node ES cluster, reading through various resources and a lot of production incidents due to unhealthy ES. Memory Always choose ES_HEAP_SIZE 50% of the total available memory. Sorting and aggregations both can be memory hungry, so enough heap space to accommodate these is required. This property is set inside the /etc/init.d/elasticsearch file. A machine with 64 GB of RAM is ideal; however, 32 GB and 16 GB machines are also common. Less than 8 GB tends to be counterproductive (you end up needing smaller machines), and greater than 64 GB has problems in pointer compression. CPU Choose a modern processor with multiple cores. If you need to choose between faster CPUs or more cores, choose more cores. The extra concurrency that multiple cores offer will far outweigh a slightly faster clock speed. The number of threads is dependent on the number of cores. The more cores you have, the more threads you get for indexing, searching, merging, bulk, or other operations. Disks If you can afford SSDs, they are far superior to any spinning media. SSD-backed nodes see boosts in both querying and indexing performance. Avoid network-attached storage (NAS) to store data. Network The faster the network you have, the more performance you will get in a distributed system. Low latency helps to ensure that nodes communicate easily, while a high bandwidth helps in shard movement and recovery. Avoid clusters that span multiple data centers even if the data centers are collocated in close proximity. Definitely avoid clusters that span large geographic distances. General consideration It is better to prefer medium-to-large boxes. Avoid small machines because you don’t want to manage a cluster with a thousand nodes, and the overhead of simply running Elasticsearch is more apparent on such small boxes. Always use a Java version greater than JDK1.7 Update 55 from Oracle and avoid using Open JDK. A master node does not require much resources. In a cluster with 2 Terabytes of data having 100s of indexes, 2 GB of RAM, 1 Core CPU, and 10 GB of disk space is good enough for the master nodes. In the same scenario, the client nodes with 8 GB of RAM each and 2 Core CPUs is a very good configuration to handle millions of requests. The configuration of data nodes is completely dependent on the speed of indexing, the type of queries, and aggregations. However, they usually need very high configurations such as 64 GB of RAM and 8 Core CPUs. Some other important configuration changes ...

[Performance] : What does CPU% usage tell us ?

When you come across a system which is misbehaving, majority of the times the first metrics that we look at is CPU usage. But do we really understand what CPU usage of a system tells us ? In this article let us try and understand what X % usage of a system really means. One of the easy ways to check on CPU is “top” command. The “%Cpu(s)” metrics seen above is a combination of different components. ...

[Performance] : Using iperf3 tool for Network throughput test

In this world of Microservices and the distributed systems, a single request (generally) hops through multiple servers before being served. More often than not, these hops are also across the Network cards making the Network Performance the source of slowness in the application. These parameters makes the need to measure Network performance between servers/systems more critical for benchmarking or debugging. Iperf3 is one of the open source tools which can be used for network throughput measurement. Below are some of its features. ...

[Performance] : Java Thread Dumps - Part2

In the previous article about Java Thread Dumps (link here) we looked in to a few basics on Thread dumps(When to take?, How to take?, Sneak peaks? etc.) In this write up, I wanted to mention a few tools which can ease the process of collecting and analyzing thread dumps. Collecting multiple thread dumps: I prefer command-line over any APM tools for taking thread dumps. The best way for analyzing threads is to collect a few thread dumps (5 to 10) and look through the transition in the state of threads. ...

Thinking in-terms of Performance

A few short thoughts / ideas wrt of Performance centric product. In this world of infinite scaling of computes, pay close attention to common choke points. Like DB, storage(s) etc, which are shared by all the computes. Majority of the reads and writes have to happen in Bulk operations and NOT as single read/writes. Specially when there are 100’s-1000’s of reads/writes/deletes on storage(s). Threads. Pay close attention to which part of the entire flow is multi-threaded. Sometimes, only a small part of the flow is multi-threaded, but entire application is called multi-threaded, which is wrong. ...

[Performance] : Java Thread Dumps - Part1

This is first of a two parts article which talks about: What are thread dumps? When to take thread dumps ? How to take thread dumps ? What is inside a thread dumps ? What to look for in a thread dump? Majority of the systems today are mutlicore and hyper-threaded. Threading at the software level allows us to take advantage of a system’s mutlicores to achieve the desired pace and efficiency of the application operations. Along with pace and efficiency, multi-threading brings its own set of problems w.r.t thread contentions, thread racing, high CPU usage etc. In this write up we will see how to debug these problems by taking thread dumps on java applications. ...