Monday, December 28, 2015

Web Performance, tell me the story

Suppose you would like to know how your web site, behaves to clients from two different geo locations: for example, Finland and Ireland regarding response time. How quickly can you see and understand visually what's going on and how things are.

Do you need to improve something ? Change your network provider ? Tune your application(s) ? Spend more money. Start from a top level view, understanding the big picture and having the possibility to dive and analyze at the detail level. Finally, can you visually tell the story of your web application as in a short chart ?

Enter Kronometrix Web Performance ...


Finland

  • Something happened between 1:00 - 1:15 AM
  • We can see more than one request affected by this event
  • Overall we see all requests are executing fast, below 0.5 seconds
  • And we have some exceptions, some are taking more than 0.5 seconds

Ireland

  • Same things from Ireland
  • It is different, requests usually take longer to execute
  • Some requests are much slower than in Finland


Then we want to know more and check individual requests and observe their evolution in time.




And then we can dive and analyze even further ...

Kronometrix has to be simple, and self explanatory. We believe our implementation, based on the streamgraph can very quickly identify who is slow, and check the evolution in time and then in one click dive into other level of detail. It is like telling the story of your application over time.




Thursday, December 10, 2015

Web Performance subscription

We been busy to add support for Web Performance to our appliance. That means anyone running any HTTP applications can send data to Kronometrix for analysis. Our idea of monitoring Web applications is a simple one: it starts from the operating system, HTTP server and the application itself. We report the most important metrics including the response times for the application. To make things even easier we wanted to add support for complete solution stacks, like LAMP. (We still have lots of work to fully support them).

And to have a complete picture of the Web service, we have introduced the workload management concept inside Kronometrix to gather and report data from one or many data sources and bound those to a SLA for upper management reporting. Nice and easy.

Some first development snapshots from our implementation. Let's first switch to night mode, it is 23:10 here in Finland. So, here you go:

All requests dashboard


This shows a number of HTTP requests gathered as a stream graph,  over time.

All requests as a stream
The for instant consumption we switch to something simpler as a bar chart with break downs for each request:

Instant response time

Simple to see all requests as a continuous stream of data over time


Per-request dashboard


A simple break down reporting the following metrics:

  #01 timestamp : seconds since Epoch, time

  #02 request   : the HTTP request name

  #03 ttime     : total time, entire operation lasted, seconds

  #04 ctime     : connect time it took from the start until the TCP 
                  connect to the remote host (or proxy) was completed, seconds

  #05 dnstime   : namelookup time, it took from the start until the name 
                  resolving was completed, seconds

  #06 ptime     : protocol time, it took from the start until the file 
                  transfer was just about to begin, seconds

  #07 pktime    : first packet time, it took from the start until the first 
                  byte was just about to be transferred, seconds

  #08 size      : page size, the total amount of bytes that were downloaded

  #09 status    : response status code, the numerical response code 
                  that was found in the last retrieved HTTP(S) transfer

looking like this:

Per-request response time

Easy to break down at the request level, including outliers


Close to these dashboards are the operating system metrics, to have a complete picture of the running system. Later we will show how we define and report combined all resources as a complete workload. Stay tuned and join our discussion group, here


We are the makers of Kronometrix, come and join us !




Thursday, November 5, 2015

60 messages per second

Kx 1.3.2 is our next release of Kronometrix for X64 platform.  Currently we are busy testing some new features in our QA environment. We are sending data from 750 data sources, data delivered by our DataCenter Simulator. Each system is delivering 5 messages. 

Since yesterday we been processing more than 5 million messages at 60% usage.

Appliance Usage

These are the main screens within the appliance administration, which shows the utilization and throughput of our system in real-time. As well we include some other information about how many users, subscriptions and data messages etc we are processing. 



and here the system utilization:



This runs our latext Kronometrix on a *very* old hardware appliance, on purpose to test and stress our software on a bit older spec. Red and Blue pipelines working just fine.

Kronometrix can consume events from different sources: IT computer systems, weather stations, loggers, applications in real-time. This means we can process and display data as soon as it arrives to our machinery.  And we can prepare the information for consumption to different dashboards, continuously. But not all data can be dispatched and displayed as soon as it arrives. Here is why and how we do it.


The red (hot) pipeline

Inside our kernel we process all incoming data, the raw data on a main line, the red pipeline. This is the most important pipeline within our kernel analytics. Here we transform raw data in statistics, like MAX, MIN, in AVGes for charting across different time intervals.

All sorts of UI dashboards are attached to the red pipeline, to display data as immediate as it has been processed. On this line a high level of redundancy and efficiency is enforced. Here the kernel will spend as much as possible to filter and calculate things at a very fast rate so we spend lots of time optimizing things and testing.


The blue (cold) pipeline

On a different line, we compute the summary of summaries, numerical and inventory. Sort of top aggregates which should live different life and not cause any trouble to the main incoming red line. Some example of such aggregates:

  • Top 10 systems consuming highest CPU utilization 
  • Avg, Max CPU Utilization across all computers in a data center on different time intervals
  • Total number of network cards across all computer systems
  • Operating System distribution across data center
  • Disk kIOPS across all systems

We call this the blue line. In here things are aggregated at a slower rate but still available for UIs and dashboards. This line should not use as much computing power as the red line and it should be ok to shutdown and run Kronometrix without it, if we want so.

A top goal was to make the entire processing as modular as possible and ensure one failure in one part of the kernel will not bring down entire kernel.


Top goals

  • modular design
  • red pipeline must be always ON and highly available
  • red is hot, will always run and use more computing resources
  • blue pipeline can be stopped
  • blue is cold, and should not use lots of computing power
  • easy to stop blue pipeline without disrupting the red pipeline
  • dashboards can be bound to red or blue
With these top goals we can say we are able to efficiently process data in real-time, display it as soon as it arrives and summarize it as we want without sacrificing performance and uptime. 

We are the makers of Kronometrix, come and join us

Thursday, August 27, 2015

The STALL data filter

Currently, Kronometrix, is supporting two types of raw data filters: the STALL and RANGE filters.

A raw data filter is a mechanism to ensure that incoming raw data follows certain rules or conditions before the numerical processing and data visualization, within Kronometrix. For example the STALL filter will ensure raw data is properly arriving to our appliance and there are no delays. The RANGE filter, will ensure incoming raw data is sane and valid, and stays within certain range of values. For example the air temperature is valid as long as it is between -50C to 50C or the CPU Utilization of a Windows server is between 0 and 100%.


The STALL Filter

Defined under messages.json and part of the Library of Monitoring Objects, the STALL filter is part of the data message definition. For example this is the way to define the STALL for a data message, called sysrec, which belongs to a data source type: Linux (a computer system running Linux CentOS operating system). 

stall: seconds

The STALL filter, is defined in seconds, describing the time spent before triggering a STALL warning under Kronometrix event management console:



Turn them off

Starting with Kronometrix 1.2 we are able to turn ON/OFF, per subscription the STALL detector. That means if we have several data subscriptions, we can say for what subscriptions the filter should be ON or OFF, no matter what the messages.json will have configured.

See here:



dcsim and mix subscriptions require some updates, some computer systems will need some maintenance work and during this period of time we dont want to receive warnings and alerts from any sources which belong to these two subscriptions. So we turned OFF the STALL filters for these two subscriptions.


Why STALL filter is important and you should use it

  • because we want to see and get notified when we are not receiving data from sensors, computer hosts, weather stations, etc
  • we want to keep a close look how often data is missing or is delayed
  • because we are forced by regulations and laws to monitor and report these delays (example, Airports, Air Traffic Controller)

Sunday, July 26, 2015

Windows, Linux, FreeBSD all welcome

By default for IT business, we support in Kronometrix, monitoring objects from Linux and FreeBSD data sources. Recently we been porting our data recorders to Windows operating system and start to offer ready made objects for this.

Example, latest Kronometrix 1.2 we plan to support Linux, FreeBSD and Windows 2008, 2012 Server editions 64 bit. Below several data sources within Kronometrix:





and then we can drill down per OS, example clicking centos:



Thats all. This is part of executive dashboard view.

Tuesday, July 21, 2015

Programming Republic of Perl, the Windows story

Task: port Kronometrix from Linux, FreeBSD to Windows platform, including all data recorders and the transporter. Preferable use a programming language, like Perl to easy the porting process and re-utilize whatever we have already in Kronometrix.

Timeline: ASAP

Open Source: Yes


Goals

Some top rules for developing the new recorders:
  • all recorders, must be coded in a scripting language
  • preferable, all recorders must work as CLI and Windows services
  • all raw data, should be easy to access, via C:\Kronometrix\log\ , no more mysteries about AppData directory
  • transporter should be done similar way, coded using a scripting language
  • memory footprint 64MB RAM

Perl5

We been experimenting previously with C/C++ for Kronometrix on Windows. Nothing wrong with C/C++ except, that for every small change we had to do a lot of work & testing. We looked to PowerShell and other langauges but nothing came closer and felt like home, than Perl.

All our data recorders are simple Perl5 citizens already, so why not to have Kronometrix on Windows done in Perl too !?

After some research and coding we found a very powerful module, Win32 which was capable to speak WMI and access almost anything from a running Windows system. That's it. Enter Perl. We selected ActiveState PDK to compile each .pl to a Win32 executable service. Nice and easy.

A simple Win32 service sample, in Perl5 using PDK:

Win32 Perl Service


Windows

Here, 2 main data recorders, and sender, the transporter, running on top of Windows 2008 Server, as services:


Kronometrix Windows Services


Source Code

Visit our repository to see and check out, our Windows data recorders. This is work in progress , more recorders will soon be published and released.