Search This Blog

Showing posts with label data. Show all posts
Showing posts with label data. Show all posts

Thursday, August 20, 2015

Collecting Apache Storm time series data with Graphite

Naturally, Apache Storm processes data as tuples over time.  This is an ideal framework to utilize for streaming data through a 'pipe line'.  However, maintaining a record of this information is not necessarily an inherent feature of storm.  For metrics collection and analysis, I used Graphite.  Graphite is available in the Fedora EPEL and a great option for quickly collecting some metrics and generating a graph w/ overlaying analytical functions.

The easiest way to collect information is by going straight to the source.  The Nimbus server.  Using the NimbusClient class, available from the package backtype.storm.generated, you can easily review the ClusterSummary and iterate through each TopologySummary's ID to retrieve TopologyInfo and each topology's ExecutorSummary, to determine emitted and transferred tuples. Additionally, you can get information about errors, threads, and likely other details related to Apache Storm topologies I have not collected in my example code.

First, Lets look at how I will push data in to the carbon-cache daemon.
https://github.com/charlescva/graphite-common/blob/master/src/main/java/zkCliTest.java#L44

   // TCP Stream to carbon-cache.
    private void graphite(Map metrics) {
        // Current Time-Stamp for test
    long epoch = System.currentTimeMillis()/1000;
   
        try { // output  stream to the host on default port.
        Socket conn          = new Socket("cabon-cache.novalocal", 2003);
        DataOutputStream dos = new DataOutputStream(conn.getOutputStream());
        // graphite syntax map the #ngsm to output stream.
            for  (String metric : metrics.keySet() ){
                dos.writeBytes(metric +
            " " + metrics.get(metric) +
            " " + epoch + "\n");
            }
        //CLOSED CONNECTION.
            conn.close();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
The above code simply opens a TCP socket to the carbon-cache, writes a Map of metrics with the current timestamp, and closes the connection.  Easy.  The Take-away here, is the syntax. '[metric_path] [value] [time]' as String.

It would be redundant for me to display all of the Java code here: https://github.com/charlescva/graphite-common/blob/master/src/main/java/zkCliTest.java#L125 which essentially takes a Nimbus client connect, and builds a model of the nimbus state with your topoloy as each child node of the root context.

So, rather than bore you.  Here is a screenshot!


As you can see, the "My Topology" is a test, and clearly a static source that is linear.  But all and all, you can quickly get some good information.  Feel free to comment, as I find this article particularly interesting.