Champion of Cyrodiil: storm

Thursday, August 20, 2015

Collecting Apache Storm time series data with Graphite

Naturally, Apache Storm processes data as tuples over time. This is an ideal framework to utilize for streaming data through a 'pipe line'. However, maintaining a record of this information is not necessarily an inherent feature of storm. For metrics collection and analysis, I used Graphite. Graphite is available in the Fedora EPEL and a great option for quickly collecting some metrics and generating a graph w/ overlaying analytical functions.

The easiest way to collect information is by going straight to the source. The Nimbus server. Using the NimbusClient class, available from the package backtype.storm.generated, you can easily review the ClusterSummary and iterate through each TopologySummary's ID to retrieve TopologyInfo and each topology's ExecutorSummary, to determine emitted and transferred tuples. Additionally, you can get information about errors, threads, and likely other details related to Apache Storm topologies I have not collected in my example code.

First, Lets look at how I will push data in to the carbon-cache daemon.
https://github.com/charlescva/graphite-common/blob/master/src/main/java/zkCliTest.java#L44

   // TCP Stream to carbon-cache.
    private void graphite(Map metrics) {
        // Current Time-Stamp for test
    long epoch = System.currentTimeMillis()/1000;

        try { // output stream to the host on default port.
        Socket conn          = new Socket("cabon-cache.novalocal", 2003);
        DataOutputStream dos = new DataOutputStream(conn.getOutputStream());
        // graphite syntax map the #ngsm to output stream.
            for (String metric : metrics.keySet() ){
                dos.writeBytes(metric +
            " " + metrics.get(metric) +
            " " + epoch + "\n");
            }
        //CLOSED CONNECTION.
            conn.close();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }

The above code simply opens a TCP socket to the carbon-cache, writes a Map of metrics with the current timestamp, and closes the connection. Easy. The Take-away here, is the syntax. '[metric_path] [value] [time]' as String.

It would be redundant for me to display all of the Java code here: https://github.com/charlescva/graphite-common/blob/master/src/main/java/zkCliTest.java#L125 which essentially takes a Nimbus client connect, and builds a model of the nimbus state with your topoloy as each child node of the root context.

So, rather than bore you. Here is a screenshot!

As you can see, the "My Topology" is a test, and clearly a static source that is linear. But all and all, you can quickly get some good information. Feel free to comment, as I find this article particularly interesting.

Saturday, January 4, 2014

"Cloud Manager" for Netbeans

Working on a side project to help automate server maintenance tasks for various open source distributed services.

Zookeeper, Storm, Accumulo, Hadoop, CentOS are the current software packages I want to manage with this tool. The reason for providing it as a netbeans platform application is for a few reasons:

Java can run on any platform.
You don't need to know java to run a netbeans platform application.
If you already know java, you can contribute to this application through netbeans.

If you want to contribute code or ideas for the project, you can do so through github.

https://github.com/charlescva/cloud-manager

Currently the tool allows you to add some server nodes, create actions for those nodes, and even assign a UI to the action for easier use. JAXB is used for marshalling xml. XSDs were generated against the XML on the Accumulo monitor.

SSH code is integrated. One can easily deploy Storm topologies with the nimbus node action.

Search This Blog

Thursday, August 20, 2015

Collecting Apache Storm time series data with Graphite

Saturday, January 4, 2014

"Cloud Manager" for Netbeans