Expanding Data Analysis & Anomaly Detection Using partition & by Fields

In a previous blog, I showed how easy it is to analyze multiple metrics simultaneously by adding multiple “detectors” to your job configuration definition for the Anomaly Detective Engine API. Now, let’s take it a step further by expanding analysis across instances of things by using “byFieldName” and “partitionFieldName.”

The concepts of the “by field” and "partition field" were originally developed for the Anomaly Detective Splunk app. In Splunk, there is a notion of a “group by clause” where one can get separate instances of things simply by naming them in the by-clause, and partition fields are specified using the "partitionfield=<fieldname>" option to the prelertautodetect command. In the Engine API, you can leverage the same capabilities using “byFieldName” and “partitionFieldName." I'll elaborate on the difference between these in a future post, but for now let’s just jump to a simple example.

Read More

Java Garbage Collectors: Comparing Performance Options

If you’ve read many of my previous posts you’ll probably be aware that my preferred programming language is C++, and that the core Prelert code is written in it.  However, large parts of the Prelert codebase are also written in Java, including the component of the Anomaly Detective® Engine API that you’ll communicate with RESTfully from your own code.

One of the benefits of Java over C++ is that memory management is handled by the JVM, liberating the developer from having to worry about it.  At least, that’s the theory.  In practice, when you write a non-trivial Java program you do have to consider what’s going on with the memory, but in a different way to writing C++.  The JVM uses a garbage collector to find objects that are no longer required and release the memory they occupy.

Read More

Proud to be a Finalist: MassTLC Innovative Big Data Technology Award

Last night at the Microsoft NERD Center, the Mass Technology Leadership Council (MassTLC) announced finalists for its 17th annual Leadership Awards, and we’re proud to let you know that Prelert was named a finalist in the “Innovative Technology of the Year – Big Data” category! With more than 550 member companies, MassTLC is the region’s leading technology association and the premier network for tech executives, entrepreneurs, investors and policy leaders.

 

Prelert is being recognized for its Anomaly Detective software alongside other finalists in the category including EnerNOC, HP Vertica, Pixability and WordStream.

Read More

Analyzing Multiple Metrics Using the Anomaly Detection Engine API

After hearing from some users that they assume it takes multiple jobs to analyze multiple metrics within the Anomaly Detective Engine API, I thought I’d write this blog to explain that analyzing multiple metrics within a single job is a lot easier to accomplish than you might think.

Let’s first take a very simple example in which you have a variety of performance metrics that you’d like to find anomalies in - perhaps some network performance data:


Read More

std::getline is the poor relation

Earlier this year I was following a thread on the Boost user’s mailing list that started off with a question about whether to prefer Boost or the C++ standard library when both provide the same classes (which is a common occurrence now because much of C++11 started life in Boost).  The thread then moved on to discussing how much effort goes into optimising performance for C++ standard library implementations, and comparing performance for a few classes in different C++ standard library implementations.

As I read this, I had a wry smile on my face, as there’s a part of the C++ standard library that only appears to have had any optimisation applied to it by one of the standard library implementers – namely GNU with libstdc++.  The function I’m referring to is std::getline.

Read More

Connectors and Results Processors: Anomaly Detection Engine API

From time to time, you might hear us reference the terms “Connector” or “Results Processor” in the context of integrating the Anomaly Detective Engine API into a data analysis workflow. So, what roles do each of these components play?

 

The following diagram should help to put this into perspective:


Read More

Good Data for Anomaly Detection

The Prelert Anomaly Detective® Engine API is extremely good at finding anomalies in data sets, even “Big Data” sets. Given that “Big Data" can mean so many different things, what kind of data is good for anomaly detection?

In general, the best type of data to use with the anomaly detection API engine is time stamped, structured data.

Read More

Data Science for the Rest of Us

“Big Data Analytics” is turning out to be quite the buzzword lately - and it is becoming apparent that there are many players in this space. So where exactly does Prelert fit? Who is Prelert really beneficial for?

 

First and foremost, the goal of Anomaly Detective® is to bring sophisticated anomaly detection capabilities to the masses. We’re not a giant library of algorithms for data scientists.

Read More

Discrete Optimization Techniques

This post is going to look at optimization. Optimization is an extremely important problem. Two examples, which might be of particular interest to readers of this blog, are machine learning, which often boils down to optimizing some objective function, plus regularization term, for a set of observed and possibly labeled data, and operations research, which always contains an optimization component.

Read More

STL Container Memory Usage when Developing with C++

My most popular blog post up to now has been the one about the memory usage of the C++ standard library’s std::deque container.

The C++ standard doesn’t dictate exact implementation details for standard library classes, so it’s unsurprising that details of memory usage are missing from most references.  Yet judging by the number of people who stumble across my previous post about std::deque, memory usage of the standard library containers is something that’s useful to know.  Therefore, for the benefit of anyone writing cross-platform C++ code, I thought I’d put together a summary of the memory usage of the main STL containers across several different C++ standard library implementations.

Read More

Subscribe to updates