BUDAPEST: BIG BLUE IBM is using Apache Sparke to analyse radio signals for signs of extra-terrestrial intelligence.
Speaking at Apache: Big Data Europe, Anjul Bhambrhi, vice president of big data products at IBM, talked about how the firm has thrown its weight behind Spark.
"We think of [Spark] as the analytics operating system. Never before have so many capabilities come together on one platform," Bhambrhi said.
Spark is a key project because of its speed and ease of use, and because it integrates seamlessly with other open-source components, Bhambrhi explained.
"Spark is speeding up even MapReduce jobs, even though they are batch oriented by two to six times. It's making developers more productive, enabling them to build applications in less time and with fewer lines of code," she claimed.
She revealed IBM is working with Nasa and Seti to analyse radio signals for signs of extra-terrestrial intelligence, using Spark to process the 60Gbit of data generated per second by various receivers.
Other applications IBM is working on with Spark include genome sequencing for personalised medicine via the Adam project at UC Berkeley in California, and early detection of conditions such as diabetes by analysing patient medical data.
"At IBM, we are certainly sold on Spark. It forms part of our big data stack, but most importantly we are contributing to the community by enhancing it," Bhambrhi said.
The Apache: Big Data Europe conference also saw Canonical founder Mark Shuttleworth outline some of the key problems in starting a big data project, such as simply finding engineers with the skills needed just to build the infrastructure for operating tools such as Hadoop.
"Analytics and machine learning are the next big thing, but the problem is there are just not enough 'unicorns', the mythical technologists who know everything about everything," he explained in his keynote address, adding that the blocker is often just getting the supporting infrastructure up and running.
Shuttleworth, pictured above, went on to demonstrate how the Juju service orchestration tool developed by Canonical could solve this problem. Juju enables users to describe the end configuration they want, and will automatically provision the servers and software and configure them as required.
This could be seen as a pitch for Juju, but Shuttleworth's message was that the open-source community is delivering tools that can manage the underlying infrastructure so that users can focus on the application itself.
"The value creators are the guys around the outside who take the big data store and do something useful with it," he said.
"Juju enables them to start thinking about the things they need for themselves and their customers in a tractable way, so they don't need to go looking for those unicorns."
The Apache community is working on a broad range of projects, many of which are focused on specific big data problems, such as Flume for handling large volumes of log data or Flink, another processing engine that, like Spark, is designed to replace MapReduce in Hadoop deployments. µ
The Turing cards look to be a major leap over last-gen Pascal GPUs
It has been a long year's wait
Chill without the Netflix
Some would say that's a lot for watching YouTube cat videos