APACHE SPARK, a high-speed analytics engine for the Hadoop distributed processing framework, is now available to plug into the YARN resource management tool.
This development means that it can now be easily deployed along with other workloads on a Hadoop cluster, according to Hadoop specialist Hortonworks.
Released as version 1.0.0 at the end of May, Apache Spark is a high-speed engine for large-scale data processing, created with the aim of being much faster than Hadoop's better-known MapReduce function, but for more specialised applications.
Hortonworks vice president of Corporate Strategy Shaun Connolly told The INQUIRER, "Spark is a memory-oriented system for doing machine learning and iterative analytics. It's mostly used by data scientists and high-end analysts and statisticians, making it a sub-segment of Hadoop workloads but a very interesting one, nevertheless."
As a relatively new addition to the Hadoop suite of tools, Spark is getting a lot of interest from developers using the Scala language to perform analysis on data in Hadoop for customer segmentation or other advanced analytics techniques such as clustering and classification of datasets, according to Connolly.
With Spark certified as YARN-ready, enterprise customers will be able to run memory and CPU-intensive Spark applications alongside other workloads on a Hadoop cluster, rather than having to deploy them in separate a cluster.
"Since Spark has requirements that are much heavier on memory and CPU, YARN-enabling it will ensure that the resources of a Spark user don't dominate the cluster when SQL or MapReduce users are running their application," Connolly explained.
Meanwhile, Hortonworks is also collaborating with Databricks, a firm founded by the creators of Apache Spark, in order to ensure that new tools and applications built on Spark are compatible with all implementations of it.
"We're working to ensure that Apache Spark and its APIs and applications maintain a level of compatibility, so as we deliver Spark in our Hortonworks Data Platform, any applications will be able to run on ours as well as any other platform that includes the technology," Connolly said. µ
Set for a UK release on 1 August
Microsoft claims next-gen database platform is the fastest on the planet, naturally
We round up the LG G5's best 'friends'. Aw.
Sane people would give up at 55 minutes or not try