Data Processing
Analytics Zoo

Analytics Zoo is an open source AI platform from Intel for distributed TensorFlow, Keras and BigDL on Apache Spark.


BigDL is a distributed deep learning library from Intel for Apache Spark.


Drools is a business rule management system with a forward-chaining and backward-chaining inference based rules engine, allowing fast and reliable evaluation of business rules and complex event processing.


Hadoop is an open source library for reliable, scalable and distributed computing of large data sets across clusters of computers using simple programming models.

Spark ML

Apache Spark is a unified analytics engine for large-scale data processing and ML is its machine learning library.

Spark NLP

John Snow Labs Spark NLP is a natural language processing library built on top of Apache Spark ML. It provides simple, performant and accurate NLP annotations for machine learning pipelines, that scale easily in a distributed environment.

Spark SQL

Apache Spark is a unified analytics engine for large-scale data processing and SQL is its library for working with structured data at scale.

Spark Time

Spark Time is a time series analysis library built on top of Apache Spark ML. It provides state-of-the time series engineering, model building, forecasting, predicting and trend analysis.


TensorFlow is an end-to-end open source platform for machine learning and deep-learning with a comprehensive, flexible ecosystem of tools, libraries and community resources.

Graph Processing

GraphFrames supports graph processing for Apache Spark based on the DataFrame API. This includes graph queries, motif finding and a variety of graph algorithms such as PageRank.

Spark GraphX

Apache Spark is a unified analytics engine for large-scale data processing and GraphX is its library for parallel graph-processing at scale.


TinkerPop is a graph computing frameworks for graph databases, analytics and traversal systems.

Pipeline Technology
Google CDAP

CDAP is an open source data application and analytics platform with a standardized and unified pipeline technology. It is the core technology of Google's Cloud Data Fusion service.


StreamSets is a data application platform for pipeline-based data integration use cases.

Container Technology

Docker is an open source containerization solution that packages software to run on almost every computing environment.


Kubernetes is an open-source system for automating deployment, scaling, and management of containerized applications.

Data Stores

Aerospike is a NoSQL data platform for real-time and extreme scale data solutions with high availability, predictable performance and operational simplicity.


Cassandra is the NoSQL workhorse for data storage at extreme scale with high availability and fault-tolerance.

Crate DB

Crate DB is a distributed real-time SQL database built on top of a NoSQL foundation for machine data at IoT-scale.


Druid is an open source real-time analytics database designed for fast slice-and-dice analytics on large data sets.


Elasticsearch is a distributed real-time search & analytics engine.


HBase is an open source distributed NoSQL database on top of HDFS and modeled after Google's Bigtable. It provides a fault-tolerant way of storing sparse data at extreme scale.


Ignite is a distributed in-memory database and processing platform for transactional, analytical, and streaming workloads at petabyte scale.

Influx DB

InfluxDB is an open source, high performance time series database developed by InfluxData.


JanusGraph is a distributed graph database optimized for storing and querying graphs containing hundreds of billions of vertices and edges.


MongoDB is a distributed open source NoSQL database. It is document-based and the world's most popular backbone for modern application development.


OrientDB is the world's fastest graph and multi-model database.


Redis is an open source in-memory data structure store. It is used as a database, cache and message broker.

Data Streaming

Ignite integrates with major streaming frameworks and supports real-time ingestion of event streams with more than millions of events per second on a moderately sized cluster.


Kafka is an open source distributed streaming platform.


Kinesis is a distributed streaming platform developed by Amazon.


MQTT for Apache Spark is a secure streaming extension for Spark Streaming, implementing the MQTT connectivity protocol.

Spark Streaming

Apache Spark is a unified analytics engine for large-scale data processing and Streaming is its real-time streaming library.

Frontend Technology

Angular is an app-design framework and development platform for creating efficient and sophisticated single-page apps.


Cytoscape is an open source software platform for visualizing complex networks and integrating these with any type of attribute data.


ECharts is a declarative framework for rapid construction of web-based visualizations, with support for a broad range of modern web browsers.


jsPlumb is a visual connectivity library for web applications, based on SVG and with support for a broad range of modern web browsers.


React is an open-source, front end library for building user interfaces of single-page or mobile applications.