Technology
Programming
Scala
Java
TypeScript
JavaScript
Python
Data Processing
Analytics Zoo

Analytics Zoo is an open source AI platform from Intel for distributed TensorFlow, Keras and BigDL on Apache Spark.

BigDL

BigDL is a distributed deep learning library from Intel for Apache Spark.

Drools

Drools is a business rule management system with a forward-chaining and backward-chaining inference based rules engine, allowing fast and reliable evaluation of business rules and complex event processing.

Hadoop

Hadoop is an open source library for reliable, scalable and distributed computing of large data sets across clusters of computers using simple programming models.

Spark ML

Apache Spark is a unified analytics engine for large-scale data processing and ML is its machine learning library.

Spark NLP

John Snow Labs Spark NLP is a natural language processing library built on top of Apache Spark ML. It provides simple, performant and accurate NLP annotations for machine learning pipelines, that scale easily in a distributed environment.

Spark SQL

Apache Spark is a unified analytics engine for large-scale data processing and SQL is its library for working with structured data at scale.

Spark Time

Spark Time is a time series analysis library built on top of Apache Spark ML. It provides state-of-the time series engineering, model building, forecasting, predicting and trend analysis.

Tensorflow

TensorFlow is an end-to-end open source platform for machine learning and deep-learning with a comprehensive, flexible ecosystem of tools, libraries and community resources.

Graph Processing
GraphFrames

GraphFrames supports graph processing for Apache Spark based on the DataFrame API. This includes graph queries, motif finding and a variety of graph algorithms such as PageRank.

Spark GraphX

Apache Spark is a unified analytics engine for large-scale data processing and GraphX is its library for parallel graph-processing at scale.

TinkerPop

TinkerPop is a graph computing frameworks for graph databases, analytics and traversal systems.

Pipeline Technology
Google CDAP

CDAP is an open source data application and analytics platform with a standardized and unified pipeline technology. It is the core technology of Google's Cloud Data Fusion service.

StreamSets

StreamSets is a data application platform for pipeline-based data integration use cases.

Container Technology
Docker

Docker is an open source containerization solution that packages software to run on almost every computing environment.

Kubernetes

Kubernetes is an open-source system for automating deployment, scaling, and management of containerized applications.

Data Stores
Aerospike

Aerospike is a NoSQL data platform for real-time and extreme scale data solutions with high availability, predictable performance and operational simplicity.

Cassandra

Cassandra is the NoSQL workhorse for data storage at extreme scale with high availability and fault-tolerance.

Crate DB

Crate DB is a distributed real-time SQL database built on top of a NoSQL foundation for machine data at IoT-scale.

Druid

Druid is an open source real-time analytics database designed for fast slice-and-dice analytics on large data sets.

Elastic

Elasticsearch is a distributed real-time search & analytics engine.

HBase

HBase is an open source distributed NoSQL database on top of HDFS and modeled after Google's Bigtable. It provides a fault-tolerant way of storing sparse data at extreme scale.

Ignite

Ignite is a distributed in-memory database and processing platform for transactional, analytical, and streaming workloads at petabyte scale.

Influx DB

InfluxDB is an open source, high performance time series database developed by InfluxData.

JanusGraph

JanusGraph is a distributed graph database optimized for storing and querying graphs containing hundreds of billions of vertices and edges.

MongoDB

MongoDB is a distributed open source NoSQL database. It is document-based and the world's most popular backbone for modern application development.

OrientDB

OrientDB is the world's fastest graph and multi-model database.

Redis

Redis is an open source in-memory data structure store. It is used as a database, cache and message broker.

Data Streaming
Ignite

Ignite integrates with major streaming frameworks and supports real-time ingestion of event streams with more than millions of events per second on a moderately sized cluster.

Kafka

Kafka is an open source distributed streaming platform.

Kinesis

Kinesis is a distributed streaming platform developed by Amazon.

MQTT

MQTT for Apache Spark is a secure streaming extension for Spark Streaming, implementing the MQTT connectivity protocol.

Spark Streaming

Apache Spark is a unified analytics engine for large-scale data processing and Streaming is its real-time streaming library.

Frontend Technology
Angular

Angular is an app-design framework and development platform for creating efficient and sophisticated single-page apps.

Cytoscape

Cytoscape is an open source software platform for visualizing complex networks and integrating these with any type of attribute data.

ECharts

ECharts is a declarative framework for rapid construction of web-based visualizations, with support for a broad range of modern web browsers.

JSPlumb

jsPlumb is a visual connectivity library for web applications, based on SVG and with support for a broad range of modern web browsers.