Getting Started with Sparkflows

Sparkflows makes it incredibly fast and easy to do Self-Serve Data Preparation and Advanced Analytics. With the power of Sparkflows at your hands, seamlessly find value from your data and scale to Petabytes of data.

Install on the cloud, on-premise or even on your laptop. Sparkflows seamlessly integrates with the most complex of Enterprise Environments.

This documentation provides how-to guidance and reference information for data analysts, data scientists, and data engineers working on the Sparkflows Product. Teams can easily collaborate with each other.

Sparkflows provides the following features:

  • Connect to various data source

  • Perform ETL and Data Preparation

  • Profile and Clean Data

  • Measure Data Quality

  • Build ML Models using various ML engines

  • Deploy and execute the ML models

  • Build Reports and Dashboards

  • Build Analytical Applications

Sparkflows supports both Batch and Streaming Jobs.

Installation & Configuration

Operations

User Guide

Tutorials

Migration

Databricks Guide

AWS Guide

Azure Guide

GCP Guide

HPE Guide

Incorta Guide

Snowflake Guide

Spark Standalone Guide

Jupyter Guide

MLOps Guide

ModelDoc Guide

What-if Analysis Guide

Change Data Capture

Lineage

Cloudera/Hadoop Guide

Kubernetes Guide

Best Practices

Frequently Asked Questions

Commands

Performance Guide

Troubleshooting

Developer Guide

REST API

How To

Release Notes

Third Party Acknowledgements

Indices and tables