Getting Started with Sparkflows

Sparkflows makes it incredibly fast and easy to do Self-Serve Data Preparation and Advanced Analytics. With the power of Sparkflows at your hands, seamlessly find value from your data and scale to Petabytes of data.

Install on the cloud, on-premise or even on your laptop. Sparkflows seamlessly integrates with the most complex of Enterprise Environments.

This documentation provides how-to guidance and reference information for data analysts, data scientists, and data engineers working on the Sparkflows Product. Teams can easily collaborate with each other.

Sparkflows provides the following features:

Connect to various data source
Perform ETL and Data Preparation
Profile and Clean Data
Measure Data Quality
Build ML Models using various ML engines
Deploy and execute the ML models
Build Reports and Dashboards
Build Analytical Applications

Sparkflows supports both Batch and Streaming Jobs.

Installation & Configuration

Installation and Administration

Operations

Operations

User Guide

User’s Guide

Tutorials

Tutorials

Migration

Alteryx to Sparkflows Migration
- Purpose

Databricks Guide

Databricks Guide

AWS Guide

AWS Guide

Azure Guide

Azure Guide

GCP Guide

GCP Guide

HPE Guide

HPE Guide

Incorta Guide

Incorta Guide

Snowflake Guide

Snowflake Guide
- Snowflake Spark Connector

Spark Standalone Guide

Spark Standalone Guide

Jupyter Guide

Jupyter Guide

MLOps Guide

MLOps Guide

ModelDoc Guide

What-if Analysis Guide

What-if Analysis Guide

Change Data Capture

Change Data Capture Guide

Lineage

Lineage Guide

Cloudera/Hadoop Guide

Cloudera Guide

Kubernetes Guide

Kubernetes Guide

Best Practices

Best Practices

Frequently Asked Questions

FAQ

Commands

Commands Cheatsheet

Performance Guide

Performance Tuning

Troubleshooting

Troubleshooting Guide

Developer Guide

Developers Guide

REST API

REST API

How To

How To

Release Notes

Release Notes

Third Party Acknowledgements

Third Party Acknowledgements

Indices and tables