How It Works

An end-to-end Flask pipeline that automates your first hour of EDA — with real code and real stats, built for analysts who care about trust.

Step 1: Upload & Load

We support CSV, Excel, JSON, and Feather formats. Your file is handled securely and converted into a Pandas DataFrame using our custom function load_dataframe(), which dynamically detects format and prevents memory issues via smart fallback logic.

Step 2: Data Profiling

The core profiling is handled by our custom function data_quality_check(). This performs a deep classification and statistical summary for every column:

🔢 Numerics: mean, std, skew, kurtosis, variance, outliers (3σ–5σ)
🧮 Booleans: distribution, imbalance (≥70%)
🕒 Datetimes: flexible parsing, monotonic checks, time spans
🔠 Categoricals: mode, value counts, cardinality flags
🚫 Constants/Null Columns: flagged early for pipeline hygiene

We also handle edge cases like 1-unique-value features, implicit booleans, and messy time columns via check_timeseries().

Step 3: Statistical Overview

Next, we build a bird’s-eye view using overview(). This aggregates key metrics across the dataset:

📐 Data shape, column names, column type counts
📉 Nulls, duplicates, low-variance, high-cardinality flags
📈 Skewness and kurtosis distributions
🚩 Outlier count by σ-bracket, including graphical output via Matplotlib

Step 4: Recommendations Engine

Here’s where things get opinionated. Our custom engine data_quality_recommendations() reviews all stats and triggers warnings & suggestions in structured categories:

⚠️ Missingness: 30%+ null warnings, full-null column drop alerts
🪞 Duplicates: exact row matches flagged
🚫 Constant Columns: fully repeated values
📈 Outliers: flagged at 3σ–5σ with encoded plots via generate_outlier_plot()
🎯 Skewness: classified by tail and severity via generate_skewness_plot()
🎭 Cardinality: high (>50) or medium (12–49) categorical uniqueness
📉 Variance: low numeric variance, boolean imbalance, dominant categories
🔗 Correlation: Pearson heatmap + Cramér's V for categorical pairs
📎 Multicollinearity: VIF analysis with pre-cleaning logic for nulls & constants

Step 5: Output & Cleaning

Everything is encoded for front-end use via convert_numpy() — even complex stats and plots are serialized to JSON-safe formats. Session outputs and files are cleaned hourly via periodic_cleanup() using wipe_all_files_in_folder() for both user uploads and intermediate results.

Framework & Stack

Everything runs inside a secure Flask app using:

📦 Pandas / NumPy: Data wrangling and math
📊 Seaborn / Matplotlib: Diagnostic plots
📐 Scipy / StatsModels: Z-scores, VIF, statistical tests
🖼️ Bootstrap 5 + Jinja2: Fully styled templates and modals

Designed with mobile users in mind and structured to run securely in the browser (server-side only, no data leaves your machine).

🚀 Try it on your dataset