Platform Features

Everything you need to build compliant data flows. From visual design to automatic PII protection and Delta Lake storage.

Core Capabilities

Visual Data Flow Builder

Drag-and-drop canvas to design complex data flows. Connect sources, apply transformations, and define destinations visually.

Interactive canvas
Real-time validation
40+ node types
Version control

PII/PHI Protection Engine

6 protection methods for sensitive data with automatic detection. HIPAA Safe Harbor built-in with 18 identifiers.

MASK, HASH, REDACT
ENCRYPT, TOKENIZE
Auto-detection
HIPAA Safe Harbor

Data Pods

Logical containers with compliance isolation. Each pod has its own protection rules and access policies.

Compliance isolation
GENERAL, PII, PHI levels
Access policies
Audit trails

Add-on

Data Lineage

Interactive data lineage visualization at table and column level. Impact analysis for schema changes.

Table & column level
Impact analysis
SQL parsing
Visual graph

Delta Lake Storage

Delta Lake format storage with ACID transactions, time travel, and schema evolution. Multi-cloud native.

ACID transactions
Time travel
Schema evolution
OneLake, ADLS, S3, GCS

Add-on

Data Quality & Trust Score

Quality framework with Great Expectations integration. Automatic Trust Score from 0-100 for each dataset.

Great Expectations
Trust Score 0-100
Auto-profiling
Anomaly detection

New

Hybrid Spark Engine

Automatic intelligent routing: DuckDB for fast queries, Databricks/Spark for petabyte-scale transformations.

Smart routing
Databricks Photon
Warm pool
Multi-cloud

New

Transformations

SQL, Python, field mapping, and dbt model transformations. Visual configuration with automatic schema propagation.

SQL queries
Python scripts
Field mapping
dbt models

New

Real-Time CDC

Change Data Capture from database logs. Track WAL positions, binlog offsets, and resume tokens for precise recovery. SQL Server supports both CDC and lightweight Change Tracking.

PostgreSQL WAL
MySQL Binlog/GTID
SQL Server CDC/CT
Oracle LogMiner
MongoDB Streams

Recommended Format

Delta Lake Storage

All data is stored in Delta Lake format by default, providing ACID transactions, time travel, and schema evolution in your cloud storage.

ACID Transactions

Ensures consistency even with concurrent data flows

Time Travel

Access historical versions of your data

Schema Evolution

Evolve schemas without breaking data flows

Unity Catalog

Compatible with Databricks Unity Catalog

Supported Storage Backends

Microsoft OneLake

Microsoft Fabric native

Primary

Azure Data Lake Gen2

ADLS with hierarchical namespace

Amazon S3

AWS native storage

Google Cloud Storage

GCP native storage

30+ Metrics Per Run

Each execution captures detailed metrics for volumetry, timing, quality, resources, and costs. Full visibility into your data flow.

Volumetry

rows_extracted
rows_transformed
rows_loaded
rows_rejected
bytes_processed

PII/PHI Tracking

pii_fields_detected
pii_records_protected
phi_fields_detected
phi_records_protected

Timing

extraction_duration_ms
transform_duration_ms
load_duration_ms
total_duration_ms

Quality

quality_score
quality_checks_passed
quality_checks_failed
data_freshness_score

CDC Tracking

cdc_start_position
cdc_end_position
lag_at_start_ms
lag_at_end_ms
transactions_processed

Resources

cpu_usage_percent
memory_usage_mb
disk_io_mb
network_io_mb

High Performance

Parallel Pipeline Execution

Extract multiple tables simultaneously with license-controlled parallelism. Process terabytes of data in hours instead of days with intelligent stream orchestration.

Parallel Streams

Extract up to 16 tables concurrently within a single pipeline. Perfect for databases with hundreds of tables.

Concurrent Pipelines

Run multiple complete pipelines at the same time. Schedule all your data flows without queuing.

Throughput Control

License-based rate limiting ensures fair resource allocation. Enterprise plans get unlimited throughput.

Performance by Plan

Starter$2,500/mo

Parallel Streams

Pipelines

10K

rows/sec

ProfessionalPopular

Parallel Streams

Pipelines

100K

rows/sec

EnterpriseCustom

Parallel Streams

∞

Pipelines

∞

rows/sec

Compare all plans →

Real-Time Sync

Change Data Capture (CDC)

Capture changes directly from database transaction logs in real-time. Track exact positions for precise recovery and zero data loss. Sub-second latency with automatic lag monitoring.

Position Tracking

WAL LSN for PostgreSQL, Binlog positions for MySQL, SCN for Oracle. Resume from exact position after failures - no data loss or duplication.

Lag Monitoring & Alerts

Real-time lag metrics in milliseconds. Automatic alerts when replication falls behind configurable thresholds.

Replication Slot Management

Automatic slot creation and cleanup for PostgreSQL. Monitor slot lag in bytes to prevent disk exhaustion.

Supported Databases

PostgreSQL

Logical Replication

WAL

LSN

Position

pgoutput

Plugin

Slots

Managed

MySQL / MariaDB

Binary Log Replication

Binlog

File+Pos

Position

GTID

Mode

ROW

Format

SQL Server

CDC & Change Tracking

LSN

CDC

Full history

Lightweight

Oracle

LogMiner

SCN

MongoDB

Change Streams

Resume Token

CDC Health Dashboard

Monitor all sources in one view

5 Healthy1 Lagging

Complete Control

Intelligent Data Strategies

Control exactly how data is read from sources and written to your Data Pod. Configure per-table or use smart defaults with auto-detected primary keys.

Read Strategies

How data is extracted from source

Full Table

Default

Extract all records every run. Ideal for small reference tables and lookup data.

Incremental

Efficient

Only extract changes since last run using timestamp or ID column. Reduces data transfer.

Log-Based CDC

Real-Time

Read changes from database transaction log. Captures deletes, sub-second latency.

Write Strategies

How data is loaded to Data Pod

Append

Insert all, no dedup. Logs & events.

Replace

Delete and reload. Reference tables.

Upsert

Recommended

Insert new or update existing based on merge keys (SCD Type 1).

Merge

Upsert + delete records not in source. Full bidirectional sync.

Soft Delete

Audit

Mark deleted_at instead of deleting. HIPAA-ready.

SCD Type 2

History

Full history with valid_from/valid_to. Data warehouse.

Read Strategies

Write Strategies

Auto

Merge Key Detection

Per-Table

Configuration

Unlimited Scale

Hybrid Spark Execution

Automatic intelligent routing between local DuckDB for fast queries and Databricks/Spark for massive transformations. Process billions of rows without changing your pipeline.

Smart Routing

Under 100M rows? Runs locally in milliseconds with DuckDB. Larger datasets automatically route to your Spark cluster.

Databricks Native

First-class Databricks integration with Photon acceleration, Unity Catalog support, and warm cluster pooling.

Multi-Cloud Spark

Connect to Databricks on AWS, Azure, or GCP. Also supports EMR Serverless and Azure Synapse for existing investments.

Execution Comparison

DuckDB (Local)

Default for <100M rows

<1s

Startup

Extra cost

100M

rows/run

Databricks

Recommended for Big Data

Best

<5s

Warm pool

Photon speed

scale

EMR Serverless

Pay-per-second

~2m

Cold start

$0.16

per 50M rows

scale

Bring your own Spark cluster or use ours

81 Connectors

Connect with databases, data warehouses, SaaS apps, APIs and cloud storage. On-premise gateway for sources behind firewalls.

HL7 FHIR R4

Plaid

Microsoft OneLake

Salesforce

Google Analytics

Slack

PostgreSQL

MySQL

SQL Server

Oracle

MongoDB

MariaDB

HubSpot

Mailchimp

Mixpanel

Facebook Ads

Pipedrive

Zoho CRM

Google Sheets

Jira

Notion

Stripe

Square

Shopify

Excel

CSV/JSON

+ 55 more...

Need a custom connector? Contact us →

Gateway Agent

Online

StatusConnected via WebSocket

Versionv2.4.1

Last Heartbeat2 seconds ago

Configured Connections

PostgreSQL - prod-db

SQL Server - erp-system

Oracle - legacy-db

On-Premise

Gateway Agent

Access data behind corporate firewalls securely. The Gateway installs on-premise and connects to Nexion via outbound WebSocket.

No inbound ports required - only outbound HTTPS
Credentials never leave your network
Auto-reconnect and health monitoring
HA deployment with multiple agents

Ready to see it in action?

Schedule a demo with our team and see how Nexion can transform your data operations.

Request Demo View Pricing