Platform Features

Everything you need to build compliant data flows. From visual design to automatic PII protection and Delta Lake storage.

Core Capabilities

Visual Data Flow Builder

Drag-and-drop canvas to design complex data flows. Connect sources, apply transformations, and define destinations visually.

  • Interactive canvas
  • Real-time validation
  • 40+ node types
  • Version control

PII/PHI Protection Engine

6 protection methods for sensitive data with automatic detection. HIPAA Safe Harbor built-in with 18 identifiers.

  • MASK, HASH, REDACT
  • ENCRYPT, TOKENIZE
  • Auto-detection
  • HIPAA Safe Harbor

Data Pods

Logical containers with compliance isolation. Each pod has its own protection rules and access policies.

  • Compliance isolation
  • GENERAL, PII, PHI levels
  • Access policies
  • Audit trails
Add-on

Data Lineage

Interactive data lineage visualization at table and column level. Impact analysis for schema changes.

  • Table & column level
  • Impact analysis
  • SQL parsing
  • Visual graph

Delta Lake Storage

Delta Lake format storage with ACID transactions, time travel, and schema evolution. Multi-cloud native.

  • ACID transactions
  • Time travel
  • Schema evolution
  • OneLake, ADLS, S3, GCS
Add-on

Data Quality & Trust Score

Quality framework with Great Expectations integration. Automatic Trust Score from 0-100 for each dataset.

  • Great Expectations
  • Trust Score 0-100
  • Auto-profiling
  • Anomaly detection
New

Hybrid Spark Engine

Automatic intelligent routing: DuckDB for fast queries, Databricks/Spark for petabyte-scale transformations.

  • Smart routing
  • Databricks Photon
  • Warm pool
  • Multi-cloud
New

Transformations

SQL, Python, field mapping, and dbt model transformations. Visual configuration with automatic schema propagation.

  • SQL queries
  • Python scripts
  • Field mapping
  • dbt models
New

Real-Time CDC

Change Data Capture from database logs. Track WAL positions, binlog offsets, and resume tokens for precise recovery. SQL Server supports both CDC and lightweight Change Tracking.

  • PostgreSQL WAL
  • MySQL Binlog/GTID
  • SQL Server CDC/CT
  • Oracle LogMiner
  • MongoDB Streams
Recommended Format

Delta Lake Storage

All data is stored in Delta Lake format by default, providing ACID transactions, time travel, and schema evolution in your cloud storage.

ACID Transactions

Ensures consistency even with concurrent data flows

Time Travel

Access historical versions of your data

Schema Evolution

Evolve schemas without breaking data flows

Unity Catalog

Compatible with Databricks Unity Catalog

Supported Storage Backends

OneLake

Microsoft OneLake

Microsoft Fabric native

Primary
Azure Data Lake Gen2

Azure Data Lake Gen2

ADLS with hierarchical namespace

Amazon S3

Amazon S3

AWS native storage

Google Cloud Storage

Google Cloud Storage

GCP native storage

30+ Metrics Per Run

Each execution captures detailed metrics for volumetry, timing, quality, resources, and costs. Full visibility into your data flow.

Volumetry

  • rows_extracted
  • rows_transformed
  • rows_loaded
  • rows_rejected
  • bytes_processed

PII/PHI Tracking

  • pii_fields_detected
  • pii_records_protected
  • phi_fields_detected
  • phi_records_protected

Timing

  • extraction_duration_ms
  • transform_duration_ms
  • load_duration_ms
  • total_duration_ms

Quality

  • quality_score
  • quality_checks_passed
  • quality_checks_failed
  • data_freshness_score

CDC Tracking

  • cdc_start_position
  • cdc_end_position
  • lag_at_start_ms
  • lag_at_end_ms
  • transactions_processed

Resources

  • cpu_usage_percent
  • memory_usage_mb
  • disk_io_mb
  • network_io_mb
High Performance

Parallel Pipeline Execution

Extract multiple tables simultaneously with license-controlled parallelism. Process terabytes of data in hours instead of days with intelligent stream orchestration.

Parallel Streams

Extract up to 16 tables concurrently within a single pipeline. Perfect for databases with hundreds of tables.

Concurrent Pipelines

Run multiple complete pipelines at the same time. Schedule all your data flows without queuing.

Throughput Control

License-based rate limiting ensures fair resource allocation. Enterprise plans get unlimited throughput.

Performance by Plan

Starter$2,500/mo

1

Parallel Streams

2

Pipelines

10K

rows/sec

ProfessionalPopular

4

Parallel Streams

10

Pipelines

100K

rows/sec

EnterpriseCustom

16

Parallel Streams

Pipelines

rows/sec

Compare all plans →

Real-Time Sync

Change Data Capture (CDC)

Capture changes directly from database transaction logs in real-time. Track exact positions for precise recovery and zero data loss. Sub-second latency with automatic lag monitoring.

Position Tracking

WAL LSN for PostgreSQL, Binlog positions for MySQL, SCN for Oracle. Resume from exact position after failures - no data loss or duplication.

Lag Monitoring & Alerts

Real-time lag metrics in milliseconds. Automatic alerts when replication falls behind configurable thresholds.

Replication Slot Management

Automatic slot creation and cleanup for PostgreSQL. Monitor slot lag in bytes to prevent disk exhaustion.

Supported Databases

PostgreSQL

PostgreSQL

Logical Replication

WAL

LSN

Position

pgoutput

Plugin

Slots

Managed

MySQL

MySQL / MariaDB

Binary Log Replication

Binlog

File+Pos

Position

GTID

Mode

ROW

Format

SQL Server

SQL Server

CDC & Change Tracking

LSN

CDC

Full history

CT

Lightweight

Oracle

Oracle

LogMiner

SCN
MongoDB

MongoDB

Change Streams

Resume Token

CDC Health Dashboard

Monitor all sources in one view

5 Healthy1 Lagging
Complete Control

Intelligent Data Strategies

Control exactly how data is read from sources and written to your Data Pod. Configure per-table or use smart defaults with auto-detected primary keys.

Read Strategies

How data is extracted from source

Full Table

Default

Extract all records every run. Ideal for small reference tables and lookup data.

Incremental

Efficient

Only extract changes since last run using timestamp or ID column. Reduces data transfer.

Log-Based CDC

Real-Time

Read changes from database transaction log. Captures deletes, sub-second latency.

Write Strategies

How data is loaded to Data Pod

Append

Insert all, no dedup. Logs & events.

Replace

Delete and reload. Reference tables.

Upsert

Recommended

Insert new or update existing based on merge keys (SCD Type 1).

Merge

Upsert + delete records not in source. Full bidirectional sync.

Soft Delete

Audit

Mark deleted_at instead of deleting. HIPAA-ready.

SCD Type 2

History

Full history with valid_from/valid_to. Data warehouse.

3

Read Strategies

6

Write Strategies

Auto

Merge Key Detection

Per-Table

Configuration

Unlimited Scale

Hybrid Spark Execution

Automatic intelligent routing between local DuckDB for fast queries and Databricks/Spark for massive transformations. Process billions of rows without changing your pipeline.

Smart Routing

Under 100M rows? Runs locally in milliseconds with DuckDB. Larger datasets automatically route to your Spark cluster.

Databricks Native

First-class Databricks integration with Photon acceleration, Unity Catalog support, and warm cluster pooling.

Multi-Cloud Spark

Connect to Databricks on AWS, Azure, or GCP. Also supports EMR Serverless and Azure Synapse for existing investments.

Execution Comparison

DuckDB

DuckDB (Local)

Default for <100M rows

<1s

Startup

$0

Extra cost

100M

rows/run

Databricks

Databricks

Recommended for Big Data

Best

<5s

Warm pool

3x

Photon speed

PB

scale

EMR Serverless

EMR Serverless

Pay-per-second

~2m

Cold start

$0.16

per 50M rows

PB

scale

Bring your own Spark cluster or use ours

81 Connectors

Connect with databases, data warehouses, SaaS apps, APIs and cloud storage. On-premise gateway for sources behind firewalls.

HL7 FHIR R4
Plaid
Microsoft OneLake
Salesforce
Google Analytics
Slack
PostgreSQL
MySQL
SQL Server
Oracle
MongoDB
MariaDB
HubSpot
Mailchimp
Mixpanel
Facebook Ads
Pipedrive
Zoho CRM
Google Sheets
Jira
Notion
Stripe
Square
Shopify
S3
Excel
CSV/JSON
+ 55 more...

Gateway Agent

Online
StatusConnected via WebSocket
Versionv2.4.1
Last Heartbeat2 seconds ago

Configured Connections

PostgreSQL - prod-db
SQL Server - erp-system
Oracle - legacy-db
On-Premise

Gateway Agent

Access data behind corporate firewalls securely. The Gateway installs on-premise and connects to Nexion via outbound WebSocket.

  • No inbound ports required - only outbound HTTPS
  • Credentials never leave your network
  • Auto-reconnect and health monitoring
  • HA deployment with multiple agents

Ready to see it in action?

Schedule a demo with our team and see how Nexion can transform your data operations.