Everything you need to build compliant data flows. From visual design to automatic PII protection and Delta Lake storage.
Drag-and-drop canvas to design complex data flows. Connect sources, apply transformations, and define destinations visually.
6 protection methods for sensitive data with automatic detection. HIPAA Safe Harbor built-in with 18 identifiers.
Logical containers with compliance isolation. Each pod has its own protection rules and access policies.
Interactive data lineage visualization at table and column level. Impact analysis for schema changes.
Delta Lake format storage with ACID transactions, time travel, and schema evolution. Multi-cloud native.
Quality framework with Great Expectations integration. Automatic Trust Score from 0-100 for each dataset.
Automatic intelligent routing: DuckDB for fast queries, Databricks/Spark for petabyte-scale transformations.
SQL, Python, field mapping, and dbt model transformations. Visual configuration with automatic schema propagation.
Change Data Capture from database logs. Track WAL positions, binlog offsets, and resume tokens for precise recovery. SQL Server supports both CDC and lightweight Change Tracking.
All data is stored in Delta Lake format by default, providing ACID transactions, time travel, and schema evolution in your cloud storage.
Ensures consistency even with concurrent data flows
Access historical versions of your data
Evolve schemas without breaking data flows
Compatible with Databricks Unity Catalog
Microsoft OneLake
Microsoft Fabric native
Azure Data Lake Gen2
ADLS with hierarchical namespace
Amazon S3
AWS native storage
Google Cloud Storage
GCP native storage
Each execution captures detailed metrics for volumetry, timing, quality, resources, and costs. Full visibility into your data flow.
rows_extractedrows_transformedrows_loadedrows_rejectedbytes_processedpii_fields_detectedpii_records_protectedphi_fields_detectedphi_records_protectedextraction_duration_mstransform_duration_msload_duration_mstotal_duration_msquality_scorequality_checks_passedquality_checks_faileddata_freshness_scorecdc_start_positioncdc_end_positionlag_at_start_mslag_at_end_mstransactions_processedcpu_usage_percentmemory_usage_mbdisk_io_mbnetwork_io_mbExtract multiple tables simultaneously with license-controlled parallelism. Process terabytes of data in hours instead of days with intelligent stream orchestration.
Extract up to 16 tables concurrently within a single pipeline. Perfect for databases with hundreds of tables.
Run multiple complete pipelines at the same time. Schedule all your data flows without queuing.
License-based rate limiting ensures fair resource allocation. Enterprise plans get unlimited throughput.
1
Parallel Streams
2
Pipelines
10K
rows/sec
4
Parallel Streams
10
Pipelines
100K
rows/sec
16
Parallel Streams
∞
Pipelines
∞
rows/sec
Capture changes directly from database transaction logs in real-time. Track exact positions for precise recovery and zero data loss. Sub-second latency with automatic lag monitoring.
WAL LSN for PostgreSQL, Binlog positions for MySQL, SCN for Oracle. Resume from exact position after failures - no data loss or duplication.
Real-time lag metrics in milliseconds. Automatic alerts when replication falls behind configurable thresholds.
Automatic slot creation and cleanup for PostgreSQL. Monitor slot lag in bytes to prevent disk exhaustion.
PostgreSQL
Logical Replication
LSN
Position
pgoutput
Plugin
Slots
Managed
MySQL / MariaDB
Binary Log Replication
File+Pos
Position
GTID
Mode
ROW
Format
SQL Server
CDC & Change Tracking
CDC
Full history
CT
Lightweight
Oracle
LogMiner
MongoDB
Change Streams
CDC Health Dashboard
Monitor all sources in one view
Control exactly how data is read from sources and written to your Data Pod. Configure per-table or use smart defaults with auto-detected primary keys.
How data is extracted from source
Extract all records every run. Ideal for small reference tables and lookup data.
Only extract changes since last run using timestamp or ID column. Reduces data transfer.
Read changes from database transaction log. Captures deletes, sub-second latency.
How data is loaded to Data Pod
Insert all, no dedup. Logs & events.
Delete and reload. Reference tables.
Insert new or update existing based on merge keys (SCD Type 1).
Upsert + delete records not in source. Full bidirectional sync.
Mark deleted_at instead of deleting. HIPAA-ready.
Full history with valid_from/valid_to. Data warehouse.
3
Read Strategies
6
Write Strategies
Auto
Merge Key Detection
Per-Table
Configuration
Automatic intelligent routing between local DuckDB for fast queries and Databricks/Spark for massive transformations. Process billions of rows without changing your pipeline.
Under 100M rows? Runs locally in milliseconds with DuckDB. Larger datasets automatically route to your Spark cluster.
First-class Databricks integration with Photon acceleration, Unity Catalog support, and warm cluster pooling.
Connect to Databricks on AWS, Azure, or GCP. Also supports EMR Serverless and Azure Synapse for existing investments.
DuckDB (Local)
Default for <100M rows
<1s
Startup
$0
Extra cost
100M
rows/run
Databricks
Recommended for Big Data
<5s
Warm pool
3x
Photon speed
PB
scale
EMR Serverless
Pay-per-second
~2m
Cold start
$0.16
per 50M rows
PB
scale
Bring your own Spark cluster or use ours
Connect with databases, data warehouses, SaaS apps, APIs and cloud storage. On-premise gateway for sources behind firewalls.
Configured Connections
Access data behind corporate firewalls securely. The Gateway installs on-premise and connects to Nexion via outbound WebSocket.
Schedule a demo with our team and see how Nexion can transform your data operations.