●New Feature

Data Transformations

Transform your data with SQL, Python, or field mapping. Visual configuration with automatic schema propagation and intelligent execution routing.

Try It Free Explore Transform Types

ELT Philosophy: Keep Data Raw

Nexion follows the modern ELT approach (Extract-Load-Transform). We recommend minimal transformations during ingestion to preserve data fidelity.

Use Transforms For

Essential data cleaning (nulls, date formats)
PII/PHI protection (handled automatically)
Destination compatibility (type casting)

Raw Data Benefits

Reprocess with different criteria anytime
Full audit trail and compliance
Future analytics flexibility

Transform in your data warehouse with SQL/BI tools after loading. Your raw data stays intact in Delta Lake.

Transform Node Configuration

Clean Customer Data

SQL Query Transform

SQLValidated

SELECT
  id,
  UPPER(TRIM(name)) as name,
  LOWER(email) as email,
  amount * 100 as amount_cents
FROM {{ source }}
WHERE status = 'active'

Input: 5 fields→Output: 4 fields

DuckDB engine

Three Ways to Transform

Choose the right tool for your transformation. From simple field mapping to powerful SQL queries.

SQL Query

Write SQL to transform your data. Powered by DuckDB for sub-second execution on datasets up to 100M rows.

SELECT
  id,
  UPPER(TRIM(name)) as name,
  LOWER(email) as email,
  amount * 100 as amount_cents,
  created_at
FROM {{ source }}
WHERE status = 'active'
  AND created_at >= '2024-01-01'

Standard SQL syntax
Use {{ source }} as input table
DuckDB engine (<100M rows)
Auto-routes to Spark for big data
Schema inference from query

Python Script

Write Python code with pandas DataFrames. Full access to numpy, datetime, and common data manipulation libraries.

def transform(df):
    """
    Transform input DataFrame.

    Args:
        df: pandas DataFrame
    Returns:
        Transformed DataFrame
    """
    # Clean email addresses
    df['email'] = df['email'].str.lower().str.strip()

    # Create full name
    df['full_name'] = df['first_name'] + ' ' + df['last_name']

    # Filter active records
    df = df[df['status'] == 'active']

    return df

Pandas DataFrame input/output
Access to numpy, datetime
Sandboxed execution
5-minute timeout by default
Full error tracebacks

Field Mapping

Visual field mapping with rename, cast, and simple transformations. No code required.

[
  { "source": "customer_email", "target": "email", "transform": "lower" },
  { "source": "customer_name", "target": "name", "transform": "title" },
  { "source": "amount", "target": "amount_cents", "transform": "multiply:100" },
  { "source": "created_date", "target": "created_at", "transform": "date:%Y-%m-%d" },
  { "source": "status", "target": "status", "transform": "default:unknown" }
]

Rename fields visually
Type casting (string, int, float)
String transforms (upper, lower, trim)
Numeric transforms (multiply, divide)
Default values for nulls

Intelligent Routing

Automatic Engine Selection

Nexion automatically routes your transformations to the optimal engine. Small datasets run instantly on DuckDB. Large datasets scale to Databricks or Spark.

Under 100M Rows

Runs locally with DuckDB. Sub-second execution, zero extra cost. Perfect for development and medium datasets.

Over 100M Rows

Automatically routes to your Spark cluster. Databricks Photon provides 3x faster execution with warm pool support.

DuckDB

Default for datasets under 100M rows

<1s

Startup

Cost

100M

Max rows

Databricks

Best for big data with Photon acceleration

Best

<5s

Startup

Speed

Scale

EMR Serverless

Pay-per-second for sporadic workloads

~2m

Startup

$0.16/50M

Cost

Scale

Field Mapping Transforms

Built-in transforms for the visual field mapping interface. No code required.

upper

Convert to uppercase

HELLO WORLD

lower

Convert to lowercase

hello world

strip

Remove whitespace

" text " → "text"

title

Title case

Hello World

multiply:N

Multiply by N

100 * 100 = 10000

divide:N

Divide by N

100 / 2 = 50

to_string

Cast to string

123 → "123"

to_int

Cast to integer

"123" → 123

date:FORMAT

Parse date

2024-01-15

default:VALUE

Fill nulls

null → "unknown"

Part of Your Data Pipeline

Transformations integrate seamlessly with the full pipeline execution flow.

Extraction

Pull from sources

PII Protection

Mask sensitive data

PHI Protection

HIPAA compliance

Transform

SQL/Py/Map

Quality

Validate data

Write to Delta

Lineage

Track provenance

Ready to Transform Your Data?

Start building visual data pipelines with powerful transformations. Free trial includes all transform types.

Start Free Trial Talk to Sales