New Feature

Data Transformations

Transform your data with SQL, Python, or field mapping. Visual configuration with automatic schema propagation and intelligent execution routing.

ELT Philosophy: Keep Data Raw

Nexion follows the modern ELT approach (Extract-Load-Transform). We recommend minimal transformations during ingestion to preserve data fidelity.

Use Transforms For

  • Essential data cleaning (nulls, date formats)
  • PII/PHI protection (handled automatically)
  • Destination compatibility (type casting)

Raw Data Benefits

  • Reprocess with different criteria anytime
  • Full audit trail and compliance
  • Future analytics flexibility

Transform in your data warehouse with SQL/BI tools after loading. Your raw data stays intact in Delta Lake.

Transform Node Configuration

Clean Customer Data

SQL Query Transform

SQLValidated
SELECT
  id,
  UPPER(TRIM(name)) as name,
  LOWER(email) as email,
  amount * 100 as amount_cents
FROM {{ source }}
WHERE status = 'active'
Input: 5 fieldsOutput: 4 fields
DuckDB engine

Three Ways to Transform

Choose the right tool for your transformation. From simple field mapping to powerful SQL queries.

SQL Query

Write SQL to transform your data. Powered by DuckDB for sub-second execution on datasets up to 100M rows.

SELECT
  id,
  UPPER(TRIM(name)) as name,
  LOWER(email) as email,
  amount * 100 as amount_cents,
  created_at
FROM {{ source }}
WHERE status = 'active'
  AND created_at >= '2024-01-01'
  • Standard SQL syntax
  • Use {{ source }} as input table
  • DuckDB engine (<100M rows)
  • Auto-routes to Spark for big data
  • Schema inference from query

Python Script

Write Python code with pandas DataFrames. Full access to numpy, datetime, and common data manipulation libraries.

def transform(df):
    """
    Transform input DataFrame.

    Args:
        df: pandas DataFrame
    Returns:
        Transformed DataFrame
    """
    # Clean email addresses
    df['email'] = df['email'].str.lower().str.strip()

    # Create full name
    df['full_name'] = df['first_name'] + ' ' + df['last_name']

    # Filter active records
    df = df[df['status'] == 'active']

    return df
  • Pandas DataFrame input/output
  • Access to numpy, datetime
  • Sandboxed execution
  • 5-minute timeout by default
  • Full error tracebacks

Field Mapping

Visual field mapping with rename, cast, and simple transformations. No code required.

[
  { "source": "customer_email", "target": "email", "transform": "lower" },
  { "source": "customer_name", "target": "name", "transform": "title" },
  { "source": "amount", "target": "amount_cents", "transform": "multiply:100" },
  { "source": "created_date", "target": "created_at", "transform": "date:%Y-%m-%d" },
  { "source": "status", "target": "status", "transform": "default:unknown" }
]
  • Rename fields visually
  • Type casting (string, int, float)
  • String transforms (upper, lower, trim)
  • Numeric transforms (multiply, divide)
  • Default values for nulls
Intelligent Routing

Automatic Engine Selection

Nexion automatically routes your transformations to the optimal engine. Small datasets run instantly on DuckDB. Large datasets scale to Databricks or Spark.

Under 100M Rows

Runs locally with DuckDB. Sub-second execution, zero extra cost. Perfect for development and medium datasets.

Over 100M Rows

Automatically routes to your Spark cluster. Databricks Photon provides 3x faster execution with warm pool support.

DuckDB

DuckDB

Default for datasets under 100M rows

<1s

Startup

$0

Cost

100M

Max rows

Databricks

Databricks

Best for big data with Photon acceleration

Best

<5s

Startup

3x

Speed

PB

Scale

EMR Serverless

EMR Serverless

Pay-per-second for sporadic workloads

~2m

Startup

$0.16/50M

Cost

PB

Scale

Field Mapping Transforms

Built-in transforms for the visual field mapping interface. No code required.

upper

Convert to uppercase

HELLO WORLD

lower

Convert to lowercase

hello world

strip

Remove whitespace

" text " → "text"

title

Title case

Hello World

multiply:N

Multiply by N

100 * 100 = 10000

divide:N

Divide by N

100 / 2 = 50

to_string

Cast to string

123 → "123"

to_int

Cast to integer

"123" → 123

date:FORMAT

Parse date

2024-01-15

default:VALUE

Fill nulls

null → "unknown"

Part of Your Data Pipeline

Transformations integrate seamlessly with the full pipeline execution flow.

Extraction

Pull from sources

PII Protection

Mask sensitive data

PHI Protection

HIPAA compliance

Transform

SQL/Py/Map

Quality

Validate data

Loading

Write to Delta

Lineage

Track provenance

Ready to Transform Your Data?

Start building visual data pipelines with powerful transformations. Free trial includes all transform types.