Transform your data with SQL, Python, or field mapping. Visual configuration with automatic schema propagation and intelligent execution routing.
Nexion follows the modern ELT approach (Extract-Load-Transform). We recommend minimal transformations during ingestion to preserve data fidelity.
Transform in your data warehouse with SQL/BI tools after loading. Your raw data stays intact in Delta Lake.
SQL Query Transform
SELECT
id,
UPPER(TRIM(name)) as name,
LOWER(email) as email,
amount * 100 as amount_cents
FROM {{ source }}
WHERE status = 'active'Choose the right tool for your transformation. From simple field mapping to powerful SQL queries.
Write SQL to transform your data. Powered by DuckDB for sub-second execution on datasets up to 100M rows.
SELECT
id,
UPPER(TRIM(name)) as name,
LOWER(email) as email,
amount * 100 as amount_cents,
created_at
FROM {{ source }}
WHERE status = 'active'
AND created_at >= '2024-01-01'Write Python code with pandas DataFrames. Full access to numpy, datetime, and common data manipulation libraries.
def transform(df):
"""
Transform input DataFrame.
Args:
df: pandas DataFrame
Returns:
Transformed DataFrame
"""
# Clean email addresses
df['email'] = df['email'].str.lower().str.strip()
# Create full name
df['full_name'] = df['first_name'] + ' ' + df['last_name']
# Filter active records
df = df[df['status'] == 'active']
return dfVisual field mapping with rename, cast, and simple transformations. No code required.
[
{ "source": "customer_email", "target": "email", "transform": "lower" },
{ "source": "customer_name", "target": "name", "transform": "title" },
{ "source": "amount", "target": "amount_cents", "transform": "multiply:100" },
{ "source": "created_date", "target": "created_at", "transform": "date:%Y-%m-%d" },
{ "source": "status", "target": "status", "transform": "default:unknown" }
]Nexion automatically routes your transformations to the optimal engine. Small datasets run instantly on DuckDB. Large datasets scale to Databricks or Spark.
Runs locally with DuckDB. Sub-second execution, zero extra cost. Perfect for development and medium datasets.
Automatically routes to your Spark cluster. Databricks Photon provides 3x faster execution with warm pool support.
DuckDB
Default for datasets under 100M rows
<1s
Startup
$0
Cost
100M
Max rows
Databricks
Best for big data with Photon acceleration
<5s
Startup
3x
Speed
PB
Scale
EMR Serverless
Pay-per-second for sporadic workloads
~2m
Startup
$0.16/50M
Cost
PB
Scale
Built-in transforms for the visual field mapping interface. No code required.
upperConvert to uppercase
HELLO WORLD
lowerConvert to lowercase
hello world
stripRemove whitespace
" text " → "text"
titleTitle case
Hello World
multiply:NMultiply by N
100 * 100 = 10000
divide:NDivide by N
100 / 2 = 50
to_stringCast to string
123 → "123"
to_intCast to integer
"123" → 123
date:FORMATParse date
2024-01-15
default:VALUEFill nulls
null → "unknown"
Transformations integrate seamlessly with the full pipeline execution flow.
Extraction
Pull from sources
PII Protection
Mask sensitive data
PHI Protection
HIPAA compliance
Transform
SQL/Py/Map
Quality
Validate data
Loading
Write to Delta
Lineage
Track provenance
Start building visual data pipelines with powerful transformations. Free trial includes all transform types.