What data engineering is, how it differs from related disciplines, career paths, essential skills, and New Zealand market demand.

Data Engineering

Data engineering is the discipline of designing, building, and maintaining the infrastructure and systems that enable organizations to collect, store, process, and analyze data at scale. Data engineers build the pipelines and platforms that transform raw data into reliable, accessible datasets for analysts, scientists, and business stakeholders.

What Data Engineers Do

At a high level, data engineers are responsible for:

Ingesting data from diverse sources (APIs, databases, event streams, files)
Transforming raw data into clean, structured formats
Orchestrating complex workflows that run reliably on schedule
Modeling data for efficient analytical querying
Ensuring data quality, lineage, and governance
Optimizing storage and compute for cost and performance

# A simplified view of a data engineer's daily work
pipeline_responsibilities = {
    "ingest":     "Pull data from 50+ sources into a data lake",
    "transform":  "Clean, deduplicate, and enrich raw records",
    "model":      "Build dimensional models for the analytics team",
    "orchestrate": "Schedule and monitor DAGs in Airflow",
    "quality":    "Run data validation checks before downstream use",
    "optimize":   "Tune Spark jobs to cut costs by 40%",
}

Data Engineering vs Software Engineering

While both roles write production-quality code, the focus differs significantly:

Aspect	Software Engineering	Data Engineering
Primary output	Applications and services	Data pipelines and platforms
Data volume	Transactional (rows at a time)	Analytical (millions to billions of rows)
State management	Application state, sessions	Historical data, slowly changing dimensions
Testing focus	Unit tests, integration tests	Data quality tests, schema validation
Failure modes	Service outages, bugs	Silent data corruption, stale data
Key tools	React, Spring, Rails	Spark, Airflow, dbt, SQL

Data engineers often come from software engineering backgrounds and bring strong coding practices to data systems.

Data Engineering vs Data Science

Data science and data engineering are complementary but distinct:

Aspect	Data Science	Data Engineering
Goal	Extract insights, build models	Build reliable data infrastructure
Key skills	Statistics, ML, experimentation	Distributed systems, SQL, pipeline design
Tools	Jupyter, scikit-learn, PyTorch	Spark, Airflow, Kafka, dbt
Output	Models, dashboards, recommendations	Clean datasets, APIs, pipelines
Data interaction	Reads and analyzes data	Builds systems that produce data

A common saying: "Data scientists are only as good as the data they receive." Data engineers ensure that data is timely, accurate, and accessible.

The Modern Data Stack

The modern data stack has evolved rapidly. Here is a typical architecture:

Sources          Ingestion       Storage         Transform       Serve
─────────        ──────────      ──────────      ──────────      ──────────
APIs         →   Fivetran    →   Snowflake   →   dbt         →   Looker
Databases    →   Airbyte     →   BigQuery    →   Spark       →   Metabase
Event Streams→   Kafka       →   Databricks  →   Python      →   API Layer
Files        →   Custom      →   Delta Lake  →   SQL         →   Notebooks

Essential Skills for Data Engineers

Technical Skills

SQL - The foundation of data engineering. Master window functions, CTEs, and query optimization.

-- Example: Common analytical SQL pattern
WITH daily_metrics AS (
    SELECT
        date_trunc('day', event_timestamp) AS event_date,
        user_id,
        COUNT(*) AS event_count,
        COUNT(DISTINCT session_id) AS session_count
    FROM raw_events
    WHERE event_timestamp >= CURRENT_DATE - INTERVAL '30 days'
    GROUP BY 1, 2
)
SELECT
    event_date,
    COUNT(DISTINCT user_id) AS active_users,
    SUM(event_count) AS total_events,
    AVG(session_count) AS avg_sessions_per_user
FROM daily_metrics
GROUP BY 1
ORDER BY 1;

Python - For pipeline logic, API integrations, and Spark jobs.
Distributed Systems - Understanding partitioning, replication, consistency.
Cloud Platforms - AWS (S3, Glue, Redshift), GCP (BigQuery, Dataflow), Azure (Synapse, Data Factory).
Orchestration - Airflow, Dagster, or Prefect for workflow management.
Streaming - Kafka, Kinesis, or Flink for real-time data.

Soft Skills

Communication - Translating business requirements into technical designs
Problem decomposition - Breaking complex data flows into manageable steps
Operational mindset - Thinking about monitoring, alerting, and failure recovery

Career Path

A typical data engineering career progression:

Junior Data Engineer (0-2 years)
  → Write ETL scripts, maintain existing pipelines
  → Learn SQL deeply, understand data modeling basics

Data Engineer (2-5 years)
  → Design and build pipelines end-to-end
  → Optimize Spark jobs, implement data quality frameworks

Senior Data Engineer (5-8 years)
  → Architect data platforms, mentor juniors
  → Drive adoption of best practices, evaluate new tools

Staff / Principal Data Engineer (8+ years)
  → Set technical direction for the data org
  → Solve cross-team data challenges at scale

Engineering Manager / Data Architect
  → Lead teams, define data strategy
  → Bridge business and technical stakeholders

NZ Market Demand

Data engineering is one of the fastest-growing tech roles in New Zealand:

High demand: Companies across banking (ANZ, ASB, Westpac), telco (Spark, One NZ), government (Stats NZ, ACC), and tech (Xero, Trade Me) actively recruit data engineers.
Salary range: NZD $100,000 - $180,000+ for intermediate to senior roles in Auckland and Wellington (as of 2025).
Skills in demand: SQL, Python, Spark, cloud platforms (AWS/GCP/Azure), Airflow, dbt, and Snowflake/Databricks appear frequently in NZ job listings.
Remote-friendly: Many NZ organizations now offer hybrid or fully remote data engineering positions.
Growing ecosystem: The NZ data community is active with meetups in Auckland, Wellington, and Christchurch, and conferences like Data Engineering NZ.

Getting Started in NZ

If you are entering data engineering in New Zealand:

Build a portfolio - Create public data pipelines on GitHub using open datasets (Stats NZ, data.govt.nz).
Get cloud certified - AWS Data Analytics Specialty or GCP Professional Data Engineer certifications are valued.
Learn the local stack - Snowflake and dbt are particularly popular among NZ companies.
Network - Attend local data meetups and engage with the NZ tech community on LinkedIn.

What is Next

The following articles dive deep into core data engineering topics:

ETL & ELT - Data integration paradigms and tools
Data Modeling - Dimensional modeling, data vault, and schema design
Orchestration - Workflow management with Airflow, Dagster, and dbt
Apache Spark - Distributed data processing at scale
Lakehouse Architecture - Modern unified data platforms
Stream Processing - Event time, windowing, watermarks, and exactly-once with Kafka and Flink
Data Quality - Quality dimensions, testing, anomaly detection, SLAs, and alerting
Data Contracts - Schema management, evolution, compatibility, and CI enforcement
DataOps & Observability - CI/CD, lineage, monitoring, and incident response for data

Each topic builds on the fundamentals covered here and includes hands-on code examples you can adapt for your own projects.

Data Engineering

On this page