Data Engineering
What data engineering is, how it differs from related disciplines, career paths, essential skills, and New Zealand market demand.
Data Engineering
Data engineering is the discipline of designing, building, and maintaining the infrastructure and systems that enable organizations to collect, store, process, and analyze data at scale. Data engineers build the pipelines and platforms that transform raw data into reliable, accessible datasets for analysts, scientists, and business stakeholders.
What Data Engineers Do
At a high level, data engineers are responsible for:
- Ingesting data from diverse sources (APIs, databases, event streams, files)
- Transforming raw data into clean, structured formats
- Orchestrating complex workflows that run reliably on schedule
- Modeling data for efficient analytical querying
- Ensuring data quality, lineage, and governance
- Optimizing storage and compute for cost and performance
# A simplified view of a data engineer's daily work
pipeline_responsibilities = {
"ingest": "Pull data from 50+ sources into a data lake",
"transform": "Clean, deduplicate, and enrich raw records",
"model": "Build dimensional models for the analytics team",
"orchestrate": "Schedule and monitor DAGs in Airflow",
"quality": "Run data validation checks before downstream use",
"optimize": "Tune Spark jobs to cut costs by 40%",
}Data Engineering vs Software Engineering
While both roles write production-quality code, the focus differs significantly:
| Aspect | Software Engineering | Data Engineering |
|---|---|---|
| Primary output | Applications and services | Data pipelines and platforms |
| Data volume | Transactional (rows at a time) | Analytical (millions to billions of rows) |
| State management | Application state, sessions | Historical data, slowly changing dimensions |
| Testing focus | Unit tests, integration tests | Data quality tests, schema validation |
| Failure modes | Service outages, bugs | Silent data corruption, stale data |
| Key tools | React, Spring, Rails | Spark, Airflow, dbt, SQL |
Data engineers often come from software engineering backgrounds and bring strong coding practices to data systems.
Data Engineering vs Data Science
Data science and data engineering are complementary but distinct:
| Aspect | Data Science | Data Engineering |
|---|---|---|
| Goal | Extract insights, build models | Build reliable data infrastructure |
| Key skills | Statistics, ML, experimentation | Distributed systems, SQL, pipeline design |
| Tools | Jupyter, scikit-learn, PyTorch | Spark, Airflow, Kafka, dbt |
| Output | Models, dashboards, recommendations | Clean datasets, APIs, pipelines |
| Data interaction | Reads and analyzes data | Builds systems that produce data |
A common saying: "Data scientists are only as good as the data they receive." Data engineers ensure that data is timely, accurate, and accessible.
The Modern Data Stack
The modern data stack has evolved rapidly. Here is a typical architecture:
Sources Ingestion Storage Transform Serve
───────── ────────── ────────── ────────── ──────────
APIs → Fivetran → Snowflake → dbt → Looker
Databases → Airbyte → BigQuery → Spark → Metabase
Event Streams→ Kafka → Databricks → Python → API Layer
Files → Custom → Delta Lake → SQL → NotebooksEssential Skills for Data Engineers
Technical Skills
- SQL - The foundation of data engineering. Master window functions, CTEs, and query optimization.
-- Example: Common analytical SQL pattern
WITH daily_metrics AS (
SELECT
date_trunc('day', event_timestamp) AS event_date,
user_id,
COUNT(*) AS event_count,
COUNT(DISTINCT session_id) AS session_count
FROM raw_events
WHERE event_timestamp >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY 1, 2
)
SELECT
event_date,
COUNT(DISTINCT user_id) AS active_users,
SUM(event_count) AS total_events,
AVG(session_count) AS avg_sessions_per_user
FROM daily_metrics
GROUP BY 1
ORDER BY 1;- Python - For pipeline logic, API integrations, and Spark jobs.
- Distributed Systems - Understanding partitioning, replication, consistency.
- Cloud Platforms - AWS (S3, Glue, Redshift), GCP (BigQuery, Dataflow), Azure (Synapse, Data Factory).
- Orchestration - Airflow, Dagster, or Prefect for workflow management.
- Streaming - Kafka, Kinesis, or Flink for real-time data.
Soft Skills
- Communication - Translating business requirements into technical designs
- Problem decomposition - Breaking complex data flows into manageable steps
- Operational mindset - Thinking about monitoring, alerting, and failure recovery
Career Path
A typical data engineering career progression:
Junior Data Engineer (0-2 years)
→ Write ETL scripts, maintain existing pipelines
→ Learn SQL deeply, understand data modeling basics
Data Engineer (2-5 years)
→ Design and build pipelines end-to-end
→ Optimize Spark jobs, implement data quality frameworks
Senior Data Engineer (5-8 years)
→ Architect data platforms, mentor juniors
→ Drive adoption of best practices, evaluate new tools
Staff / Principal Data Engineer (8+ years)
→ Set technical direction for the data org
→ Solve cross-team data challenges at scale
Engineering Manager / Data Architect
→ Lead teams, define data strategy
→ Bridge business and technical stakeholdersNZ Market Demand
Data engineering is one of the fastest-growing tech roles in New Zealand:
- High demand: Companies across banking (ANZ, ASB, Westpac), telco (Spark, One NZ), government (Stats NZ, ACC), and tech (Xero, Trade Me) actively recruit data engineers.
- Salary range: NZD $100,000 - $180,000+ for intermediate to senior roles in Auckland and Wellington (as of 2025).
- Skills in demand: SQL, Python, Spark, cloud platforms (AWS/GCP/Azure), Airflow, dbt, and Snowflake/Databricks appear frequently in NZ job listings.
- Remote-friendly: Many NZ organizations now offer hybrid or fully remote data engineering positions.
- Growing ecosystem: The NZ data community is active with meetups in Auckland, Wellington, and Christchurch, and conferences like Data Engineering NZ.
Getting Started in NZ
If you are entering data engineering in New Zealand:
- Build a portfolio - Create public data pipelines on GitHub using open datasets (Stats NZ, data.govt.nz).
- Get cloud certified - AWS Data Analytics Specialty or GCP Professional Data Engineer certifications are valued.
- Learn the local stack - Snowflake and dbt are particularly popular among NZ companies.
- Network - Attend local data meetups and engage with the NZ tech community on LinkedIn.
What is Next
The following articles dive deep into core data engineering topics:
- ETL & ELT - Data integration paradigms and tools
- Data Modeling - Dimensional modeling, data vault, and schema design
- Orchestration - Workflow management with Airflow, Dagster, and dbt
- Apache Spark - Distributed data processing at scale
- Lakehouse Architecture - Modern unified data platforms
Each topic builds on the fundamentals covered here and includes hands-on code examples you can adapt for your own projects.