Steven's Knowledge

Data Engineering

What data engineering is, how it differs from related disciplines, career paths, essential skills, and New Zealand market demand.

Data Engineering

Data engineering is the discipline of designing, building, and maintaining the infrastructure and systems that enable organizations to collect, store, process, and analyze data at scale. Data engineers build the pipelines and platforms that transform raw data into reliable, accessible datasets for analysts, scientists, and business stakeholders.

What Data Engineers Do

At a high level, data engineers are responsible for:

  • Ingesting data from diverse sources (APIs, databases, event streams, files)
  • Transforming raw data into clean, structured formats
  • Orchestrating complex workflows that run reliably on schedule
  • Modeling data for efficient analytical querying
  • Ensuring data quality, lineage, and governance
  • Optimizing storage and compute for cost and performance
# A simplified view of a data engineer's daily work
pipeline_responsibilities = {
    "ingest":     "Pull data from 50+ sources into a data lake",
    "transform":  "Clean, deduplicate, and enrich raw records",
    "model":      "Build dimensional models for the analytics team",
    "orchestrate": "Schedule and monitor DAGs in Airflow",
    "quality":    "Run data validation checks before downstream use",
    "optimize":   "Tune Spark jobs to cut costs by 40%",
}

Data Engineering vs Software Engineering

While both roles write production-quality code, the focus differs significantly:

AspectSoftware EngineeringData Engineering
Primary outputApplications and servicesData pipelines and platforms
Data volumeTransactional (rows at a time)Analytical (millions to billions of rows)
State managementApplication state, sessionsHistorical data, slowly changing dimensions
Testing focusUnit tests, integration testsData quality tests, schema validation
Failure modesService outages, bugsSilent data corruption, stale data
Key toolsReact, Spring, RailsSpark, Airflow, dbt, SQL

Data engineers often come from software engineering backgrounds and bring strong coding practices to data systems.

Data Engineering vs Data Science

Data science and data engineering are complementary but distinct:

AspectData ScienceData Engineering
GoalExtract insights, build modelsBuild reliable data infrastructure
Key skillsStatistics, ML, experimentationDistributed systems, SQL, pipeline design
ToolsJupyter, scikit-learn, PyTorchSpark, Airflow, Kafka, dbt
OutputModels, dashboards, recommendationsClean datasets, APIs, pipelines
Data interactionReads and analyzes dataBuilds systems that produce data

A common saying: "Data scientists are only as good as the data they receive." Data engineers ensure that data is timely, accurate, and accessible.

The Modern Data Stack

The modern data stack has evolved rapidly. Here is a typical architecture:

Sources          Ingestion       Storage         Transform       Serve
─────────        ──────────      ──────────      ──────────      ──────────
APIs         →   Fivetran    →   Snowflake   →   dbt         →   Looker
Databases    →   Airbyte     →   BigQuery    →   Spark       →   Metabase
Event Streams→   Kafka       →   Databricks  →   Python      →   API Layer
Files        →   Custom      →   Delta Lake  →   SQL         →   Notebooks

Essential Skills for Data Engineers

Technical Skills

  1. SQL - The foundation of data engineering. Master window functions, CTEs, and query optimization.
-- Example: Common analytical SQL pattern
WITH daily_metrics AS (
    SELECT
        date_trunc('day', event_timestamp) AS event_date,
        user_id,
        COUNT(*) AS event_count,
        COUNT(DISTINCT session_id) AS session_count
    FROM raw_events
    WHERE event_timestamp >= CURRENT_DATE - INTERVAL '30 days'
    GROUP BY 1, 2
)
SELECT
    event_date,
    COUNT(DISTINCT user_id) AS active_users,
    SUM(event_count) AS total_events,
    AVG(session_count) AS avg_sessions_per_user
FROM daily_metrics
GROUP BY 1
ORDER BY 1;
  1. Python - For pipeline logic, API integrations, and Spark jobs.
  2. Distributed Systems - Understanding partitioning, replication, consistency.
  3. Cloud Platforms - AWS (S3, Glue, Redshift), GCP (BigQuery, Dataflow), Azure (Synapse, Data Factory).
  4. Orchestration - Airflow, Dagster, or Prefect for workflow management.
  5. Streaming - Kafka, Kinesis, or Flink for real-time data.

Soft Skills

  • Communication - Translating business requirements into technical designs
  • Problem decomposition - Breaking complex data flows into manageable steps
  • Operational mindset - Thinking about monitoring, alerting, and failure recovery

Career Path

A typical data engineering career progression:

Junior Data Engineer (0-2 years)
  → Write ETL scripts, maintain existing pipelines
  → Learn SQL deeply, understand data modeling basics

Data Engineer (2-5 years)
  → Design and build pipelines end-to-end
  → Optimize Spark jobs, implement data quality frameworks

Senior Data Engineer (5-8 years)
  → Architect data platforms, mentor juniors
  → Drive adoption of best practices, evaluate new tools

Staff / Principal Data Engineer (8+ years)
  → Set technical direction for the data org
  → Solve cross-team data challenges at scale

Engineering Manager / Data Architect
  → Lead teams, define data strategy
  → Bridge business and technical stakeholders

NZ Market Demand

Data engineering is one of the fastest-growing tech roles in New Zealand:

  • High demand: Companies across banking (ANZ, ASB, Westpac), telco (Spark, One NZ), government (Stats NZ, ACC), and tech (Xero, Trade Me) actively recruit data engineers.
  • Salary range: NZD $100,000 - $180,000+ for intermediate to senior roles in Auckland and Wellington (as of 2025).
  • Skills in demand: SQL, Python, Spark, cloud platforms (AWS/GCP/Azure), Airflow, dbt, and Snowflake/Databricks appear frequently in NZ job listings.
  • Remote-friendly: Many NZ organizations now offer hybrid or fully remote data engineering positions.
  • Growing ecosystem: The NZ data community is active with meetups in Auckland, Wellington, and Christchurch, and conferences like Data Engineering NZ.

Getting Started in NZ

If you are entering data engineering in New Zealand:

  1. Build a portfolio - Create public data pipelines on GitHub using open datasets (Stats NZ, data.govt.nz).
  2. Get cloud certified - AWS Data Analytics Specialty or GCP Professional Data Engineer certifications are valued.
  3. Learn the local stack - Snowflake and dbt are particularly popular among NZ companies.
  4. Network - Attend local data meetups and engage with the NZ tech community on LinkedIn.

What is Next

The following articles dive deep into core data engineering topics:

  • ETL & ELT - Data integration paradigms and tools
  • Data Modeling - Dimensional modeling, data vault, and schema design
  • Orchestration - Workflow management with Airflow, Dagster, and dbt
  • Apache Spark - Distributed data processing at scale
  • Lakehouse Architecture - Modern unified data platforms

Each topic builds on the fundamentals covered here and includes hands-on code examples you can adapt for your own projects.

On this page