Find A Job That
Fits Your Ambition.

Startup and scale-up jobs in the Greater Rotterdam Area
companies
Jobs

Senior Software Engineer — ML Data Platform

DuckDuckGoose

DuckDuckGoose

Software Engineering, Data Science
South Holland, Netherlands
Posted on Aug 16, 2025
Senior Software Engineer — ML Data Platform

Location: Delft (hybrid)

Type: Full-time

Start: ASAP

We protect citizens, enterprises, and governments from synthetic media fraud. Everything you see and hear online can now be manipulated — our job is to make sure people can trust what they see. As part of our forensics platform team, you’ll work on the data backbone that makes large-scale detection possible, from ingestion and versioning to training, evaluation, and production.

You’ll join a small, senior team where your work will have immediate impact, and you’ll have ownership over the systems you build.

What You’ll Drive
  • Data platform architecture: Define unified schemas, lineage, and dataset versioning for large image/video + context data.
  • Ingestion at scale: Build reliable pipelines from research repos, APIs, and internal generators; automate connectors and jobs.
  • Quality & governance: Implement deduplication, validation, health dashboards, and drift/coverage checks with auditable lineage.
  • Curation & access: Deliver one-command dataset builds, deterministic splits, and fast sampling tools for training/eval.
  • Performance & cost: Tune S3/object storage layouts, partitioning, and lifecycle policies for speed and spend.
  • Orchestration & ops: Productionize pipelines with CI/CD, containerization, scheduling/monitoring, and safe rollbacks.
  • Reliability & operations: Build for simplicity and observability; participate in a planned, compensated support rotation.
  • Engineering productivity: Create internal tools/CLIs, docs, and templates that make everyone faster.
Must haves
  • Strong software engineering foundation: Master’s in Computer Science, Data Engineering, or a related field.
  • Production experience: 5–8+ years building and operating data platforms for large unstructured datasets (images/video).
  • Data lifecycle ownership: Ingest → validate → catalog → version → sample/serve → monitor.
  • Pipelines & orchestration: Experience with modern schedulers (e.g., Airflow/Prefect) and containerized jobs.
  • Storage & formats: Hands-on with object storage (e.g., S3), columnar formats/partitioning, and performance tuning.
  • Versioning & lineage: Experience with dataset versioning and reproducibility (e.g., DVC/lakeFS/Delta or equivalents).
  • Quality at scale: Deduplication, schema/label checks, and automated QC gates in CI.
  • Security & privacy: IAM, access controls, and privacy-aware workflows suitable for regulated customers.
  • Domain awareness: Familiarity with digital forensics, misinformation threats, or synthetic media — and willingness to deepen expertise.
  • Flexibility: Comfortable moving between data engineering, infra, and tooling tasks when needed.
  • Mindset & delivery: Thrive in a fast-moving environment; proactive problem-solver; ship, measure, simplify.
  • Communication: Excellent written and verbal skills; explain complex ideas clearly.
  • Independence: Deliver quality work on time without constant oversight.
  • Language: Fluent in English.
Nice-to-haves
  • Streaming & events: Kafka/Kinesis or similar for near-real-time ingestion.
  • Vector search: Experience with embedding stores or similarity search at scale.
  • Synthetic data: Building pipelines to generate/stress-test rare scenarios.
  • Cloud & on-prem: Terraform/CDK, Kubernetes, and hybrid/on-prem data deployments.
  • FinOps: Cost monitoring and optimization for data workloads.
  • Technical track record: Strong GitHub, open-source contributions, publications, patents, or public talks.
  • Leadership: Mentoring and guiding technical direction.
  • Dutch language: Fluency is a plus.
Key Deliverables (First 90 Days)
  • A unified schema + catalog with key datasets onboarded, versioned, and reproducibly built via one command.
  • Automated QC gates (dedup/validation) with a red/amber/green dataset health dashboard and clear lineage.
  • Fast sampling/curation tools for the ML team, plus cost controls (storage layouts, lifecycle policies) in place.
  • Data migration: Inventory and migrate existing/legacy datasets into the new platform; reformat to the new schema, backfill metadata, validate checksums/lineage, and deprecate legacy paths with a rollback plan.
Compensation & benefits
  • Own the backbone: Define schemas, lineage, and dataset versioning used across research and production.
  • Company participation: Meaningful equity/virtual shares aligned with company growth.
  • Flexible work: Hybrid (Delft), flexible hours, minimal ceremony, async-first collaboration.
  • Data platform mandate: Real say in stack choices (orchestration, catalog, storage/layout) and time to implement them right.
  • Repro & auditability: Space to enforce deterministic builds, splits, and traceable lineage—no heroics needed.
  • Quality culture: Backing to implement dedup, drift/coverage checks, and dataset health dashboards org-wide.
  • FinOps mindset: Budget and support to balance speed, reliability, and total cost.
  • Pragmatic on-call: Planned, compensated rotation with automation-first recovery and rollback plans.
  • Growth path: IC track to Staff/Principal; opportunities to mentor and codify data standards.
  • Learning budget: Annual budget for courses/books + two data/ML-infra conferences per year.
  • Home office: Modest stipend for an ergonomic setup; commuting support (public transport or mileage).
  • Relocation + visa: Visa sponsorship and relocation support for internationals.

Join us and be part of a company committed to creating a more secure and trustworthy digital future. Apply today to become part of our mission-driven team!