Data Platform Engineer (Data Quality & AI Workflows)

Tritone Analytics, Inc

Tritone Analytics, Inc

Salary: Gross salary $400 - 600
Type: Full time

Tags: Python Git SQL CI/CD

Tritone Analytics is a music-technology startup building a forensic royalty auditing platform. We help artists, managers, and rights-holders identify unpaid or misreported royalties by combining deterministic financial analysis with AI-assisted workflows.

Our platform ingests real royalty statements from major distributors, labels, and publishers, normalizes them into a canonical analytical schema, and surfaces discrepancies through both deterministic audit checks and LLM-powered contract analysis.

You'll work directly on the data infrastructure that powers everything: the ingestion pipeline, schema normalization, data quality systems, and the preparation layer that feeds our AI workflows.

This is hands-on, production-grade work. Our platform processes 15M+ royalty rows across 30+ databases from 20+ label and publisher datasets. The data is real, messy, and financially consequential.

This will be a short-term internship with the intension to convert to a full-time position.

© getonbrd.com. All rights reserved.

What You'll Work On

Data ingestion & normalization

  • Extend our profile-based CSV detection system, to handle new sources and edge cases
  • Map inconsistent raw schemas to our canonical normalized schema, reconciling formats across a wide range of distributor and publisher statement types
  • Debug ingestion failures, detect schema variants, and write deterministic fix logic

Data quality & validation

  • Write SQL validation queries to catch normalization errors, equation mismatches, and financial inconsistencies
  • Build and extend audit checks that verify royalty calculations deterministically (e.g. rate × units = net, gross × participation = artist amount)
  • Profile and document data loss during transformation

AI pipeline support

  • Prepare and chunk contract documents (PDF/DOCX) for vector-based retrieval
  • Clean and structure inputs for LLM extraction workflows (contract term extraction, rate comparison, anomaly classification)
  • Help maintain the read-only SQL agent that answers financial queries against live analytical databases

Infrastructure & reliability

  • Write and maintain tests (4,000+ in the suite); new modules require 80%+ coverage
  • Improve error handling and logging in ingestion pipelines
  • Collaborate via GitHub: PRs, code review, and CI/CD workflows (tests, linting, type checking run on every PR)
  • Work within strict type-checking constraints (mypy strict mode throughout)

What You'll Need

Must have:

  • Strong Python: comfortable with real production data scripts, not just notebooks
  • Strong SQL: analytical queries, data validation, debugging financial discrepancies
  • Experience wrangling genuinely messy data — inconsistent headers, encoding issues, multi-format CSVs, missing columns
  • Ability to trace a data bug from raw file → transformation → output and explain what went wrong
  • Comfortable working with Git and GitHub: branching, PRs, code review, and CI/CD pipelines
  • Comfort reading and extending other people's code in a mid-size codebase
  • Comfortable using AI tools (e.g., ChatGPT, Claude, Copilot) as part of your development workflow—for debugging, data analysis, or accelerating implementation.

Nice to have:

  • DuckDB (our primary analytical engine — a real advantage)
  • Parquet / columnar formats (PyArrow, Polars)
  • Vector databases or RAG pipelines
  • Experience with financial or accounting data
  • Music industry domain knowledge (royalty statements, publishing vs master, mechanical vs performance)
  • pytest, mypy, type-annotated Python

Who Thrives Here

You enjoy finding the bug in a pipeline by reading data, not just logs. You're comfortable with "the schema changed again" as a normal day. You care about correctness — in this domain, a normalization error means an artist doesn't get paid. You write tests because you've been burned before, not because someone asked you to.

We're a small team moving fast on real problems in a notoriously opaque industry.

Source: GetOnBoard | Main Category: Other