Senior Data Engineer – AI Data Infrastructure

Vequity

Salary: Gross salary $3500 - 5200
Type: Tiempo completo

Tags: Python SQL JSON Cloud Computing

Vequity is building the world’s most robust, contextualized buyer intelligence network for investment banks, private equity firms, and strategic acquirers. Our platform currently houses over 1.5 million buyer profiles with approximately 100 structured and inferred data fields per profile. We leverage proprietary AI agents to continuously enrich, infer, and structure buyer intelligence at scale. As a Senior Data Engineer, you will own the architecture, quality, and scalability of our data ecosystem—from ingestion and cleaning to inference and output generation. You will partner with AI, product, and engineering teams to deliver data APIs and feeds that power our platform's decision-support capabilities. Your work will directly impact data reliability, operational efficiency, and the precision of buyer attributes used across our customers.

Apply to this job opportunity at getonbrd.com.

Key Responsibilities

Multi-Source Data Architecture

Work with systems handling multiple write paths: external providers, LLM hygiene agents, and customer-claimed edits
Define standards for data versioning, lineage, and observability across pipelines

Entity Lifecycle & Master Data Management

Handle entity lifecycle complexity: mergers, acquisitions, spin-offs, rebranding, and temporal relationship changes
Design entity resolution systems using deterministic blocking (fuzzy matching, location) combined with LLM-based evaluation for match decisions
Build confidence scoring models and surface low-confidence cases for human review

Machine Learning & Matching Systems

Work with embeddings infrastructure: vector generation, retrieval optimization, and quality measurement
Optimize semantic search pipelines including embedding strategies, namespace design, and reranking
Establish evaluation frameworks to measure model performance against human judgment

Collaboration & Team Development

Educate and mentor the engineering team on data best practices, patterns, and common pitfalls
Lead continuous improvement of the data infrastructure roadmap

Relationship & Graph Modeling

Design data models for complex relationships: parent/subsidiary hierarchies, PE firm → portfolio company chains
Evaluate and implement graph query capabilities (Apache AGE, Neo4j, or optimized Postgres patterns) for relationship traversal that semantic search cannot address

Data Quality, Testing & Operations

Build quality-control layers including confidence scoring, human-in-the-loop validation, and automated anomaly detection
Implement testing strategies including data contracts, pipeline unit tests, and integration testing
Build proactive monitoring, alerting, and runbooks for data health issues
Ensure compliance with data governance, privacy, and security standards

Description

5+ years in data engineering with strong Python (Pydantic a bonus), SQL, and cloud data stacks (including GCP)
Experience with orchestration frameworks (Airflow, Dagster, Prefect) and/or data platforms (Databricks)
Experience designing or integrating AI/LLM agents for data enrichment with structured AI → JSON → database pipelines including error recovery and monitoring
Understanding of embedding-based retrieval
Excellent communication and cross-team collaboration skills

Desirable

Prior experience with Machine Learning algorithms / semantic search
Prior experience with entity resolution or master data management — you understand why matching company records is fundamentally hard
Familiarity with graph databases or graph query patterns (Neo4j, Apache AGE, recursive CTEs) for complex entity relationships
Experience with event sourcing or append-only architectures for audit trails and data replay
Background in investment data, market intelligence, or deal sourcing platforms
Familiarity with agent orchestration tools (LangChain, LlamaIndex) and data quality frameworks (dbt, Great Expectations)
Experience as an early/first data hire at a startup
Experience designing or integrating AI/LLM agents for data enrichment with structured AI → JSON → database pipelines including error recovery and monitoring
Understanding of prompt engineering, MCP Servers, function calling, and embedding-based retrieval

Benefits

Competitive compensation and Paid Time Off (PTO).

Fully remote You can work from anywhere in the world.

POSTULAR VÍA WEB

Source: GetOnBoard | Main Category: Other