Senior Data Engineer – AI Data Infrastructure

Vequity

Vequity

Salary: Gross salary $3500 - 5200
Type: Tiempo completo

Tags: Python SQL JSON Cloud Computing

Vequity is building the world’s most robust, contextualized buyer intelligence network for investment banks, private equity firms, and strategic acquirers. Our platform currently houses over 1.5 million buyer profiles with approximately 100 structured and inferred data fields per profile. We leverage proprietary AI agents to continuously enrich, infer, and structure buyer intelligence at scale. As a Senior Data Engineer, you will own the architecture, quality, and scalability of our data ecosystem—from ingestion and cleaning to inference and output generation. You will partner with AI, product, and engineering teams to deliver data APIs and feeds that power our platform's decision-support capabilities. Your work will directly impact data reliability, operational efficiency, and the precision of buyer attributes used across our customers.

Apply to this job opportunity at getonbrd.com.

Key Responsibilities

Multi-Source Data Architecture

  • Work with systems handling multiple write paths: external providers, LLM hygiene agents, and customer-claimed edits
  • Define standards for data versioning, lineage, and observability across pipelines


Entity Lifecycle & Master Data Management

  • Handle entity lifecycle complexity: mergers, acquisitions, spin-offs, rebranding, and temporal relationship changes
  • Design entity resolution systems using deterministic blocking (fuzzy matching, location) combined with LLM-based evaluation for match decisions
  • Build confidence scoring models and surface low-confidence cases for human review

Machine Learning & Matching Systems

  • Work with embeddings infrastructure: vector generation, retrieval optimization, and quality measurement
  • Optimize semantic search pipelines including embedding strategies, namespace design, and reranking
  • Establish evaluation frameworks to measure model performance against human judgment

Collaboration & Team Development

  • Educate and mentor the engineering team on data best practices, patterns, and common pitfalls
  • Lead continuous improvement of the data infrastructure roadmap

Relationship & Graph Modeling

  • Design data models for complex relationships: parent/subsidiary hierarchies, PE firm → portfolio company chains
  • Evaluate and implement graph query capabilities (Apache AGE, Neo4j, or optimized Postgres patterns) for relationship traversal that semantic search cannot address

Data Quality, Testing & Operations

  • Build quality-control layers including confidence scoring, human-in-the-loop validation, and automated anomaly detection
  • Implement testing strategies including data contracts, pipeline unit tests, and integration testing
  • Build proactive monitoring, alerting, and runbooks for data health issues
  • Ensure compliance with data governance, privacy, and security standards

Description

  • 5+ years in data engineering with strong Python (Pydantic a bonus), SQL, and cloud data stacks (including GCP)
  • Experience with orchestration frameworks (Airflow, Dagster, Prefect) and/or data platforms (Databricks)
  • Experience designing or integrating AI/LLM agents for data enrichment with structured AI → JSON → database pipelines including error recovery and monitoring
  • Understanding of embedding-based retrieval
  • Excellent communication and cross-team collaboration skills

Desirable

  • Prior experience with Machine Learning algorithms / semantic search
  • Prior experience with entity resolution or master data management — you understand why matching company records is fundamentally hard
  • Familiarity with graph databases or graph query patterns (Neo4j, Apache AGE, recursive CTEs) for complex entity relationships
  • Experience with event sourcing or append-only architectures for audit trails and data replay
  • Background in investment data, market intelligence, or deal sourcing platforms
  • Familiarity with agent orchestration tools (LangChain, LlamaIndex) and data quality frameworks (dbt, Great Expectations)
  • Experience as an early/first data hire at a startup
  • Experience designing or integrating AI/LLM agents for data enrichment with structured AI → JSON → database pipelines including error recovery and monitoring
  • Understanding of prompt engineering, MCP Servers, function calling, and embedding-based retrieval

Benefits

Competitive compensation and Paid Time Off (PTO).

Fully remote You can work from anywhere in the world.

Source: GetOnBoard | Main Category: Other