Skip to main content

Job Openings

Position: Data Scientist and Transformation Engineer
Location: Remote
Employment Type: Full-Time

Role Overview

The Data Scientist and Transformation Engineer will play a pivotal role in shaping Homefile’s data strategy. You’ll lead efforts to acquire, clean, and format data from diverse sources while ensuring that datasets are optimized for training and fine-tuning advanced AI models. This role combines data engineering, analysis, and machine learning, with a specific focus on preparing data for SLMs and LLMs that power Homefile’s AI assistant, Homie.

Key Responsibilities

Data Discovery and Acquisition

  • Identify, evaluate, and acquire diverse data sources, including:
    • Home maintenance costs and trends.
    • Appliance warranties and product data.
    • Home improvement and service provider metrics.
  • Collaborate with internal and external stakeholders to integrate proprietary and third-party data into Homefile’s systems.

Data Transformation and Engineering

  • Build and maintain pipelines to process, clean, and transform raw data into structured, scalable formats.
  • Develop and optimize datasets specifically designed for training and fine-tuning SLMs and LLMs, ensuring compliance with model requirements.
  • Implement techniques for formatting unstructured data into structured formats (e.g., tabular data, JSON, or knowledge graphs).
  • Manage schema design and ensure data quality for use in machine learning pipelines and LLM APIs.

AI Model Preparation and Integration

  • Curate high-quality training datasets for SLMs and LLMs, focusing on tasks such as:
    • Contextual understanding of homeowner needs.
    • Automation of maintenance recommendations.
    • ROI-driven home improvement suggestions.
  • Pre-process and annotate data for supervised learning or fine-tuning tasks.
  • Collaborate with the AI team to integrate formatted data into Homefile’s Generative AI pipelines.

Data Science and Machine Learning

  • Conduct exploratory data analysis to uncover trends, patterns, and actionable insights.
  • Develop predictive models and algorithms to power features such as cost forecasting, improvement ROI analysis, and service recommendations.
  • Continuously refine models for performance and scalability in production environments.

Collaboration and Impact

  • Work closely with product managers, AI engineers, and backend developers to ensure data initiatives align with Homefile’s mission.
  • Document and share best practices for data preparation, transformation, and integration with machine learning systems.
  • Champion ethical and transparent data usage, ensuring user trust and compliance.
Qualifications

Education

Bachelor’s or Master’s degree in Data Science, Computer Science, or related fields, and or equivalent experience.

Experience

  • 3+ years of experience in data engineering, data science, or machine learning.
  • Proven ability to prepare data for use in training or fine-tuning SLMs or LLMs (e.g., GPT, BERT, T5)

Technical Skills

  • Proficiency in Python and tools like Pandas, NumPy, and Scikit-learn.
  • Experience with data processing frameworks (e.g., Apache Airflow, dbt).
  • Strong understanding of data preprocessing techniques for LLMs, including tokenization and embedding generation.
  • Knowledge of machine learning libraries and frameworks (e.g., TensorFlow, PyTorch).
  • Familiarity with APIs for deploying LLMs (e.g., OpenAI, Hugging Face).

Preferred Skills

  • Strong problem-solving and critical-thinking skills.
  • Effective communication and collaboration with cross-functional teams.
  • Experience with home management, property technology, or service provider platforms.
  • Hands-on experience in MLOps tools for deploying and monitoring machine learning pipelines.
  • Knowledge of data annotation tools and processes.
  • Familiarity with multi-modal AI (text, images, and structured data).

Why Join Homefile?

  • Purpose: Play a vital role in powering AI systems that redefine homeownership.
  • Innovation: Be part of a team leveraging SLMs and LLMs to create smarter, data-driven solutions for homeowners.
  • Growth: Join a growing company where your work directly impacts users and the platform’s evolution.
  • Flexibility: Enjoy a hybrid/remote work environment, competitive salary, and comprehensive benefits.

Job Openings