Position: Data Scientist and Transformation Engineer
Location: Remote
Employment Type: Full-Time
Role Overview
The Data Scientist and Transformation Engineer will play a pivotal role in shaping Homefile’s data strategy. You’ll lead efforts to acquire, clean, and format data from diverse sources while ensuring that datasets are optimized for training and fine-tuning advanced AI models. This role combines data engineering, analysis, and machine learning, with a specific focus on preparing data for SLMs and LLMs that power Homefile’s AI assistant, Homie.
Key Responsibilities
Data Discovery and Acquisition
- Identify, evaluate, and acquire diverse data sources, including:
- Home maintenance costs and trends.
- Appliance warranties and product data.
- Home improvement and service provider metrics.
- Collaborate with internal and external stakeholders to integrate proprietary and third-party data into Homefile’s systems.
Data Transformation and Engineering
- Build and maintain pipelines to process, clean, and transform raw data into structured, scalable formats.
- Develop and optimize datasets specifically designed for training and fine-tuning SLMs and LLMs, ensuring compliance with model requirements.
- Implement techniques for formatting unstructured data into structured formats (e.g., tabular data, JSON, or knowledge graphs).
- Manage schema design and ensure data quality for use in machine learning pipelines and LLM APIs.
AI Model Preparation and Integration
- Curate high-quality training datasets for SLMs and LLMs, focusing on tasks such as:
- Contextual understanding of homeowner needs.
- Automation of maintenance recommendations.
- ROI-driven home improvement suggestions.
- Pre-process and annotate data for supervised learning or fine-tuning tasks.
- Collaborate with the AI team to integrate formatted data into Homefile’s Generative AI pipelines.
Data Science and Machine Learning
- Conduct exploratory data analysis to uncover trends, patterns, and actionable insights.
- Develop predictive models and algorithms to power features such as cost forecasting, improvement ROI analysis, and service recommendations.
- Continuously refine models for performance and scalability in production environments.
Collaboration and Impact
- Work closely with product managers, AI engineers, and backend developers to ensure data initiatives align with Homefile’s mission.
- Document and share best practices for data preparation, transformation, and integration with machine learning systems.
- Champion ethical and transparent data usage, ensuring user trust and compliance.
Qualifications
Education
Bachelor’s or Master’s degree in Data Science, Computer Science, or related fields, and or equivalent experience.
Experience
- 3+ years of experience in data engineering, data science, or machine learning.
- Proven ability to prepare data for use in training or fine-tuning SLMs or LLMs (e.g., GPT, BERT, T5)
Technical Skills
- Proficiency in Python and tools like Pandas, NumPy, and Scikit-learn.
- Experience with data processing frameworks (e.g., Apache Airflow, dbt).
- Strong understanding of data preprocessing techniques for LLMs, including tokenization and embedding generation.
- Knowledge of machine learning libraries and frameworks (e.g., TensorFlow, PyTorch).
- Familiarity with APIs for deploying LLMs (e.g., OpenAI, Hugging Face).
Preferred Skills
- Strong problem-solving and critical-thinking skills.
- Effective communication and collaboration with cross-functional teams.
- Experience with home management, property technology, or service provider platforms.
- Hands-on experience in MLOps tools for deploying and monitoring machine learning pipelines.
- Knowledge of data annotation tools and processes.
- Familiarity with multi-modal AI (text, images, and structured data).
Why Join Homefile?
- Purpose: Play a vital role in powering AI systems that redefine homeownership.
- Innovation: Be part of a team leveraging SLMs and LLMs to create smarter, data-driven solutions for homeowners.
- Growth: Join a growing company where your work directly impacts users and the platform’s evolution.
- Flexibility: Enjoy a hybrid/remote work environment, competitive salary, and comprehensive benefits.
Job Openings