The NDIF Engineering Fellowship
About
The National Deep Inference Fabric (NDIF) is awarding Summer Engineering Fellowships to participate in a high-priority summer project to develop and rapidly scale AI research infrastructure to keep pace with the precipitous increase in scale of state-of-the art AI.
NDIF is an NSF-funded project to create an innovative, highly-transparent large-scale AI inference infrastructure to enable scientific research on the largest open AI models. Read more about the NDIF project and team here, and read about the NNsight API architecture here. This twitter thread also provides some more context.
The NDIF Summer Engineering Fellowship is an intensive program for systems- and ML-focused students to contribute to an active state-of-the-art AI research engineering project in the public interest.
During your fellowship you will work under the supervision of a team of leading machine learning and software engineering systems professors. Your work in the fellowship will be essential for scaling up the innovative NDIF inference infrastructure to prepare the groundwork to enable interpretability, safety auditing, and transparent research at a scale that matches the pace of the ongoing increase in AI model complexity.
This program will be held in-person in Boston from May 18th to August 2nd. NDIF can provide housing accommodations for fellows throughout the duration of the fellowship upon request.
Fellowship Tracks
About the Role
We're seeking a skilled Full-Stack Developer to lead the development and maintenance of an AI interpretability toolbox, a user-facing application built on our proprietary software stack (NNsight and NDIF). This role offers an exciting opportunity to work at the intersection of machine learning infrastructure and web development.
Primary Responsibilities
- Own and drive the development of the AI interpretability toolbox from conception to deployment.
- Architect and implement full-stack solutions that leverage our NNsight and NDIF frameworks.
- Design and develop intuitive user interfaces that make complex ML visualization tools accessible.
- Build and maintain robust backend APIs to handle data processing and model interactions.
- Collaborate with the core platform team to ensure seamless integration with existing systems.
- Implement best practices for performance, testing, and code quality.
- Strong proficiency in Python and ML-adjacent software development
- Full stack web application development, modern JavaScript frameworks (React, Vue, or Angular)
- Backend API development expertise using Python frameworks (FastAPI or Flask), solid understanding of web technologies (HTML, CSS, REST APIs)
- Experience with version control systems (Git) and CI/CD pipelines.
- Current or recent PhD, Master’s, or Undergraduate student in computer science, data science, engineering, or related fields.
- Familiarity with machine learning concepts and visualization techniques.
- Experience with real-time data processing and visualization.
- Track record of owning and delivering complex technical projects.
- Strong communication skills and ability to work independently.
About the Role
We are seeking a highly motivated and independent Full-Stack Developer to lead critical initiatives in observability, telemetry, and automated testing for NDIF, a platform enabling cutting-edge research in AI. This role requires a hands-on developer who can design, implement, and optimize systems that improve system reliability, provide actionable insights, and streamline our development processes.
As part of your work, you’ll architect tools and frameworks that make debugging proactive, implement telemetry dashboards to track service usage and performance, and develop robust automated testing environments. This role demands someone who thrives in solving complex technical challenges with minimal supervision and can collaborate effectively across teams.
Primary Responsibilities
- Design and implement observability systems, including alerts and metrics, to monitor the health and performance of distributed services.
- Develop telemetry dashboards that track service usage, provide actionable insights, and enable predictive analysis.
- Architect and maintain a robust automated testing infrastructure to validate code changes, ensure CI/CD quality, and improve reliability.
- Debug and resolve complex distributed system issues (e.g., Ray connectivity, resource allocation strategies).
- Collaborate with the platform and research teams to integrate observability and telemetry features with NDIF’s existing tools.
- Ensure best practices for performance optimization, code quality, and test-driven development.
- Proficiency in Python and backend frameworks like FastAPI or Flask.
- Hands-on experience with distributed systems, observability tools (e.g., Prometheus, Grafana), and telemetry systems.
- Solid understanding of CI/CD pipelines, automated testing tools, and infrastructure as code.
- Expertise in modern JavaScript frameworks (React, Vue, or Angular) for building intuitive UIs.
- Proficiency in web technologies (HTML, CSS, REST APIs).
- Experience developing and maintaining APIs for real-time data processing.
- Proven ability to debug and optimize distributed systems with minimal guidance.
- Knowledge of SLURM or other resource schedulers.
- Strong communication skills and ability to work independently while aligning with team goals.
- Track record of owning and delivering complex projects from start to finish.
- Current or recent PhD, Master’s, or Undergraduate student in computer science, engineering, or related fields.
- Familiarity with machine learning concepts and distributed computing frameworks.
- Experience with setting up observability and telemetry pipelines for large-scale systems.
- Strong interest in building scalable tools for interdisciplinary research.
About the Role
We are seeking a motivated Outreach Fellow to build out an AI interpretability curriculum for NDIF. This unique role bridges communications, marketing, and technical artificial intelligence work, offering a valuable opportunity to gain experience across these fields
You will help further user adoption of our research computing platform by supporting NDIF's Outreach Team in developing tutorials, organizing coding bootcamps, and creating impactful social media content. Proficiency in Python and strong communication and writing skills are required. Website development experience is a plus.
Primary Responsibilities
- Develop tutorials for NNsight, NDIF’s AI interpretability library, using Python.
- Ensure tutorials are accessible to both technical and non-technical audiences.
- Create social media content highlighting key NDIF developments and features.
- Assist with writing press releases and news articles to showcase NDIF's advancements.
- Organize community-building events to drive NDIF adoption by researchers, such as live interpretability tutorials, user feedback sessions, and educational workshops.
- Compile and analyze data from social media and website engagement to inform and optimize social media strategies, improve SEO, and enhance the website interface.
- Proficiency in Python and machine learning frameworks like PyTorch.
- Hands-on experience in web technologies (i.e., HTML, CSS).
- Familiarity with large-language models and AI interpretability research and techniques.
- Experience with version control systems (Git).
- Ability to clearly communicate complex concepts to technical and non-technical audiences.
- Excellent writing skills to create tutorials, press releases, and marketing materials.
- Experience managing social media accounts and creating engaging marketing content.
- Familiarity with SEO practices and data-driven content optimization.
- Current or recent PhD, Master’s, or Undergraduate student in computer science, data science, engineering, or related fields.
- Strong foundation in machine learning concepts and enthusiasm for AI interpretability research.
- Prior experience creating educational materials or tutorials, particularly for technical audiences.
- Experience conducting workshops, coding bootcamps, or similar instructional events.
- Background in marketing, technical communications, or community outreach in a tech-focused organization.
- Excellent communication and writing skills and ability to work independently.
Eligibility
The NDIF engineering fellowship accepts applications from current and recent PhD, Masters, and Undergraduate students in computer science and related fields. The ideal applicant will have knowledge of machine learning systems, such as quantized and low-precision machine learning, ML compilers, and architectures for low-latency transformer inference.
Both systems-focused and machine-learning-focused applicants will be considered.
Timeline
Applications will be accepted on a rolling basis.
- Application Deadline: February 7, 2025
- Project Start Date: May 18, 2025
- Project End Date: August 2, 2025
How to apply
To apply, submit your contact information, CV, current transcript, and an explanation of your interest and potential contributions to the project using
the following forms:
Full Stack Tool Development Fellow
Site Reliability Engineering Fellow
Outreach Fellow
For any questions about the application process or the NDIF platform, please contact us at info@ndif.us.