Supreeth Rao

I'm Supreeth, a Senior Machine Learning Engineer at GrapheneAI. My work revolves around the intersection of Natural Language Processing, High-Performance Computing, Distributed Computing, and Agentic Systems. With experience in both academia and industry, I bridge the gap between research and practical ML applications. I've led multiple projects in the past, and I'm always looking for new challenges. Beyond tech, I'm passionate about all things food and coffee.

Get in Touch

Reading

AlphaChip

article • March 23, 2025 • By Anna Goldie, Azalia Mirhoseini

AlphaChip is groundbreaking work by DeepMind that has transformed the design of computer chips. It is a new approach to chip design that uses reinforcement learning to design chips. The results are impressive, with AlphaChip achieving state-of-the-art performance on a range of benchmarks. To me it closes the loop on the AI-chip-AI-chip-design cycle. It's a great example of how AI can be used to solve complex problems and push the boundaries of what is possible.

#ai #chip-design #deepmind

Picrotron

repository • March 23, 2025 • By Hugging Face

Picrotron provides a barebones implementation of Nvidia's Megatron-LM. It's a great read and a hackable starting point for anyone who wants to build their own language model.

#nlp #deep-learning #transformers #ai #distributed-systems #training

JAX-ML Scaling Book

book • March 23, 2025 • By JAX-ML

This book is a comprehensive guide to scaling training of models. It covers a wide range of topics, from hardware to software to model design, and provides a comprehensive overview of the current state of the art. It's a great read for anyone who wants to understand the challenges and solutions for scaling training of models. Its also got exercises and code examples to help you understand the concepts.

#nlp #deep-learning #transformers #ai #distributed-systems #training

DeepSeekAI - 3FS

repository • March 23, 2025 • By DeepSeekAI

3FS is a fast and scalable filesystem designed for AI workloads. It is a distributed filesystem that is optimized for the needs of AI workloads, such as high throughput, low latency, and high scalability. It is designed to be used with modern storage technologies, such as SSDs and NVMe drives. Delivers higher performance than existing filesystems for AI workloads, such as HDFS and S3.

#filesystems #storage #ai #distributed-systems

Stanford CS153: Infra at Scale

video • March 23, 2025 • By Mike Abbot, Anjey Midha

This video is a great overview of the challenges and solutions for scaling AI inference. It covers a wide range of topics, from hardware to software, and provides a comprehensive overview of the current state of the art. Mike Abbot, from Anthropic, talks about the challenges of scaling training and inference, How scaling modern languge models is a fascinating engineering problem

#nlp #deep-learning #transformers #ai #inference #distributed-systems

How to train a model on 10K H100 GPUs

article • March 23, 2025 • By Soumith Chintala

Provides a comprehensive overview of the challenges and solutions for scaling training a model on 10K H100 GPUs. Soumith Chintala, from Meta AI, provides a detailed overview of the challenges and solutions for scaling training a model on 10K H100 GPUs. From the hardware to the software, it's a great read for anyone who wants to understand the challenges and solutions for scaling training a model on a large cluster.

#nlp #deep-learning #transformers #ai #distributed-systems #training

View All Readings