COMET Internship Blog – Large InSAR dataset for training foundational machine learning models

COMET Internship Blog – Large InSAR dataset for training foundational machine learning models

Summer 2025

Name of internship student: Araav Bhardwaj (University of Leeds)

Project title: Large InSAR dataset for training foundational machine learning models

Supervisors: Robert Popescu and Dr Nantheera Anantrasirichai (University of Bristol)

Bio: I am Araav Bhardwaj, a Computer Science student at the University of Leeds with a strong interest in machine learning, data science, and Earth observation technologies. Having lived across India, Egypt, and Saudi Arabia, I bring a global perspective and adaptability to problem-solving. My academic and professional journey has been marked by hands-on experience across AI-driven projects, web development, and applied data science. I have worked on projects ranging from AI-based tuberculosis detection to data-engineered census analysis, combining technical precision with real-world impact. At COMET, I collaborated to create and refine a large-scale InSAR dataset for geohazard monitoring. My passion lies in leveraging AI and remote sensing to solve global challenges related to climate, health, and sustainability. Beyond academics, I actively engage in research, sports, and technology communities, striving to build innovative solutions that contribute to both science and society. 

Project: During my internship with COMET, I developed a large-scale dataset from Sentinel-1 InSAR satellite imagery to advance research in geohazard monitoring and Earth observation. Using data from the COMET LiCS system, I curated and processed millions of interferograms to create a structured, high-quality dataset suitable for AI applications. Inspired by Meta’s DINOv2 model, I designed a data pipeline focused on deduplication and image retrieval to enhance dataset quality and consistency. The project aimed to facilitate self-supervised machine learning approaches for detecting surface changes such as volcanic deformation and land displacement. I also tested the dataset using a self-supervised image encoder, demonstrating its potential for improving global hazard detection. This project bridges remote sensing and AI, contributing valuable resources for future scientific research in geophysics and climate resilience. 

Outputs: The internship produced a comprehensive InSAR dataset curated from the COMET LiCS system, covering multiple volcanoes and geographic regions. I developed an automated pipeline that processes, filters, and organizes Sentinel-1 imagery into a machine-learning-ready format. The pipeline integrates deduplication and retrieval mechanisms inspired by self-supervised learning techniques, ensuring data diversity and minimizing redundancy. Preliminary experiments with a self-supervised encoder validated the dataset’s potential for geohazard classification and surface deformation analysis. The project outputs include a documented data preparation workflow, sample encoded representations for image retrieval, and an internal report highlighting potential downstream research opportunities. The dataset and methods can be used to accelerate the development of AI models for Earth observation, supporting COMET’s ongoing mission to monitor and understand geological hazards. 

The overall experience: My internship at COMET, was an incredibly rewarding experience that deepened my understanding of Earth observation and machine learning. Working with large-scale Sentinel-1 satellite data challenged me to combine programming, geospatial analysis, and AI to create tangible research tools. I particularly enjoyed the autonomy and trust given to design my own data pipeline, which strengthened my problem-solving and research skills. Collaborating with experienced scientists and engineers broadened my technical perspective and introduced me to the practical challenges of working with real-world satellite data. The internship not only reinforced my passion for AI in environmental applications but also inspired me to pursue future research in remote sensing and self-supervised learning. Overall, the experience was intellectually stimulating, collaborative, and transformative for my academic and career goals.