I'm an Masters student at the Centre for Visual Information & Technology (CVIT), IIIT Hyderabad.

My research mainly focuses on Computer Vision particularly Video Understanding and Vision-Language Models. I have previously completed my Bachelors of Science (BS) in Data Science & Engineering from IISER Bhopal.

News

  • 2026 🎉 Our paper “RoadTones: Tone Controllable Text Generation from Road Event Videos” was accepted to the CVPR 2026 Findings track!
  • 2025 📌 Virtually Presented a poster on image tampering detection at the CV4Science Workshop, CVPR 2025.

Publications

CVPR 2026 · Findings Track

RoadTones: Tone Controllable Text Generation from Road Event Videos

Chirag Parikh, Siddhi Pravin Lipare, Dr. Ravi Kiran Sarvadevabhatla

A dataset-model-evaluation stack designed for tone-controllable text generation for road event videos.

CVPR 2025 · CV4Science Workshop | ICML 2026 · WiML Workshop

Improving Image Tampering Detection: A Hybrid Approach Using Frequency and Spatial Domain Features

Siddhi Pravin Lipare, Vishesh Kumar, Dr. Akshay Agarwal

Leveraged the use of Discrete Wavelet Transform along with a sub-band attention module to capture frequency-spatial cues to enable precise localization of tampered regions in an image.

Experience

Applied Research Fellow

CVIT, IIIT Hyderabad

May 2025 – Dec 2025
  • Designed RoadTones, a framework for tone-controllable road-video captioning (accepted at CVPR 2026 Findings).
  • Curated RoadTones-51k dataset via a scalable data generation pipeline.
  • Built an evaluation stack with new metrics for evaluating factual consistency and tone alignment, benchmarking 10 state-of-the-art Video-Language Models.

Research Intern

MiRL, IIT Madras

May 2024 – Jul 2024
  • Explored super-resolution architectures for 3D medical imaging to enhance z-axis resolution across CT and MRI scans.
  • Applied GANs, diffusion models, and NeRF to improve diagnostic clarity.

Education

MS by Research, Computer Science & Engineering

IIIT Hyderabad — CVIT

Jan 2026 – Present

Bachelors of Science (BS) in Data Science & Engineering

IISER Bhopal

2021 – 2025

Projects

VLLM Inferencing for Long Videos

CVIT, IIIT Hyderabad · 2026

Investigated the VLLM Codebase to ensure zero to minimal quality degradation in video inference.

RoadTones

CVIT · IIIT Hyderabad · 2025

Built a dataset-model-evaluation stack to enable tone-controllable video captioning (accepted at CVPR Findings 2026).

Image Tampering Detector

Bachelor's Thesis · TBVL, IISER Bhopal · 2025

Integrated Discrete Wavelet Transform with a sub-band attention layer over an existing transformer architecture backbone (SAFIRE) for higher accuracy within fewer training epochs.

Monocular Depth Estimation in Dim Light Road Scenes

IISER Bhopal · 2024

Added denoising and deblurring layers ahead of a DepthAnything-V2 backbone to enhance depth estimation in noisy, blurry night-time footage.

3D Medical Super-Resolution

MiRL, IIT Madras · 2024

Investigated GANs, Diffusion, and NeRF-based super-resolution methods to enhance z-axis resolution in CT and MRI volumes for sharper diagnostics.

ICU Mortality Prediction

IISER Bhopal · 2023

Binary-classification pipeline using classical supervised ML to flag critical mortality indicators for heart-failure patients admitted in ICU.

Honors & Awards

  • ISRO Robotics Challenge 2024 — Top 30 teams out of 400 nationwide.
  • IEEE CAS Student Design Competition 2022 — 3rd place, Asia–Pacific region.

Technical Skills

Languages
PythonC/C++JavaScriptMATLABRSQL
Frameworks
PyTorchTensorFlowKerasOpenCVHuggingFace
Tools
DockerLinuxGitFirebaseLaTeX