Resume

Jaskirat Singh Sudan

M.S. in Artificial Intelligence ('26) · University of Michigan

Representation Learning Audio Deepfake Detection Speech + Vision ML

GitHub · LinkedIn · Google Scholar · YouTube

Interested in what signals encode about the physical world and how to build models that learn that structure robustly.

MS in AI from the University of Michigan, researching audio deepfake detection at the ISSF Lab under Prof. Hafiz Malik. My work focuses on the latent representations of speech produced by SSL models: how similarity function choice and negative scaling shape the geometry of those representations, and what that geometry reveals about the boundary between real and synthetic voice. I use supervised contrastive learning to reshape the embedding space of large speech encoders (XLS-R, WavLM, HuBERT) for deepfake detection, and I am interested in what different layers encode about the physical structure of speech.

My background spans representation learning, contrastive and self-supervised methods, and signal processing. On the computer vision side, I have worked on semantic segmentation, behavioral cloning agents, and Siamese networks.

Research

Thesis — Contrastive Audio Deepfake Detection

Master's Thesis Aug 2025–Present Representation Learning

Supervised Contrastive Learning with Cross-Batch Negatives for Deepfake Audio Detection

A controlled study of what actually drives out-of-distribution generalization in SSL-based deepfake detection. Over a frozen XLS-R-300M, a contrastive projection head (Stage 1) feeds a linear classifier (Stage 2); I sweep similarity geometry (cosine vs. geodesic), negative-queue size, and temperature to isolate each factor's effect. 0.25% EER on ASVspoof 2019, 8.3% on In-the-Wild.

0.25% EER · ASVspoof 2019 8.3% EER · In-the-Wild 6.6% EER· In-the-Wild (pooled training)

Thesis → Presentation → Poster → Code →

SLIM — two-stage SSL deepfake detection replication

Oct 2025 Self-Supervised Learning

Speaker-Specific SLIM Replication (From-Scratch)

A from-scratch rebuild of a two-stage SSL deepfake detector (Wav2Vec2 / WavLM): Stage 1 compresses style and linguistic structure, Stage 2 classifies with waveform augmentation. Adds per-speaker embedding diagnostics and UMAP/t-SNE views that expose entity-level failures hidden by aggregate EER.

Code: Private (ISSF Lab)

ViKey — visible-light backscatter authentication

IEEE MASS 2025 Oct 2025 Co-Primary Author Published

ViKey: Secure Door Access Control Using Passive Visible Light Tags

The first passive visible-light tag for door access: layered transparent tapes exploit polarized birefringence to produce 3D, position-dependent color patterns that can't be cloned. A real-time CV pipeline (SIFT → FLANN → RANSAC, no deep learning) hits 90.5% accuracy at 0.5m, <100ms latency, for a $0.20 tag.

90.5% accuracy @ 0.5m <100ms latency $0.20 tag cost

Paper → Code →

Projects

Jun 2026 · Audio ML · Web App

Truvox: Audio Deepfake Detection

A deployed web app that flags AI-generated speech using a spectrogram autoencoder: audio that reconstructs poorly is scored as synthetic. Accepts file uploads, batches, and YouTube links, renders the spectrogram, and produces a plain-language AI summary of each verdict.

Demo →

Apr 2026 · Anomaly Detection · Audio ML

Spectrogram Autoencoder for Zero-Shot Deepfake Detection

An autoencoder trained only on real speech flags deepfakes by reconstruction error, synthetic audio reconstructs poorly. Compares CNN vs. ViT-Tiny backbones and two spectrogram masks. Best config: 15.5% EER on ASVspoof 2019, 13.1% on In-the-Wild, fully zero-shot.

Code →

Apr 2025 · Representation Learning

Contrastive Learning with Siamese Network

A Siamese network maps MNIST into a 128-D space where same-class pairs cluster (99% pair accuracy). Includes an interactive GUI for few-shot classification and 3D PCA exploration of the learned embeddings.

Code → Demo →

Apr 2025 · Behavioral Cloning

Self-Driving Mario Kart (CNN-LSTM)

End-to-end behavioral cloning: a CNN-LSTM learns to drive from 50K frame-action pairs, combining spatial features with temporal memory. Runs closed-loop at ≤60ms latency, hitting 94% action accuracy and completing full laps autonomously.

Code → Slides →

Nov 2024 · Transfer Learning

Low-Light Semantic Segmentation

Tackles the day-to-night domain gap on BDD100K by pretraining on daytime then fine-tuning on night. Lifting an Xception U-Net from 0.08 to 0.90 Dice. Benchmarked against MobileNetV2 U-Net to map the accuracy/compute tradeoff.

Code → Slides →

Dec 2024 · Diffusion / Multimodal

Speech→Image with Latent Diffusion

A spoken prompt becomes an image: Whisper transcribes speech into a DreamBooth-fine-tuned Stable Diffusion v2, with prior-preservation loss to keep general priors intact while learning a custom subject.

Code → Talk → Demo →

Feb 2025 · Image Segmentation

Segment Anything Desktop GUI

A local Tkinter interface for Meta's SAM (ViT-H/L/B) with point, box, and text prompts and real-time mask preview, full interactive segmentation on consumer hardware, no cloud.

Code →

Feb 2024 · Representation Learning

Scalable Convolutional Autoencoder (SCA)

An autoencoder you can reshape live: encoder/decoder depth and latent dimension are adjustable at runtime, with a GUI showing reconstructions update as you change the bottleneck, a hands-on tool for building intuition about compression.

Code → Demo →

2023 · Computer Vision + Astronomy

Star-Based Navigation

Estimates latitude/longitude from a photo of the night sky by matching it against a star catalog. Combines template matching, SIFT/ORB homography, and a grid search for robustness to rotation and scale.

Code →

2023 · Computer Vision + HCI

Air-Draw (Gesture-Controlled Whiteboard)

A virtual whiteboard you draw on with your hand, MediaPipe tracks finger gestures and OpenCV renders the strokes, enabling live annotation in video calls with no keyboard or mouse.

Code → Demo →

Experience

ISSF Lab, University of Michigan

Aug 2025 – Present, MI

Research Assistant — Speech ML / Audio Anti-Spoofing

Advisor: Prof. Hafiz Malik

–Developing MS thesis on contrastive representation learning for audio deepfake detection: controlled study of cosine vs. geodesic similarity and cross-batch negative scaling over frozen XLS-R-300M. Best results: 0.25% EER (ASVspoof 2019 eval), 8.3% EER (In-the-Wild); 0.8% / 6.6% on ASVspoof 19 and In-the-Wild respectively with training across ASVspoof 2019, ASVspoof 5, and MLAAD.
–Contributed to platform-scale deepfake detection in direct collaboration with Google, analyzing speech of flagged public figures to inform content removal on YouTube.

TAI Lab, University of Michigan

Dec 2024 – Aug 2025, MI

Research Assistant — Computer Vision & ML Systems

Advisor: Prof. Xiao Zhang

–Co-primary author on ViKey, published at IEEE MASS 2025: a $0.20 optical authentication system using polarized birefringence patterns as unclonable physical keys, eliminating cloning and replay attack surfaces.
–Designed the real-time computer vision pipeline distinguishing genuine tag signatures from environmental reflections using per-channel SIFT, FLANN matching, RANSAC geometric verification, and temporal consistency checks — achieving 90.5% accuracy at 0.5m with <100ms latency and no deep learning.

Indian Institute of Technology, Indore

Jan 2023 – May 2023 · Indore, India

Research Assistant — Signal Processing + ML

Advisor: Prof. Narendranath Patra

–Built a rotation- and scale-invariant star-catalog navigation system using DNNs and normalized cross-correlation, addressing a known fragility of template-matching approaches.
–Reduced RFI contamination in GMRT radio telescope interferometric data using K-Means clustering for unsupervised anomaly flagging.

Publications

arXiv 2025

Similarity Choice and Negative Scaling in Supervised Contrastive Learning for Deepfake Audio Detection

Jaskirat Sudan, Hashim Ali, Surya Subramani, Hafiz Malik

A controlled study of how similarity-function choice and negative scaling drive out-of-distribution generalization in SSL-based audio deepfake detection. A two-stage pipeline over a frozen Wav2Vec2-XLS-R-300M backbone, supervised contrastive projection head, then linear classifier. Achieves 0.25% EER on ASVspoof 2019 and 8.3% EER on In-the-Wild under single-dataset training.

Paper →

IEEE MASS 2025

ViKey: Secure Door Access Control Using Passive Visible Light Tags

Jaskirat Sudan, Fatima Qasem, Hasky E Fynn, Fatima Mohammed, Ashwin Sarvadey, Tian Xie, Ang Li, Xiao Zhang

Low-cost, privacy-preserving door access control via visible-light backscatter. Polarized birefringence tags create 3D, position-dependent color patterns as unclonable optical keys. The <$0.20 prototype achieves 90.5% authentication accuracy at 0.5m while eliminating cloning, replay, and privacy attack surfaces.

Paper → Code →

IEEE MASS 2025

Demo: A Passive Optical Tagging Approach for Secure and Revocable Entry Systems

Hasky Fynn, Jaskirat Sudan, Fatima Qasem, Fatima Mohammed, Xiao Zhang

Companion demo paper presenting the live ViKey system, the first visible light backscatter-based door access control system using polarized birefringence to generate 3D position-dependent color patterns as keys, enabling robust and contactless authentication.

Paper →

IIETA 2024

A Review of EEG Artifact Removal Methods for Brain-Computer Interface Applications

Safdar Sardar Khan*, Jaskirat Singh Sudan*, Anuj Pathak*, Rakesh Pandit, Pinky Rane, Ashish Kumar Kumawat (* equal contribution)

Systematic survey of 25+ EEG denoising methods across statistical, ICA-based, wavelet-based, and DL-based approaches for BCI applications. Hybrid ICA + wavelet pipelines consistently outperform single-method approaches for preserving neural signal quality in real-time pipelines.

Paper →

Blog

Mar 2023 · Astronomy

Radio Astronomy

How radio telescopes see what eyes can't, revealing pulsars, black holes, and interstellar gas to map the invisible universe.

Read →

Nov 2023 · History of Computing

Most Important Algorithm That Averted a Nuclear Conflict

How an early-warning algorithm and crucial human judgment prevented a Cold War false alarm from escalating into catastrophe.

Read →