Anurag Mishra

I am a researcher focused on understanding how machine learning models work internally. My primary focus is mechanistic interpretability: tracing transformer circuits, testing causal hypotheses, and studying the internal representations and computations that drive model behavior.

I completed an M.S. in Artificial Intelligence at Rochester Institute of Technology in May 2025. My work spans mechanistic interpretability, LLM evaluation and reliability, multimodal deepfake detection, explainable AI, and statistical ML.

Portrait of Anurag Mishra

News

Mar 2026

CVPR 2026 — Served as a reviewer for the AIMS 2026 workshop on AI-generated media and security.

Aug 2025

Joined Eaton Ventures (Rochester Appliances) as an AI Engineer.

May 2025

Completed an M.S. in Artificial Intelligence at Rochester Institute of Technology.

2023-2026

Published research across IEEE, arXiv, and other venues in mechanistic interpretability, translation explainability, time-series foundation models, and applied machine learning.

May 2023

Completed a B.Tech in Computer Science & Engineering at Sikkim Manipal Institute of Technology.

Research Interests

Interpretability

Mechanistic interpretability and causal circuit analysis. Activation tracing, attention-head profiling, and intervention-based analysis to identify task-relevant transformer mechanisms and measure causal contribution to final behavior.

Reliability

LLM evaluation, reliability, and hallucination reduction. Benchmark-driven evaluation across frontier model families with retrieval-augmented prompting, stress tests, and regression checks for reasoning and factual consistency.

ML Systems

Performance-aware AI systems. Latency and throughput optimization, profiling, asynchronous orchestration, and runtime regression debugging for production model-serving systems.

Multimodal

Multimodal detection and robustness. CNN-transformer architectures for lip-sync manipulation detection, synthetic data generation, and out-of-distribution evaluation under realistic deployment constraints.

Publications

Dissecting Chronos: Sparse Autoencoders Reveal Causal Feature Hierarchies in Time Series Foundation Models

Anurag Mishra

1st ICLR Workshop on Time Series in the Age of Large Models, 2026

Automatic Short Answer Grading Using a LSTM Based Approach

Udit Kr Chakraborty, Anurag Mishra

2023 IEEE World Conference on Applied Intelligence and Computing (AIC), 332-337, 2023

Mechanistic Interpretability of GPT-like Models on Summarization Tasks

Anurag Mishra

arXiv preprint arXiv:2505.17073, 2025

Advancing Explainability in Neural Machine Translation: Analytical Metrics for Attention and Alignment Consistency

Anurag Mishra

arXiv preprint arXiv:2412.18669, 2024

Analyzing the Impact of Climate Change With Major Emphasis on Pollution: A Comparative Study of ML and Statistical Models in Time Series Data

Anurag Mishra, Ronen Gold, Sanjeev Vijayakumar

arXiv preprint arXiv:2405.15835, 2024

Research Showcase

Conference and research posters from recent work.

ICLR 2026 · Sole Author

Dissecting Chronos

Capstone Project

Mechanistic Interpretability of GPT-like Models on Summarization Tasks

Projects

2025

Causal Mechanisms of Backtracking in Reasoning Models

Mechanistic interpretability · Causal ablations

Studied backtracking behavior in reasoning models and identified causally important components through targeted interventions, measuring how backtracking dynamics correlate with answer correctness under controlled perturbations.

2024-2025

DeFake: Multimodal Lip-Sync Manipulation Detection

PyTorch · C++/CUDA · Multimodal systems

Built CNN + transformer detection pipelines over 18,000+ videos, with custom C++/CUDA tensor kernels, synthetic data expansion, and robustness testing under deployment-like conditions.

2024-2025

LLM Evaluation and Robustness Harness

Frontier model evaluation · Reliability benchmarking

Designed benchmark workflows to evaluate reasoning, factual consistency, tool-calling behavior, and latency-quality trade-offs, with regression checks and retrieval-augmented prompting strategies that reduced hallucination on factual QA tasks.

2024

Explainable Neural Machine Translation

Attention analysis · Interpretability metrics

Built analytical metrics for attention and alignment consistency in sequence-to-sequence translation models, focusing on interpretable behavior diagnostics for generation workflows.

Experience

Industry

Aug 2025 - Present

AI Engineer

Eaton Ventures (Rochester Appliances)

  • Architected and deployed production AI services supporting 10K+ daily requests at 99.9% uptime.
  • Reduced end-to-end API latency by 35% through request-path and model-serving optimization.
  • Implemented CI/CD validation and regression pipelines that reduced release-cycle time by 60%.

Research

Jan 2025 - May 2025

Machine Learning Research Assistant, Mechanistic Interpretability

Rochester Institute of Technology

  • Built activation tracing tooling in PyTorch/C++ for layer-wise transformer analysis.
  • Identified summarization-relevant circuits in layers 2, 3, and 5.
  • Applied targeted LoRA interventions with 40% faster convergence and 75% fewer trainable parameters.
Oct 2024 - Aug 2025

Machine Learning Research Assistant, LLM Evaluation and Robustness

RIT Office of the Provost

  • Designed evaluation harnesses for frontier models across reasoning, factuality, and tool calling.
  • Developed retrieval-augmented prompting workflows that reduced hallucination by 35%.
  • Built reproducible benchmark suites for quality and runtime regression tracking.
Oct 2024 - Aug 2025

Machine Learning Research Assistant, DeFake Project

Rochester Institute of Technology

  • Developed multimodal CNN + transformer systems on 18,000+ videos for lip-sync manipulation detection.
  • Implemented custom C++/CUDA tensor kernels that improved training efficiency by 40%.
  • Expanded stress-test data with 50K+ synthetic samples and improved OOD generalization by 18%.

Education

2023 - 2025

M.S. in Artificial Intelligence

Rochester Institute of Technology

CGPA: 3.89/4.00.

Research emphasis: Mechanistic Interpretability, NLP, Multimodal Imaging, and Statistical Machine Learning.

2019 - 2023

B.Tech in Computer Science & Engineering

Sikkim Manipal Institute of Technology

Minor in Artificial Intelligence. AI/ML Lead at Google Developer Student Club. Research intern at DRDO and NLP intern across multiple startups.

Writing

External

Substack

Essays and notes on mechanistic interpretability, AI reliability, and ML systems engineering.

Current public writing is published externally rather than hosted as on-site blog posts.