About Journey Publications Projects Skills Contact

Updated Dec 28, 2025 Resume

Boston, MA • chaitanyachakkavsk@gmail.com

Chaitanya Chakka

Multimodal ML • Vision-Language • Systems

I work at the intersection of multimodal research and production-grade engineering, building systems that are interpretable, benchmarked, and fast.

View Projects Contact

focus.txt — bash

~$ cat focus.txt

• multimodal semantic misalignment

• attention interpretability + evaluation

• fast, interactive ML tooling

• production systems (APIs, gateways, observability)

~$ echo "Let's build."

Let's build.

Scroll

About

I build AI systems that work in the real world, from research to production.

Currently exploring multimodal ML and how vision-language models process information.

Shipped systems at scale: 1500+ TPS APIs, real-time viz with millions of points, and containerized ML services.

Journey

My path through education and industry.

💼Work

Jun 2025 – Aug 2025

Data Scientist Intern

Prospect33 • New York, NY

Built low-latency WebGL/React visualization rendering up to 8M+ points (Deck.gl + GPU buffers), sustaining 60 FPS.
Designed an active-learning, human-in-the-loop labeling loop surfacing top 1% informative points.
Added GPU-accelerated embeddings (PCA, t-SNE, UMAP, autoencoder) to boost downstream model F1.

Deck.glEmbeddings

🎓Education

Sep 2024 – May 2026

MS in Artificial Intelligence

Boston University • Boston, MA

GPA: 3.96/4.0Multimodal ML, Vision-Language Models, NLP

💼Work

Jun 2023 – Jun 2024

Software Development Engineer

Cashfree Payments • Bangalore, India

Spearheaded Kong API Gateway integration with Golang wrappers across 30 teams, serving ~1500 TPS daily.
Shipped custom Lua plugins and deployed on Kubernetes with PostgreSQL backend.
Introduced Twilio for WhatsApp/SMS routing, improving delivery rates by ~20%.

🎓Education

Aug 2019 – May 2023

BS in Computer Science

Birla Institue of Technology and Science, Pilani, Hyderabad Campus • Hyderabad, India

GPA: 8.96/10Computer Science foundations, NLP, Data Science

Publications

Research on multimodal learning, vision-language models, and NLP

Under Review

Some Modalities Are More Equal Than Others: Decoding and Architecting Multimodal Integration in MLLMs

Built MMA-Bench: controlled audio–video–text semantic misalignment benchmark.
Interpretability pipeline combining black-box tests + white-box attention statistics.
LoRA finetuning improved modality-specific accuracy by 20–40%.

CVPR 2025

Improving Prompt Alignment in Vision Language Models: A Self-Learning Framework for Generative Models

Fine-tuned diffusion models (SD, SDXL, Flux, OmniGen) with gains on Object-State-Bench.
Synthetic data pipeline for absent/empty object states using LLM/LVLM recaptioning.

ICAART 2024

Social Implications of OCEAN Personality: An Automated BERT-Based Approach

This paper presents an automated approach that uses BERT and psycholinguistic features to accurately predict the Big Five (OCEAN) personality traits from textual data by combining multiple datasets.
It also empirically investigates how social factors like age, gender, profession, and zodiac sign relate to personality variations.

Projects

Featured builds. Full list includes experiments and long-tail work.

View all →

May 2025 – Jul 2025

Layer-Residual Co-Attention Networks for VQA

Built a multimodal VQA system combining ResNet-152 visual features with GloVe+LSTM text encoding. Implemented layer-wise residual co-attention mechanism achieving ~60% accuracy on VQAv2 benchmark.

MultimodalVQADeep LearningResNetLSTM

Oct 2024 – Dec 2024

Optimizing LLM Question Generation for Conversational QA

Developed a 3-module pipeline for generating contextual follow-up questions with iterative correctness checks. Fine-tuned on 26k immigration QA pairs, improving ROUGE scores over baseline.

NLPLLMsQuestion Answering

Skills & Technologies

Tools and technologies I use to build AI systems

Languages

ML Frameworks

Infrastructure

ML & GenAI

Multimodal LearningLLMsLoRADiffusionPEFTPrompting

Vision & Multimodal

Computer VisionVQAVLNCLIPCross-Modal AttentionImage/Video

Let's Connect

Always open to discussing multimodal research, ML systems, or interesting collaboration opportunities.

Socials

LinkedIn Instagram Email

Research & Code

Google Scholar GitHub

📍 Boston, MA

CChaitanya

About Journey Publications Projects Skills Contact

Updated Dec 28, 2025 Resume

Boston, MA • chaitanyachakkavsk@gmail.com

Chaitanya Chakka

Multimodal ML • Vision-Language • Systems

I work at the intersection of multimodal research and production-grade engineering, building systems that are interpretable, benchmarked, and fast.

View Projects Contact

focus.txt — bash

~$ cat focus.txt

• multimodal semantic misalignment

• attention interpretability + evaluation

• fast, interactive ML tooling

• production systems (APIs, gateways, observability)

~$ echo "Let's build."

Let's build.

Scroll

About

I build AI systems that work in the real world, from research to production.

Currently exploring multimodal ML and how vision-language models process information.

Shipped systems at scale: 1500+ TPS APIs, real-time viz with millions of points, and containerized ML services.

Journey

My path through education and industry.

💼Work

Jun 2025 – Aug 2025

Data Scientist Intern

Prospect33 • New York, NY

Built low-latency WebGL/React visualization rendering up to 8M+ points (Deck.gl + GPU buffers), sustaining 60 FPS.
Designed an active-learning, human-in-the-loop labeling loop surfacing top 1% informative points.
Added GPU-accelerated embeddings (PCA, t-SNE, UMAP, autoencoder) to boost downstream model F1.

Deck.glEmbeddings

🎓Education

Sep 2024 – May 2026

MS in Artificial Intelligence

Boston University • Boston, MA

GPA: 3.96/4.0Multimodal ML, Vision-Language Models, NLP

💼Work

Jun 2023 – Jun 2024

Software Development Engineer

Cashfree Payments • Bangalore, India

Spearheaded Kong API Gateway integration with Golang wrappers across 30 teams, serving ~1500 TPS daily.
Shipped custom Lua plugins and deployed on Kubernetes with PostgreSQL backend.
Introduced Twilio for WhatsApp/SMS routing, improving delivery rates by ~20%.

🎓Education

Aug 2019 – May 2023

BS in Computer Science

Birla Institue of Technology and Science, Pilani, Hyderabad Campus • Hyderabad, India

GPA: 8.96/10Computer Science foundations, NLP, Data Science

Publications

Research on multimodal learning, vision-language models, and NLP

Under Review

Some Modalities Are More Equal Than Others: Decoding and Architecting Multimodal Integration in MLLMs

Built MMA-Bench: controlled audio–video–text semantic misalignment benchmark.
Interpretability pipeline combining black-box tests + white-box attention statistics.
LoRA finetuning improved modality-specific accuracy by 20–40%.

CVPR 2025

Improving Prompt Alignment in Vision Language Models: A Self-Learning Framework for Generative Models

Fine-tuned diffusion models (SD, SDXL, Flux, OmniGen) with gains on Object-State-Bench.
Synthetic data pipeline for absent/empty object states using LLM/LVLM recaptioning.

ICAART 2024

Social Implications of OCEAN Personality: An Automated BERT-Based Approach

This paper presents an automated approach that uses BERT and psycholinguistic features to accurately predict the Big Five (OCEAN) personality traits from textual data by combining multiple datasets.
It also empirically investigates how social factors like age, gender, profession, and zodiac sign relate to personality variations.

Projects

Featured builds. Full list includes experiments and long-tail work.

View all →

May 2025 – Jul 2025

Layer-Residual Co-Attention Networks for VQA

Built a multimodal VQA system combining ResNet-152 visual features with GloVe+LSTM text encoding. Implemented layer-wise residual co-attention mechanism achieving ~60% accuracy on VQAv2 benchmark.

MultimodalVQADeep LearningResNetLSTM

Oct 2024 – Dec 2024

Optimizing LLM Question Generation for Conversational QA

Developed a 3-module pipeline for generating contextual follow-up questions with iterative correctness checks. Fine-tuned on 26k immigration QA pairs, improving ROUGE scores over baseline.

NLPLLMsQuestion Answering

Skills & Technologies

Tools and technologies I use to build AI systems

Languages

ML Frameworks

Infrastructure

ML & GenAI

Multimodal LearningLLMsLoRADiffusionPEFTPrompting

Vision & Multimodal

Computer VisionVQAVLNCLIPCross-Modal AttentionImage/Video

Let's Connect

Always open to discussing multimodal research, ML systems, or interesting collaboration opportunities.

Socials

LinkedIn Instagram Email

Research & Code

Google Scholar GitHub

📍 Boston, MA

Chaitanya ChakkaChaitanya ChakkaChaitanya Chakka

About

Journey

Data Scientist Intern

MS in Artificial Intelligence

Software Development Engineer

BS in Computer Science

Publications

Some Modalities Are More Equal Than Others: Decoding and Architecting Multimodal Integration in MLLMs

Improving Prompt Alignment in Vision Language Models: A Self-Learning Framework for Generative Models

Social Implications of OCEAN Personality: An Automated BERT-Based Approach

Projects

Skills & Technologies

Languages

ML Frameworks

Infrastructure

ML & GenAI

Vision & Multimodal

Let's Connect

Chaitanya ChakkaChaitanya ChakkaChaitanya Chakka

About

Journey

Data Scientist Intern

MS in Artificial Intelligence

Software Development Engineer

BS in Computer Science

Publications

Some Modalities Are More Equal Than Others: Decoding and Architecting Multimodal Integration in MLLMs

Improving Prompt Alignment in Vision Language Models: A Self-Learning Framework for Generative Models

Social Implications of OCEAN Personality: An Automated BERT-Based Approach

Projects

Skills & Technologies

Languages

ML Frameworks

Infrastructure

ML & GenAI

Vision & Multimodal

Let's Connect

Chaitanya Chakka

Chaitanya Chakka