Data Science · AI/ML · Data Engineering

Moulya Reddygari Bhupal

Data Engineer | AI/ML Enthusiast | Building Agentic AI Systems

Data Science graduate with hands-on experience building scalable data pipelines, ETL workflows, and AI/ML systems. Skilled in Python, SQL, Spark, Airflow, and cloud platforms (GCP, Azure, AWS)—turning raw multi-source data into actionable insights and shipping solutions cross-functionally.

View projects Get in touch

About

I am a data-focused engineer passionate about building intelligent systems that combine rigorous data analysis with modern AI. I thrive on real-world, messy datasets and on designing end-to-end pipelines that turn raw inputs into decisions people can trust.

I care about clarity, reproducibility, and collaboration—whether that means solid data engineering, thoughtful modeling, or agentic workflows that make analysis faster and more reliable.

Education

M.S., Data Science

University of North Texas

2024 – 2026 · GPA 4.0 / 4.0

B.S., Computer Science

Sree Vidyanikethan Engineering College

2019 – 2023 · GPA 3.4 / 4.0

Professional experience

Packaged App Development Associate

Accenture · Bangalore, India · Jul 2023 – Jul 2024

Spring Boot, Node.js microservices, React, and cloud deployment with CI/CD.

Built Spring Boot and Node.js microservices on GCP for 10+ client applications; improved API latency through tuned request/response flows and caching.
Built responsive UIs with React to improve usability of web-based applications.
Implemented cloud-native microservices with CI/CD on Azure and GCP for secure deployment and reliability across distributed environments, enabling regular low-risk AI feature releases.
Structured and optimized RESTful APIs with Node.js for microservice communication and system performance.
Used MongoDB for data management and Docker for containerization to streamline storage and deployment.

Technical skills

Languages

Python SQL

Data

Pandas NumPy ETL/ELT Apache Spark Feature engineering Time-series forecasting A/B testing

Visualization

Plotly Matplotlib Tableau Streamlit

Databases

PostgreSQL MySQL MongoDB ChromaDB

AI & ML

LLaMA 3 Groq LangChain LangGraph TensorFlow Keras scikit-learn XGBoost LightGBM Vector databases Prompt engineering LLM applications

Tools

Git Docker Airflow Jenkins CI/CD

Cloud

Azure GCP AWS Google Cloud Scheduler

Projects

Featured project

🤖 Agentic Data Analyst AI

Personal project

Automates end-to-end data analysis from raw datasets to insights using AI-driven reasoning.

Built an AI-powered data analysis system to automate data cleaning, exploratory data analysis (EDA), and insight generation from raw datasets.
Implemented an agentic workflow to detect missing values, remove duplicates, and handle inconsistent data.
Developed a natural language interface for querying datasets and generating structured insights.
Designed automated visualization pipelines for trends, distributions, and correlations using Plotly.
Integrated LLaMA 3 via Groq API for AI-driven reasoning and insight generation.
Deployed the application using Streamlit for real-time, interactive analysis.

Tech stack: Python · Pandas · NumPy · Plotly · Streamlit · Groq (LLaMA 3)

GitHub Live demo

RAG-Based Interview Agent

Research project · Advisor: Dr. Clifford Whitworth, University of North Texas · Mar 2026 – May 2026

Production-grade Retrieval-Augmented Generation system with modular, role-based architecture for grounded interview workflows.

Built with LangChain, LangGraph, and ChromaDB; top-k semantic retrieval with context injection and a hallucination guard for grounded responses.
Scalable document ingestion with duplicate detection, recursive chunking (100–300 words), metadata tagging, and embedding-based indexing.
LangGraph interview assistant with multi-stage prompts for question generation, answer evaluation, and structured responses.
Source-grounded outputs with structured JSON and citation support for traceability.

Tools used: Python · LangChain · LangGraph · ChromaDB · Groq · LLMs · Streamlit · vector databases

GitHub

Energy Consumption Forecasting (Smart Grids)

Personal project · Sep 2025 – Dec 2025

Time-series forecasting on 1M+ records with 34 features using predictive modeling for smart-grid energy use.

Achieved R² 0.969, RMSE 94.33, MAE 66.03 with LightGBM after extensive feature engineering and model comparison.
Engineered 20+ features: cyclical encoding, lags, rolling statistics, weather indicators (HDD/CDD).
Compared Linear, Ridge, Random Forest, XGBoost, LightGBM, ANN, with ensemble improvements; time-series split validation to prevent leakage.
Custom ETL for multi-source data, KNN imputation, and IQR-based outlier handling.

Tools used: Python · Pandas · scikit-learn · XGBoost · LightGBM · time-series forecasting · feature engineering

GitHub

Taxi Fare Prediction System

Personal project · Aug 2025 – Dec 2025

ML pipeline on the NYC TLC dataset to predict fares from trip-level and engineered distance and temporal features.

Compared Linear Regression, Decision Tree, and Random Forest; feature engineering drove the largest gains.
Preprocessing: missing-value imputation and outlier removal for higher data quality.
Streamlit app for real-time fare prediction and interactive visualization.

Tools used: Python · Pandas · scikit-learn · Streamlit · feature engineering

GitHub

GlobeScope — Global Data Analytics Dashboard

Personal project · May 2025 – Jul 2025

End-to-end analytics platform ingesting data from APIs (e.g., World Bank, Google Trends) and visualizing real-time metrics across 150+ countries.

Designed Python/SQL pipelines for automated ingestion, transformation, and storage in PostgreSQL with strong data-quality controls.
Built interactive Tableau and Streamlit dashboards for regional analysis, metric selection, and trends—with query times under about 2 seconds; evaluated Apache Spark for large-scale multi-country ingestion.
Automated refresh with Airflow and Google Cloud Scheduler, cutting manual reporting effort by roughly 60%.

Tools used: Python · SQL · Pandas · PostgreSQL · Streamlit · Airflow · Tableau · Google Cloud Scheduler

GitHub

Game of Tiles — AI Agent for 2048

Personal project · Feb 2025 – Apr 2025

Reinforcement-style game agent that learns to play 2048 from experience, using reward feedback to improve policy over time.

Trained a deep neural network to replace a human player from game state, improving through continued gameplay.
Outperformed rule-based baselines by about 35%; reached scores of 65,000+ and the 2048 tile within ~25 minutes of learning.

Tools used: Python · TensorFlow · Keras · Pandas · NumPy

GitHub

Catering Service Database

Advisor: Dr. Sahara Ali, College of Information, University of North Texas · Oct 2024 – Dec 2024

Relational database for an event-management catering workflow—food supply chain, events, staff, and payroll—with less redundancy and clearer operations.

MySQL schema and processes that reduced data redundancy by about 35% and manual record-keeping effort by about 40%.
Led a team of four with on-time delivery; schema designed with scalability as data volume grows.
Visualization and analysis for stakeholders; earned Best Presentation Award and a professor recommendation letter.

Tools used: MySQL · Python

GitHub

AI-Powered E-Commerce Product Recommendation System

Advisor: Prof. Narendra Kumar Rao, Sree Vidyanikethan Engineering College · Dec 2022 – Apr 2023

Hybrid recommender combining collaborative filtering with deep learning for personalized product suggestions on simulated e-commerce data.

Matrix factorization plus neural components; ~25% higher recommendation accuracy on a 10,000-user synthetic dataset vs. baselines.
Published in IEEE ICRTDA 2024 proceedings; presented at NCKITS 2023 (Tirupati, India).

Tools used: Python · Pandas · NumPy · TensorFlow · Keras · scikit-learn

GitHub

Contact

I'm open to Data Engineering, Data Science, and AI/ML roles, with a focus on building scalable data pipelines and AI agents.

Send a message

Or reach me directly

moulyarb02@gmail.com LinkedIn GitHub