Sumer Shinalkar - Portfolio

Intro

Hi, I'm Sumer. I am an M.Tech Data Science student at COEP and an AWS-Trained Data Engineer. I have hands-on experience architecting scalable ETL pipelines and real-time data systems using Kafka, Spark, Delta Lake, dbt, DuckDB, and AWS Cloud. With a proven ability to optimize query performance and build robust architectures, I am currently seeking opportunities to leverage my skills in Big Data Engineering.

Education

M.Tech, Data Science | COEP Technological University [CGPA: 8.0/10]
B.E, Computer Science | Savitribai Phule Pune University [CGPA: 8.5/10]

Technical Skills

Big Data: Apache Spark, Kafka, Delta Lake, DuckDB
Cloud (AWS): Lambda, S3, Glue, EventBridge, IAM
Languages: Python (Pandas, PySpark), SQL (CTEs, Window Functions)
DevOps: Docker, Airflow, dbt, Git

Download Resume

Projects

1. BidLock: Distributed Real-Time Bidding Engine

Click to view img

Tech Stack: Kafka, Spark Structured Streaming, Redis, Docker.
Key Achievements: Architected a negotiation platform handling concurrent user bids with sub-millisecond latency using Redis for state. Engineered a Spark Structured Streaming pipeline to process real-time events, enforcing business logic validation. Eliminated race conditions during simultaneous bidding by implementing Optimistic Locking in Redis.

2. ShopPulse: Real-Time E-Commerce Data Lakehouse

Click to view img

Tech Stack: AWS S3, Delta Lake, PySpark.
Key Achievements: Built a scalable ETL pipeline ingesting high-velocity clickstream logs via Kafka and Spark Streaming into a Delta Lake for ACID-compliant analytics. Achieved 40% faster query performance on high-velocity data by implementing storage optimization techniques.

3. Modern Data Warehouse Pipeline

Click to view img

Tech Stack: dbt, DuckDB, ELT, Data Modeling.
Key Achievements: Developed a modular ELT pipeline using DuckDB for high-performance local compute and dbt for SQL-based transformations and testing. Reduced bad data ingestion by 90% by automating rigorous data quality tests directly within the dbt workflow.

About

I am a dedicated Data Engineer driven by a profound interest in transforming data into actionable insights. I excel in Data Modeling, Database Design, and ETL processes, leveraging modern tools to manipulate and analyze complex datasets.

Experience

Data Analytics Intern | Ceres Canopus Pvt Ltd | Pune, India

Optimized complex SQL queries for client reporting, reducing data retrieval time by 40% to accelerate decisions.
Automated manual daily workflows using Python scripts (Pandas), saving the team 5+ hours weekly.
Collaborated with engineering teams to validate data integrity across source systems, ensuring 100% accuracy.

Certifications & Publications

Publications

FinSight: A Hybrid Real-time Multi-Source Framework for Stock Price Prediction (IEEE ESCI 2025 - Accepted)
Stock Price Prediction Using LSTM: Int. Journal of Research Publication & Review (IJRPR)

Certifications

AWS Cloud Practitioner (Udemy)
AWS Data Engineering Essentials (Udemy)
Data Analytics Professional Certificate (Google)
AWS Cloud Practitioner Essentials (Amazon Web Services (AWS))
AWS Data Engineering Foundations (Amazon Web Services (AWS))
Responsible & Safe AI (NPTEL - IIT Hyderabad)

Contact

Email: shinalkarsumer@gmail.com
Phone: +91 8390302819