Intro

Sumer Shinalkar

Hi, I'm Sumer. I am an M.Tech Data Science student at COEP and an AWS-Trained Data Engineer. I have hands-on experience architecting scalable ETL pipelines and real-time data systems using Kafka, Spark, Delta Lake, dbt, DuckDB, and AWS Cloud. With a proven ability to optimize query performance and build robust architectures, I am currently seeking opportunities to leverage my skills in Big Data Engineering.

Education

  • M.Tech, Data Science | COEP Technological University [CGPA: 8.0/10]
  • B.E, Computer Science | Savitribai Phule Pune University [CGPA: 8.5/10]

Technical Skills

  • Big Data: Apache Spark, Kafka, Delta Lake, DuckDB
  • Cloud (AWS): Lambda, S3, Glue, EventBridge, IAM
  • Languages: Python (Pandas, PySpark), SQL (CTEs, Window Functions)
  • DevOps: Docker, Airflow, dbt, Git

Projects

1. BidLock: Distributed Real-Time Bidding Engine

BidLock Architecture
Click to view img

Tech Stack: Kafka, Spark Structured Streaming, Redis, Docker.
Key Achievements: Architected a negotiation platform handling concurrent user bids with sub-millisecond latency using Redis for state. Engineered a Spark Structured Streaming pipeline to process real-time events, enforcing business logic validation. Eliminated race conditions during simultaneous bidding by implementing Optimistic Locking in Redis.


2. ShopPulse: Real-Time E-Commerce Data Lakehouse

ShopPulse Architecture
Click to view img

Tech Stack: AWS S3, Delta Lake, PySpark.
Key Achievements: Built a scalable ETL pipeline ingesting high-velocity clickstream logs via Kafka and Spark Streaming into a Delta Lake for ACID-compliant analytics. Achieved 40% faster query performance on high-velocity data by implementing storage optimization techniques.


3. Modern Data Warehouse Pipeline

Warehouse Architecture
Click to view img

Tech Stack: dbt, DuckDB, ELT, Data Modeling.
Key Achievements: Developed a modular ELT pipeline using DuckDB for high-performance local compute and dbt for SQL-based transformations and testing. Reduced bad data ingestion by 90% by automating rigorous data quality tests directly within the dbt workflow.

About

I am a dedicated Data Engineer driven by a profound interest in transforming data into actionable insights. I excel in Data Modeling, Database Design, and ETL processes, leveraging modern tools to manipulate and analyze complex datasets.

Experience

Data Analytics Intern | Ceres Canopus Pvt Ltd | Pune, India

  • Optimized complex SQL queries for client reporting, reducing data retrieval time by 40% to accelerate decisions.
  • Automated manual daily workflows using Python scripts (Pandas), saving the team 5+ hours weekly.
  • Collaborated with engineering teams to validate data integrity across source systems, ensuring 100% accuracy.

Certifications & Publications

Publications

  • FinSight: A Hybrid Real-time Multi-Source Framework for Stock Price Prediction (IEEE ESCI 2025 - Accepted)
  • Stock Price Prediction Using LSTM: Int. Journal of Research Publication & Review (IJRPR)

Certifications

  • AWS Cloud Practitioner (Udemy)
  • AWS Data Engineering Essentials (Udemy)
  • Data Analytics Professional Certificate (Google)
  • AWS Cloud Practitioner Essentials (Amazon Web Services (AWS))
  • AWS Data Engineering Foundations (Amazon Web Services (AWS))
  • Responsible & Safe AI (NPTEL - IIT Hyderabad)