Intro
Hi, I'm Sumer. I am an M.Tech Data Science student at COEP and an AWS-Trained Data Engineer. I have hands-on experience architecting scalable ETL pipelines and real-time data systems using Kafka, Spark, Delta Lake, dbt, DuckDB, and AWS Cloud. With a proven ability to optimize query performance and build robust architectures, I am currently seeking opportunities to leverage my skills in Big Data Engineering.
Education
- M.Tech, Data Science | COEP Technological University [CGPA: 8.0/10]
- B.E, Computer Science | Savitribai Phule Pune University [CGPA: 8.5/10]
Technical Skills
- Big Data: Apache Spark, Kafka, Delta Lake, DuckDB
- Cloud (AWS): Lambda, S3, Glue, EventBridge, IAM
- Languages: Python (Pandas, PySpark), SQL (CTEs, Window Functions)
- DevOps: Docker, Airflow, dbt, Git
Projects
1. BidLock: Distributed Real-Time Bidding Engine
Tech Stack: Kafka, Spark Structured Streaming, Redis, Docker.
Key Achievements: Architected a negotiation platform handling concurrent user bids with sub-millisecond latency using Redis for state. Engineered a Spark Structured Streaming pipeline to process real-time events, enforcing business logic validation. Eliminated race conditions during simultaneous bidding by implementing Optimistic Locking in Redis.
2. ShopPulse: Real-Time E-Commerce Data Lakehouse
Tech Stack: AWS S3, Delta Lake, PySpark.
Key Achievements: Built a scalable ETL pipeline ingesting high-velocity clickstream logs via Kafka and Spark Streaming into a Delta Lake for ACID-compliant analytics. Achieved 40% faster query performance on high-velocity data by implementing storage optimization techniques.
3. Modern Data Warehouse Pipeline
Tech Stack: dbt, DuckDB, ELT, Data Modeling.
Key Achievements: Developed a modular ELT pipeline using DuckDB for high-performance local compute and dbt for SQL-based transformations and testing. Reduced bad data ingestion by 90% by automating rigorous data quality tests directly within the dbt workflow.
About
I am a dedicated Data Engineer driven by a profound interest in transforming data into actionable insights. I excel in Data Modeling, Database Design, and ETL processes, leveraging modern tools to manipulate and analyze complex datasets.
Experience
Data Analytics Intern | Ceres Canopus Pvt Ltd | Pune, India
- Optimized complex SQL queries for client reporting, reducing data retrieval time by 40% to accelerate decisions.
- Automated manual daily workflows using Python scripts (Pandas), saving the team 5+ hours weekly.
- Collaborated with engineering teams to validate data integrity across source systems, ensuring 100% accuracy.
Certifications & Publications
Publications
- FinSight: A Hybrid Real-time Multi-Source Framework for Stock Price Prediction (IEEE ESCI 2025 - Accepted)
- Stock Price Prediction Using LSTM: Int. Journal of Research Publication & Review (IJRPR)
Certifications
- AWS Cloud Practitioner (Udemy)
- AWS Data Engineering Essentials (Udemy)
- Data Analytics Professional Certificate (Google)
- AWS Cloud Practitioner Essentials (Amazon Web Services (AWS))
- AWS Data Engineering Foundations (Amazon Web Services (AWS))
- Responsible & Safe AI (NPTEL - IIT Hyderabad)
Contact
Email: shinalkarsumer@gmail.com
Phone: +91 8390302819