Building real-time data pipelines, intelligent ETL workflows, and scalable AI-powered solutions with 3+ years of hands-on experience
- π Data Engineer specializing in Big Data, Cloud ETL, and Automation
- π± Expertise in Apache Spark, Kafka, AWS, Airflow, Selenium, GenAI & RAG Systems
- π‘ Passionate about real-time streaming analytics, web automation, and intelligent data pipelines
- π― Built production-grade systems for fraud detection, RAG pipelines, and ETL automation
- π« Reach me: vicky0x07@gmail.com
- β‘ Fun fact: I automate everything - from data pipelines to web scraping!
Production-ready automation tool for downloading complete O'Reilly courses with automatic organization
- π₯ Video + Transcript extraction with Selenium automation
- π Headless mode with Chrome DevTools Protocol
- π Smart chapter-based organization
- β‘ 10x faster transcript-only mode
- π Resume capability for interrupted downloads
Tech: Python, Selenium, FFmpeg, Chrome DevTools Protocol
Currently working on exciting data engineering and automation projects. Stay tuned!
class NikeshChavhan:
def __init__(self):
self.name = "Nikesh Chavhan"
self.role = "Data Engineer"
self.location = "Nagpur, India"
self.experience = "3+ years"
def get_skills(self):
return {
"big_data": ["Apache Spark", "Kafka", "Flink", "AWS Kinesis"],
"cloud_etl": ["AWS (S3, Lambda, Redshift, EMR, Glue)", "Airflow", "DBT"],
"automation": ["Selenium", "Puppeteer", "BeautifulSoup", "Scrapy"],
"genai_rag": ["LLMs (GPT, LLaMA)", "RAG Pipelines", "Prompt Engineering"],
"ml_analytics": ["XGBoost", "LightGBM", "scikit-learn", "Pandas"],
"devops": ["Docker", "Kubernetes", "Terraform", "GitHub Actions"],
"monitoring": ["Prometheus", "Grafana", "ELK Stack"],
"databases": ["Snowflake", "Redshift", "PostgreSQL", "MongoDB"]
}
def current_focus(self):
return [
"π₯ Real-time data pipelines with Spark Streaming",
"π€ Building production-grade RAG systems",
"π Web automation with Selenium/Puppeteer",
"βοΈ Auto-scaling ETL pipelines on AWS",
"π Streaming analytics with Kafka + Redshift"
]- β Real-Time Data Pipelines: Kafka + Spark Streaming for sub-second processing
- β Web Automation & Scraping: Selenium, Puppeteer for intelligent data extraction
- β GenAI & RAG Systems: LLM-powered pipelines with vector search + generation
- β Cloud ETL: AWS (S3, Lambda, Glue, Redshift, EMR) + Airflow orchestration
- β ML & Analytics: XGBoost, scikit-learn for fraud detection and predictions
- β Auto-Scaling Infrastructure: Cost-optimized pipelines with Terraform + K8s
- β CI/CD Automation: Docker + Kubernetes + GitHub Actions
Shivaji Science College, Nagpur | BS in Computer Science (2017-2021) | CGPA: 8/10
Certifications:
- π Data Engineering Associate (ongoing) - AWS
- π Data Engineering Professional (ongoing) - Google Cloud
- π Meta Database Engineer Professional - Coursera
- π Data Scientist Professional - Datacamp
β From Nikesh Chavhan | Data Engineer & Automation Enthusiast


