Ravikant Pal





6+ years building TB-scale pipelines, real-time CDC systems, and cloud-native analytics platforms.
Founder of a live SaaS product with 200+ paying customers. Immediate joiner. Relocating to the Netherlands.


Who I Am

I am a Senior Data Engineer who builds data systems that survive production at scale.

My work spans real-time CDC pipelines with sub-second latency, TB-scale PySpark ETL, analytics infrastructure on Snowflake and PostgreSQL, and distributed data architectures with no central database. I have shipped these systems across AdTech, HR tech, fintech, and SaaS domains.

I also founded a SaaS platform that reached 200+ paying customers built on a local-first distributed data architecture designed entirely from scratch, no team, no funding.

I am currently expanding into AWS Glue, dbt, and Databricks Delta Lake to round out the modern lakehouse stack.


Tech Stack

Core (Production Experience)

Domain Technologies
Languages Python · PySpark · SQL · Java 8-21 · Kotlin
Batch Processing Apache Spark · PySpark · Spark SQL · Spark Streaming · Distributed ETL
Stream Processing Apache Kafka · Kafka Streams · Kafka Connect · Event-Driven Architecture
CDC Pipelines Debezium · Change Data Capture · Log-based replication · Kafka Connect
Data Warehousing Snowflake · PostgreSQL · Cassandra · MySQL · Redis · MongoDB
Search and Analytics Elasticsearch · ELK Stack
AWS (Data) S3 · MSK (Managed Kafka) · Redshift · Athena · Kinesis · Lambda · EMR · CloudWatch · IAM · EKS · EC2
Containers and Infra Docker · Kubernetes · Helm · Terraform · ArgoCD · Jenkins · CI/CD
Observability Datadog · Prometheus · Grafana · ELK Stack · Automated Alerting · Schema Validation
Data Modeling Dimensional Modeling · Partitioning Strategies · Data Vault concepts · ETL/ELT patterns

Expanding (Active Learning)

Domain Technologies
AWS Managed ETL AWS Glue · Glue Data Catalog · Glue Crawlers · Glue Studio
Lakehouse Databricks · Delta Lake · Unity Catalog · Medallion Architecture
Transformation dbt · SQL-first modeling · data lineage · incremental models
GCP (Data) BigQuery · Dataflow (Apache Beam) · Pub/Sub · Dataproc · Cloud Composer · Cloud Storage · Looker Studio

Experience

MML — Library Management SaaS (Self-Founded)

Founder and Data Architect | Sep 2025 — Present | Bangalore

Built a production SaaS platform from zero, solo. Designed every layer of the data architecture from client-side storage to sync pipelines to multi-tenant isolation.


Employ Inc

Senior Data Engineer | Oct 2024 — Mar 2026 | Bangalore

Built the data backbone for an enterprise HR tech platform operating across multiple ATS products at scale.


Times Internet

Data Engineer | Sep 2023 — Oct 2024 | Noida

Built the analytics data infrastructure for one of India’s largest digital media and AdTech companies.

Software Engineer (Data Infrastructure) Sep 2021 — Sep 2023 Noida

Contributed to the data infrastructure and event platform powering the AdTech analytics ecosystem before transitioning to the dedicated Data Engineering role.


Kane Solutions

Software Engineer | Jul 2020 — Sep 2021 | Noida


MountBlue Technologies

Software Development Engineer | Jul 2019 — Jun 2020 | Bangalore


Projects

Earthquake ETL Pipeline and Live Heatmap

A production-style batch ETL pipeline that ingests live global earthquake data from the USGS API every hour, stores it in a partitioned PostgreSQL database, and serves an interactive heatmap — all running locally with a single command.

docker-compose up

What it does:

Live map preview:

Once running, open http://localhost:8050. The map renders all seismic events from the past 30 days, magnitude-weighted, with a toggle between heatmap layer and individual point layer. The stats panel refreshes automatically every 5 minutes showing event counts by magnitude range, most active region, and average depth.

Stack: Python · PostgreSQL · Docker · PySpark · Plotly Dash · USGS API · Partitioned tables · Idempotent ingestion

View on GitHub


MML Data Architecture — Local-First Distributed Sync

The data engineering problem underneath a SaaS product: how do you synchronize structured data across 5 devices per tenant, for 500+ tenants, with no central database and zero infrastructure cost?

What was designed and built:

Why it is a data engineering problem: The constraints (no server, no central DB, multi-device writes, eventual consistency) required the same thinking as designing a distributed pipeline with exactly-once semantics and partition isolation.

Stack: JavaScript · IndexedDB · Google Drive API · Custom conflict resolution · Sharding patterns · ETL migration scripts

Live Product


What I Am Learning Right Now

Tool Current State Why
AWS Glue Building ETL jobs, exploring Glue Data Catalog and Crawlers AWS-native managed ETL is standard in Dutch cloud-first DE stacks
dbt SQL-first modeling, incremental models, lineage graphs Snowflake-heavy teams in the Netherlands use dbt as the transformation standard
Databricks / Delta Lake Lakehouse architecture, Delta table internals, medallion patterns Increasingly required alongside Spark in Amsterdam-based DE roles

Education

B.Tech — Computer Science and Engineering Dr. A P J Abdul Kalam Technical University | 2015 — 2019 | Uttar Pradesh, India


Languages

English — Professional Proficiency, Hindi — Native, Dutch — Beginner (A1, in progress)


Open to Relocation: Netherlands (Amsterdam preferred) and broader Europe

EU work authorization sponsorship may be required. Immediate joiner. Available from day one.

If you are hiring a Senior Data Engineer who builds systems that survive production and ships products that customers pay for: