Hello, I'm

Nhu-Dat Cao (Dat)

|

Data Engineer with 3+ years of experience specialising in ETL pipelines, distributed data processing, and cloud data infrastructure. Currently pursuing an MSc in Data & Computer Science at Heidelberg University under a DAAD scholarship.

Databricks Certified Professional Azure Data Engineer DAAD Scholar
0 Years Exp.
0 Projects
0 Certifications
Nhu-Dat Cao
Scroll

Work Experience

Oct 2023 — Mar 2026

Senior Data Engineer

Hitachi Digital Services · Vietnam Delivery Center
  • Developed automated Databricks batch/streaming workflows processing 100 GB+ daily (~15M records/day) using PySpark and Medallion Architecture on Azure, doubling speed and cutting DBU costs by 50%.
  • Designed and optimised Pentaho flows and PostgreSQL queries for complex financial reporting, improving execution by 2–5×.
  • Delivered insights via SSRS and Qlik Sense dashboards for cross-cultural stakeholders.
  • Collaborated with BA, DA, PM, and DS roles to design data models, implement CDC via Delta Live Tables, and introduce Auto Loader for incremental ingestion.
PySparkDatabricksAzureMedallion ArchitecturePentahoQlik SenseDelta Live Tables
Mar 2023 — Aug 2023

Data Engineer

Viettel Big Data Analytics Center · Viettel Telecom
  • Built data pipelines for user logs and content using HDFS, Spark, and Pentaho, supporting Recommendation System algorithms.
  • Designed schemas in PostgreSQL, Elasticsearch, and Redis, improving retrieval speed and system scalability.
  • Installed and operated Elasticsearch, Kibana, Redis, and Kafka on testing servers.
  • Conducted acceptance testing for Backend (Java SpringBoot) and Frontend (ReactJS) components.
SparkHDFSElasticsearchRedisKafkaPostgreSQL
Aug 2022 — Feb 2023

Data Engineer / Analyst

Rocket Game Studio
  • As a founding data team member, built a full data platform for collection, storage, processing, and visualisation using Python, NodeJS, MongoDB, Redis, Kafka, Docker, Elasticsearch & Kibana.
  • Managed migration of game data from Firebase to Google BigQuery; visualised results in Looker Studio for Marketing insights.
  • Optimised MySQL-based HR data processing for timekeeping records.
BigQueryLooker StudioMongoDBKafkaDockerMySQL
Aug 2020 — Aug 2022

Research Assistant

AIoT Lab, MSO Lab · HUST
  • Applied deep learning and reinforcement learning to Multipath-TCP Scheduling problems in 5G networks.
  • Researched Genetic Algorithm (GA) and optimisation techniques.
Deep LearningReinforcement LearningGenetic Algorithm

Technical Skills

Languages

PythonJavaScalaJavaScript
🔧

Big Data Frameworks

Apache SparkHadoopKafkaAirflowPentaho
☁️

Cloud & Platforms

Microsoft AzureDatabricksGCPAWS
🗄️

Databases

PostgreSQLSQL ServerBigQueryMongoDBElasticsearchRedis
📊

BI & Visualisation

Qlik SenseSSRSLooker StudioKibana
🤖

AI / ML

Machine LearningNLPDeep LearningTensorFlow

Personal Projects

📰

Trend Analysis System

Big Data platform for trend analysis from large online news datasets using Scrapy, Spark, and HDFS. Built Elasticsearch data warehouse with Flask + ReactJS interfaces.

ScrapySparkHDFSElasticsearchFlaskReactJS
🎮

Game Data Platform

Integrated platform for managing and processing gamer behaviour data using Python, Redis, MongoDB, BigQuery, and Looker for advanced visualisation.

PythonRedisMongoDBBigQueryLooker

Coin–Twitter Correlation

Lambda architecture system correlating Binance crypto transactions with Twitter sentiment in real-time using Spark Streaming, HDFS, and Cassandra.

Spark StreamingLambda ArchCassandraHDFSNLP
🎵

Spotify Sentiment Analysis

NLP-based sentiment analysis of Spotify app reviews integrating RNN, LSTM, GRU, Bi-LSTM, and Transformer models built with Python and TensorFlow.

NLPTensorFlowLSTMTransformer
💹

Finance Mini Platform

Streamlined financial data collection, storage, and analysis pipeline using Python, Airflow for orchestration, and AWS Cloud services for scalable infrastructure.

PythonAirflowAWS
🎬

Movie Recommendation System

Personalised movie recommendation engine using Regression, KNN, and SGD models in Python, improving search results and recommendation accuracy.

PythonKNNSGDRegression

Education

2019 — 2023

BSc Computer Science — Talented Program

Hanoi University of Science and Technology

GPA 3.80 / 4.0 · 2nd in CS Vietnam (QS Ranking)

Bigdata Storage & Processing · Deep Learning · Software Design & Construction

2016 — 2019

Mathematics Specialised

Le Hong Phong High School for the Gifted

GPA 8.7 / 10

Achievements & Certifications

🏆

DAAD Scholarship 2025–2026

Highly competitive DAAD scholarship for STEM disciplines supporting Master studies in Germany.

🥇

Golden Prize — Vietnam Math Olympiad 2023

National university competition · Algebra · Ranked 6th out of 161 contestants.

🥈

Consolation Prize — Vietnam Math Olympiad 2019

National high school competition specialising in mathematics.

🥇

Golden Prize — Northern Math Competition 2017

Northern high school mathematics competition.

🔷

Databricks Certified Data Engineer Professional

Associated with Hitachi Digital Services.

☁️

Microsoft Certified: Azure Data Engineer Associate

Associated with Hitachi Digital Services.

🗃️

Microsoft Certified: Azure Database Administrator

Associated with Hitachi Digital Services.

📊

Google Data Analytics Certificate

Online certification.

🧩

SQL Advanced Certificate — HackerRank

Online certification.

🎖️

Outstanding Graduate — HUST

Excellent classification · Conduct score above 90/100.

Get in Touch

I'm open to new opportunities, collaborations, or just a chat about data engineering.