Projects

Note:

  • Company projects cannot be shown in detail, breif summary is provided.
  • Personal projects will be described more in detail.

Company Projects


May, 2019 - Aug, 2020

Prudential Financial:

  • Developing Machine learning models for business use cases based on time series analysis, mortality analysis, and customer churn analysis (Python - Scikit Learn, TensorFlow, Keras)
    • Policy persistency prediction(Python ML libraries, Tensorflow/Keras)
    • Survival analysis(pySurvival-Kaplan-Meier,Cox)
    • Rapid underwriting classification(Python ML libraries, Tensorflow/Keras)
    • Cohort behavior analysis(Python ML libraries, Tensorflow/Keras)
    • Product purchase pattern analysis(Python ML libraries, Tensorflow/Keras)
  • Research projects: solutions to address model bias, and model drift in production
  • Providing Dashboards on Analytical results and insights (using MS Power BI)
  • Cooperating with DevOps teams to deploy models
  • Researching on concerns and effectiveness of the MDLC(ML model development lifecycle)

June, 2018 - April, 2019

American Express:

  • Developed MVPs(minimum-viable products) leveraging machine learning algorithms
    • Global sanction list downsizing (XGBoost|NLTK, spaCy–Text Classification, String matching)
    • Fraud/Anomaly detection (XGBoost|Logistic regression|ARIMA)
    • Social media analysis (NLTK, spaCy–Sentiment analysis,Entity recognition)
    • Next web-page prediction (Markov|NexworkX)
  • Set up the end-to-end pipeline for a machine learning as a service (AWS - EC2 | Lambda | Kinesis | S3)
  • Provisioning of engineered dataset and benchmark solution for internal Kaggle competition
  • Feature Engineering for big data (Python|pySpark)
  • Docker build-up and containerization (Spark customization)

Personal Projects


January 26, 2018

Voyage to the intelligent music streaming service: Ep 1. The analysis of Top ranked Songs and Artists

  • Previously in 2012 and 2013, I had worked on developing a Music Recommendation System algorithm based on gathered information from sensors of smart phones. There were some reasons not to be implemented in mass production, but the critical one was processing time of big data and its clustering, which was not efficient for applying real-time engine at that time. Now I am rebooting this project from the scratch by myself. This journey will cover not only for the music recommendation system but also for all the smart artificial intelligence system. Even though I am not aware of where the end of this travel is, enjoying it would enough. My journey start with an analysis of the current music streaming trend by using database from Spotify.

February 16, 2018

Voyage to the intelligent music streaming service: Ep 2. Comparative analysis on the reliability of Spotify’s music ranking data

  • Are you enjoying listening to music? You may or may not; we can easily find people who put on headset or earphone everywhere. I can arguably say that most of them are listening to music. Do you ever think about the world without music? Commercials without music, movies with no soundtracks, and Broadway shows without songs, … we even cannot imagine that. Music makes people more comfortable, stress-less, joyful, and relaxed. That’s why music streaming service should be more intelligent and to be customized individually. IoT (Internet of Things) ecosystem for this kind of smart service - sensors, information, data, devices, cloud service, connectivity - is around the corner. All we need to is connecting them in a well-organized way. I hope I may encounter one of the good solutions for this in my journey.
  • My first voyage have been started from the analysis of the contemporary music streaming trend (refer to Episode 1), where it had been used Spotify’s database. It was a pretty easy start because I could get a data - from the Kaggle - with low efforts. During the journey I had a concern whether the data was reliable or not, but I didn’t have any alternatives at that time so I put it aside and just concentrated on the analysis. Now I am considering a suitability of this data.

March 2, 2018

House Price Prediction with Creative Feature Engineering and Advanced Regression Techniques

  • Can you imagine that a house can be described with 79 explanatory variables covering (almost) every aspect of a residential home? If that’s the situation, then how accurate can the house sale price prediction be, with the state-of-art machine learning technologies? What would be the new challenges arising to advance from the state-of-art to some even further improvement? … These are some of the exciting questions and opportunities that we have and need to find/give answers to in this project. Specifically, our focus is primarily on developing creative and effective feature engineering schemes, and exploring and exploiting the potential of advanced regression techniques like random forest (RF), gradient boosting machine (GBM) and model stacking, etc.

March 27, 2018

E-Commerce Data Analysis of Locks and Safes

  • For this capstone project, we were given an opportunity to conduct an e-commerce data analysis for MasterLock. MasterLock’s main objective was to use e-commerce data analysis to design strategies that would increase their products sales. Our task was to scrape customer reviews and product/website details from e-commerce sources that sold MasterLock’s products. With this data, we would create a data-driven analytics report, which in turn can help MasterLock to achieve their goals.