Manny (Mamiya) Adachi

A recent graduate of Masters in Health Data Science (Extension) at UNSW, formerly a business consultant in healthcare, hotel, and retail and ICT business analyst at a medical device company

LinkedIn: https://www.linkedin.com/in/mamiyaad/

 

This page showcases my data science projects and tasks I engaged in. I included code repositories and figures from some of my works. For the engagements under confidentiality restrictions, I described the overviews and processes on a best-effort basis.

 

Tip: Each project/task is tagged with keywords. Please search the keywords of your interest using “Ctrl + f” (e.g., statistical modeling/analysis, machine learning, visualization, data processing, image analysis).

 

Project / Task List

Forecasting all-cause mortality: leveraging cause-of-death data through neural networks

 

[Keyword]

Public Health, Supervised Machine Learning, Convolutional Neural Network (CNN), Lee-Carter Model, Dimensionality Reduction, Hyperparameter Tuning, Data Processing, Data Manipulation, Data Management, Exploratory Data Analysis, Data Visualization, Data Interpretation, Python

 

Winning SAS Institute’s analytics competition and earning the SAS Viya skill certificate

 

[Keyword]

Non-Profit Organization, 1st place winner, Internship, Supervised Machine Learning, Data Processing, Exploratory Data Analysis, Data Visualization, SAS Enterprise Miner

 

 

 

Investigating women’s employment status tendency among Canadian couples and families

 

[Keyword]

Socioeconomics, Statistical Modeling/Analysis, Generalized Linear Model (GLM), Exploratory Data Analysis, Data Visualization, R Programming

 

Investigating the media attention impact on dispensing contraceptives in Australia

 

[Keyword]

Public Health, Interrupted Time Series, ARIMA, Exploratory Data Analysis, Data Visualization, R Programming

 

 

 

Predicting the hospital readmission of diabetic patients

 

[Keyword]

Hospital Operation, Supervised Machine Learning, Random Forest, Logistic Regression, Machine Learning Pipeline, Feature Transformation, Hyperparameter Tuning, Data Visualization, Python

 

Developing the decision support algorithm for Parkinson’s disease early-stage screening

 

[Keyword]

Health Analytics, Decision Support, Supervised Machine Learning, Logistic Regression, Random Forest, Gradient Boosting Machine (GBM), Artificial Neural Network (ANN), Ensemble Models, Machine Learning Pipeline, Feature Transformation, Hyperparameter Tuning, Data Visualization, Python

 

 

 

Distinguishing the medical images (blood cells infected or uninfected by malaria)

 

[Keyword]

Medical Image Analysis, Unsupervised Machine Learning, Autoencoder, Neural Network, Data Visualization, Python

 

Distinguishing the electroencephalogram data of alcoholic or non-alcoholic subjects

 

[Keyword]

Health Analytics, Unsupervised Machine Learning, Autoencoder, Long-Short Term Memory (LSTM), Neural Network, Time-Series Data, Data Visualization, Python

 

 

 

Developing a decision support algorithm for hypotensive patient management in the ICU

 

[Keyword]

Health Analytics, Decision Support, Reinforcement Learning, Unsupervised Machine Learning, Batch-Constrained Q-Learning (BCQL), K-Means Clustering, Time-Series Data, Data Preprocessing, Data Visualization, Python

 

Extracting the various specified information from the old datasets of UNSW

 

[Keyword]

PostgreSQL, SQL, Data Extraction, Data retrieval

 

 

 

Mapping the emergency departments (ED) and visualizing the distances from the nearest ED

 

[Keyword]

Public Health, Spatial Data, Data Visualization, Shiny App, R Programming

 

Developing the Microsoft Excel-based analysis tools (business setting)

 

[Keyword]

Data Manipulation, Data Visualization, Master Data Management, Sales Forecast, Managerial Accounting, MS Excel

 

 

 

Developing and Implementing BI dashboard (business setting)

 

[Keyword]

BI dashboard, Process Automation, Data Visualization, SAP Business Object 4.0 Web Intelligence

 

Data entry, administration, and migration (business setting)

 

[Keyword]

Data Entry, Data Administration, Data Migration, Data Warehouse, MS Excel

 

 

 

Consulting a small company by delivering actionable insights from numbers (business setting)

 

[Keyword]

Retail Business, Small Business Consulting, Financial Simulation, Managerial Accounting, Business Planning, Data Visualization, MS Excel, MS PowerPoint, MS Word

 

 

 

Back to top

 

 

 

Main

 

Forecasting all-cause mortality: leveraging cause-of-death data through neural networks

[Keyword]

Public Health, Supervised Machine Learning, Convolutional Neural Network (CNN), Lee-Carter Model, Dimensionality Reduction, Hyperparameter Tuning, Data Processing, Data Manipulation, Data Management, Exploratory Data Analysis, Data Visualization, Data Interpretation, Python

 

[Overview]

My master's thesis was developing a new CNN-based mortality forecasting model integrating cause-of-death information using Python. The dataset was the U.S. mortality data from 1959 to 2019, initially with 6 million+ records.
(Repository:
https://github.com/MannyAdc/ForecastModel_LC_ML)

 

 

[Approach]

n Preprocessed/cleaned the large and fragmented raw datasets (6 million+ records in total) with track and record of the process to enhance the reproducibility of the project output

Ø Documented the cleaning process visually by flowcharts, including the checks on data format and duplication, deletion, and aggregation of rows and columns

Ø Made a data dictionary of the final dataset stating each variable’s description, data type, and range/options

Ø Assessed the data quality by key categories during exploratory data analysis (EDA) and visualization

Ø Omitted/kept abnormalities by iterating the above to avoid the potential "noise" while training the model

 

Sample Images of Documenting Data Preprocessing/Cleaning, Data Dictionary, and EDA

 

n Developed the new CNN model and other models for comparison; the models known for weaker performance were omitted beforehand based on the findings from the past research studies

 

 

n Assessed the model performance by a comparison table showing the training time and mean square error (MSE) values and graphs of forecast vs. actual of each model that visualized the performance per age group

 

Comparison Table

 

Sample Visualization of the Forecast (Dotted Line) vs. Actual (Solid Line)

 

n Clarified the spot for future studies by mentioning what needed to be added to my thesis's scope

 

[Outcome]

The new model outperformed most compared models by the smaller total MSE value. The training time was significantly longer than the other models, but this criterion’s priority depends on the (business) context. I presented the result to UNSW professors and research fellows and achieved high distinction.

 

 

Back to Project / Task List

 

 

Winning SAS Institute’s analytics competition and earning the SAS Viya skill certificate

[Keyword]

Non-Profit Organization, 1st place winner, Internship, Supervised Machine Learning, Data Processing, Exploratory Data Analysis, Data Visualization, SAS Enterprise Miner

 

[Overview]

Please see https://hds-hub.cbdrh.med.unsw.edu.au/posts/2023-01-13-sas-cortex/ for details. It refers to,

n winning 1st place in the SAS Cortex Analytics Simulation 5-Day Challenge in April 2022,

n participating in the internship program at SAS Institute Australia as the reward for the above, and achieving SAS Certified Associate: Programming Fundamentals Using SAS Viya in June 2022.

 

Back to Project / Task List

 

 

Investigating women’s employment status tendency among Canadian couples and families

[Keyword]

Socioeconomics, Statistical Modeling/Analysis, Generalized Linear Model (GLM), Exploratory Data Analysis, Data Visualization, R Programming

 

[Overview]

I analyzed the tendency of women’s employment status concerning their marital status, income of the male member in the household, presence of children, and region of residence. The dataset was based on 1977 surveys of Canadian couples and families. The model was GLM in the binomial family since the dataset had both continuous and categorical variables.

 

[Approach]

n Exploratory data analysis (EDA) by observing the correlations,

Ø between the outcome variable (i.e., women’s employment status) and input variables and

Ø among the input variables

n Model-fitting in various scenarios:

Ø using all input variables,

Ø using fewer variables,

Ø using all input variables one of which interacts with other input variables, and

Ø using fewer variables one of which interacts with remaining input variables

n Evaluation on models based on the p-value of ANOVA test, AIC score, and AUROC score

 

[Outcome]

The analysis implied,

n the strong association of the presence of children and

n the slight association of male members’ income to women’s employment status.

I achieved high distinction for this task.

 

Back to Project / Task List

 

 

Investigating the media attention impact on dispensing contraceptives in Australia

[Keyword]

Public Health, Interrupted Time Series, ARIMA, Exploratory Data Analysis, Data Visualization, R Programming

 

[Overview]

I analyzed the media attention impact on dispensing contraceptives (combined/simple contraceptives) using R. The dataset consisted of monthly rates (per 1000 women of reproductive age) of PBS-subsidized dispensing of combined and simple contraceptives between January 2013 and December 2016. The media attention peaked in the last week of May 2015.

 

[Approach]

n Exploratory data analysis (EDA) by decomposing each time series data (i.e., the combined or simple) to observe the trend, seasonality, outliers, stationarity, and autocorrelation

n Log-transformation of the data for eliminating autocorrelation and non-stationarity effects

n Model selection for each data based on the EDA (e.g., stationarity + no-autocorrelation à segmented time series, no-stationarity + autocorrelation à ARIMA)

n Model fitting for each time series by iteratively testing different parameters

n Evaluation of time series changes: step (= interruption) and slope after media attention (= intervention).

n Quantifying the above changes in tables and visualized by the actual time series against the counterfactual (simulative plot if no intervention was present)

 

Sample Image of PBS-Subsidised Dispensing of the Combined Contraceptives
(Log of Monthly Rates per 1,000 Women of Reproductive Age)

 

[Outcome]

The media impact was agreeable on the combined contraceptives based on the changes (in %) with the monthly dispensing rate confidence intervals from the step change. I achieved distinction for this task.

 

Back to Project / Task List

 

 

Predicting the hospital readmission of diabetic patients

[Keyword]

Hospital Operation, Supervised Machine Learning, Random Forest, Logistic Regression, Machine Learning Pipeline, Feature Transformation, Hyperparameter Tuning, Data Visualization, Python

 

[Overview]

I developed the algorithms to predict the risk of diabetic patients’ readmission to a hospital after discharge. The scenario was deploying the prediction algorithm for a hospital home-visit care unit, given that the operation cost is higher for readmitted patients. The dataset was a simulative electric health record data with binary labels of readmission (i.e., yes or no) and provided as “clean” data for this task. The algorithms used were logistic regression and random forest.

 

[Approach]

n Train/test split of the dataset with stratifying along the labels (target variable)

n Developing the following pipeline for logistic regression algorithm due to its sensitivity to the value scales:

Ø feature transformation

Ø training/validation

Ø hyperparameter tuning

n Fitting (training/validation) with GridSearchCV on the Scikit-Learn library

n Model evaluation by f1 scores (i.e., 2 x (precision x recall) / (precision + recall)),

n Observation of feature variables by SHAP (for general feature importance) and LIME (for feature importance of a single sample prediction)

 

[Outcome]

I chose the random forest algorithm as the final model, given the higher scores in f1 (0.6706). Although the f1 score was not high, I concluded that the random forest algorithm was deployable given the high precision score with test data and 81% higher cost efficiency with home visits per patient. I achieved distinction for the task.

 

Back to Project / Task List

 

 

Developing the decision support algorithm for Parkinson’s disease early-stage screening

[Keyword]

Health Analytics, Decision Support, Supervised Machine Learning, Logistic Regression, Random Forest, Gradient Boosting Machine (GBM), Artificial Neural Network (ANN), Ensemble Models, Machine Learning Pipeline, Feature Transformation, Hyperparameter Tuning, Data Visualization, Python

 

[Overview]

I developed machine learning models (final model as a decision support algorithm) for early-stage screening of Parkinson's disease patients. The dataset consisted of 252 subjects (188 patients and 64 controls) with 3 records for each subject. The algorithms used were logistic regression, random forest, GBM, ANN, Ensemble models (voting/ensemble classifiers)

 

[Approach]

n Exploratory data analysis/data preprocessing (e.g., data type check, categorical variable check for encoding, balance between patient and control numbers, train/test data split)

n Developing models per algorithm (with pipeline construction, when necessary, e.g., for logistic regression)

n Model evaluation by AUROC, recall, and f1 scores for prediction performance and over/underfitting: minimizing the false negative rate as the top priority

 

[Outcome]

Based on the model evaluation criteria above (priority in order), the random forest model was the best (AUROC: 0.8300, recall:0.97, and f1: 0.8706) and likely viable only for preliminary screening purposes. I achieved distinction for this task.

 

Back to Project / Task List

 

 

Distinguishing the medical images (blood cells infected or uninfected by malaria)

[Keyword]

Medical Image Analysis, Unsupervised Machine Learning, Autoencoder, Neural Network, Data Visualization, Python

 

[Overview]

I developed a classification model for the images of blood cells infected or uninfected by malaria using Python. The image data was provided as the compressed pixel data, and labels for the images were provided (both infected and uninfected images, 13,779 each).

 

[Approach]

n Developing an autoencoder (feed-forward neural network-based, unsupervised) to assess its power to distinguish the image by using a limited portion of the whole dataset (8,819 uninfected cell images) as the training data, aiming to develop a model faster and less training data

n Assessing the performance through several data visualization: direct comparison of the actual/reconstructed images and the t-SNE cluster plot

 

Illustration of an Autoencoder

 

[Outcome]

I presented how my model could distinguish the images. I achieved distinction for the presentation.

 

Back to Project / Task List

 

 

Distinguishing the electroencephalogram data of alcoholic or non-alcoholic subjects

[Keyword]

Health Analytics, Unsupervised Machine Learning, Autoencoder, Long-Short Term Memory (LSTM), Neural Network, Time-Series Data, Data Visualization, Python

 

[Overview]

I developed an autoencoder with LSTM for electroencephalogram classification among alcoholic and control patients using Python. The dataset was from UCI Machine Learning Repository and consisted of 122 patients (120 trials for each patient and 255-step time series for each trial).

 

[Approach]

n Narrowing the data amount to 30 patients with 30 trials (20 patients with 20 trials for model training) due to computational capacity on my platform (Google Colab)

n Data loading and cleaning from .gz files (per trial) to pandas data frames (saved as CSV)

n Constructing and training the autoencoder with LSTM cells at each layer

n Assessing the autoencoder’s prediction values from test datasets of alcoholic and control patients

 

[Outcome]

The difference in mean square errors was distinct between the alcoholic and the control. Hence, my autoencoder was able to distinguish the data. I achieved high distinction for this task.

 

Back to Project / Task List

 

 

Developing a decision support algorithm for hypotensive patient management in the ICU

[Keyword]

Health Analytics, Decision Support, Reinforcement Learning, Unsupervised Machine Learning, Batch-Constrained Q-Learning (BCQL), K-Means Clustering, Time-Series Data, Data Preprocessing, Data Visualization, Python

 

[Overview]

I developed a reinforcement learning (BCQL, hence model-free) algorithm for hypotensive patient management in the ICU using Python. The dataset consisted of vital signs, lab tests, and treatments measured over 48 hours in 3,910 patients with acute hypotension, and no new and additional real-time data was available.

 

[Approach]

n Defining the reward function which labels the mean arterial pressure (a vital sign) in the next time step

n Labeling state (of each patient at each time point) by k-means clustering (i.e., unsupervised learning)

Ø The number of clustered states resulted in 100 based on Davies Bouldin score (the lower, the better)

n Computing a tabular state-action (treatment) value function (i.e., RL policy)

n Evaluating the RL policy performance against the clinical policy (i.e., simply the observation dataset) and a simple Q-Learning policy (being more biased toward the initialization values of Q-policy, hence less realistic)

[Outcome]

The developed BCQL algorithm outperformed the clinical policy based on the expected value of reward and was more realistic than the simple Q-Learning. I achieved high distinction for this task.

 

Back to Project / Task List

 

 

Extracting the various specified information from the old datasets of UNSW

[Keyword]

PostgreSQL, SQL, Data Extraction, Data retrieval

 

[Overview]

Using the relational schema (57 data tables) about the information (e.g., people, program/course/class enrolment, facility, organization) from my university, I developed PostgreSQL codes to generate views, tables, and functions which take user inputs based on the task requirements.

 

[Outcome]

I achieved high distinction for this task.

 

Back to Project / Task List

 

 

Mapping the emergency departments (ED) and visualizing the distances from the nearest ED

[Keyword]

Public Health, Spatial Data, Data Visualization, Shiny App, R Programming

 

[Overview]

I developed an interactive shiny app by R, mapping the 1010 emergency departments (EDs) and coloring the grids where each ED was by the distance from the nearest other ED. The data source was AIHW, which included each hospital's information (e.g., name, address, phone, webpage, longitude, and latitude). I set the shiny app to show the blue boundary for the selected states. The state boundary data was provided in .shp format beforehand. Please note that the source code and the shiny app are unavailable for publishing.

 

Sample Image of the ED mapping with distance coloring around Sydney

 

[Outcome]

The shiny app worked perfectly, as shown in the sample image above. I achieved high distinction for this task.

 

Back to Project / Task List

 

 

Developing the Microsoft Excel-based analysis tools (business setting)

[Keyword]

Data Manipulation, Data Visualization, Master Data Management, Sales Forecast, Managerial Accounting, MS Excel

 

[Overview]

At a medical device company, I developed MS Excel-based analysis tools complimentary to BI dashboards; for example,

n intragroup sales/cost forecast by items (from global headquarter to overseas affiliates),

n master data check file for annual maintenance, and

n revenue/profit breakdown simulation in various currency rates.

 

The purpose was to enhance the flexibility of analysis operation while optimally standardizing the tools for efficient and consistent usage. Some used functions are vlookup, hlookup, index, indirect, subtotal, concatenate, pivot table, and pivot graph.

 

Back to Project / Task List

 

 

Developing and Implementing BI dashboard (business setting)

[Keyword]

BI dashboard, Process Automation, Data Visualization, SAP Business Object 4.0 Web Intelligence

 

[Overview]

At a medical device company, I developed and delivered a BI dashboard using SAP Business Objects 4.0 Web Intelligence. This responsibility started as a project of re-engineering the financial data analysis/reporting, which initially required an entirely manual process in Excel sheets before the data analysis with frequent errors (e.g., incorrect copy/paste, inconsistent version control).

 

[Approach/Solution]

I provided the BI dashboard and complimentary data extraction/checking tools to standardize and semi-automate the data preprocessing and visualize the financial data (e.g., by time, subsidiary, country, and business field). The tool development process included,

n identifying user requirements,

n assessing raw data,

n modeling the to-be operation process,

n communicating with the system vendor,

n validating the system (data warehouse) developed by the vendor,

n developing BI dashboard and complementary tools, and

n providing aftercare of users' ad-hoc requirements.

 

[Outcome]

n Although the data granularity was often insufficient to create sophisticated analytical outputs given that the individual transaction level data was not publishable for multiple stakeholders, I still delivered the frequently missed observation points (e.g., trend, irregularity, potentially incorrect data, false abnormality) through BI dashboard delivery.

n In addition, through this project I improved the existing data analysis workload of stakeholders in Japan by 50%+. I received an award from the Executive Vice President (head of division) for the improvement.

n Overall, I engaged in the project for three and a half years, including post-project engagement for ad-hoc requests (it has taken the majority of the time).

 

Back to Project / Task List

 

 

Data entry, administration, and migration (business setting)

[Keyword]

Data Entry, Data Administration, Data Migration, Data Warehouse, MS Excel

 

[Overview]

At a medical device company, I engaged in data entry, administration, and migration as part of financial analysis operation re-engineering through BI dashboard development and implementation.

 

[Issues/challenges]

The monthly financial reporting data (base data) did not have the granularity required for complete analysis. Initially, the data gathering and processing schemes for those “non-base” data were neither standardized nor automated (e.g., excel sheets in inconsistent form, calculation error during consolidation).

 

[Approach/Solution]

n I developed the scheme and tools that partially standardized and automated the data entry and administration processes through templates and rules for routine data updates. Along with the implementation, I cleansed the past “non-base” data to align the templates and rules and then migrated it into the data warehouse, where the BI dashboard retrieved the data.

n I documented the data cleaning process as manuals with figures and succeeded the task to other colleagues to establish the routine operation. Overall, I engaged in the tasks for around three years, including routine operation, task succession, and supervision.

 

Back to Project / Task List

 

 

Consulting a small company by delivering actionable insights from numbers (business setting)

[Keyword]

Retail Business, Small Business Consulting, Financial Simulation, Managerial Accounting, Business Planning, Data Visualization, MS Excel, MS PowerPoint, MS Word

 

[Overview]

I consulted a small company who initially feared their business termination due to their significant staff shortage. I engaged in this project for nine months, parallel to projects with other clients.

 

[Approach/Solution]

I analyzed numerically and visually their P/L and BS to quantify the possibility, measures, and potential risk of CF shortage. It required me to estimate

n the minimum viable revenue and

n the operation cost and cash flow along with the above,

n production capacity for sales at the store and to their distributors, and

n the condition of the client's staff members.

 

I then advised the client that they could maintain their business by

n reducing the business day/hours,

n temporarily discontinuing sales to their distributors,

n clarifying the time leeway before the cash flow shortage, and

n collaborating with another advisory team for job postings.

 

In addition, the key communication approaches I took while facing the clients were

n presenting only the details necessary to inform the conclusion,

n documented the details in graphs and tables with descriptions in simple languages to encourage the audience to understand fast and recap after my presentation,

n communicating interactively instead of simply explaining, and

n slowing down the explanation when the audience had difficulty understanding or was emotionally stressed.

 

[Outcome]

As a result, the client could maintain their businesses.

Back to Project / Task List