Integrative Machine Learning of Genetic and Lifestyle Factors for Personalized Skin Health

Document Type

Article

Source of Publication

IEEE Journal of Translational Engineering in Health and Medicine

Publication Date

1-1-2026

Abstract

Objective: To develop an AI framework that combines genetic, phenotypic, and lifestyle data for profiling skin-health patterns and generating hypothesis-supporting summaries for potential decision support. Methods and procedures: A dataset of 5,254 individuals integrates six genes (FLG, AQP3, MMP-1, MMP-3, SOD2, GPX), six phenotype severities, and 20+ lifestyle factors. Mutation burden and interactions are tested by ANOVA. K-modes clustering identifies four interpretable dermatological profiles within the cohort and is embedded in leakage-free nested cross-validation (train-only selection; test labels from training centroids). Subtypes are predicted from genetics plus lifestyle using an XGBoost (XGB) classifier; explainability uses gain, permutation importance, and SHAP contributions aggregated across outer folds. Results: Four subtypes are identified. Mutation burden differed across phenotypes (ANOVA, p < 0.05). Interactions are observed for AQP3×Winter→Dryness, GPX×Medication→Pigmentation, and MMP-3×City Living→Redness. Nested-CV prediction achieves 0.9789 ± 0.0083 accuracy with macro-F1 0.9711 ± 0.0126 and macro-recall 0.9697 ± 0.0091. This outperformed unimodal baselines and improved generalization across all folds in practice. Drivers are stable across folds and included scrub usage, stress, sleep, low water intake, menopause, and camouflage habits, alongside oxidative-stress and MMP genes. Conclusion: Integrating genomic susceptibility with modifiable exposures enables robust, interpretable skinprofile prediction and highlights actionable targets for stratified counseling beyond genetic predisposition.

ISSN

2168-2372

Publisher

Institute of Electrical and Electronics Engineers (IEEE)

Volume

14

First Page

164

Last Page

178

Disciplines

Computer Sciences | Medicine and Health Sciences

Keywords

Dermatogenomics, gene-environment interactions, k-modes clustering, leakage-free evaluation, model interpretability, multimodal data integration, nested cross-validation, permutation importance, SHAP, XGBoost

Scopus ID

105033796028

Creative Commons License

Creative Commons Attribution 4.0 International License
This work is licensed under a Creative Commons Attribution 4.0 International License.

Indexed in Scopus

yes

Open Access

yes

Open Access Type

Gold: This publication is openly available in an open access journal/series

This document is currently not available here.

Share

COinS