A

Click on clusters to explore

UMAP_1
UMAP_2

Clustering Parameters
? Clustering Parameters
These control how the nodes are positioned and grouped. In principle, the closer two nodes are, the more similar they are.

In this example, I first define a list of keywords to compare against. Then, I calculate similarity between these keywords and the descriptions of each CV element. This is done in one of two ways:

Jaccard Index: Calculate the jaccard index of the CV element descriptions vs keywords.
Embeddings: Use a high dimensional large language model to transform the keywords for each CV element into dense vectors and then calculate the cosine similarity between them.

Semantic Weight: Balance between embedding similarity and category identity.
Category Weight: Influence of project categories on positioning.

*Note: Under normal circumstances the category and semantic weights would be scaled and set at 1.0 for a standard UMAP projection. However, to make this visualization more interactive I have allowed for manual adjustment to artifically inflate the variance of either semantic similarity scores or category conditions, adjusting clustering. This is not a standard procedure in UMAP analysis.

Number of PCs: How many Principal Components to preserve before reducing down to a final 2D UMAP projection.
Method: Jaccard (keywords) vs Embeddings (dense vectors).
Gravity: Pulls nodes to the centre.
Friction: Affects how quickly nodes decelerate.
Spring Strength: Stiffness of connections between nodes.

1.00
1.00
7
0.20
0.80
0.03

B

Click a node to see details

C

D

PCA Explained Variance

E

Top Feature Loadings (PC1)
? Feature Loadings
This graph shows which characteristics mathematically drive the most variation across the dataset's primary axis (UMAP_1).

Jaccard Mode: Displays specific CV keywords (e.g. "Python"). Larger signed values indicate that the keyword is a stronger driver of the variation in PC1.

Embeddings Mode: Displays the abstract dimensions of the underlying neural network (e.g. "Dim 12"). I correlated these abstract dimensions to the closest matching keyword from all CV items to give you an idea of what that abstract dimension represents. Often several dimensions are correlated to the same keyword, demonstrating the capture of complex relationships by high dimensional embeddings.

About Me

I am a Master of Molecular Biosciences candidate with a specialized hybrid background in End-End Bioinformatics, translating computational findings to Wet Lab Experimental Biology.

My expertise lies in single-cell and bulk RNA-sequencing pipelines and applications of Variational Autoencoders (VAEs). Fluent in Python and R, moderate experience in Java, Javascript, HTML, and CSS, with full-stack development experience.

Experience

MSc. Internship with Michael Platten (DKFZ)

DKFZ | Nov 2025 - Present

Conducting high-throughput single-cell analysis to validate non-variant reference genes for qPCR normalization, improving data accuracy for downstream applications.

MSc. Internship with Michael Boutros (DKFZ)

DKFZ | Apr 2025 - Nov 2025

Engineered multi-omics data (RNA-seq, FACS, Morphological features) for input into a deep learning VAE architecture, deconvoluting patient derived organoid batch effects and specific drug-perturbation effects.

Scientific Assistant Full-Stack Web Developer (HIWI) with Michael Boutros (DKFZ)

DKFZ | Apr 2025 - Nov 2025

Worked on the modernization of back-end services for the GenomeRNAi web-portal, a database of RNAi screening data. Migrated the database from a legacy system to a modern database, and integrated FastAPI to serve data to the front-end.

Scientific Assistant (HIWI) with Michael Boutros (DKFZ)

DKFZ | Apr 2025 - Nov 2025

Annotated a library of >1,000 under-characterized small molecules by building automated pipelines to extract external database data via REST APIs, and predicting target drug pathways by gene ontology analysis.

MSc. Internship with Stefan Wiemann (DKFZ)

DKFZ | Mar 2025 - July 2025

Analyzed Illumina TruSeq bulk RNA-Seq data to categorize CRISPR-mediated GLYATL1 knockout mutants. Verified bioinformatic findings by executing Sanger Sequencing and PCR protocols. For these contributions was awarded a coauthorship on a manuscript in review at BMC Clinical Epigenetics. See publications.

BSc. Full-time Researcher with Dr. Kyoung-Han Kim (uOttawa)

UOttawa Heart Institute | Sept 2022 - July 2024

Identified the role of ketogenesis in adipogenesis using scRNA-seq bioinformatic pipelines, and validated adipocyte hypertrophy and morphology using ImageJ algorithms. Futher wet-lab experiments included mice husbandry, tissue extraction, immunohistochemistry and qPCR. Additionally produced key bioinformatic evidence for the role of ketone bodys in hepatocyte-immune cell crosstalk in a sex dependent manner. These contributions awarded a coauthorship on a manuscript in review at American Diabetes Association - Diabetes. See publications.

Projects

AnkiExam (Educational Software)

Software Development

Developed a Python-based study tool plugin for Anki. Engineered a secure, end-to-end encrypted user authentication system using OAuth2.0 and PostgreSQL, with a PyQt6 front-end. While current support is limited for this project, 50 active monthly users were achieved; demonstrating a strong proof of concept.

LabMetrics (Academic Mentorship Analytics)

Data Analytics / Full-stack

Developed LabMetrics, an academic scanner that calculates mentorship metrics (M-Index) based on lab co-authorship data. The platform helps students identify supportive Principal Investigators by weighing student-led publications, filtering out traditional H-Index biases. Features an interactive leaderboard and lab comparison tools. labmetrics.online

Flappy Bird Genetic Algorithm

Deep Learning / Neural Networks

Developed a custom Genetic Algorithm (GA) to evolve the weights and biases of a fixed-topology neural network (8 inputs → 12 hidden → 1 output) to play Flappy Bird. The implementation features tournament selection, uniform crossover, Gaussian mutation, and a 'Hall of Fame' system to preserve top-performing agents. I initially developed this project using pygame for rendering and have since refactored this entirely for JS and HTML5 canvas. Please check out the live demo which performs the genetic algorithm directly in the browser at /flappy.

Get In Touch

Based in Heidelberg, Germany. I am available for opportunities in Computational Biology and Data Science.

Publications

Sex and Ketogenesis-Dependent Effects of Intermittent Fasting

Bioinformatics, scRNA-seq, Ketogenesis, Research

Sex- and ketogenesis-dependent effects of intermittent fasting against diet-induced obesity and fatty liver disease. Co-authored preprint investigating the role of hepatic ketogenesis in the metabolic benefits of intermittent fasting using scRNA-seq. doi:10.1101/2025.11.17.688915

GLYATL1 Endocrine Resistance

Breast Cancer, Endocrine Resistance, Metabolism, Epigenetics, scRNA-seq, Research

GLYATL1 is associated with metabolic and epigenetic changes and with endocrine resistance in luminal breast cancer. Co-authored preprint investigating ERα-positive luminal breast cancer and AI-resistance. doi:10.1101/2025.10.17.682994

Figure 1: Interactive visualization of curriculum vitae (CV) topology via real-time dimensionality reduction. The dashboard functions as a real-time simulation, utilizing a scikit-learn backend to dynamically process structural and semantic features of CV items. (A) The primary view renders CV entries (e.g., Projects, Publications) as nodes within a 2D manifold. User-defined constraints (bottom-left sliders) trigger immediate re-computation of Principal Component Analysis (PCA) and UMAP (Uniform Manifold Approximation and Projection) reductions, reorganizing the layout based on high-dimensional feature vectors. (B, C) Detail views for selected nodes. (D) Real-time scree plot showing the explained variance ratio for the first six principal components. (E) Feature loading analysis for the first principal component (PC1), quantifying the contribution of specific attributes—such as categorical tags or semantic similarity—to the current clustering configuration.