Tauska Wang

About me
Publications
Grants
Public outreach and news

About me

I’m a machine learning engineer focused on post-training large language models and high-integrity data pipelines—especially where scientific rigor and real-world impact intersect.
My work spans agentic systems, reward modeling (RLAIF/RLHF), and evaluation at scale, with a track record of building reproducible tooling and shipping measurable improvements.
Recent focus areas:
LLM post-training & evaluation: Scaled rubric-based reward datasets to the million-pair range; trained summary/CoT reward models; delivered a ~5% lift on a health benchmark.
Agentic systems & tooling: Built Pantheon-CLI and an LLM query-router; delivered end-to-end single-cell analysis agents; added cross-provider API compatibility; surpassed a general SWE-agent CLI on biomedical tasks.
Scientific/health AI & data engineering: Parallelized BKMR on k8s; vectorized 30 years of NHANES; developed TCGA/GEO pipelines; created a mixture-of-experts framework for COPD metal-risk assessment; integrated wet-lab and dry-lab workflows.
Quality & evidence: Author on peer-reviewed studies; reviewer for leading journals and ICLR; co-inventor on patents spanning transformer-based multimodal medical AI and biotechnology.
Tooling & stack: Python, PyTorch, PyG, SQL; GCP/Azure; open-source contributions (PantheonOS/Agentic Data Science, OmicVerse, RAG Web UI, AstrBot). Currently pursuing advanced studies bridging medical science and computer science, and I thrive on projects where careful ablations, clear metrics, and tough benchmarks drive decisions.
If you’re tackling LLM post-training, agentic workflows, or scientific/health data at production scale, I’m interested in collaborating on systems that move from promising results to verifiable outcomes.

Research

My research integrates reproductive endocrinology, metabolism, and artificial intelligence to uncover mechanisms of endocrine–metabolic disorders such as polycystic ovary syndrome (PCOS). I develop AI-driven frameworks for multi-omics integration and single-cell data analysis to reveal cross-organ communication among reproductive and metabolic tissues. I also explore large language models (LLMs) for biomedical text mining, literature synthesis, and clinical decision support. By combining systems biology with machine learning, my goal is to advance precision medicine and digital health approaches that bridge molecular discoveries with real-world clinical data, ultimately improving diagnosis, prevention, and treatment of reproductive–metabolic diseases.

Articles

Journal article: ADVANCED SCIENCE. 2026;:e75782
SIRT6-Mediated Deacetylation of ATF3 Promotes Silica-Induced Lung Fibrosis by Enhancing its Nuclear Import via Binding to Importin α.
Cheng D; Bu W; Wang F; Jin Y; Liu R; Zhao R; Wang X; Jiang M; Shen J; Cheng X; Chen Z; Zhu L; Li J; Ge Z; Miao S; Xu H; Zhou X; Wang D; Zhao X
Journal article: ANGEWANDTE CHEMIE-INTERNATIONAL EDITION. 2026;65(11):e2203782
SIRT6-Mediated Deacetylation of ATF3 Promotes Silica-Induced Lung Fibrosis by Enhancing its Nuclear Import via Binding to Importin α
Cheng D; Bu W; Wang F; Jin Y; Liu R; Zhao R; Wang X; Jiang M; Shen J; Cheng X; Chen Z; Zhu L; Li J; Ge Z; Miao S; Xu H; Zhou X; Wang D; Zhao X
Journal article: IMETAOMICS. 2026;3(1)
CellOntologyMapper: Consensus mapping of cell type annotation
Zeng Z; Wang X; Du H; Xing C
Journal article: JOURNAL OF TRANSLATIONAL MEDICINE. 2025;23(1):515
Pulmonary fibrosis: from mechanisms to therapies
Jiang M; Bu W; Wang X; Ruan J; Shi W; Yu S; Huang L; Xue P; Tang J; Zhao X; Su L; Cheng D
Journal article: TOXICS. 2025;13(2):118
Comprehensive Cross-Sectional Study of the Triglyceride Glucose Index, Organophosphate Pesticide Exposure, and Cardiovascular Diseases: A Machine Learning Integrated Approach
Wang X; Tian M; Shen Z; Tian K; Fei Y; Cheng Y; Ruan J; Mo S; Dai J; Xia W; Jiang M; Zhao X; Zhu J; Xiao J
Journal article: ENVIRONMENTAL HEALTH AND PREVENTIVE MEDICINE. 2025;30:35
Association between brominated flame retardants and obesity: a mediation analysis through markers of oxidative stress and inflammation
Fei Y; Cheng Y; Wang X; Ruan J; Zheng D; Cao H; Wang X; Wang X; Zhao X; Yang J
Journal article: TOXICS. 2024;12(12):918
Associations Between Brominated Flame Retardant Exposure and Depression in Adults: A Cross-Sectional Study
Cheng Y; Fei Y; Xu Z; Huang R; Jiang Y; Sun L; Wang X; Yu S; Luo Y; Mao X; Zhao X
Journal article: TOXICS. 2024;12(12):908
Unmasking the Invisible Threat: Biological Impacts and Mechanisms of Polystyrene Nanoplastics on Cells
Bu W; Cui Y; Jin Y; Wang X; Jiang M; Huang R; Egbobe JO; Zhao X; Tang J
Journal article: BIORESOURCE TECHNOLOGY. 2024;413:131539
Bifunctional sludge-derived redox carbon dots with photoelectron storage and delivery properties for ammonia production by photosensitized Shewanella oneidensis MR-1
Li Q; Lu H; Tian T; Zhang H; Cheng F; Li X; Sun H; Wang X; Zhou J
Journal article: ECOTOXICOLOGY AND ENVIRONMENTAL SAFETY. 2024;283:116842
Construction of metal interpretable scoring system and identification of tungsten as a novel risk factor in COPD
Wang X; Wang X; Cheng Y; Luo C; Xia W; Gao Z; Bu W; Jiang Y; Fei Y; Shi W; Tang J; Liu L; Zhu J; Zhao X
Journal article: BMC CANCER. 2024;24(1):1046
APOL6 predicts immunotherapy efficacy of bladder cancer by ferroptosis
Fan Z; Liu Y; Wang X; Xu Y; Huang R; Shi W; Qu Y; Ruan J; Zhou C; Zhao X; Liu L
Journal article: FOOD AND CHEMICAL TOXICOLOGY. 2024;184:114378
TRPML1 contributes to antimony-induced nephrotoxicity by initiating ferroptosis via chaperone-mediated autophagy
Liu L; Luo C; Zheng D; Wang X; Wang R; Ding W; Shen Z; Xue P; Yu S; Liu Y; Zhao X
Journal article: JOURNAL OF AFFECTIVE DISORDERS. 2024;344:554-562
The association between polycyclic aromatic hydrocarbons exposure and neuropsychiatric manifestations in perimenopausal women: A cross-sectional study
Cheng Y; Zhang Z; Ma X; Wang X; Chen L; Luo Y; Cao X; Yu S; Wang X; Cao Y; Zhao X
Journal article: FRONTIERS IN PHARMACOLOGY. 2023;14:1138452
Integrated single-cell and transcriptome sequencing analyses develops a metastasis-based risk score system for prognosis and immunotherapy response in uveal melanoma
Meng S; Zhu T; Fan Z; Cheng Y; Dong Y; Wang F; Wang X; Dong D; Yuan S; Zhao X
Journal article: FRONTIERS IN PHARMACOLOGY. 2023;13:1098136
Construction of a ferroptosis scoring system and identification of LINC01572 as a novel ferroptosis suppressor in lung adenocarcinoma
Hong L; Wang X; Cui W; Wang F; Shi W; Yu S; Luo Y; Zhong L; Zhao X
Journal article: FRONTIERS IN ONCOLOGY. 2022;12:1054564
A prognostic model and immune regulation analysis of uterine corpus endometrial carcinoma based on cellular senescence
Gao L; Wang X; Wang X; Wang F; Tang J; Ji J
Journal article: JOURNAL OF ONCOLOGY. 2022;2022:1-10
HPV-Related Prognostic Signature Predicts Survival in Head and Neck Squamous Cell Carcinoma
Zhao H; Wang F; Wang X; Zhao X; Ji J
Journal article: FRONTIERS IN GENETICS. 2022;13:981603
Comprehensive analysis of PTPN gene family revealing PTPN7 as a novel biomarker for immuno-hot tumors in breast cancer
Wang F; Wang X; Liu L; Deng S; Ji W; Liu Y; Wang X; Wang R; Zhao X; Gao E
Journal article: JCI INSIGHT. 2022;7(18):e161940
Statin shapes inflamed tumor microenvironment and enhances immune checkpoint blockade in non-small cell cancer
Mao W; Cai Y; Chen D; Jiang G; Xu Y; Chen R; Wang F; Wang X; Zheng M; Zhao X; Mei J
Journal article: BIOMED RESEARCH INTERNATIONAL. 2022;2022:8577821
Systematic Characterization of Expression Patterns and Immunocorrelations of Formin-Like Genes in Breast Cancer
Gao E; Wang X; Wang F; Deng S; Xia W; Wang R; Wang X; Zhao X; Qian H
Journal article: FRONTIERS IN ONCOLOGY. 2022;12:995929
Molecular subtypes, clinical significance, and tumor immune landscape of angiogenesis-related genes in ovarian cancer
Tang H; Shan J; Liu J; Wang X; Wang F; Han S; Zhao X; Wang J
Journal article: FRONTIERS IN ONCOLOGY. 2022;12:955979
The IFN-γ-related long non-coding RNA signature predicts prognosis and indicates immune microenvironment infiltration in uterine corpus endometrial carcinoma
Gu C; Lin C; Zhu Z; Hu L; Wang F; Wang X; Ruan J; Zhao X; Huang S
Journal article: FRONTIERS IN GENETICS. 2022;13:925231
Identification of a Gene Signature of Cancer-Associated Fibroblasts to Predict Prognosis in Ovarian Cancer
Zeng L; Wang X; Wang F; Zhao X; Ding Y
Show more

All other publications

Preprint: BIORXIV. 2025
CellOntologyMapper: Consensus mapping of cell type annotation
Zeng Z; Wang X; Du H

Grants

Thinking Machine Lab Tinker Research Grant
Thinking Machine Labs
10 November 2025 - 10 November 2026
Polycystic ovary syndrome (PCOS) is a heterogeneous endocrine and metabolic disorder in women, characterized by complex dysregulation across multiple tissues, including the ovary, endometrium, adipose tissue, skeletal muscle, and circulating immune and metabolic pathways. This complexity has led to highly fragmented bioinformatics workflows and substantial analyst-to-analyst variability, which limits reproducibility and slows translation of omics findings into clinically relevant insights. Supported by the Tinker research grant (USD 5,000 in compute credits over 12 months), this project aims to develop and evaluate a domain-adapted large language model (LLM)–driven “PCOS data analysis agent” for multi-omics analysis in women’s health. We will curate public bulk and single-cell transcriptomic and proteomic datasets related to PCOS and female metabolic health, along with high-quality method descriptions, analysis scripts, and reporting templates. Using Tinker’s infrastructure, we will fine-tune an LLM on this corpus and integrate it with our existing OmicVerse/OVAgent ecosystem to perform end-to-end tasks, such as pipeline design, parameter selection, quality control reasoning, cell-type annotation, and generation of structured analysis reports. The agent’s performance will be benchmarked against general-purpose LLMs and human expert baselines on accuracy, robustness, and run-to-run reproducibility of analysis outputs. By systematically testing whether a domain-adapted LLM can provide more consistent, transparent, and auditable analyses than ad-hoc expert workflows, this project seeks to establish a blueprint for trustworthy AI assistants in women’s health omics and to release reusable tools, prompts, and evaluation protocols to the research community.
Google Cloud Credits for Gemini research project at Karolinska Institutet
Google (Google Cloud / Gemini)

About me

Research

Articles

All other publications

Grants

News from KI

Events from KI