Nancy Zhang

Associate Professor of Statistics at The Wharton School

Schools

  • The Wharton School

Links

Biography

The Wharton School

I got a BSc in Mathematics (2001), MSc in Computer Sciences (2001), and PhD in Statistics, all from Stanford University. From 20052006 I was a postdoctoral researcher at UC Berkeley. In 2006 I joined the Department of Statistics at Stanford University as assistant professor. I moved to Univ. of Pennsylvania in 2011.

Nancy Zhang, B Yakir, Charlie L. Xia, David O. Siegmund (2016), Scanning a Poisson random field for local signals, Annals of Applied Statistics, to appear.

X. Wang, M. Chen, X. Yu, N. Pornputtapong, H. Chen, Nancy Zhang, RS Powers, M Krauthammer (2016), Global copy number profiling of cancer genomes, Bioinformatics, to appear.

Anna Cushing, Amanda Kamali, M. Winters, Erik Hopmans, J Bell, S. Grimes, Li C. Xia, Nancy Zhang, Ronald B. Moss, M Holodniy, Hanlee P. Ji (2015), Emergence of Hemagglutinin Mutations During the Course of Influenza Infection, Scientific Reports, 5.

M Yue, X Han, L De Masi L, C Zhu, X Ma, Junje Zhang, Renwei Wu, Robert Schmieder, Radhey S. Kaushik, George P. Fraser, Shaohua Zhao, Patrick F. McDermott, FrançoisXavier Weill, Jacques G. Mainil, Cesar Arze, W. Florian Fricke, Robert A. Edwards, Dustin Brisson, Nancy Zhang, Shelley C. Rankin, Dieter M. Schifferli (2015), Allelic Variation Contributes to Bacterial Host Specificity, Nature Communications, 6.

Lucia L Peixoto, Mathieu Wimmer, Shane G Poplawski, Jennifer C Tudor, Charles A Kenworthy, Shichong Liu, Keiko Mizuno, Benjamin Garcia, Nancy Zhang, K Peter Giese, Ted Abel (2015), Memory acquisition and retrieval impact different epigenetic processes that regulate gene expression, BMC Genomics, 16.

Y Jiang, DA Oldridge, SJ Diskin, Nancy Zhang (2015), CODEX: a normalization and copy number variation detection method for whole exome sequencing, Nucleic Acids Research, 43.

Hao Chen and Nancy Zhang (2015), Graphbased changepoint detection, The Annals of Statistics, 43 (139).

H. Chen, J Bell, Nicolas A. Zavala, Hanlee P. Ji, Nancy Zhang (2014), Allelespecific copy number profiling by nextgeneration DNA sequencing, Nucleic Acids Research, 23 (e23).

Lincoln D Nadauld, Sarah Garcia, Georges Natsoulis, John Bell, Laura Miotke, Erik Hopmans, Hua Xu, Reetesh K Pai, Curt Palm, John F Regan, Hao Chen, Patrick Flaherty, Akifumi Ootani, Nancy Zhang, James M Ford, Calvin J Kuo, Hanlee P. Ji (2014), Metastatic tumor evolution and organoid modeling implicate TGFBR2as a cancer driver in diffuse gastric cancer, Genome Biology, 15 (428).

Georges Natsoulis, Nancy Zhang, Katrina Welch, John Bell, Hanlee Ji (2013), Identification of Insertion Deletion Mutations from Deep Targeted Resequencing, Journal of Data Mining in Genomics & Proteomics, 4, p. 132.

Abstract: Taking advantage of the deep targeted sequencing capabilities of next generation sequencers, we have developed a novel two step insertion deletion (indel) detection algorithm (IDA) that can determine indels from single read sequences with high computational efficiency and sensitivity when indels are fractionally less compared to wild type reference sequence. First, it identifies candidate indel positions utilizing specific sequence alignment artifacts produced by rapid alignment programs. Second, it confirms the location of the candidate indel by using the SmithWaterman (SW) algorithm on a restricted subset of Sequence reads. We demonstrate that IDA is applicable to indels of varying sizes from deep targeted sequencing data at low fractions where the indel is diluted by wild type sequence. Our algorithm is useful in detecting indel variants present at variable allelic frequencies such as may occur in heterozygotes and mixed normaltumor tissue.  

Past Courses

STAT102 INTRO BUSINESS STAT

Continuation of STAT 101. A thorough treatment of multiple regression, model selection, analysis of variance, linear logistic regression; introduction to time series. Business applications.

STAT405 STAT COMPUTING WITH R

The goal of this course is to introduce students to the R programming language and related ecosystem. This course will provide a skillset that is in demand in both the research and business environments. In addition, R is a platform that is used and required in other advanced classes taught at Wharton, so that this class will prepare students for these higher level classes and electives.

STAT431 STATISTICAL INFERENCE

Graphical displays; one and twosample confidence intervals; one and twosample hypothesis tests; one and twoway ANOVA; simple and multiple linear leastsquares regression; nonlinear regression; variable selection; logistic regression; categorical data analysis; goodnessoffit tests. A methodology course. This course does not have business applications but has significant overlap with STAT 101 and 102.

STAT471 MODERN DATA MINING

Modern Data Mining: Statistics or Data Science has been evolving rapidly to keep up with the modern world. While classical multiple regression and logistic regression technique continue to be the major tools we go beyond to include methods built on top of linear models such as LASSO and Ridge regression. Contemporary methods such as KNN (K nearest neighbor), Random Forest, Support Vector Machines, Principal Component Analyses (PCA), the bootstrap and others are also covered. Text mining especially through PCA is another topic of the course. While learning all the techniques, we keep in mind that our goal is to tackle real problems. Not only do we go through a large collection of interesting, challenging reallife data sets but we also learn how to use the free, powerful software "R" in connection with each of the methods exposed in the class.

STAT701 MODERN DATA MINING

Modern Data Mining: Statistics or Data Science has been evolving rapidly to keep up with the modern world. While classical multiple regression and logistic regression technique continue to be the major tools we go beyond to include methods built on top of linear models such as LASSO and Ridge regression. Contemporary methods such as KNN (K nearest neighbor), Random Forest, Support Vector Machines, Principal Component Analyses (PCA), the bootstrap and others are also covered. Text mining especially through PCA is another topic of the course. While learning all the techniques, we keep in mind that our goal is to tackle real problems. Not only do we go through a large collection of interesting, challenging reallife data sets but we also learn how to use the free, powerful software "R" in connection with each of the methods exposed in the class.

STAT705 STAT COMPUTING WITH R

The goal of this course is to introduce students to the R programming language and related ecosystem. This course will provide a skillset that is in demand in both the research and business environments. In addition, R is a platform that is used and required in other advanced classes taught at Wharton, so that this class will prepare students for these higher level classes and electives.

STAT991 SEM IN ADV APPL OF STAT

This seminar will be taken by doctoral candidates after the completion of most of their coursework. Topics vary from year to year and are chosen from advance probability, statistical inference, robust methods, and decision theory with principal emphasis on applications.

Sloan Fellowship, 2011 New World Silver Medal for Best PhD Thesis in Mathematical Sciences, 2007

Read about executive education

Other experts

Mark Schmitz

Mark Schmitz studied at the University of St. Gallen, Switzerland, Universita Commerciale Luigi Bocconi, Italy and WHU Otto Beisheim School of Management, Germany. He graduated with a Bachelor of Arts in Business Administration from University of St. Gallen and a Master in Business Administration...

Looking for an expert?

Contact us and we'll find the best option for you.

Something went wrong. We're trying to fix this error.