 |
|
 |
 |
 |
 |
 |
 |
 |
 |
|
 |
|
You are here :-
People >> Putter
|
|
 |
Hein Putter started working at the Department of Medical Statistics of the Leiden University Medical Center (LUMC) in February 2000. Prior to his coming (or rather returning) to Leiden he worked for about 1 years at NATEC and the Department of Human Retrovirology of the AMC in Amsterdam. He got some experience in clinical trials there and in modelling of the complex interaction of HIV with the immune system, in close collaboration with Frank de Wolf and the Wellcome Trust Centre for the Epidemiology of Infectious Diseases in Oxford. Prior to that he spent a (much too short) period of time at the Statistical Laboratory of the University of Cambridge as a post-doc of the European Network on Computational and Spatial Statistics, working on kriging with Alistair Young. Before Cambridge he did post-doc work on semi-parametric methods at the Free University and the University of Amsterdam. His PhD is in mathematical statistics on the subject of resampling methods under the supervision of Prof van Zwet.
> Master of Science from the University of Leiden, 1989, Prof. dr. W. R. van Zwet > PhD in 1994, Leiden: "Consistency of Resampling Methods", Prof. dr. W. R. van Zwet > 1995, University of Amsterdam > 1996-1997, Free University Amsterdam > 1997-1998, Statistical Laboratory, Cambridge, UK > 1998-1999, Amsterdam Medical Center > 2000-present, Assistent Professor (UD), Leiden University Medical Center
|
|
|
 |
|
|
 |
|
|
|
 |
|
|
|
 |
|
 |
|
|
Relative efficiency of haplotype frequency estimation in sibships and nuclear families compared to unrelated individuals
|
|
Putter H, Meulenbelt I, van Houwelingen JC (2007). Relative efficiency of haplotype frequency estimation in sibships and nuclear families compared to unrelated individuals. Hum Hered. 64: 52-62. http://content.karger.com/produktedb/produkte.asp?typ=fulltext&file=000101423
|
|
read summary
The problem of estimating haplotype frequencies from unphased single nucleotide polymorphism (SNP) genotype data in sibships with and without parents is considered. We focus on the Fisher information of the haplotype frequencies of the parents in order to correctly deal with the dependence of haplotypes within sibships. We compare these Fisher information matrices with those obtained for unrelated individuals and study the relative efficiency of sibships with and without parents compared to unrelated individuals in estimating haplotype frequencies. Crudely summarizing, the second sib contributes half the information of the first, except for rare haplotypes, when the second sib counts almost as one. We argue that the relative efficiencies can also be used to correct for dependence in the calculation of standard errors after initially ignoring the dependence in the estimation phase.
|
|
|
|
|
read summary
Standard survival data measure the time span from some time origin until the occurrence of one type of event. If several types of events occur, a model describing progression to each of these competing risks is needed. Multi-state models generalize competing risks models by also describing transitions to intermediate events. Methods to analyze such models have been developed over the last two decades. Fortunately, most of the analyzes can be performed within the standard statistical packages, but may require some extra effort with respect to data preparation and programming. This tutorial aims to review statistical methods for the analysis of competing risks and multi-state models. Although some conceptual issues are covered, the emphasis is on practical issues like data preparation, estimation of the effect of covariates, and estimation of cumulative incidence functions and state and transition probabilities. Examples of analysis with standard software are shown.
|
|
|
|
|
read summary
An important aim in clinical studies in oncology is to study how treatment and prognostic factors influence the course of disease of a patient. Typically in these trials, besides overall survival, also other endpoints such as locoregional recurrence or distant metastasis are of interest. Most commonly in these situations, Cox regression models are applied for each of these endpoints separately or to composite endpoints such as disease-free survival. These approaches however fail to give insight into what happens to a patient after a first event. We re-analyzed data of 2795 patients from a breast cancer trial (EORTC 10854) by applying a multi-state model, with local recurrence, distant metastasis, and both local recurrence and distant metastasis as transient states and death as absorbing state. We used an approach where the clock is reset on entry of a new state. The influence of prognostic factors on each of the transition rates is studied, as well as the influence of the time at which intermediate events occur. The estimated transition rates between the states in the model are used to obtain predictions for patients with a given history. Formulas are developed and illustrated for these prediction probabilities for the clock reset approach.
|
|
|
|
|
Combining evidence for association from transmission disequilibrium and case-control studies using single-nucleotide polymorphisms
|
DOWNLOAD |
|
Putter H, Houwing-Duistermaat JJ, Nagelkerke NJ (2005). Combining evidence for association from transmission disequilibrium and case-control studies using single-nucleotide polymorphisms. BMC Genet. 6: S106.
http://www.biomedcentral.com/1471-2156/6/S1/S106
|
|
read summary
The aim of the present analysis is to combine evidence for association from the two most commonly used designs in genetic association analysis, the case-control design and the transmission disequilibrium test (TDT) design. The cases here are affected offspring from nuclear families and are used in both the case-control and TDT designs. As a result, inference from these designs is not independent. We applied a simple logistic regression method for combining evidence for association from case-control and TDT designs to single-nucleotide polymorphism data purchased on a region on chromosome 3, replicate 1 of the Aipotu population. Combining the evidence from the case-control and TDT designs yielded a 5-10% reduction in the standard errors of the relative risk estimates. The authors did not know the results before the analyses were conducted.
|
|
|
|
|
read summary
The mean identity-by-descent (IBD) specification used in the Generalized Estimating Equations (GEE) methodology for linkage is only valid, strictly speaking, under the assumption of fully polymorphic markers. In practice, markers often provide only partial IBD information, which can potentially result in inconsistency of the locus location and gene effect estimates obtained by the GEE method. Using both simulations and theory, we identify some realistic conditions about marker information under which the validity of the GEE linkage methods may be arguable. Namely, researchers should not trust the GEE parameters' estimates and their associated confidence intervals in areas of the genome where IBD information is sparse or when this information changes abruptly. We show that properly standardized statistics based on IBD sharing provide a valid alternative.
|
|
|
|
|
Long-term survival with non-proportional hazards: results from the Dutch Gastric Cancer Trial
|
|
Putter H, Sasako M, Hartgrink HH, van de Velde CJ, van Houwelingen JC (2005). Long-term survival with non-proportional hazards: results from the Dutch Gastric Cancer Trial. Stat Med. 24: 2807-21. http://www3.interscience.wiley.com/cgi-bin/abstract/111082867/ABSTRACT
|
|
read summary
Randomized clinical trials with long-term survival data comparing two treatments often show Kaplan-Meier plots with crossing survival curves. Such behaviour implies a violation of the proportional hazards assumption for treatment. The Cox proportional hazards regression model with treatment as a fixed effect can therefore not be used to assess the influence of treatment of survival. In this paper we analyse long-term follow-up data from the Dutch Gastric Cancer Trial, a randomized study comparing limited (D1) lymph node dissection with extended (D2) lymph node dissection. We illustrate a number of ways of dealing with survival data that do not obey the proportional hazards assumption, each of which can be easily implemented in standard statistical packages.
|
|
|
|
|
Missing forms and dropout in the TME quality of life substudy
|
|
Putter H, Marijnen CA, Kranenbarg EK, van de Velde CJ, Stiggelbout AM (2005). Missing forms and dropout in the TME quality of life substudy. Qual Life Res. 14: 857-65. http://www.springerlink.com/content/w1678nj414876118/
|
|
read summary
OBJECTIVE: Missing forms may pose problems in health related quality of life (QOL) studies, because the absence of a QOL measure may be related to the patient's health and hence to the patient's QOL itself. Studying patterns of missingness, dropout, and the possible impact of missing data on QOL measures is an important step in reporting outcomes of QOL studies. We study patterns of dropout and evaluate the impact of missing forms in the TME QOL substudy. METHODS: Patients with rectal cancer, randomized to receive either radiotherapy plus total mesorectal excision (TME) or TME only were included in the TME trial. QOL was evaluated in 1302 Dutch patients, before treatment, and 3, 6, 12, 18 and 24 months after surgery. Here only the visual analogue score (VAS) was studied. RESULTS: At baseline, differences between VAS scores were found with respect to whether the QOL forms were dated before or after radiotherapy and surgery. Differences were small between different statistical methods accounting for dropout; only a cross-sectional analysis gave biased results. CONCLUSION: The results of the sensitivity analysis indicated that a linear mixed model analysis is a reliable and attractive approach for this study.
|
|
|
|
|
read summary
Competing events concerning individual subjects are of interest in many medical studies. For example, leukemia-free patients surviving a bone marrow transplant are at risk of developing acute or chronic graft-versus-host disease, or they might develop infections. In this situation, competing risks models provide a natural framework to describe the disease. When incorporating covariates influencing the transition intensities, an obvious approach is to use Cox's proportional hazards model for each of the transitions separately. A practical problem then is how to deal with the abundance of regression parameters. Our objective is to describe the competing risks model in fewer parameters, both in order to avoid imprecise estimation in transitions with rare events and in order to facilitate interpretation of these estimates. Suppose that the regression parameters are gathered into a p x K matrix B, with p and K as the number of covariates and transitions, respectively. We propose the use of reduced rank models, where B is required to be of lower rank R, smaller than both p and K. One way to achieve this is to write B = AGamma(intercal) with A and Gamma matrices of dimensions p x R and K x R, respectively. We shall outline an algorithm to obtain estimates and their standard errors in a reduced rank proportional hazards model for competing risks and illustrate the approach on a competing risks model applied to 8966 leukemia patients from the European Group for Blood and Marrow Transplantation.
|
|
|
|
|
read summary
Consider a semiparametric model with a Euclidean parameter and an infinite-dimensional parameter, to be called a Banach parameter. Assume:
(a) There exists an efficient estimator of the Euclidean parameter.
(b) When the value of the Euclidean parameter is known, there exists an estimator of the Banach parameter, which depends on this value and is efficient within this restricted model.
Substituting the efficient estimator of the Euclidean parameter for the value of this parameter in the estimator of the Banach parameter, one obtains an efficient estimator of the Banach parameter for the full semiparametric model with the Euclidean parameter unknown. This hereditary property of efficiency completes estimation in semiparametric models in which the Euclidean parameter has been estimated efficiently. Typically, estimation of both the Euclidean and the Banach parameter is necessary in order to describe the random phenomenon under study to a sufficient extent. Since efficient estimators are asymptotically linear, the above substitution method is a particular case of substituting asymptotically linear estimators of a Euclidean parameter into estimators that are asymptotically linear themselves and that depend on this Euclidean parameter. This more general substitution case is studied for its own sake as well, and a hereditary property for asymptotic linearity is proved.
|
|
|
|
|
read summary
In many biomedical investigations, multivariate or clustered failure time data are encountered. They arise for instance, when the sample consists of centres and each centre contains several patients. For this kind of data, frailty models can be used to take into account the possible correlation within centres. Most common are centre-specific frailty models in which the centre-effect on failure time is assumed to be constant with follow-up. But with many applications this assumption is too restrictive. More realistic are models with time-varying frailties. Therefore, we studied ways to extend the constant centre-specific frailty model to allow time dependence of the frailties. To begin, we followed and adapted Paik et al. (1994) who generalized the shared frailty model by introducing additional random frailty terms for different time-intervals. Although, the model allows the frailty to vary between intervals, it is very cumbersome to calculate and difficult to fit. We developed two much simpler centre-specific frailty models. First, we proposed a model with a power parameter in which the effect of the centre-specific frailty is allowed to vary between intervals. And secondly, we extended the model permitting the centre-specific frailty to vary with time. Although the convenience of the gamma and positive stable frailty models is lost in both extensions of the frailty model, the computations of these models are much easier than of the adapted Paik's model. We applied these time-dependent frailty models to a data set from the European Blood Marrow Transplant (EBMT) registry, where a decaying centre-effect was found.
|
|
|
|
|
read summary
We present a unified approach to selection and linkage analysis of selected samples, for both quantitative and dichotomous complex traits. It is based on the score test for the variance attributable to the trait locus and applies to general pedigrees. The method is equivalent to regressing excess IBD sharing on a function of the traits. It is shown that when population parameters for the trait are known, such inversion does not entail any loss of information. For dichotomous traits, pairs of pedigree members of different phenotypic nature (e.g., affected sib pairs and discordant sib pairs) can easily be combined as well as populations with different trait prevalences.
|
|
|
|
|
read summary
Genetic linkage analysis for complex diseases offers a major challenge to geneticists. In these complex diseases multiple genetic loci are responsible for the disease and they may vary in the size of their contribution; the effect of any single one of them is likely to be small. In many situations, like in extensive twin registries, trait values have been recorded for a large number of individuals, and preliminary studies have revealed summary measures for those traits, like mean, variance and components of variance, including heritability. Given the small effect size, a random sample of twins will require a prohibitively large sample size. It is well known that selective sampling is far more efficient in terms of genotyping effort. In this paper we derive easy expressions for the information contributed by sib pairs for the detection of linkage to a quantitative trait locus (QTL). We consider random samples as well as samples of sib pairs selected on the basis of their trait values. These expressions can be rapidly computed and do not involve simulation. We extend our results for quantitative traits to dichotomous traits using the concept of a liability threshold model. We present tables with required sample sizes for height, insulin levels and migraine, three of the traits studied in the GenomEUtwin project.
|
|
|
|
|
read summary
The two most popular methods to detect linkage of a quantitative trait to a marker are the Haseman-Elston regression method and the variance components likelihood-ratio test. In the literature, these methods are frequently compared and the relative advantages and disadvantages of each method are well known. In this article, we derive a score test for the variance component attributable to a specific quantitative trait locus and show that for sib-pairs it is mathematically equivalent to a recently proposed version of the Haseman-Elston method that optimally combines the sum squared and the difference squared of the centered phenotype values of the sibs. Because score tests and likelihood-ratio tetsts are equivalent for large sample sizes, the variance components likelihood-ratio test is also asymptotically equivalent to this optimal Haseman-Elston test. This fact gives a theoretical explanation of the empirical observation from simulation studies reporting similar power of the variance components likelihood-ratio test and the optimal Haseman-Elston method. Perhaps more importantly for practical purposes, the score test can also be extended in a natural way to support the simultaneous analysis of more than two subjects and multivariate phenotypes.
|
|
|
|
|
read summary
Bootstrap methods have gained wide acceptance and huge popularity in the field of applied statistics. The bootstrap is able to provide accurate answers in cases where other methods are simply not available, or where the usual approximations are invalid. The number of applications in chemistry, however, has been rather limited. One possible cause for this is the overwhelming number of techniques available. This tutorial aims to introduce the basic concepts of bootstrap methods, provide some guidance as to what bootstrap methods are appropriate in different situations, and illustrate several potential application areas in chemometrics by worked examples.
|
|
|
|
|
read summary
In the context of a mathematical model describing HIV infection, we discuss a Bayesian modelling approach to a non-linear random effects estimation problem. The model and the data exhibit a number of features that make the use of an ordinary non-linear mixed effects model intractable: (i) the data are from two compartments fitted simultaneously against the implicit numerical solution of a system of ordinary differential equations; (ii) data from one compartment are subject to censoring; (iii) random effects for one variable are assumed to be from a beta distribution. We show how the Bayesian framework can be exploited by incorporating prior knowledge on some of the parameters, and by combining the posterior distributions of the parameters to obtain estimates of quantities of interest that follow from the postulated model.
|
|
|
|
|
read summary
The kriging procedure gives an optimal linear predictor of a spatial process at a point x(o), given observations of the process at other locations x(1),..., x(n), taking into account the spatial dependence of the observations. The kriging predictor is optimal if the weights are calculated from the correct underlying covariance structure. In practice, this covariance structure is unknown and is estimated from the data. An important, but not very well understood, problem in kriging theory is the effect on the accuracy of the kriging predictor of substituting the optimal weights by weights derived from the estimated covariance structure. We show that the effect of estimation is negligible asymptotically if the joint Gaussian distributions of the process at x(o),..., x(n) under the true and the estimated covariance are contiguous almost surely. We consider a number of commonly used parametric covariance models where this can indeed be achieved.
|
|
|
|
|
Second Order and Bootstrap Approximation to Student's t Statistic
|
DOWNLOAD |
|
|
|
read summary
We prove the validity of one-term Edgeworth expansion for Student's t-statistic under minimal conditions: the distribution of observations is nonlattice and has finite third moment. As a corollary we obtain the second-order correctness for the bootstrap of Student's t-statistic under these optimal conditions, thus extending a classical result of Singh [Ann. Statist., 9 (1981), pp. 1187--1195] to the Studentized mean.
|
|
|
|
|
read summary
In this paper the validity of a one-term Edgeworth expansion for Studentized symmetric statistics is proved. We propose jackknife estimates for the unknown constants appearing in the expansion and prove their consistency. As a result we obtain the second-order correctness of the empirical Edgeworth expansion for a very general class of statistics, including $U$-statistics, $L$-statistics and smooth functions of the sample mean. We illustrate the application of the bootstrap in the case of a $U$-statistic of degree two.
|
|
|
|
|
Resampling: Consistency of substitution estimators
|
DOWNLOAD |
|
|
|
read summary
On the basis of N i.i.d. random variables with a common unknown distribution P we wish to estimate a functional $\tau_N(P)$. An obvious and very general approach to this problem is to find an estimator $\hat{P}_N$ of P first, and then construct a so-called substitution estimator $\tau_N (\hat{P}_N)$ of $\tau_N(P)$. In this paper we investigate how to choose the estimator $\hat{P}_N$ so that the substitution estimator $\tau_N (\hat{P}_N)$ will be consistent.
Although our setup covers a broad class of estimation problems, the main substitution estimator we have in mind is a general version of the bootstrap where resampling is done from an estimated distribution $\hat{P}_N$. We do not focus in advance on a particular estimator $\hat{P}_N$, such as, for example, the empirical distribution, but try to indicate which resampling distribution should be used in a particular situation. The conclusion that we draw from the results and the examples in this paper is that the bootstrap is an exceptionally flexible method which comes into its own when full use is made of its flexibility. However, the choice of a good bootstrap method in a particular case requires rather precise information about the structure of the problem at hand. Unfortunately, this may not always be available.
|
|
|
|
|
On a Set of the First Category
|
|
|
|
|
|
|
|
|
|
|
|
 |
|
|
 |
|
|
|
 |
|
|
|
 |
 |
 |
September 03, 2010
|
 |
|
 |
|