Steen Magnussen An assessment of three variance estimators for the k-nearest neighbour technique

A jackknife (JK), a bootstrap (BOOT), and an empirical difference estimator (EDE) of totals and variance were assessed in simulated sampling from three artificial but realistic complex multivariate populations (N = 8000 elements) organized in clusters of four elements. Intra-cluster correlations of the target variables (Y) varied from 0.03 to 0.26. Time-saving implementations of JK and BOOT are detailed. In simple random sampling (SRS), bias in totals was ≤ 0.4% for the two largest sample sizes (n = 200, 300), but slightly larger for n = 50, and 100. In cluster sampling (CLU) bias was typically 0.1% higher and more variable. The lowest overall bias was in EDE. In both SRS and CLU, JK estimates of standard error were slightly (3%) too high, while the bootstrap estimates in both SRS and CLU were too low (8%). Estimates of error suggested a trend in EDE toward an overestimation with increasing sample size. Calculated 95% confidence intervals achieved a coverage that in most cases was fairly close (± 2%) to the nominal level. For estimation of a population total the EDE estimator appears to be slightly better than the JK estimator.


Introduction
The k-nearest neighbour technique (kNN) is used to impute values of one or more target variables (Y) for elements in a finite population without a direct observation of Y (Paass 1985;Aha 1997).Imputations are based on a set of selected auxiliary variables (X) known for all (N) elements in the population and correlated with Y.In a typical kNN application, a probability sample of n elements provides paired observations of X and Y.A population element can be a pixel in an image of the population or simply a fixed-sized contiguous regular spatial area suitable for a tessellation of the population and compatible with the scale of the target variable.
The sample of n elements is referred to as the reference set, while the N-n elements with no observation of Y, are referred to as the target set (Tomppo 1991).The imputed Y-value for an Silva Fennica vol. 47 no. 1 article id 925 • Magnussen • An assessment of three variance estimators… element in the target set is a fixed known function (f) of the k Y-values in the reference set whose associated X-values are closest -in terms of a selected distance metric -to the X-values in the element to receive an imputation.The analyst chooses X, f, k, and the distance metric; usually through a combination of cross-validation procedures and ranking of goodness-of-fit statistics (McRoberts 2009).
The appeal of the kNN technique is: i) simple and flexible (non-parametric, distribution-free), ii) predictions of the target variable(s) for all elements in a population (suitable for mapping), iii) small area estimation of totals, and iv) an ability to handle multivariate imputations as easily as univariate imputations (Crookston and Finley 2008).The most important detractors are: i) difficulty in optimizing performance (bias and accuracy); ii) the curse of dimensionality, and iii) a lack of small area estimators of variance.
Leave-one-out cross-validation is often used to guide the analyst towards a good choice of k, an appropriate function f, and a suitable distance metric.Selection of X-variables have been done in the context of regression modeling or with more advanced methods like simulated annealing and genetic algorithms (Tomppo and Halme 2004;Barth et al. 2009).Given the non-parametric (distribution-free) nature of kNN, and the lack of a proper joint distribution of imputations (Lin and Jeon 2006), one cannot infer performance at the population level from in-sample point estimates of bias and accuracy (Chen and Shao 2000; Baffetta et al. 2009).
The curse of dimensionality (Scott 1992, p. 27) manifests itself by the fact that distances in X-space from a target element to the n reference elements become increasingly similar with the dimension of X. Conversely, the number of reference elements with a similar distance to a target element grows exponentially with the dimension of X.Or, as stated by Beyer et al. (1999), "when dimensionality increases, the distance to the nearest data point approaches the distance to the farthest data point".
In forestry the kNN technique has appeal (Maltamo and Kangas 1998;Holmström and Fransson 2003;Maselli et al. 2005;LeMay et al. 2008;Breidenbach et al. 2010) due to: i) readily available low-cost remotely-sensed auxiliary variables correlated with Y; and ii) difficulties encountered with alternative parametric and non-parametric multivariate modeling approaches (Koistinen et al. 2008) due to locally varying relationships between X and Y in response to variation in species, ages, forest structures, soils, and climate (Zhang and Shi 2004;Opsomer et al. 2008;McRoberts et al. 2010), and iii) predictions of individual population elements for mapping and small area estimation problems (Tomppo 2006).
A summary of a kNN inventory typically requires an estimate of precision of estimated strata and population totals.In early kNN applications, an estimate of precision was typically given as the average, over the n-reference elements, root-mean-squared error (RMSE) obtained in a leave-one-out cross-validation (e.g.Maltamo and Kangas 1998;Katila and Tomppo 2001).However, the RMSE only applies to the sampled elements and cannot be scaled to a population (stratum) total (Kim and Tomppo 2006;McRoberts et al. 2007).Kim and Tomppo (2006) proposed an ordinary kriging based estimator of the variance of element-level predictions (Cressie 1993, p. 127) and block-kriging (Van der Meer 2012) for small area estimation problems (SAE).However, the underlying assumption of stationarity is rarely reasonable in forestry.As well, the application of ordinary kriging theory to a prediction based on the k-nearest neighbours and not a weighted sum of all n reference elements may lead to a bias, akin to the bias following from a tapering of a covariance matrix (Kaufman et al. 2008). McRoberts et. al (2007) was the first to propose a kNN variance estimator for a population total that explicitly considered the variance and covariance of all population elements in both simple random and cluster sampling designs.I refer to this estimator as 'vMCR'.Computation of vMCR involves a double-summation over all elements included in a total.There was no formal testing of vMCR but in three case studies it agreed well with bootstrapped estimates of variance (McRoberts et al. 2011).Computation time for vMCR can be sharply reduced by a Monte Carlo approximation to the double summation (McRoberts et al. 2011).vMCR can serve as a direct estimator of variance in SAE problems (Särndal et al. 1992).Magnussen et al. (2009) proposed a model-based mean squared error (MSE) estimator for a kNN estimate of a population mean.They showed that the average element-level MSE estimator from a leave-one-out cross-validation underestimates the MSE of an areal mean by a significant margin.The applicability of their MSE to SAE problems requires further investigation.Chen and Shao (2000) developed a design-based estimator of variance for nearest neighbour imputations (k = 1), but it is only applicable to imputation of missing values in a planned survey.Baffetta et al. (2009) proposed a design-based "empirical-difference" variance estimator (vEDE) for a bias-corrected kNN estimator of a population total (mean).They viewed kNN imputations as proxy values for the target variable (Särndal et al. 1992, p. 221).Simulated simple random sampling from a population of N = 312 and relative sample sizes of 5%, 10%, 20%, and 30%, illustrated that the estimated relative bias of vEDE declined with increasing sample size.Bias became unimportant (< 5%) for the three largest sample sizes with k > 3 and when the coefficient of determination between Y and X was less than 0.8.By construct EDE and vEDE are limited to direct estimation of totals and variances in SAE problems.
An alternative design-based resampling variance estimator was proposed by Magnussen et al. (2009).A modified balanced repeated replication (BRR) scheme (Wolter 2007, p. 117) was employed to obtain estimates of element-level variances and covariances and a variance estimator of a total (vBRR).A comparison of vEDE and vBRR in simulated sampling from seven populations (576 ≤ N ≤ 900) indicated a good performance of both estimators in simple random sampling (SRS) and one-stage cluster sampling (CLU).In theory vBRR is suited for design-based inference in SAE problems (Särndal et al. 1992).Nothdurft et al. (2009) were among the first to use a bootstrap resampling variance estimator in a forest inventory kNN application (k = 1).Their primary objective was to compare variability in stand-level kNN imputations and in calibrated kNN empirical best linear unbiased predictors (EBLUP).
In a recent study of actual forest inventory kNN applications, McRoberts et al. ( 2011) compared vMCR to a jackknife (vJK) and a bootstrap (vBOOT) estimator of variance.In the case of SRS the differences between vMCR 0.5 , vJK 0.5 , and vBOOT 0.5 were minor (< 3%).For CLU the agreement between vMCR and vBOOT (no vJK results for clustered data) was generally satisfactory, but less so than in the SRS designs.An unexplained systematic effect of the bootstrap resampling protocol applied to clustered data (single-stage cluster sampling versus single-stage cluster sampling followed by simple random sampling within clusters (Field and Welsh 2007)) was manifest.With large clusters (14-18 elements) the two-stage bootstrap sampling generated vBOOT estimators that agreed with vMCR estimators.With small clusters the agreement was lacking.Firm conclusions cannot be drawn from these non-replicated case studies.The application of vJK and vBOOT to SAE problems requires further study.
The current lack of an extensive testing of kNN variance estimators for a population total, and sheer absence of tested kNN variance estimators for SAE problems, makes it difficult to make recommendations to practice.As a first step towards improving the situation, this study compares the performance (bias, accuracy, and coverage of calculated nominal 95% confidence intervals) of three variance estimators for a kNN estimate of a population total: vJK, vBOOT, and vEDE in simulated sampling from three artificial yet realistically complex multivariate populations (N = 8000).Results are presented for four sample sizes in SRS and CLU.vBRR was not included because it is complex and still demands too much computer time to be of practical use.SAE problems are Silva Fennica vol. 47 no. 1 article id 925 • Magnussen • An assessment of three variance estimators… beyond the scope of this study.To qualify as a suitable kNN variance estimator in SAE problems an estimator must first qualify as suited at the population level.A follow-up study of potentially suitable kNN variance estimators for SAE problems is anticipated.
In forestry, population sizes are generally large (N ≥ 10 4 ), and sample fractions are low (n / N ≤ 0.01), yet the number of reference elements is relatively large (n > 100).Hence, resampling estimators of variance for a kNN estimate of a population total can be computationally demanding despite a steady increase in the processing speed of desk-top computers.Fast neighbour search algorithms like the 'kd-tree' (e.g.Finley et al. 2006) in, for examples, the R-package 'yaImpute' (Crookston andFinley 2008), MATLAB®, andMATHEMATICA (Wolfram 1999), and parallel processing has sharply reduced the problem.Computations with graphics processor units (GPUs) will eventually deflate the time issue (Schenk et al. 2008).To improve the practicality of resampling estimators of variance for a population total, this study details steps that will curtail the time needed to compute a vJK or a vBOOT estimate.

Population, sampling objectives, and notation
A finite area population U composed of N equal-area spatial elements is considered with the objective of estimating the total of one or more target variables (Y) from a probability sample of size n.A set of auxiliary variables (X) is known for every element in the population.The auxiliary variables have been selected on grounds of their ability to predict Y.The probability sample (s) is obtained, without replacement, by either SRS or CLU.The population contains M clusters of size m so that N = m × M with m = 1 in SRS.In SRS a sample unit is equal to a population element; in CLU it is a cluster of m elements.Notation is for a univariate Y but extension to a multivariate case requires no new theory.

The kNN estimator
The kNN estimator of Y in the ith population element (Haara et al. 1997) can be written succinctly as where summation in Eq. 1 is over the k elements (j ~ i) in the sample (s) with auxiliary variable values closest to X i , and w ij is the weight given to the reference element y j∈s ean distances in X-space were used for the selection of the k-nearest neighbours.A standardized Euclidean distance metric is used throughout; i.e. the X-variables have been standardized to a mean of zero and a variance of one for the computation of distances in X-space.This is also the metric used by the 'euclidean' option in the R-package 'yaImpute' (Crookston and Finley 2008).
In practice the weights may be a function of distance that optimizes precision (McRoberts 2009).Here w ij = k -1 since the choice of weights is inconsequential for an assessment of the kNN variance estimators.
A kNN estimator of the population total of Y (T y ) is denoted as T y k and computed as the sum of y i k over the N population elements.The expected value of y i k and T y k over all possible samples are invariant to the sampling design (Baffetta et al. 2009).Silva Fennica vol. 47 no. 1 article id 925 • Magnussen • An assessment of three variance estimators…

The jackknife kNN estimator
The jackknife kNN estimator (JK) of T y (Efron 1982) is where T y l jk ( ) is the kNN estimator (pseudo-value) of T y after deleting the lth sample unit (l = 1,…,n).The jackknife estimator of variance (vJK) of T y jk is then (Wolter 2007, p. 153) where fpc is the finite population correction factor 1 -n × m / N. The variance estimator vJK is used as an estimator of the variance of T y k (Wolter 2007, p. 153).Appendix A provides a time-saving implementation of the jackknife estimators.

The bootstrap kNN estimator
The bootstrap kNN estimator of Y in the ith population element (y i ) and bth bootstrap sample is where s*(b) is the bth bootstrap sample generated by SRS with replacement from the original sample of n sample units (Field and Welsh 2007).Accordingly, a bootstrap replication estimator of a population total is * .It follows that the bootstrap estimator of variance (vBOOT) of is As in the case of vJK, the vBOOT in Eq 5 will be used as a variance estimator for T y k (Wolter 2007, p. 195).
In implementations of the bootstrap procedure, a replicate (b) specific N × k array of nearest neighbour identifiers ( N k NN (b), see Appendix A for details) is needed for each bootstrap sample, making computing times for vBOOT B times longer than for T y k .A faster bootstrap variant of BOOT (called FBOOT) is detailed in Appendix A. Variances estimated with this variant are denoted vFBOOT.Baffetta et al. (2009) proposed a design-based bias-adjusted empirical difference estimator (EDE) of T y for element sampling (m = 1).With an extension to clusters of size m ≥ 1, the EDE becomes

The empirical difference estimator
where π j is the sample inclusion probability of sample unit j (Särndal et al. 1992), and e ̅ j is the mean of the m differences between the actual and the kNN imputed values of Y in the jth sample unit (j = 1,…, n).In SRS m = 1 and e ̅ j is simply the difference (residual) for the jth sample unit.The associated Horvitz-Thompson type estimator of variance (vEDE) is where π jh is the joint sample inclusion probability of sample units j and h.

Estimator performance
The kNN, JK, BOOT, FBOOT, and EDE estimators of T y are assessed for bias in simulated SRS (m = 1) and CLU (m = 4) from three artificial yet realistic populations (see 2.8).Bias is estimated as the difference between the mean of 400 replicated kNN estimates of a total and the known true total (see 2.7).For ease of comparison the bias is expressed in percent of a true total.
Variance estimates obtained with vJK, vBOOT, vFBOOT, and vEDE were compared to the corresponding empirical estimate of variance (vEMP) of replicated estimates of a kNN total (see 2.7).Normal quantile 95% confidence intervals for estimated population totals, calculated from vJK and vEDE estimates of variance, are assessed for their coverage (relative frequency with which a calculated confidence interval includes T y ).Accelerated and bias-corrected normal quantiles were used to estimate coverage of bootstrap intervals (Efron and Tibshirani 1993, p. 188).
The distributions of replicated estimates of a total (see 2.7) were tested for normality at the 5% level with an Anderson-Darling (AD) test (Anderson and Darling 1952), and differences between estimated and actual totals (bias) were tested under the null hypothesis of a zero mean normal distribution (AD test).Distributions of estimated JK, BOOT, FBOOT, and EDE totals were tested for equality to the distribution of T y k (AD test).
The replicate mean of vJK, vBOOT, vFBOOT, and vEDE estimates of variance was compared to vEMP and tested for equality with an AD test using a bootstrap distribution of 400 paired differences.

Case studies
Three artificial multivariate populations (POP1, POP2, and POP3) of size N = 8000 elements were generated from known marginal distributions of X and Y and a target correlation coefficient between the variables in X and Y. Silva Fennica vol. 47 no. 1 article id 925 • Magnussen • An assessment of three variance estimators… There are three Y-variables (Y1, Y2, and Y3) in each population, three X-variables in POP1 (X1, X2, and X3), and four in POP2 and POP3 (X1, X2, X3, and X4).The marginal distributions of variables in the three populations were complex in order to reflect scenarios with skewed, multimodal, and non-Gaussian distributions in forest inventory applications as seen in actual inventory data (LeMay et al. 2008;Magnussen et al. 2009).Details of the populations are in Appendix B. The data used in this study can be accessed at http://www.silvafennica.fi/article/925.
It is recognized that the size of the simulated populations are orders of magnitude smaller than in practical kNN applications (Katila 2006;Bernier et al. 2010).Yet, the bias of a kNN estimator appears to be largely a function of sample size, and not population size (Katila 2006;McRoberts et al. 2011).A doubling of N would have added just 1% to the marginal variance of the target variables.A further doubling would have added less than 0.5% to the variances.Finally, the accuracy of the tested variance estimators is governed by intrinsic properties and sample sizes more than population size and sample fractions (Särndal et al. 1992).On an absolute scale, the studied sample sizes -like those in practice -are small and should vouch for the practical relevance of the study.

Choice of k
The k-value that resulted in the lowest RMSE varied from a low of 4 in POP1 to a high of 8 (POP2 with SRS and POP3 with CLU).The intermediate value of 6 was best in POP2 with CLU and POP3 with SRS.However, the effect of k on RMSE was modest once k was equal to or greater than 4. A common choice of k = 6 would not have changed the results by much.
Henceforth results are only reported for the k-value for which the average (across Y-variables) relative RMSE (in percent of the actual total) of a kNN total T y k  was lowest.

Bias
Under a SRS design the kNN estimator of a total had a bias ≤ 0.3% in POP1 and POP3 (Table 1).In POP2 bias approached 1% for Y1 and the two smallest sample sizes (50, 100).Apart from these two cases, there was no apparent effect of sample size.Results for JK, BOOT and FBOOT were almost identical.Bias for non-reported k-values were not materially different.Bias of T y k EDE ,  was in 7 out of 10 cases less than the bias of T y k  .On average, EDE reduced the bias in T y k  by approximately 33%; in just two cases (out of 36) did EDE results suggest a slightly larger bias than in T y k  .Results for CLU followed trends seen in SRS except for a larger variation, and a tendency towards a larger bias for the two smallest sample sizes (Table 2).A bias between 0.5% and 1.0% was encountered in 9 out of 36 cases in T y k  and T y k EDE ,


. In POP1 EDE was slightly less efficient in reducing bias than in POP2 and POP3.Fortunately, the squared bias was much smaller than the associated variance.Hence, inferences based on estimated variances will be approximately valid (Cochran 1977, p. 12).
The distribution of the 400 replicated estimates of a kNN totals were approximately normal in SRS designs (P-values in AD-tests were above 0.10 in 137 out of 144 cases).Under CLU there was no rejection of the hypotheses of a Gaussian sampling distribution of estimated totals.

Standard errors
The jackknifed estimates of standard error vJK 0.5 were, in SRS, close to vEMP 0.5 (Table 3), the difference was just 1% in half the scenarios.The hypotheses of equal variances (vJK = vEMP) were not rejected at the 5% level of significance.Yet distributions of standardized estimates of vJK were significantly different from a normal distribution in all but four cases.
SRS standard errors obtained from vEDE were slightly conservative with an overestimation that increased with sample size from 3% with n = 50 to 7% with n = 300 (Table 3).However, only one estimate (POP1, Y1, n = 50) was found to be statistically significantly different from vEMP 0.5 .
Trends in standard errors estimated under a CLU design (Table 4) were, by and large, similar to those reported for SRS.
For SRS and CLU designs with equal element sample sizes (n SRS = 200, n CLU = 50) the standard errors of a CLU design were larger than for a SRS design; the design effect of CLU (i.e. the ratio of CLU to SRS sampling variances for designs with equal number of sampled elements (Särndal et al. 1992, p. 53)) was approximately 1.1 for all estimators of variance.

Coverage of confidence intervals
Coverage rates of computed 95% confidence intervals are in Tables 5 and 6.Overall, jackknife intervals tend to be slightly too wide, while those of BOOT, FBOOT and EDE are slightly too narrow.With 400 Monte-Carlo replications, a departure of 2.1% from the nominal value is statistically significant at the 5% level of significance (t-test).Table 5 reports only 4 significant deviations out of 144 entries.Under the null hypothesis the expected number is 7 (144 × 0.05).Accordingly, the simultaneous null hypothesis was not rejected (Miller 1981, p. 9).
For the CLU designs (Table 6), there were 20 cases out of 144 where the coverage was either significantly above (3 cases) or below (17 cases) the nominal level.Significant departures were concentrated in bootstrap intervals.Without the accelerated bias-correction of bootstrap confidence intervals, the rate of significant departures would have been higher.Jackknife and EDE intervals achieved an average coverage of 0.95 with an almost perfect balance between over-and undercoverage.The BOOT intervals were, on average, too short with a mean coverage of 0.94.FBOOT intervals were the worst with a mean coverage of 0.93.

Discussion
In forestry kNN applications, the preferred values of k are typically greater than the values of 4 to 8 reported in this study (Maltamo and Kangas 1998;Tomppo and Halme 2004;Maselli et al. 2005;Falkowski 2010;McRoberts 2011).This is to be expected, since the number of auxiliary X-variables in many kNN applications is considerably greater than three, and because the optimal k is tied to the dimension of the feature space of X (Singh et al. 1993).Under ideal conditions with symmetric distributions of independent p-dimensional X variables significantly correlated with Y, the optimal k should not be far from 2 p (Stage and Crookston 2007).Under such favorable conditions, and a sufficiently large reference set, a majority of kNN imputations have a balance in kNN X-values above and below the X-value of a target element.A necessary requirement for accurate imputations (Chen and Shao 2000).A lower than optimal k-values may be preferred for mapping purposes where it can be important to preserve, as far as possible, the variance in observed sample values of Y (Meng et al. 2007).
The comparatively low dimension (three) of the feature space in this study does not limit the practical relevance of the study.A high-dimensional X-space suffers from the mentioned 'curseof-dimensionality' which sets the stage for inefficient selections of nearest neighbours, and hence loss of precision.Efforts towards reducing the dimension of X (Kim and Tomppo 2006;Magnussen et al. 2009) should precede variable selection, optimization of the distance metric, and searches for an 'optimal' weighting of X-variables (Katila and Tomppo 2001;Sironen et al. 2001;Tomppo and Halme 2004;Breidenbach et al. 2010;Latifi et al. 2010).
The kNN estimator is biased (Stroup and Mulitze 1991;Snapp and Venkatesh 1998;Chen and Shao 2000), but this study confirmed that the risk of a practically important bias (in a kNN population total) in forestry applications with a reference set greater than 200 appears to be low  (Fehrmann et al. 2008;Magnussen et al. 2010b;McRoberts 2011).The EDE estimator can, as expected (Baffetta et al. 2009), achieve an effective reduction in the bias of a kNN population total.The effectiveness of EDE to reduce bias appears to vary among populations and variables.Bias could be an issue in CLU designs with sample sizes < 30 clusters.Yet Magnussen et al. (2010a) did not report an increase in bias of the classic kNN estimator when the reference data were obtained under a CLU design.Further studies are needed to clarify the efficacy of EDE in bias reduction under a CLU design.Sampling distributions of kNN estimates of a population total (mean) obtained with kNN, JK, BOOT, and FBOOT, appeared to be approximately Gaussian which is one condition for obtaining correct quantile-based confidence intervals (Casella and Berger 2002, p. 240).With the low number of Monte-Carlo replications, it was not possible to detect differences among the resampling distributions of estimated population parameters.
Widely available low-cost auxiliary information correlated with the variables of interest (Y), ease of implementation (Crookston and Finley 2008), and provision of element level predictions of Y are important factors behind the popularity of the kNN technique in forestry (Franco-Lopez et al. 2001;Holmström and Fransson 2003).In applications of the kNN technique, it is usually required that an estimate of a population (stratum) total (mean) be accompanied by an estimate of variance; preferably an MSE, given the biased nature of a kNN estimator.A generally modest level of bias in kNN estimates supported by a reasonably large number of reference units, means that estimates of variance can be used in lieu of an MSE (Cochran 1977, p. 15).
The vEDE proposed by Baffetta et al. (2009) is easy and fast to compute for any probability design.Alternative variance estimators for the kNN are either derived from models of element-level variance and covariance (McRoberts et al. 2007;Magnussen et al. 2009) or computer-intensive resampling techniques (Magnussen 2009;McRoberts et al. 2011).In a small-scale testing (N = 312) of vEDE, estimates of RMSEs were only slightly biased (< 4%) when sample sizes were above 20 and the coefficient of determination in a multiple linear regression of the six auxiliary X variables and Y was between 0.2 and 0.6 (Baffetta et al. 2009).A second testing of vEDE in SRS with two actual and three somewhat larger, artificial populations ( 567 ≤ N ≤ 900) by Magnussen et al. (2010a) confirmed that vEDE was close to the Monte-Carlo estimates of variance.A third confirmation was provided by this study and will hopefully encourage application.Although vEDE also performed well in CLU, further testing may be needed with larger clusters (m > 4) and possibly a stronger intra-cluster correlation.Since vEDE is computed as a Horvitz-Thompson estimator of variance (Thompson 1992, p. 49) it can fail with small sample sizes (e.g.< 30) as in SAE applications (Fuller 2009, p. 311).
Although vJK, on average, was slightly better than vEDE at matching the empirical variances, the non-trivial burden of computing n leave-one-out estimates of a kNN total (mean) still weighs in favour of vEDE, despite the demonstrated opportunity to accelerate vJK computations.As computation speed improves, the reason to choose vEDE over vJK may dissipate (Schenk et al. 2008).
The performance of vJK was similar in SRS and CLU, although slightly more variable in the latter.We saw no sign that the level of intracluster correlation influenced the results.It remains to test vJK in designs with clusters of more than four elements.The success of the jackknife estimators supports the assumption linked to a jackknife estimator, that bias in a kNN estimator of a total can be written as a quadratic function of sampled Y-values (Wolter 2007, p. 153).
Attempts at accelerating computation of vJK even further, by estimating the effect of leaving out a sample unit on a kNN total from changes in the marginal frequencies with which elements 1,…,n × m enters the total, failed due to an unacceptable level of bias in estimated variances (> 30%).The large bias suggests that the joint-inclusion probability of membership in a nearest neighbour Silva Fennica vol. 47 no. 1 article id 925 • Magnussen • An assessment of three variance estimators… clique of size k is distinctly different from the product of marginal inclusion probabilities (Baffetta et al. 2009).
The kNN bootstrap variance estimator was also employed by McRoberts et al. (2011) who compared it to a parametric estimator (vMCR) of variance (McRoberts et al. 2007).In SRS designs there was good agreement between the two estimators, lending support to the assumption in vMCR: the variance of a single kNN imputation is (approximately) the variance of the Y-values of the k-nearest reference elements.For clustered data, the performance of the bootstrap appeared to depend on whether elements in a cluster were resampled or not (Field and Welsh 2007).For large clusters (m ≥ 8) a resampling of clusters followed by a resampling of elements within the cluster produced a stronger agreement with vMCR than a single stage resampling of clusters.It was not possible to explain the effects of the bootstrap procedure.Without the positive results with vEDE and vJK, vMCR would also have been studied.
In this study the performance of vBOOT and vFBOOT in SRS and CLU designs was comparable but slightly inferior to that of vEDE and vJK.Accelerating the computation of the bootstrap variance -by replacing nearest neighbours missed in a bootstrap sample with their nearest neighbour in the bootstrap sample -sharply reduced the time to compute a variance.For large populations and sample sizes vFBOOT will be faster than vJK and our results suggest that for sufficiently large n the two estimators will produce similar estimates.
Confidence intervals provide an intuitive summary of the uncertainty associated with an estimate obtained from a probability sample (Beal 1989).The combination of a low level of bias in a kNN estimate of a population total, an approximate Gaussian sampling distribution, and a consistent estimate of variance, sets the stage for computing a confidence interval with an expected coverage close to the nominal level (e.g.95%).Results from SRS confirmed this, but results from the CLU simulations reiterated the importance of choosing a consistent variance estimator.The analyst apparently has a greater chance of a correct coverage with confidence intervals computed from vJK or vEDE than with intervals computed from vBOOT or vFBOOT.Standard percentile bootstrap confidence intervals would, in a majority of cases, have shown a significant under-coverage.Thus bias-corrected accelerated percentile intervals (Efron and Tibshirani 1993, p. 188) are recommended for bootstrap kNN estimators.For sample fractions greater than 0.1 a JK confidence intervals can be improved by a finite population correction (Wolter 2007, p. 167).
This study emulated a realistic testing of kNN variance estimators in simulated SRS and CLU sampling with a larger population size than seen in other published studies.To make the simulations realistic and relevant to forestry, considerable effort was directed towards creating realistic populations with a cluster structure.Advancements in creating multivariate distributions with copulas (Srinivas et al. 2006;Fischer 2010) greatly facilitated the task.Further realism could come from adding outliers which are known to occur in inventory samples (McRoberts 2009).However, addition of outliers, although simple to do, would balloon the number of scenarios and push computing times to impractical levels.A study on the robustness of kNN estimators of totals and variance (Wang and Raftery 2002) against outliers is needed.
The target pair-wise correlation coefficients among the variables in the three populations are in Table B1.Generation of the 8000 multivariate correlated random variables was done using the copula technique with a multivariate Gaussian copula defined by the target correlation structures in Table B1 (Nelsen 1999;Srinivas et al. 2006;Fischer 2010).
A cluster structure with clusters of size (m) was incorporated in the three populations by: i) adding a uniform distributed (0,1) random variable (u) to the three populations; ii) specifying a target correlation ρ between u and the X-and Y-variables in a population; iii) sorting the population elements on their u-values; iv) adding an element identifier variable ω (ω = 1,…, N) to the sorted population values; and v) adding a cluster identifier γ (γ = 1,…, M) defined as [ω × m -1 ] where [x] is

*
over the N population elements.The bootstrap estimator (BOOT) of T y is the mean of the B bootstrap replication estimates, here denoted as T y k

Table 1 .
Relative bias (RB%) of kNN and EDE estimators of totals in SRS (m = 1).RB% = (estimate -actual) / actual × 100.Results are for the k-value (k opt ) that minimized the RMSE of an estimated total.

Table 2 .
Relative bias (RB%) of kNN and EDE estimators of totals in CLU (m = 4).RB% = (estimate -actual) / actual × 100.Results are for the k-value (k opt ) that minimized the RMSE of an estimated total.

Table 3 .
Estimates of relative standard error (se%) in SRS .(se% = standard error of total ÷ total × 100).An estimate significantly different from its empirical counterpart at the 5% level (AD-test) is indicated in gray.

Table 4 .
Estimates of relative standard error (se%) in CLU.(se% = standard error of total ÷ total × 100).An estimate significantly different from its empirical counterpart at the 5% level (AD-test) is indicated in gray.

Table 5 .
Achieved coverage rates (%) of nominal 95% confidence intervals under SRS.Statistical significant (5% level) departures from the nominal level are in gray.Silva Fennica vol.47no. 1 article id 925 • Magnussen • An assessment of three variance estimators…

Table 6 .
Achieved coverage rates (%) of nominal 95% confidence intervals under CLU.Statistical significant (5% level) departures from the nominal level are in gray.

Table B1 .
Pair-wise variable target correlations in the three populations (POP1, POP2, and POP3).Realized correlations between two different variables in the randomly generated populations of 8000 elements may deviate by up to 0.02 from the target.