Cut-off Importance Sampling of Bole Volume

Cut-off importance sampling (CIS) is introduced as a means of sampling individual trees for the purpose of estimating bole volume. The novel feature of this variant of importance sampling is the establishment on the bole of a cut-off height, He, above which sampling is precluded. An estimator of bole volume between predetermined heights Hi and Hu > Hc is proposed, and its design-based bias and mean square error are derived. In an application of CIS as the second stage of a two-stage sample to estimate aggregate bole volume, the gain in precision realized from CIS more than offset its bias when compared to the precision of importance sampling when He = HJJ.


Introduction
Importance sampling (IS) has received considerable attention in both the forestry and statistics literature as a method of estimating bole volume (e.g., Gregoire et al. 1986Gregoire et al. , 1993Gregoire et al. , 1995;;Wiant et al. 1989;Van Deusen 1990;Valentine et al. 1992;Schreuder et al. 1993;Robinson and Wood 1994).Application of the method ordinarily requires the measurement of diameters or cross-sectional areas of the bole at heights selected at random.The ease with which a bole measurement can be made on a standing tree depends upon the select-ed height and whether it is within the crown of the tree.High measurement points may be obscured from view or difficult to locate exactly (Wood and Wiant 1992).Särndal et al. (1992) discuss the possibility of cut-off sampling, in which a part of the target population is deliberately excluded from the sampling frame.This is presented as a compromise between probability sampling and nonprobabilistic selection and it leads to biased estimators.It is considered reasonable when it would cost too much to construct and maintain a complete frame, and the bias is not expected to be very great.
We suggest "cut-off importance sampling" (CIS) as a means to avoid measurements in the uppermost section of a bole.The technique takes its name from the fact that the sampling is restricted to the region of the bole beneath a cutoff height or "cut-off."The cut-off is determined for each bole prior to sampling.The restriction of the IS to beneath the cut-off tends to reduce the sampling variance.Of course, there is a downside to CIS: the estimate of the volume above the cut-off is biased.However, our studies indicate that, for small sample sizes, cut-off importance sampling yields an overall reduction in mean-square error.
Similarly, the volume of the proxy bole defined by the ptf between the limits H L and H v is denoted by V P (H L , Hu), therefore,

JHr
The height of the cut-off is denoted by H c and the volumes of the bole and the proxy bole between H L and H c are denoted, respectively, by V(H L , H c ) and

Method
The estimation of bole volume of a standing tree by IS ordinarily requires a measurement of the height (//) of the tree at the outset.The sampling also requires auxiliary information in the form of an integrable "proxy taper function" (ptf).
The ptf defines the cross-sectional area of a "proxy bole" and predicts the cross-sectional area of the bole of interest at any height 0 < h < H.The ptf is used to construct a probability density function for h from which the sample heights are selected at random (see, e.g., Gregoire et al. 1986).The ptf need not be specifically fitted to the species being sampled; a simple generic taper function will suffice.However, a very accurate ptf -in the sense that the taper of the proxy bole is nearly proportional to the taper of the bole of interest -will afford very efficient IS.

Preliminaries
Let A(h) denote cross-sectional area of the bole of interest at height h and let A p (h) denote the cross-sectional area of the proxy bole defined by the ptf at height h.The volume of the bole between the limits of interest, H L and H v , is denoted by V(H L , Hu), therefore, where h = fy (i = l,...,m) is selected at random from the probability density function: The variance of V(H L ,H V ) is:

Cut-off Sampling
Under the cut-off scheme, sampling is restricted between H L and H c , where H c < Hu.We define the cut-off probability density function, g(h), in the usual manner, i.e., otherwise If heights h = 9, (i = l,...,m) are selected at random from g(h), then the usual unbiased estimator of the volume from H L to the cut-off height, H c , is: Presumably the volume of the bole from H L to H v remains the parameter of interest.As an estimator of V(H L , Hu) under CIS we propose a ratio adjustment of the unbiased estimator V(H L ,Hc),which is unbiased for V(H L , H c )', and V(H c ,Hu) = V(H L ,Hu)-V(H L ,Hc), which subsumes all of the bias of V(H L ,Hu).Ordinarily, however, the magnitude of V(H L , H c ) will be far greater than that of V(H C , H v ), and hence there is a priori reason to believe that the bias of V(H L ,Hu) will be small relative to V(H L , H v ).
The mean-square error of V(Hi,Hu) as an estimator of V(H L , H v ) is: The analogy of The essential difference between (la) and (3b) is that/*(-) in (3b) is not a probability density, as it does not integrate to unity.This estimator of V(H L , Hy) is biased, in general: It is evident from the expression in square brackets that V(HL,HU) unbiasedly estimates V(H L , H v ) only if the proportions of the total volume above and below the cut-off are identical for both the bole of interest and the proxy bole.The CIS estimator of V(H L , H v ) can be partitioned into two components: 3 Test The statistical performance of V(H L ,Hu) under CIS was compared to that of V(H L ,Hu) under IS with the aid of detailed stem measurements on five species of trees: loblolly pine (Pinus taeda L.), southern red oak (Quercus falcata Michx L.), slash pine (Pinus elliottii Engelm.),sweetgum (Liquidambar styraciflua L.), and white oak (Quercus alba L.).Information summarizing tree sizes is displayed in Table 1.
Each tree in the database had been felled and cut into roughly 1 m sections.The cross-sectional area at the base of each section was calculated from a measurement of outside-bark diameter.We used a cubic spline to interpolate between the successive cross-sectional areas of each bole.Thus the cubic spline defined A(h) for H L < h < Hu and its integral gave V(H L , H v ).Any deviation between this determination of bole volume and that obtainable from gravimetric techniques was assumed to be inconsequential.An advantage to determining bole volume as the integral of bole cross-sectional area is the ability to compute volume to any stipulated upper-bole diameter.Proxy boles were defined by the following ptf: where H B denotes breast height.Following Gre- For the sake of illustrating the performance of the suggested sampling strategy, a two-stage design was implemented to estimate the aggregate bole volume in each population.The first stage consisted of the selection of trees from the population by list sampling with replacement using selection probabilities proportional to the integrated ptf, namely V P (H L , H v ).
The second stage consisted of the independent selection of m sampling heights on each tree chosen in stage 1 by both IS and CIS.In all cases the lower limit of integration, H L , matched the stump height of the tree.The upper limit, H v , was alternately set at total tree height, the height to a upper-bole diameter of 5 cm (2 inches), or of 10 cm (4 inches), in order to compare the effect that varying H v has on the performances of the estimator of aggregate volume.
The target parameter estimated by the twostage sampling was V = YJk=\ Vk where V k denotes the volume, V(H L , Hu) of the kth of N boles.Let V2S or V25, respectively, denote the estimator of V where IS or CIS is the second-stage method.In the first case, the estimator of Vis: where V k =V(H L ,Hu) for the kth bole (eqn (1)) and P k is the first-stage selection probability of the kth bole from the population of T V boles.Its variance is: p {l\~v (6) where Var(V*) is the variance of V(H L ,Hu) for the kth bole (eqn (2)).When CIS is used in the second stage, the estimator of Vis: The bias of V25 is N n and its mean-square error is where Vk and MSE(Vt) obtain from eqns (3) and ( 4), respectively.

Results
The results in Table 2 pertain to the case where n = 1 tree was selected in the first stage and m = 2 heights were selected from either f(h) (for IS) or g(h) (for CIS) in the second stage.The target volume, V, was aggregate bole volume to the height of a 5 cm (2 inch) upper-bole diame- ter.Therefore H v varied in each tree.Standard errors (SE) for V 2 s were calculated exactly from (6), because we knew the actual volume, V k , and first-stage selection probability, P k , of each tree in the population, and we could integrate the cubic-spline profile of each tree to evaluate Var(V^) for each tree, as well.Bias and rootmean-square errors (RMSE) of V 2 s were calculated exactly using ( 5) and ( 9).All results in Table 2 are expressed as a percentage of V. We also calculated errors for samplings to the height of an upper diameter of 10 cm (4 inches); the net effect was reduced bias and RMSE for all cases compared to the results in Table 2.The bias percentages reported in Table 2 are invariant to the size of both stages of sampling.The SE and RMSE results shown in the table can be prorated to first-stage samples of size n > 1 by dividing by -\fn .
For all five tree species, the RMSE of the estimate uniformly decreased as the cut-off lowered.The bias increased at varying rates, and we note that it was worse for those species for which the ptf was a poor fit (Table 2), specifically, the overall cost was higher for hardwoods than soft-woods, because the ptf was a better fit to the latter.
Profiles are presented for slash pine and white oak in Fig. 1.Note that the RMSE of V 2S at H c = Hu is equivalent to the SE of V 2 s, so to compare the performance of CIS at any particular point with IS, one needs to compare it with CIS at 100 % on the same graph.The graphs both clearly show that, as the cut-off descends the bole, the RMSE remains stable or actually decreases, while the bias increases.The RMSE decreases and the bias increases more quickly for the white oak than for the slash pine, as can be seen in the higher axis intercept of the RMSE curve.
A comparison profile for loblolly pine using the different shape parameters can be found in Fig. 2. From the intercepts of the RMSE curves we can see that the ptf fits loblolly pine better with c = 3 than with c = 4, again the bias increases more quickly for the poorer fit, but the RMSE seems to decrease at about the same rate.We can see here the cost of using an inappropriate ptf.
We also examined diagnostic graphs to better understand the relative behaviors of IS and CIS.A sample pair of graphs, for white oak, are displayed in Fig. 3. Fig. 3a shows that on an individual bole basis, the relative RMSE of V(H L ,Hu) is generally a bit smaller than standard error of V(H L ,Hu).Finally, we note with Fig. 3b that the maximum bias of V(H L ,Hu) is less than 2.0 % and that as bole size increases, the bias increases for this particular choice of ptf.
pling strategies investigated here.We have demonstrated that CIS enables more accurate estimation of aggregate bole volume than unrestricted importance sampling in these circumstances.While we have not explicitly addressed the issue of measurement error, we anticipate that an added advantage of CIS results from the decreased measurement error resulting from the exclusion of the upper bole from the sampling frame.

Discussion
When the cut-off is established relatively high on the bole, the bias of total bole volume estimation by V 2 s appears to be small or negligible, at least for the tree populations and two-stage sam-

Fig. 3 .
Fig. 3. Estimation of white oak bole volumes.In (a), the standard error of V(H L ,Hu) versus the standard error of V(HL,HU)-In (b), the bias of V(H L ,Hu) versus V(H L , H v ).The shape parameter of the proxy function was set to c = 3.

Table 1 .
Summary information for the five sets of tree measurements used in the study of V(HL,HU) as an estimator of bole volume following CIS.

Table 2 .
Summary of two-stage samplings of aggregate bole volume.The firstand second-stage sample sizes, respectively, were n = 1 and m = 2. Bole volume to a 5 cm top diameter was estimated.SE signifies the standard error of V2s-RMSE and Bias signify the root mean-square error and bias of V 2 s, respectively.The cut-offs were 60 % or 80 % of tree height.SE, RMSE, and Bias are presented as percentages of the true aggregate volume to a 5 cm top diameter.
t Shape parameter of the the proxy taper function.