Ola Sallnäs Categorical mapping from estimates of continuous forest attributes – classification and accuracy

Spatially explicit data on forest attributes is demanded for various research with landscape perspective. Existing datasets with estimates of continuous forest variables are often used as the basis for producing categorical forest type maps. Normally, this type of maps are used without knowing their accuracy. This paper presents a Bayesian network model for estimating pixel level class membership probabilities of thus derived categorical maps. Class membership probabilities can be used as a post-classification measure of map accuracy and in the process of map classification affecting the assignments of class labels. The method is applied in mapping deciduous dominated forests on the basis of the k-NN Sweden 2005 dataset in a study area in southern Sweden. The results indicate rather low accuracy for deciduous class regardless of the map classification method: 0.48 versus 0.50 in the maps classified without and with the use of the class membership probabilities given equal deciduous area. When probability-based classification is applied, the level of accuracy varies with the assumed map class proportions. Thus, when deciduous class area corresponding to the National Forest Inventory estimate was used, the accuracy of only 0.35 was obtained for the deciduous map class.


Introduction
The need for landscape perspective in analysis, implementation and evaluation of forest management, and more broadly land-use has been demonstrated by numerous studies, and is now widely recognized (e.g.Andersson et al. 2006;Hansson and Angelstam 1991;Lindbladh et al. 2011).To implement it in practice relevant information on the state of the landscape is needed.In Sweden, the sample-plot based statistical National Forest Inventory (NFI) is carried out in 5-year cycles since the 1920s.From the NFI data, multiple characteristics of wood stock, forest land and forest ecosystems can be inferred for regional and national levels (Ranneby et al. 1987).A more recent undertaking launched in 2003, the National Inventory of Landscapes in Sweden (NILS) has an overall objective of providing national level data for and performing analyses of landscape biodiversity conditions (Ståhl et al. 2011).However, there are problems that require spatially-explicit, wall-to-wall data coverage.For instance, landscape ecological studies often focus on specific landscape elements or patterns.To meet these needs, several land-cover datasets have been produced over the last decades: the Digital Terrain Type Classification of Satellite Data (TTCSD) from the Swedish Space Corporation in 1900-1991 (Mikusinski et al. 2003), wRESEx from the research program Remote Sensing for the Environment (RESE) (Angelstam et al. 2003) and the Swedish Land Cover Map (SMD) and its upscaled version as a part of the EU CORINE program (Hagner and Reese 2007;Lantmäteriet 2005a).Forest attributes such as standing wood volume by species, mean stand height, age and biomass have been mapped using satellite images and NFI data and compiled in k-NN Sweden (k-NN) datasets (Reese et al. 2003;Reese et al. 2002).Various other methods, including aerial photography and increasingly LIDAR, are used in forest inventories to produce spatially explicit data at lower, for example, estate scales; however, those data are private property held by landowners.
Many recent research with a landscape perspective have used k-NN Sweden or SMD or a combination of the two datasets for identifying or mapping patterns constituted of patches of specific forest types Andersson et al. (2012), Mikusiński and Edenius (2006), Lindbladh et al. (2011), Manton et al. (2005) Stighäll et al. (2011).In principle, k-NN Sweden datasets allow for mapping whatever forest type can be defined on the basis of the forest attributes in the dataset, which means, essentially, creating a new categorical map every time.SMD, in contrast, is limited to the forest classes which are present in the dataset: young, deciduous, mixed and coniferous.
The most straight-forward way of producing a categorical map on the basis of a map (or maps) of continuous variables is what we call a "face value" map classification.It is also the most commonly used method (see, for example, Andersson et al. 2012;Lindbladh et al. 2011;Manton et al. 2005;Mikusiński and Edenius 2006;Paltto et al. 2006;Stighäll et al. 2011).The pixels are classified by directly applying the classification rule to the values of the forest variables from the given dataset (e.g.k-NN Sweden).The classification rule is essentially the class definitions.The classification procedure produces no information on the accuracy of the product.Typically, authors refer to the uncertainty of the forest attribute estimates in the original datasets (e.g.Manton et al. 2005) which is not quite the same thing; an accuracy specification must apply to the actual classes in the map.With no idea of the accuracy, the maps might be, as Mac Roberts (2011) put it, "just pretty pictures".Due to the costs of the procedure, empirical accuracy assessments are not realistic for maps that are ad-hoc i.e. produced to serve a very specific purpose such as an assessment of a particular habitat type in the landscape.It is for this very reason that this type of maps are derived from existing datasets rather than estimated from spectral records.The first objective of our study was to find a way of estimating the uncertainty of categorical forest type maps that can be derived from the k-NN datasets and to thus improve the scientific and practical validity of their applications.However, the problem and the proposed method are not limited to using k-NN estimated data as the basis for mapping.The estimates of the continuous variables might be based on almost any method; the main problem lies in the transformation of the uncertainty from continuous to categorical variables.
We consider map accuracy to be an expression of map units' classification (un)certainty and address the problem at pixel level.At pixel level, the (un)certainty of classification can be defined as class membership probabilities (Foody et al. 1992).Richards (1996) demonstrated that class membership probabilities ("accuracy" in his usage) can be derived from known classifier performance and marginal map class proportions intended to represent the prior probabilities of the classes.Steel et al. (2003) went further by trying to actually estimate the performance of a k-NN classifier in the course of image classification in order to obtain the membership probabilities in the same manner as suggested by Richards.The differences in the terminology used by the different authors might be confusing.Steel et al. (2003) speak of "probability of correct classification", Foody et al. (1992) of "a posteriori probability of membership" and Richards (1996) of "posterior probability" or "accuracy"; all these terms stand for the same concept i.e. the probability that a map unit assigned (or would-be assigned) a certain class label actually belongs to that class.In this paper, we use the term "class membership probability".
Even though in our problem, existing estimates of continuous forest attributes rather than spectral records serve as predictor variables, it is not fundamentally different.We propose a Bayesian network formulation with the same logic as in the studies cited above.Bayesian network properties allow for effective utilization of the relationships between the forest attributes by modifying the prior probabilities associated with any variable in response to the observed values of the related variables.The class membership probabilities estimated using the Bayesian model can serve not only as a classification accuracy measure but also as a basis for the classification itself.Maps can be classified based on pixel class membership probabilities rather than directly on the forest attribute values ("face values").Thus, the second objective of the study was to explore the possibilities of probability-based classification methods for improved classification of categorical forest type maps based on the k-NN datasets compared to the "face value" method.
In this study we used the k-NN Sweden 2005 dataset to map deciduous dominated forests in an 8 000 km 2 large area in southern Sweden.With the help of the presented method, we estimated pixel class membership probabilities, which we then used to assess the accuracy of a "face-value" map classification and to produce two alternative maps through probability-based classifications.We then compared the maps and discussed the implications of the results for the validity of the method.

Bayesian networks
Estimating class membership probabilities for map units from uncertain predictor variables involves reasoning probabilistically about the values of the variables.Multivariate probabilistic problems can be represented with advantage by graphical models.In our treatment, we use Bayesian network (BN) representation, a graphical model whose graph is a directed acyclic graph.Here, we only offer a concise presentation of the basic ideas; there is an extensive literature on the subject (Murphy 2012;Pearl 2009).As a matter of fact, BN represent joint probability distributions by specifying conditional independence assumptions.By conditional independence is meant that the variables are independent given fixed values of parent variables.In graphs, the nodes represent variables of interest and the links represent causal or information relationships between variables.The strength of a dependency between parent and child nodes is expressed by conditional prob-abilities usually, but not necessarily, stored in a table.Hence BN can be used to represent any probability distribution of the form where pa i are parent variables of variable x i .With respect to the joint probability distribution each node in the graph corresponds to the factor p(x i | pa i ).Thus, the pairwise-conditional probabilities are all what is needed to specify the full joint distribution.

Estimating pixel class membership probabilities
Let us denote the forest attributes covered by a spatial dataset as Y i ∈ (Y 1 ,…,Y n ) and the estimates of those attributes as X i ∈ (X 1 ,…,X n ).Suppose that Y i and X i are random variables and that the lower case letters denote specific realizations of these variables for each mapping unit j ∈ (1,…,m) on the ground and on the map; then the ground truth is Suppose also that the mapping units in an area of interest need to be classified into a set of disjoint classes C ∈ (c 1 ,…,c k ) such that c h ≡ f(y 1 ,…,y n ) where c h ∈ C and f is a classification rule corresponding to the class definitions.Then, for any map unit j, the true class is c hj ≡ f(y 1j ,…,y nj ).
Given the uncertain knowledge of the ground truth, the actual class membership is also uncertain and can be defined as the probability . How the p j is computed depends on the actual class definitions and the probability distribution for Y 1j ,…,Y nj .Hence, in order to obtain class membership probabilities the probability distributions for Y 1j ,…,Y nj , which specify possible values of the ground truth and their corresponding probabilities, need to be estimated.
Let us first look at Y ij X ij in isolation without considering possible correlations with other variables.The variable Y ij could be any of the forest attributes for any pixel j.The value of the estimate X ij can be observed, whilst Y ij is hidden.To begin, we can assume that the distribution of X ij , given any y ij of the variable Y ij is Gaussian.This distribution is the performance of the estimator.The use of normal approximation concerning the distribution of measurements is widely accepted in map classification (Strahler 1980).We can specify the link between the estimated y ij and the estimate X ij for any pixel as where y ij is the mean and σ 2 is the variance of the distribution.To employ this model, we must be able to approximate the variance σ 2 for any y ij .Having specified the generative model for the data, that is to say the estimator performance, and knowing x ij , we can use the prior distribution of the variable P(Y ij ) = P(Y i ) to obtain its posterior distribution P(Y ij | x ij ).Prior probabilities P(Y ij ) describe our knowledge of how likely different values of the variable Y ij are to occur in absence of any information about X ij .The expression takes the familiar form of Bayes theorem, The left hand side of the expression denotes the posterior distribution of P(Y ij ) given x ij .On the right hand side, P(x ij | Y i ) is the probability of x ij approximated from the generative model ( 2) given possible values of Y ij .Prior probabilities p(Y i ) might be known from an independent source or they can be approximated by deriving them from the map itself (Richards 1996).However, in the latter case, the quality of the approximation depends on the unbiasedness of the underlying map estimators.
If correlation exists between the two variables, the posterior probabilities of the variable(s) of interest could be refined by using not only their own estimates but also the estimates of the related variable(s).However, this works only to the degree to which the original estimation errors for the different variables are independent; in this treatment, we assume that they are.This kind of conditional independence assumption is a property of naïve Bayes models which proved to be useful in many fields (Koller and Friedman 2009).
Fig. 1 shows a BN consisting of four variables.Extending the notations used previously, let the nodes Y 1j and Y 2j represent two forest attributes as random variables, of which only the prior probabilities are known; let X 1j and X 2j represent the estimates, also as random variables, of Y 1j and Y 2j respectively.Recall, however, that X 1j and X 2j can be observed, whilst Y 1j and Y 2j cannot.According to the graph structure, the full distribution can be factorized as In addition to the already familiar terms, this expression contains p(Y 2j | Y 1j ) which specifies the link between the two forest attributes.The conditional probability distribution p(Y 2 | Y 1 ) can be either generated, if an independent model of that relationship is available, or estimated from the dataset.The correctness of the latter approximation will, however, depend on the unbiasedness and independence of the estimates.It should be noted that for computational purposes, it is not necessary to actually generate the distributions p(X 1j | Y 1j ) and p(X 2j | Y 2j ) as long as a generative model is specified.
Having thus specified the BN model, the visible variables X 1j and X 2j can be "clamped" to their observed values x 1j and x 2j to obtain the conditional p(Y 1j | x 1j , x 2j ) and/or p(Y 2j | x 1j , x 2j ).Multiple algorithms for message propagation in graphical models are described in literature (e.g.Murphy 2012).A small problem as in our example is not computationally challenging allowing to simply "sum-out" the joint distribution.In Fig. 1, the numbered dashed arrows show the sequence of updating the graph for obtaining p(Y 2j | x 1j , x 2j ).Mathematically, propagation against the direction of the links involves Bayes theorem according to the expression (3), whilst propagation along the links involves marginalizing, i.e. summing probabilities of the parental variable over the child variable.In our example, message passing from Y 1j to Y 2j is realized as On the right hand side of the expression is the marginalized conditional distribution of Y 2j .The term x 1j is included to stress that marginalization was conducted over p(Y 1j | x 1j ).Thus, regardless of the number of variables in the network, we can, by employing expressions (3) and ( 5), obtain the posterior distributions of the variables of interest for any map unit j.Knowing the class definitions and posterior distributions of the variables, it will not be difficult to compute class probabilities.

Application example
In Sweden, deciduous forests are of particular interest for ecological and other research on forestry and land-use embracing landscape perspective.This is reflected by numerous studies focusing, in one or another way, on deciduous forests in a landscape context (e.g.Andersson et al. 2006;Ask 2002;Eriksson et al. 2010;Mikusinski et al. 2003).It means that data on the amount and distribution of deciduous forests in general and of specific deciduous forest types is highly demanded.Some of the needs are met by the NFI; however, often spatially explicit data over entire landscapes are requisite.Therefore, in several studies, k-NN Sweden datasets were used to identify deciduous dominated forest areas.
In this example, we map deciduous dominated forests in Kronoberg County in southern Sweden using the k-NN Sweden 2005 dataset.With the help of the presented method, we estimate class membership probabilities at pixel level, derive the overall measures of accuracy for a map classified at "face-value" and produce two alternative maps by probability-based classification.

Study area
Kronoberg is located in the hemi boreal zone between the 56th and 57th latitude and has a land area of approximately 849 000 ha (Fig. 2).It is a densely forested region; about 76% of the area is forest land.The forest land ownership pattern is small-scale, dominated by family owners.Norway spruce (Picea abies ( L.) Karst.)accounts for almost half of the standing volume.Scots pine (Pinus sylvestris L.) which is the next most common species makes up about 30% of the standing volume.The deciduous species are mainly represented by birch (Betula spp.) 11%, oak (Quercus spp.) 2.5% and beech (Fagus sylvatica L.) 1% of the standing volume (Nilsson et al. 2008).In the last two decades, the need for increasing the amount of deciduous forests has been emphasized in the environmental debate.Estimated area of deciduous forests, based on the data from the NFI 2001-2005, is 61 000 ha (± 15 000 ha at 95% confidence interval) (NFI 2012).

Data
k-NN Sweden are spatially explicit countrywide estimates or forest attributes derived from satellite images and NFI field-plot data produced at the Swedish University of Agricultural Science (Reese et al. 2003;Reese et al. 2002).The datasets owe their names to the estimation algorithm used -k Nearest Neighbor (k-NN).The value of the estimated parameter is calculated as a weighted mean of the k nearest samples in the spectral space.Inverse squared Euclidian distance is used for assigning weights to each of the k samples.In average, some 1250 NFI plots (6-years' time span) are used in estimation of one satellite scene (Reese et al. 2003).The estimated attributes are standing wood volume by tree species (six individual species and one species group), mean stand age, height and total biomass.The size of pixels is 25*25 meters.As to date, three datasets k-NN Sweden (2000, 2005 and 2010) are available.The accuracy of the estimates in the k-NN Sweden was found to be rather low at a pixel level but improving for averages over larger areas (Reese et al. 2003;Reese et al. 2002).A summary of the reported results regarding pixel level accuracy of k-NN Sweden datasets can be found in Appendix 1.

Map classes and the probability model
In this example, we defined a set of two classes: deciduous dominated c d and other forests c o .Deciduous forest class is defined by the ratio between deciduous and coniferous volume: at least 70% of the total volume has to be deciduous, which roughly corresponds to a ratio of 2.33 between the deciduous and the coniferous volume.
Besides the standing volume which directly enters the forest class definition, we expected that from all the forest attributes in the k-NN dataset, age is most strongly related with species composition.Thus three forest attributes were represented in the model: deciduous volume, coniferous volume and age.We denote the random variables representing the forest attributes as D j y for deciduous volume, S j y for coniferous volume and  utes, also as random variables, are denoted D j x , S j x and A j x .Like previously, possible realizations of the random variables are denoted by lower case letters.We use "x" and "y" in superscript to distinguish between the actual forest attributes and their estimates.
Fig. 3 shows a Bayesian network formulation of the problem.The upper part of the graph specifies the dependencies between the forests attributes.Both deciduous D j y and coniferous S j y volumes correlate with the age A j y but are assumed to be conditionally independent from each other.Forest class C j y is a function of the forest attributes D j y and S j y , which is indicated by the rhomboid shape of the node.The estimates A j x , D j x and S j x are conditionally independent from each other and thus only depend on the respective forest attribute variable.The darker fill of the nodes A j x , D j x and S j x indicates that these variables can be observed, whilst the nodes without shading represent the hidden variables.
Probability of deciduous forest class p(c j d ) for a pixel j is algebraically given by Maximum possible deciduous volume d max is a constant and so is the maximum coniferous volume defined as s max = 2.33 -1 d max .In the computations, we set the maximum deciduous volume equal to the actual highest estimated deciduous volume in the application area.
Since the other class is complementary, the probability of c j o is Prior probability distribution for A j y in the mapping area, and the conditional distributions  the k-NN dataset (to represent deciduous and coniferous volumes, volume estimates of corresponding tree species were summed).We treated the variables as discrete and used the empirical distributions derived from the dataset rather than fitting any continuous probability density functions.With respect to A j x , D j x and S j x , i.e. the estimator performance, we assumed Gaussian distributions on grounds discussed earlier.The variances of the distributions were set proportionally to the distributions' means.In the base case, reported cross-validation RMSE's were used as coefficients.In addition, sensitivity to the assumed error-level of deciduous volume estimates was tested.Detailed explanation of this point and the parameter values are given in Appendix 1.Since the joint distribution was specified for discrete variables, we used integration over intervals of the probability density function for obtaining the probabilities of a j x , d j x and s j x .For every pixel j, the joint distribution was conditioned on the data, i.e. the observed a j x , d j x and s j x .The distributions for the hidden variables were updated according to the graph structure.Since only D j y and S j y influence the class probabilities directly, A j y did not need to be updated with the evidence from D j x and S j x but rather served as a pass for the evidence from A j x to reach D j y and S j y .The posterior distributions for D j y and S j y were used in expression ( 6) to calculate the posterior probability of c j d .

Map classification methods
We tested several classification methods.In all cases we used the class definitions specified above.The first method is what we call "face value classification" (FVC) i.e. a simple classification considering the kNN estimates of the forest variables rather than the computed probabilities (i.e.assuming that d j y = d j x and s j y = s j x ).In FVC, all pixels that fulfill d j x ≥ 2.33s j x are classified as deciduous.
Maximum a posteriori classification (MAP) assigns every pixel to the class with the highest class membership probability (Steele 2005).In our case with only two classes it means that all pixels with p(c j d ) ≥ 0.5 are labeled as deciduous.However, a preliminary calculation showed that MAP results in only 11 000 ha deciduous (NFI reports 61 000 ha).This clearly showed that a "pure" MAP classification was not suitable in this context and therefore it was excluded from further analyses.
Probability-based classification with predefined class areas maximizes the average membership probabilities for the class as a whole.Maximization is achieved by ranking the pixels according to their probability of class membership and picking the needed number of pixels from the higher end of the range.We tested two different values of deciduous class area.First value was the deciduous area resulting from FVC (27 000 ha, hence MAP27) that allowed for direct comparison of the accuracy values between the two methods.The other value was the NFI estimate of the deciduous area in the region (61 000 ha, hence MAP61).In this case it was of interest to determine the maximum accuracy that could be achieved with such class area.

Results
In Fig. 4 (a) and (b), the series "Unclassified" show the distribution of all pixels in the application area by the deciduous class membership probability (we will discuss the other series later).Note, that this is pre-classification distribution of pixels.Provided that deciduous dominated forests, by all estimates, constitute less than 10% of the total forest land, naturally, the distribution is dominated by low-probability i.e. very unlikely deciduous pixels.However, the effect is amplified because of the uncertainty associated with the predictor variables.The consistency of the computed probabilities can be roughly tested by calculating the expected map class areas.By summing pixel-level probability values, a figure of approximately 37 000 ha deciduous is obtained which is reasonable number considering the 61 000 ha (±15 000 ha at 95% confidence interval) reported by the NFI (average 2001-2005)(NFI 2012).
Fig. 4 (a) and (b) show the distributions of the pixels labeled as deciduous in each of the three classification methods FVC, MAP27 and MAP61.The distribution in FVC is approximately symmetric with the mode at around 0.50 and ranges from about 0.15 to about 0.85.As mentioned above, the total deciduous area in FVC was 27 000 ha.The distributions of MAP27 and MAP61 are essentially the right hand side of the all-pixels distribution truncated at different points.MAP27 deciduous class encompasses pixels that have class membership probability higher than 0.35 while MAP 61 deciduous class encompasses almost all pixels with the class membership probability above 0.15.
The average deciduous class membership probability in FVC is 0.48.In MAP27 it is 0.50 corresponding to an improvement of 0.02.The average class membership probability for deciduous pixels in MAP61 is about 0.35.
The sensitivity analysis in which the pixel class membership probabilities were calculated using a different σ r value for deciduous (0.75 compared to 1.50 in the base case) resulted in 0.51 for FVC, 0.53 for MAP27 and 0.37 for MAP61.Noteworthy, the differences between the average probability values of the different classifications were preserved and the difference from the base case is small for all methods.
The average class membership probability for the other ("not deciduous") class in the three maps was: 0.95 for FVC, 0.95 for MAP27 and 0.96 for MAP61.Such high values indicate a very low frequency of misclassification.This outcome is logical considering that the overall class proportion for other forests in the study area is about 90%.
Fig. 5 shows a 4 km 2 map fragment with pixels colors scaled according to deciduous class membership probabilities i.e. the unclassified map and the same fragment of the maps classified according to FVC, MAP27 and MAP61.The spatial distribution of pixels with different probability values in the unclassified map exhibits visible regularities.Lower probability-pixels are more frequently found isolated or in small patches while higher probability-pixels occur more often in the core parts of larger contiguous patches.This pattern is plausible considering the fact that edge-pixels are typically prone to estimation errors due to their mixed nature and possible spatial displacement.It is noteworthy that the model produced such spatial pattern even though no spatial relationships at all were considered.In MAP61 there are more deciduous patches and they are more "dense" than in MAP27 and FVC.Between FVC and MAP27 only a small difference can be noticed.
Spatial comparison between full FVC and MAP27 (rather than the fragment depicted in Fig. 5) showed that approximately 13% of the deciduous pixels did not match.The substituted pixels in average were "older" than the substituting, 52 years compared to 26 years.This indicates that the model assigned higher probability of deciduous class membership to lower-age pixels, given same ratio of the deciduous and coniferous volumes.
In FVC and MAP 27, the different satellite scenes comprising the map could be clearly distinguished by the amount of deciduous pixels.Also, when displaying the deciduous probabilities prior to any classification, it could be seen that in some scenes the probabilities are mostly low as compared to the other scenes.In contrast, the borders between the satellite scenes are nearly indiscernible in MAP61.Obviously, the low deciduous probabilities and the low number of deciduous pixels in FVC in some scenes are caused by the underestimation of deciduous volume within these scenes in the k-NN dataset (compare the NFI and the FVC estimates).That the magnitude of the bias differs between the scenes is not strange: estimation in kNN Sweden is scene-wise i.e. based on different training data (Reese et al. 2003).For our study, this observation signifies that the probability measures are not completely comparable across the scenes, although the differences between the pixels within individual scenes might be reflected correctly.Nevertheless, the scene level biases in deciduous became less pronounced when larger, and probably more correct, total deciduous area was allocated in MAP61.In summary, our calculation showed a rather low accuracy of 0.48 for deciduous class in the FVC map.By using a probability-based allocation (MAP27) the accuracy could be improved to 0.50 with the same class area.In general, the user's accuracy of 0.85 and higher is considered a benchmark for thematic maps.For example, the estimated user's accuracy of deciduous class in the SMD dataset is 0.69 (Lantmäteriet 2005b).However, considering the notoriously poor accuracy of the k-NN estimates of deciduous volume, the obtained values seem fair.Both in FVC and in MAP27, there were clearly visible differences in the density of deciduous pixels across the satellite scene borders.Increasing the total deciduous class area to the level of the NFI estimate comes at a cost of the class accuracy reduced to 0.35.However, the transitions between satellite scenes smooth out considerably in this case.

Discussion
We presented a method for estimating pixel level class membership probabilities when existing remote sensing-based estimates of continuous forest variables are used for map classification.Class membership probabilities allow to produce a probability-based map classification rather than a "face value" classification and to implement a more correct overall class proportion when a reliable independent estimate is available.Measures of overall map classification accuracy, which are necessary to support the user's confidence in the map, can be readily derived from the pixel level probabilities.In addition, the probability measures on a per pixel base are a valuable supplement to the overall classification accuracy measures and enable a fuller use of the map (Foody et al. 1992).When the intended use of the map is identifying forests with specific characteristics for subsequent field inventory of, for instance, biodiversity values, the class membership probabilities can be considered in selecting the locations to be visited.When, however, the intended use of the map is spatial ecological modeling, the probabilities can be taken into account to adjust the inputs to or the parameters of the models.For example, assume that the quality of a certain habitat type is affected by the amount of a specific forest type within a certain distance.Instead of using the nominal amount of the forest type from the map, we can thus calculate the expected value and use that as the input to the habitat model.Similarly, when studying connectivity, pixels' probability values can be considered in determining path costs.More simply, only pixels with probabilities above a defined threshold can be selected.If, in contrary, inclusiveness is more important than precision, a larger area than the unbiased estimate suggests can be selected based on probabilities without modifying the class definitions The method involves several assumptions.We suggested assuming the variances of the original estimators of the continuous forest variables from empirical relative RMSE's.RMSE might be a poor basis for approximating the variance; for the kNN data, however, it is presently the only available measure of uncertainty of estimates of continuous forest variables.Once a better estimator Silva Fennica vol. 48 no. 2 article id 975 • Trubins and Sallnäs • Categorical mapping from estimates of continuous… of variance is available, it can be used instead.Moreover, the results were rather insensitive to the change of value of the variance parameter for deciduous volume.In the application example, we could not account for biases in the k-NN data due to the lack of relevant assessments.In general, both variance and bias of forest attributes estimated with k-NN technique can vary between scenes due to different sets of reference plots, which we could not account for.The prior probabilities, which also affect the results, can be derived from NFI data rather than from the spatial dataset itself as we did in the example.The same applies to the conditional probabilities when two or more related forest attributes are included in the model.Hence, there are possibilities to use independent data for priors and thus free them from eventual biases the existent spatial dataset.The transparency with respect to the assumptions is an advantage of the method.Furthermore, we would like to note that regardless of the issue of reliability of the model inputs, e.g.RMSE of the k-NN estimates, the method is useful in that it demonstrates the effect of the uncertainty in the underlying continuous forest variables on the accuracy of derived forest type maps.
In summary, the presented method is flexible, transparent and relatively easy to apply.In particular, it can strengthen the validity of inferences made from ad-hoc categorical maps derived from existent remote sensing-based estimates of continuous forest variables, such as kNN Sweden, and encourage their wider and more informed use in ecological and other research with landscape perspective.

Fig. 1 .
Fig. 1.Bayesian network representation of the joint probability distribution p(X 1j , X 2j , Y 1j , Y 2j ).Shaded nodes represent the observed variables, the nodes without shading the hidden variables.The solid-line arrows represent the relationships between the variables.The numbered dashed-line arrows show the sequence of message propagation for inferring the posterior distribution of Y 2j .
A j y for age for any pixel j ∈ (1,…,m) in the mapping area.Forest class as a dichotomous random variable is denoted by C j y .The possible outcomes of C j y are the deciduous class c j d and the other class c j o .The estimates of the forest attrib-
d max is the maximum possible deciduous volume, s max is the maximum coniferous volume and d min is the minimum deciduous volume.In accordance with the class definition d min = 2.33s j y .
p(D j y | A j y ) = p(D y | A y ) and p(S j y | A j y ) = p(S y | A y ) were approximated using the variable estimates from

Fig. 3 .
Fig. 3. Bayesian network model for estimating pixels' class membership probabilities in deciduous forest mapping problem.The variable D j y is deciduous volume, S j y coniferous volume, A j y age, and C j y is forest class.D j x , S j x and A j x represent the k-NN estimates of the respective forest attributes.Shaded nodes represent the observed variables, the nodes without shading the hidden variables.The rhomboid shape of the node indicates that C j y is a deterministic function of D j y and S j y .

Fig. 4 .
Fig. 4. Distribution of pixels by deciduous class membership probability.Unclassified -all pixels prior to classification; MAP61, MAP27 and FVConly pixels labeled as deciduous using the respective classification method.(a) Full range of values; (b) left-hand side truncated at 0.10.

Fig. 5 .
Fig. 5.A fragment of the study area (left bottom corner coordinates x: 1367200, y: 6334550 ; coordinate system RT1990).Unclassified -probability map with no class labels assigned; MAP61, MAP27 and FVC -maps produced using the respective classification method.The bar on the left shows the colors used for different levels of deciduous class membership probabilities in the unclassified map.