Title: | Perform Spatial Error Estimation and Variable Importance Assessment |
---|---|
Description: | Implements spatial error estimation and permutation-based variable importance measures for predictive models using spatial cross-validation and spatial block bootstrap. |
Authors: | Alexander Brenning [aut, cre] , Patrick Schratz [aut] , Tobias Herrmann [ctb] |
Maintainer: | Alexander Brenning <[email protected]> |
License: | GPL-3 |
Version: | 3.0.5 |
Built: | 2024-11-12 05:43:15 UTC |
Source: | https://github.com/giscience-fsu/sperrorest |
This package implements spatial error estimation and permutation-based spatial variable importance using different spatial cross-validation and spatial block bootstrap methods. To cite ‘sperrorest’ in publications, reference the paper by Brenning (2012).
Brenning, A. 2012. Spatial cross-validation and bootstrap for the assessment of prediction rules in remote sensing: the R package 'sperrorest'. 2012 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), 23-27 July 2012, p. 5372-5375.
Brenning, A. 2005. Spatial prediction models for landslide hazards: review, comparison and evaluation. Natural Hazards and Earth System Sciences, 5(6): 853-862.
Russ, G. & A. Brenning. 2010a. Data mining in precision agriculture: Management of spatial information. In 13th International Conference on Information Processing and Management of Uncertainty, IPMU 2010; Dortmund; 28 June - 2 July 2010. Lecture Notes in Computer Science, 6178 LNAI: 350-359.
Russ, G. & A. Brenning. 2010b. Spatial variable importance assessment for yield prediction in Precision Agriculture. In Advances in Intelligent Data Analysis IX, Proceedings, 9th International Symposium, IDA 2010, Tucson, AZ, USA, 19-21 May 2010. Lecture Notes in Computer Science, 6065 LNCS: 184-195.
Add distance information to resampling objects
add.distance(object, ...) ## S3 method for class 'resampling' add.distance(object, data, coords = c("x", "y"), ...) ## S3 method for class 'represampling' add.distance(object, data, coords = c("x", "y"), mode = "future", ...)
add.distance(object, ...) ## S3 method for class 'resampling' add.distance(object, data, coords = c("x", "y"), ...) ## S3 method for class 'represampling' add.distance(object, data, coords = c("x", "y"), mode = "future", ...)
object |
resampling or represampling object. |
... |
Additional arguments to dataset_distance and add.distance.resampling, respectively. |
data |
|
coords |
(ignored by |
mode |
Use |
Nearest-neighbour distances are calculated for each sample in the
test set. These nrow(???$test)
nearest-neighbour distances are then
averaged. Aggregation methods other than mean
can be chosen using the
fun
argument, which will be passed on to dataset_distance.
A resampling or represampling object containing an additional.
$distance
component in each resampling object. The distance
component
is a single numeric value indicating, for each train
/ test
pair, the
(by default, mean) nearest-neighbour distance between the two sets.
dataset_distance represampling resampling
# Muenchow et al. (2012), see ?ecuador nsp.parti <- partition_cv(ecuador) sp.parti <- partition_kmeans(ecuador) nsp.parti <- add.distance(nsp.parti, data = ecuador) sp.parti <- add.distance(sp.parti, data = ecuador) # non-spatial partioning: very small test-training distance: nsp.parti[[1]][[1]]$distance # spatial partitioning: more substantial distance, depending on number of # folds etc. sp.parti[[1]][[1]]$distance
# Muenchow et al. (2012), see ?ecuador nsp.parti <- partition_cv(ecuador) sp.parti <- partition_kmeans(ecuador) nsp.parti <- add.distance(nsp.parti, data = ecuador) sp.parti <- add.distance(sp.parti, data = ecuador) # non-spatial partioning: very small test-training distance: nsp.parti[[1]][[1]]$distance # spatial partitioning: more substantial distance, depending on number of # folds etc. sp.parti[[1]][[1]]$distance
Functions for handling represampling
objects, i.e. list
s of
resampling objects.
as.represampling(object, ...) ## S3 method for class 'list' as.represampling(object, ...) ## S3 method for class 'represampling' print(x, ...) is_represampling(object)
as.represampling(object, ...) ## S3 method for class 'list' as.represampling(object, ...) ## S3 method for class 'represampling' print(x, ...) is_represampling(object)
object |
object of class |
... |
currently not used. |
x |
object of class |
represampling
objects are (names) lists of resampling objects.
Such objects are typically created by partition_cv, partition_kmeans,
represampling_disc_bootstrap and related functions.
In r
-repeated k
-fold cross-validation, for example, the corresponding
represampling
object has length r
, and each of its r
resampling
objects has length k
.
as.resampling_list
coerces object
to class represampling
while
coercing its elements to resampling objects. Some validity checks are
performed.
as.represampling
methods return an object of class represampling
with the contents of object
.
resampling, partition_cv, partition_kmeans, represampling_disc_bootstrap, etc.
# Muenchow et al. (2012), see ?ecuador # Partitioning by elevation classes in 200 m steps: fac <- factor(as.character(floor(ecuador$dem / 300))) summary(fac) parti <- as.resampling(fac) # a list of lists specifying sets of training and test sets, # using each factor at a time as the test set: str(parti) summary(parti)
# Muenchow et al. (2012), see ?ecuador # Partitioning by elevation classes in 200 m steps: fac <- factor(as.character(floor(ecuador$dem / 300))) summary(fac) parti <- as.resampling(fac) # a list of lists specifying sets of training and test sets, # using each factor at a time as the test set: str(parti) summary(parti)
Create/coerce and print resampling objects, e.g., partitionings or bootstrap samples derived from a data set.
as.resampling(object, ...) ## Default S3 method: as.resampling(object, ...) ## S3 method for class 'factor' as.resampling(object, ...) ## S3 method for class 'list' as.resampling(object, ...) validate.resampling(object) is.resampling(x, ...) ## S3 method for class 'resampling' print(x, ...)
as.resampling(object, ...) ## Default S3 method: as.resampling(object, ...) ## S3 method for class 'factor' as.resampling(object, ...) ## S3 method for class 'list' as.resampling(object, ...) validate.resampling(object) is.resampling(x, ...) ## S3 method for class 'resampling' print(x, ...)
object |
depending on the function/method, a list or a vector of type factor defining a partitioning of the dataset. |
... |
currently not used. |
x |
object of class |
A resampling
object is a list of lists defining a set of training
and test samples.
In the case of k
-fold cross-validation partitioning, for example, the
corresponding resampling
object would be of length k
, i.e. contain k
lists. Each of these k
lists defines a training set of size n(k-1)/k
(where n
is the overall sample size), and a test set of size n/k
. The
resampling
object does, however, not contain the data itself, but only
indices between 1
and n
identifying the selection (see Examples).
Another example is bootstrap resampling. represampling_bootstrap with
argument oob = TRUE
generates rep
resampling
objects with indices of a
bootstrap sample in the train
component and indices of the out-of-bag
sample in the test component (see Examples below).
as.resampling.factor
: For each factor level of the input variable,
as.resampling.factor
determines the indices of samples in this level (=
test samples) and outside this level (= training samples). Empty levels of
object
are dropped without warning.
as.resampling_list
checks if the list in object
has a valid resampling
object structure (with components train
and test
etc.) and assigns the
class attribute 'resampling'
if successful.
as.resampling
methods: An object of class resampling
.
represampling, partition_cv, partition_kmeans, represampling_bootstrap, etc.
# Muenchow et al. (2012), see ?ecuador # Partitioning by elevation classes in 200 m steps: parti <- factor(as.character(floor(ecuador$dem / 200))) smp <- as.resampling(parti) summary(smp) # Compare: summary(parti) # k-fold (non-spatial) cross-validation partitioning: parti <- partition_cv(ecuador) parti <- parti[[1]] # the first (and only) resampling object in parti # data corresponding to the test sample of the first fold: str(ecuador[parti[[1]]$test, ]) # the corresponding training sample - larger: str(ecuador[parti[[1]]$train, ]) # Bootstrap training sets, out-of-bag test sets: parti <- represampling_bootstrap(ecuador, oob = TRUE) parti <- parti[[1]] # the first (and only) resampling object in parti # out-of-bag test sample: approx. one-third of nrow(ecuador): str(ecuador[parti[[1]]$test, ]) # bootstrap training sample: same size as nrow(ecuador): str(ecuador[parti[[1]]$train, ])
# Muenchow et al. (2012), see ?ecuador # Partitioning by elevation classes in 200 m steps: parti <- factor(as.character(floor(ecuador$dem / 200))) smp <- as.resampling(parti) summary(smp) # Compare: summary(parti) # k-fold (non-spatial) cross-validation partitioning: parti <- partition_cv(ecuador) parti <- parti[[1]] # the first (and only) resampling object in parti # data corresponding to the test sample of the first fold: str(ecuador[parti[[1]]$test, ]) # the corresponding training sample - larger: str(ecuador[parti[[1]]$train, ]) # Bootstrap training sets, out-of-bag test sets: parti <- represampling_bootstrap(ecuador, oob = TRUE) parti <- parti[[1]] # the first (and only) resampling object in parti # out-of-bag test sample: approx. one-third of nrow(ecuador): str(ecuador[parti[[1]]$test, ]) # bootstrap training sample: same size as nrow(ecuador): str(ecuador[parti[[1]]$train, ])
Functions for generating and handling alphanumeric tile names of
the form 'X2:Y7'
as used by partition_tiles and
represampling_tile_bootstrap.
as.tilename(x, ...) ## S3 method for class 'numeric' as.tilename(x, ...) ## S3 method for class 'tilename' as.character(x, ...) ## S3 method for class 'tilename' as.numeric(x, ...) ## S3 method for class 'character' as.tilename(x, ...) ## S3 method for class 'tilename' print(x, ...)
as.tilename(x, ...) ## S3 method for class 'numeric' as.tilename(x, ...) ## S3 method for class 'tilename' as.character(x, ...) ## S3 method for class 'tilename' as.numeric(x, ...) ## S3 method for class 'character' as.tilename(x, ...) ## S3 method for class 'tilename' print(x, ...)
x |
object of class |
... |
additional arguments (currently ignored). |
object of class tilename
, character
, or numeric vector of length
2
partition_tiles, represampling, represampling_tile_bootstrap
tnm <- as.tilename(c(2, 3)) tnm # 'X2:Y3' as.numeric(tnm) # c(2,3)
tnm <- as.tilename(c(2, 3)) tnm # 'X2:Y3' as.numeric(tnm) # c(2,3)
dataset_distance
calculates Euclidean nearest-neighbour
distances between two point datasets and summarizes these distances using
some function, by default the mean.
dataset_distance( d1, d2, x_name = "x", y_name = "y", fun = mean, method = "euclidean", ... )
dataset_distance( d1, d2, x_name = "x", y_name = "y", fun = mean, method = "euclidean", ... )
d1 |
a |
d2 |
see |
x_name |
name of column in |
y_name |
same for y coordinates |
fun |
function to be applied to the vector of nearest-neighbor
distances of |
method |
type of distance metric to be used; only |
... |
additional arguments to |
Nearest-neighbour distances are calculated for each point in d1
,
resulting in a vector of length nrow(d1)
, and fun
is applied to this
vector.
depends on fun
; typically (e.g., mean
) a numeric vector
of length 1
df <- data.frame(x = rnorm(100), y = rnorm(100)) dataset_distance(df, df) # == 0
df <- data.frame(x = rnorm(100), y = rnorm(100)) dataset_distance(df, df) # == 0
Calculate a variety of accuracy measures from observations and predictions of numerical and categorical response variables.
err_default(obs, pred)
err_default(obs, pred)
obs |
factor, logical, or numeric vector with observations |
pred |
factor, logical, or numeric vector with predictions. Must be of
same type as |
A list with (currently) the following components, depending on the type of prediction problem:
'hard' classification: Misclassification error, overall accuracy; if two classes, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), kappa
'soft' classification: area under the ROC curve, error and accuracy at a obs>0.5 dichotomization, false-positive rate (FPR; 1-specificity) at 70, 80 and 90 percent sensitivity, true-positive rate (sensitivity) at 80, 90 and 95 percent specificity.
regression: Bias, standard deviation, mean squared error, MAD (mad), median, interquartile range (IQR) of residuals
NA
values are currently not handled by this function,
i.e. they will result in an error.
ROCR
obs <- rnorm(1000) # Two mock (soft) classification examples: err_default(obs > 0, rnorm(1000)) # just noise err_default(obs > 0, obs + rnorm(1000)) # some discrimination # Three mock regression examples: err_default(obs, rnorm(1000)) # just noise, but no bias err_default(obs, obs + rnorm(1000)) # some association, no bias err_default(obs, obs + 1) # perfect correlation, but with bias
obs <- rnorm(1000) # Two mock (soft) classification examples: err_default(obs > 0, rnorm(1000)) # just noise err_default(obs > 0, obs + rnorm(1000)) # some discrimination # Three mock regression examples: err_default(obs, rnorm(1000)) # just noise, but no bias err_default(obs, obs + rnorm(1000)) # some association, no bias err_default(obs, obs + 1) # perfect correlation, but with bias
get_small_tiles
identifies partitions (tiles) that are too
small according to some defined criterion / criteria (minimum number of
samples in tile and/or minimum fraction of entire dataset).
get_small_tiles(tile, min_n = NULL, min_frac = 0, ignore = c())
get_small_tiles(tile, min_n = NULL, min_frac = 0, ignore = c())
tile |
factor: tile/partition names for all samples; names must be
coercible to class tilename, i.e. of the form |
min_n |
integer (optional): minimum number of samples per partition_ |
min_frac |
numeric >0, <1: minimum relative size of partition as percentage of sample. |
ignore |
character vector: names of tiles to be ignored, i.e. to be retained even if the inclusion criteria are not met. |
character vector: names of tiles that are considered 'small' according to these criteria
# Muenchow et al. (2012), see ?ecuador # Rectangular partitioning without removal of small tiles: parti <- partition_tiles(ecuador, nsplit = c(10, 10), reassign = FALSE) summary(parti) length(parti[[1]]) # Same in factor format for the application of get_small_tiles: parti_fac <- partition_tiles(ecuador, nsplit = c(10, 10), reassign = FALSE, return_factor = TRUE ) get_small_tiles(parti_fac[[1]], min_n = 20) # tiles with less than 20 samples parti2 <- partition_tiles(ecuador, nsplit = c(10, 10), reassign = TRUE, min_n = 20, min_frac = 0 ) length(parti2[[1]]) # < length(parti[[1]])
# Muenchow et al. (2012), see ?ecuador # Rectangular partitioning without removal of small tiles: parti <- partition_tiles(ecuador, nsplit = c(10, 10), reassign = FALSE) summary(parti) length(parti[[1]]) # Same in factor format for the application of get_small_tiles: parti_fac <- partition_tiles(ecuador, nsplit = c(10, 10), reassign = FALSE, return_factor = TRUE ) get_small_tiles(parti_fac[[1]], min_n = 20) # tiles with less than 20 samples parti2 <- partition_tiles(ecuador, nsplit = c(10, 10), reassign = TRUE, min_n = 20, min_frac = 0 ) length(parti2[[1]]) # < length(parti[[1]])
partition_cv
creates a represampling object for
length(repetition)
-repeated nfold
-fold cross-validation.
partition_cv( data, coords = c("x", "y"), nfold = 10, repetition = 1, seed1 = NULL, return_factor = FALSE )
partition_cv( data, coords = c("x", "y"), nfold = 10, repetition = 1, seed1 = NULL, return_factor = FALSE )
data |
|
coords |
(ignored by |
nfold |
number of partitions (folds) in |
repetition |
numeric vector: cross-validation repetitions to be
generated. Note that this is not the number of repetitions, but the indices
of these repetitions. E.g., use |
seed1 |
|
return_factor |
if |
This function does not actually perform a cross-validation or partition the data set itself; it simply creates a data structure containing the indices of training and test samples.
If return_factor = FALSE
(the default), a represampling object.
Specifically, this is a (named) list of length(repetition)
resampling
objects. Each of these resampling objects is a list of length nfold
corresponding to the folds. Each fold is represented by a list of
containing the components train
and test
, specifying the indices of
training and test samples (row indices for data
). If return_factor = TRUE
(mainly used internally), a (named) list of length
length(repetition)
. Each component of this list is a vector of length
nrow(data)
of type factor
, specifying for each sample the fold to which
it belongs. The factor levels are factor(1:nfold)
.
data(ecuador) ## non-spatial cross-validation: resamp <- partition_cv(ecuador, nfold = 5, repetition = 5) # plot(resamp, ecuador) # first repetition, second fold, test set indices: idx <- resamp[["1"]][[2]]$test # test sample used in this particular repetition and fold: ecuador[idx, ]
data(ecuador) ## non-spatial cross-validation: resamp <- partition_cv(ecuador, nfold = 5, repetition = 5) # plot(resamp, ecuador) # first repetition, second fold, test set indices: idx <- resamp[["1"]][[2]]$test # test sample used in this particular repetition and fold: ecuador[idx, ]
partition_cv_strat
creates a set of sample indices
corresponding to cross-validation test and training sets.
partition_cv_strat( data, coords = c("x", "y"), nfold = 10, return_factor = FALSE, repetition = 1, seed1 = NULL, strat )
partition_cv_strat( data, coords = c("x", "y"), nfold = 10, return_factor = FALSE, repetition = 1, seed1 = NULL, strat )
data |
|
coords |
vector of length 2 defining the variables in |
nfold |
number of partitions (folds) in |
return_factor |
if |
repetition |
numeric vector: cross-validation repetitions to be
generated. Note that this is not the number of repetitions, but the indices
of these repetitions. E.g., use |
seed1 |
|
strat |
character: column in |
A represampling object, see also partition_cv()
.
partition_strat_cv
, however, stratified with respect to the variable
data[,strat]
; i.e., cross-validation partitioning is done within each set
data[data[,strat]==i,]
(i
in levels(data[, strat])
), and the i
th
folds of all levels are combined into one cross-validation fold.
sperrorest()
, as.resampling()
, resample_strat_uniform()
data(ecuador) parti <- partition_cv_strat(ecuador, strat = "slides", nfold = 5, repetition = 1 ) idx <- parti[["1"]][[1]]$train mean(ecuador$slides[idx] == "TRUE") / mean(ecuador$slides == "TRUE") # always == 1 # Non-stratified cross-validation: parti <- partition_cv(ecuador, nfold = 5, repetition = 1) idx <- parti[["1"]][[1]]$train mean(ecuador$slides[idx] == "TRUE") / mean(ecuador$slides == "TRUE") # close to 1 because of large sample size, but with some random variation
data(ecuador) parti <- partition_cv_strat(ecuador, strat = "slides", nfold = 5, repetition = 1 ) idx <- parti[["1"]][[1]]$train mean(ecuador$slides[idx] == "TRUE") / mean(ecuador$slides == "TRUE") # always == 1 # Non-stratified cross-validation: parti <- partition_cv(ecuador, nfold = 5, repetition = 1) idx <- parti[["1"]][[1]]$train mean(ecuador$slides[idx] == "TRUE") / mean(ecuador$slides == "TRUE") # close to 1 because of large sample size, but with some random variation
partition_disc
partitions the sample into training and tests
set by selecting circular test areas (possibly surrounded by an exclusion
buffer) and using the remaining samples as training samples
(leave-one-disc-out cross-validation). partition_loo
creates training and
test sets for leave-one-out cross-validation with (optional) buffer.
partition_disc( data, coords = c("x", "y"), radius, buffer = 0, ndisc = nrow(data), seed1 = NULL, return_train = TRUE, prob = NULL, replace = FALSE, repetition = 1 ) partition_loo(data, ndisc = nrow(data), replace = FALSE, ...)
partition_disc( data, coords = c("x", "y"), radius, buffer = 0, ndisc = nrow(data), seed1 = NULL, return_train = TRUE, prob = NULL, replace = FALSE, repetition = 1 ) partition_loo(data, ndisc = nrow(data), replace = FALSE, ...)
data |
|
coords |
vector of length 2 defining the variables in |
radius |
radius of test area discs; performs leave-one-out resampling if radius <0. |
buffer |
radius of additional 'neutral area' around test area discs that is excluded from training and test sets; defaults to 0, i.e. all samples are either in the test area or in the training area. |
ndisc |
Number of discs to be randomly selected; each disc constitutes a
separate test set. Defaults to |
seed1 |
|
return_train |
If |
prob |
optional argument to sample. |
replace |
optional argument to sample: sampling with or without replacement? |
repetition |
see |
... |
arguments to be passed to |
A represampling object. Contains length(repetition)
resampling
objects. Each of these contains ndisc
lists with indices of test and (if
return_train = TRUE
) training sets.
Test area discs are centered at (random) samples, not at general
random locations. Test area discs may (and likely will) overlap independently
of the value of replace
. replace
only controls the replacement
of the center point of discs when drawing center points from the samples.
radius < 0
does leave-one-out resampling with an optional buffer.
radius = 0
is similar except that samples with identical coordinates
would fall within the test area disc.
Brenning, A. 2005. Spatial prediction models for landslide hazards: review, comparison and evaluation. Natural Hazards and Earth System Sciences, 5(6): 853-862.
sperrorest, partition_cv, partition_kmeans
data(ecuador) parti <- partition_disc(ecuador, radius = 200, buffer = 200, ndisc = 5, repetition = 1:2 ) # plot(parti,ecuador) summary(parti) # leave-one-out with buffer: parti.loo <- partition_loo(ecuador, buffer = 200) summary(parti)
data(ecuador) parti <- partition_disc(ecuador, radius = 200, buffer = 200, ndisc = 5, repetition = 1:2 ) # plot(parti,ecuador) summary(parti) # leave-one-out with buffer: parti.loo <- partition_loo(ecuador, buffer = 200) summary(parti)
partition_factor
creates a represampling object, i.e. a set
of sample indices defining cross-validation test and training sets.
partition_factor( data, coords = c("x", "y"), fac, return_factor = FALSE, repetition = 1 )
partition_factor( data, coords = c("x", "y"), fac, return_factor = FALSE, repetition = 1 )
data |
|
coords |
vector of length 2 defining the variables in |
fac |
either the name of a variable (column) in |
return_factor |
if |
repetition |
numeric vector: cross-validation repetitions to be
generated. Note that this is not the number of repetitions, but the indices
of these repetitions. E.g., use |
A represampling object, see also partition_cv for details.
In this partitioning approach, all repetition
s are identical and
therefore pseudo-replications.
sperrorest, partition_cv, as.resampling.factor
data(ecuador) # I don't recommend using this partitioning for cross-validation, # this is only for demonstration purposes: breaks <- quantile(ecuador$dem, seq(0, 1, length = 6)) ecuador$zclass <- cut(ecuador$dem, breaks, include.lowest = TRUE) summary(ecuador$zclass) parti <- partition_factor(ecuador, fac = "zclass") # plot(parti,ecuador) summary(parti)
data(ecuador) # I don't recommend using this partitioning for cross-validation, # this is only for demonstration purposes: breaks <- quantile(ecuador$dem, seq(0, 1, length = 6)) ecuador$zclass <- cut(ecuador$dem, breaks, include.lowest = TRUE) summary(ecuador$zclass) parti <- partition_factor(ecuador, fac = "zclass") # plot(parti,ecuador) summary(parti)
partition_factor_cv
creates a represampling object, i.e. a
set of sample indices defining cross-validation test and training sets,
where partitions are obtained by resampling at the level of groups of
observations as defined by a given factor variable. This can be used, for
example, to resample agricultural data that is grouped by fields, at the
agricultural field level in order to preserve spatial autocorrelation
within fields.
partition_factor_cv( data, coords = c("x", "y"), fac, nfold = 10, repetition = 1, seed1 = NULL, return_factor = FALSE )
partition_factor_cv( data, coords = c("x", "y"), fac, nfold = 10, repetition = 1, seed1 = NULL, return_factor = FALSE )
data |
|
coords |
vector of length 2 defining the variables in |
fac |
either the name of a variable (column) in |
nfold |
number of partitions (folds) in |
repetition |
numeric vector: cross-validation repetitions to be
generated. Note that this is not the number of repetitions, but the indices
of these repetitions. E.g., use |
seed1 |
|
return_factor |
if |
A represampling object, see also partition_cv for details.
In this partitioning approach, the number of factor levels in fac
must be large enough for this factor-level resampling to make sense.
sperrorest, partition_cv, partition_factor, as.resampling.factor
partition_kmeans
divides the study area into irregularly
shaped spatial partitions based on k-means (kmeans) clustering of
spatial coordinates.
partition_kmeans( data, coords = c("x", "y"), nfold = 10, repetition = 1, seed1 = NULL, return_factor = FALSE, balancing_steps = 1, order_clusters = TRUE, ... )
partition_kmeans( data, coords = c("x", "y"), nfold = 10, repetition = 1, seed1 = NULL, return_factor = FALSE, balancing_steps = 1, order_clusters = TRUE, ... )
data |
|
coords |
vector of length 2 defining the variables in |
nfold |
number of cross-validation folds, i.e. parameter k in k-means clustering. |
repetition |
numeric vector: cross-validation repetitions to be
generated. Note that this is not the number of repetitions, but the indices
of these repetitions. E.g., use |
seed1 |
|
return_factor |
if |
balancing_steps |
if |
order_clusters |
if |
... |
additional arguments to kmeans. |
A represampling object, see also partition_cv for details.
Default parameter settings may change in future releases.
Brenning, A., Long, S., & Fieguth, P. (2012). Detecting rock glacier flow structures using Gabor filters and IKONOS imagery. Remote Sensing of Environment, 125, 227-237. doi:10.1016/j.rse.2012.07.005
Russ, G. & A. Brenning. 2010a. Data mining in precision agriculture: Management of spatial information. In 13th International Conference on Information Processing and Management of Uncertainty, IPMU 2010; Dortmund; 28 June - 2 July 2010. Lecture Notes in Computer Science, 6178 LNAI: 350-359.
sperrorest, partition_cv, partition_disc, partition_tiles, kmeans
data(ecuador) resamp <- partition_kmeans(ecuador, nfold = 5, repetition = 2) # plot(resamp, ecuador)
data(ecuador) resamp <- partition_kmeans(ecuador, nfold = 5, repetition = 2) # plot(resamp, ecuador)
partition_tiles
divides the study area into a specified number
of rectangular tiles. Optionally small partitions can be merged with
adjacent tiles to achieve a minimum number or percentage of samples in each
tile.
partition_tiles( data, coords = c("x", "y"), dsplit = NULL, nsplit = NULL, rotation = c("none", "random", "user"), user_rotation, offset = c("none", "random", "user"), user_offset, reassign = TRUE, min_frac = 0.025, min_n = 5, iterate = 1, return_factor = FALSE, repetition = 1, seed1 = NULL )
partition_tiles( data, coords = c("x", "y"), dsplit = NULL, nsplit = NULL, rotation = c("none", "random", "user"), user_rotation, offset = c("none", "random", "user"), user_offset, reassign = TRUE, min_frac = 0.025, min_n = 5, iterate = 1, return_factor = FALSE, repetition = 1, seed1 = NULL )
data |
|
coords |
vector of length 2 defining the variables in |
dsplit |
optional vector of length 2: equidistance of splits in
(possibly rotated) x direction ( |
nsplit |
optional vector of length 2: number of splits in (possibly
rotated) x direction ( |
rotation |
indicates whether and how the rectangular grid should be
rotated; random rotation is only between |
user_rotation |
if |
offset |
indicates whether and how the rectangular grid should be shifted by an offset. |
user_offset |
if |
reassign |
logical (default |
min_frac |
numeric >=0, <1: minimum relative size of partition as
percentage of sample; argument passed to get_small_tiles. Will be ignored
if |
min_n |
integer >=0: minimum number of samples per partition; argument
passed to get_small_tiles. Will be ignored if |
iterate |
argument to be passed to tile_neighbors |
return_factor |
if |
repetition |
numeric vector: cross-validation repetitions to be
generated. Note that this is not the number of repetitions, but the indices
of these repetitions. E.g., use |
seed1 |
|
A represampling object. Contains length(repetition)
resampling
objects as repetitions. The exact number of folds / test-set tiles within
each resampling objects depends on the spatial configuration of the data
set and possible cleaning steps (see min_frac
, min_n
).
Default parameter settings may change in future releases. This
function, especially the rotation and shifting part of it and the algorithm
for cleaning up small tiles is still a bit experimental. Use with caution.
For non-zero offsets (offset!='none')
), the number of tiles may actually
be greater than nsplit[1]*nsplit[2]
because of fractional tiles lurking
into the study region. reassign=TRUE
with suitable thresholds is
therefore recommended for non-zero (including random) offsets.
sperrorest, as.resampling.factor, get_small_tiles, tile_neighbors
data(ecuador) set.seed(42) parti <- partition_tiles(ecuador, nsplit = c(4, 3), reassign = FALSE) # plot(parti,ecuador) # tile A4 has only 55 samples # same partitioning, but now merge tiles with less than 100 samples to # adjacent tiles: parti2 <- partition_tiles(ecuador, nsplit = c(4, 3), reassign = TRUE, min_n = 100 ) # plot(parti2,ecuador) summary(parti2) # tile B4 (in 'parti') was smaller than A3, therefore A4 was merged with B4, # not with A3 # now with random rotation and offset, and tiles of 2000 m length: parti3 <- partition_tiles(ecuador, dsplit = 2000, offset = "random", rotation = "random", reassign = TRUE, min_n = 100 ) # plot(parti3, ecuador) summary(parti3)
data(ecuador) set.seed(42) parti <- partition_tiles(ecuador, nsplit = c(4, 3), reassign = FALSE) # plot(parti,ecuador) # tile A4 has only 55 samples # same partitioning, but now merge tiles with less than 100 samples to # adjacent tiles: parti2 <- partition_tiles(ecuador, nsplit = c(4, 3), reassign = TRUE, min_n = 100 ) # plot(parti2,ecuador) summary(parti2) # tile B4 (in 'parti') was smaller than A3, therefore A4 was merged with B4, # not with A3 # now with random rotation and offset, and tiles of 2000 m length: parti3 <- partition_tiles(ecuador, dsplit = 2000, offset = "random", rotation = "random", reassign = TRUE, min_n = 100 ) # plot(parti3, ecuador) summary(parti3)
plot.represampling
displays the partitions or samples
corresponding arising from the resampling of a data set.
## S3 method for class 'represampling' plot(x, data, coords = c("x", "y"), pch = "+", wiggle_sd = 0, ...) ## S3 method for class 'resampling' plot(x, ...)
## S3 method for class 'represampling' plot(x, data, coords = c("x", "y"), pch = "+", wiggle_sd = 0, ...) ## S3 method for class 'resampling' plot(x, ...)
x |
a represampling resp. resampling object. |
data |
a |
coords |
vector of length 2 defining the variables in |
pch |
point symbol (to be passed to points). |
wiggle_sd |
'wiggle' the point locations in x and y direction to avoid overplotting of samples drawn multiple times by bootstrap methods; this is a standard deviation (in the units of the x/y coordinates) of a normal distribution and defaults to 0 (no wiggling). |
... |
additional arguments to plot. |
This function is not intended for samples obtained by resampling with replacement (e.g., bootstrap) because training and test points will be overplotted in that case. The size of the plotting region will also limit the number of maps that can be displayed at once, i.e., the number of rows (repetitions) and fields (columns).
data(ecuador) # non-spatial cross-validation: resamp <- partition_cv(ecuador, nfold = 5, repetition = 1:2) # plot(resamp, ecuador) # spatial cross-validation using k-means clustering: resamp <- partition_kmeans(ecuador, nfold = 5, repetition = 1:2) # plot(resamp, ecuador)
data(ecuador) # non-spatial cross-validation: resamp <- partition_cv(ecuador, nfold = 5, repetition = 1:2) # plot(resamp, ecuador) # spatial cross-validation using k-means clustering: resamp <- partition_kmeans(ecuador, nfold = 5, repetition = 1:2) # plot(resamp, ecuador)
represampling_bootstrap
draws a bootstrap random sample (with
replacement) from data
.
represampling_bootstrap( data, coords = c("x", "y"), nboot = nrow(data), repetition = 1, seed1 = NULL, oob = FALSE )
represampling_bootstrap( data, coords = c("x", "y"), nboot = nrow(data), repetition = 1, seed1 = NULL, oob = FALSE )
data |
|
coords |
vector of length 2 defining the variables in |
nboot |
Size of bootstrap sample |
repetition |
numeric vector: cross-validation repetitions to be
generated. Note that this is not the number of repetitions, but the indices
of these repetitions. E.g., use |
seed1 |
|
oob |
logical (default |
A represampling object. This is a (named) list containing
length(repetition)
. resampling objects. Each of these contains only one
list with indices of train
ing and test
samples. Indices are row indices
for data
.
data(ecuador) # only 10 bootstrap repetitions, normally use >=100: parti <- represampling_bootstrap(ecuador, repetition = 10) # plot(parti, ecuador) # careful: overplotting occurs # because some samples are included in both the training and # the test sample (possibly even multiple times)
data(ecuador) # only 10 bootstrap repetitions, normally use >=100: parti <- represampling_bootstrap(ecuador, repetition = 10) # plot(parti, ecuador) # careful: overplotting occurs # because some samples are included in both the training and # the test sample (possibly even multiple times)
represampling_disc_bootstrap
performs a spatial block
bootstrap by resampling at the level of rectangular partitions or 'tiles'
generated by partition_tiles
.
represampling_disc_bootstrap( data, coords = c("x", "y"), nboot, repetition = 1, seed1 = NULL, oob = FALSE, ... )
represampling_disc_bootstrap( data, coords = c("x", "y"), nboot, repetition = 1, seed1 = NULL, oob = FALSE, ... )
data |
|
coords |
vector of length 2 defining the variables in |
nboot |
number of bootstrap samples; you may specify different values
for the training sample ( |
repetition |
numeric vector: cross-validation repetitions to be
generated. Note that this is not the number of repetitions, but the indices
of these repetitions. E.g., use |
seed1 |
|
oob |
logical (default |
... |
additional arguments to be passed to partition_disc; note that a
|
Performs nboot
out of nrow(data)
resampling of circular discs. This
is an overlapping spatial block bootstrap where the blocks are circular.
data(ecuador) # Overlapping disc bootstrap: parti <- represampling_disc_bootstrap(ecuador, radius = 200, nboot = 20, oob = FALSE ) # plot(parti, ecuador) # Note that a 'buffer' argument would make no difference because boostrap # sets of discs are drawn independently for the training and test sample. # # Overlapping disc bootstrap for training sample, out-of-bag sample as test # sample: parti <- represampling_disc_bootstrap(ecuador, radius = 200, buffer = 200, nboot = 10, oob = TRUE ) # plot(parti,ecuador)
data(ecuador) # Overlapping disc bootstrap: parti <- represampling_disc_bootstrap(ecuador, radius = 200, nboot = 20, oob = FALSE ) # plot(parti, ecuador) # Note that a 'buffer' argument would make no difference because boostrap # sets of discs are drawn independently for the training and test sample. # # Overlapping disc bootstrap for training sample, out-of-bag sample as test # sample: parti <- represampling_disc_bootstrap(ecuador, radius = 200, buffer = 200, nboot = 10, oob = TRUE ) # plot(parti,ecuador)
represampling_factor_bootstrap
resamples partitions defined by
a factor variable. This can be used for non-overlapping block bootstraps
and similar.
represampling_factor_bootstrap( data, fac, repetition = 1, nboot = -1, seed1 = NULL, oob = FALSE )
represampling_factor_bootstrap( data, fac, repetition = 1, nboot = -1, seed1 = NULL, oob = FALSE )
data |
|
fac |
defines a grouping or partitioning of the samples in |
repetition |
numeric vector: cross-validation repetitions to be
generated. Note that this is not the number of repetitions, but the indices
of these repetitions. E.g., use |
nboot |
number of bootstrap replications used for generating the
bootstrap training sample ( |
seed1 |
|
oob |
if |
nboot
refers to the number of groups (as defined by the factors)
to be drawn with replacement from the set of groups. I.e., if fac
is a
factor variable, nboot
would normally not be greater than nlevels(fac)
,
nlevels(fac)
being the default as per nboot = -1
.
represampling_disc_bootstrap, represampling_tile_bootstrap
data(ecuador) # a dummy example for demonstration, performing bootstrap # at the level of an arbitrary factor variable: parti <- represampling_factor_bootstrap(ecuador, factor(floor(ecuador$dem / 100)), oob = TRUE ) # plot(parti,ecuador) # using the factor bootstrap for a non-overlapping block bootstrap # (see also represampling_tile_bootstrap): fac <- partition_tiles(ecuador, return_factor = TRUE, repetition = c(1:3), dsplit = 500, min_n = 200, rotation = "random", offset = "random" ) parti <- represampling_factor_bootstrap(ecuador, fac, oob = TRUE, repetition = c(1:3) ) # plot(parti, ecuador)
data(ecuador) # a dummy example for demonstration, performing bootstrap # at the level of an arbitrary factor variable: parti <- represampling_factor_bootstrap(ecuador, factor(floor(ecuador$dem / 100)), oob = TRUE ) # plot(parti,ecuador) # using the factor bootstrap for a non-overlapping block bootstrap # (see also represampling_tile_bootstrap): fac <- partition_tiles(ecuador, return_factor = TRUE, repetition = c(1:3), dsplit = 500, min_n = 200, rotation = "random", offset = "random" ) parti <- represampling_factor_bootstrap(ecuador, fac, oob = TRUE, repetition = c(1:3) ) # plot(parti, ecuador)
represampling_tile_bootstrap
performs a non-overlapping
spatial block bootstrap by resampling at the level of rectangular
partitions or 'tiles' generated by partition_tiles.
represampling_tile_bootstrap( data, coords = c("x", "y"), repetition = 1, nboot = -1, seed1 = NULL, oob = FALSE, ... )
represampling_tile_bootstrap( data, coords = c("x", "y"), repetition = 1, nboot = -1, seed1 = NULL, oob = FALSE, ... )
data |
|
coords |
vector of length 2 defining the variables in |
repetition |
numeric vector: cross-validation repetitions to be
generated. Note that this is not the number of repetitions, but the indices
of these repetitions. E.g., use |
nboot |
|
seed1 |
|
oob |
|
... |
additional arguments to be passed to partition_tiles |
resample_factor
draws a random (sub)sample (with or without
replacement) of the groups or clusters identified by the fac
argument.
resample_factor(data, param = list(fac = "class", n = Inf, replace = FALSE))
resample_factor(data, param = list(fac = "class", n = Inf, replace = FALSE))
data |
a |
param |
a list with the following components: |
If param$replace=FALSE
, a subsample of
min(param$n,nlevel(data[,fac]))
groups will be drawn from data
. If
param$replace=TRUE
, the number of groups to be drawn is param$n
.
a data.frame
containing a subset of the rows of data
.
resample_strat_uniform()
, sample()
resample_strat_uniform
draws a stratified random sample (with
or without replacement) from the samples in data
. Stratification is over
the levels of data[, param$response]
. The same number of samples is drawn
within each level.
resample_strat_uniform( data, param = list(strat = "class", nstrat = Inf, replace = FALSE) )
resample_strat_uniform( data, param = list(strat = "class", nstrat = Inf, replace = FALSE) )
data |
a |
param |
a list with the following components: |
If param$replace=FALSE
, a subsample of size
min(param$n,nrow(data))
will be drawn from data
. If param$replace=TRUE
,
the size of the subsample is param$n
.
a data.frame
containing a subset of the rows of data
.
data(ecuador) # Muenchow et al. (2012), see ?ecuador d <- resample_strat_uniform(ecuador, param = list(strat = "slides", nstrat = 100) ) nrow(d) # == 200 sum(d$slides == "TRUE") # == 100
data(ecuador) # Muenchow et al. (2012), see ?ecuador d <- resample_strat_uniform(ecuador, param = list(strat = "slides", nstrat = 100) ) nrow(d) # == 200 sum(d$slides == "TRUE") # == 100
resample_uniform
draws a random (sub)sample (with or without
replacement) from the samples in data
.
resample_uniform(data, param = list(n = Inf, replace = FALSE))
resample_uniform(data, param = list(n = Inf, replace = FALSE))
data |
a |
param |
a list with the following components: |
If param$replace=FALSE
, a subsample of size
min(param$n,nrow(data))
will be drawn from data
. If
param$replace=TRUE
, the size of the subsample is param$n
.
a data.frame
containing a subset of the rows of data
.
resample_strat_uniform()
, sample()
# Muenchow et al. (2012), see ?ecuador d <- resample_uniform(ecuador, param = list(strat = "slides", n = 200)) # == 200 sum(d$slides == "TRUE")
# Muenchow et al. (2012), see ?ecuador d <- resample_uniform(ecuador, param = list(strat = "slides", n = 200)) # == 200 sum(d$slides == "TRUE")
sperrorest is a flexible interface for multiple types of parallelized spatial and non-spatial cross-validation and bootstrap error estimation and parallelized permutation-based assessment of spatial variable importance.
sperrorest( formula, data, coords = c("x", "y"), model_fun, model_args = list(), pred_fun = NULL, pred_args = list(), smp_fun = partition_cv, smp_args = list(), train_fun = NULL, train_param = NULL, test_fun = NULL, test_param = NULL, err_fun = err_default, imp_variables = NULL, imp_permutations = 1000, imp_sample_from = c("test", "train", "all"), importance = !is.null(imp_variables), distance = FALSE, do_gc = 1, progress = "all", benchmark = FALSE, mode_rep = c("future", "sequential", "loop"), mode_fold = c("sequential", "future", "loop"), verbose = 0 )
sperrorest( formula, data, coords = c("x", "y"), model_fun, model_args = list(), pred_fun = NULL, pred_args = list(), smp_fun = partition_cv, smp_args = list(), train_fun = NULL, train_param = NULL, test_fun = NULL, test_param = NULL, err_fun = err_default, imp_variables = NULL, imp_permutations = 1000, imp_sample_from = c("test", "train", "all"), importance = !is.null(imp_variables), distance = FALSE, do_gc = 1, progress = "all", benchmark = FALSE, mode_rep = c("future", "sequential", "loop"), mode_fold = c("sequential", "future", "loop"), verbose = 0 )
formula |
A formula specifying the variables used by the |
data |
a |
coords |
vector of length 2 defining the variables in |
model_fun |
Function that fits a predictive model, such as |
model_args |
Arguments to be passed to |
pred_fun |
Prediction function for a fitted model object created by
|
pred_args |
(optional) Arguments to |
smp_fun |
A function for sampling training and test sets from |
smp_args |
(optional) Arguments to be passed to |
train_fun |
(optional) A function for resampling or subsampling the training sample in order to achieve, e.g., uniform sample sizes on all training sets, or maintaining a certain ratio of positives and negatives in training sets. E.g. resample_uniform or resample_strat_uniform. |
train_param |
(optional) Arguments to be passed to |
test_fun |
(optional) Like |
test_param |
(optional) Arguments to be passed to |
err_fun |
A function that calculates selected error measures from the
known responses in |
imp_variables |
(optional; used if |
imp_permutations |
(optional; used if |
imp_sample_from |
(default: |
importance |
logical (default: |
distance |
logical (default: |
do_gc |
numeric (default: 1): defines frequency of memory garbage
collection by calling gc; if |
progress |
character (default: |
benchmark |
(optional) logical (default: |
mode_rep , mode_fold
|
character (default: |
verbose |
Controls the amount of information printed while processing. Defaults to 0 (no output). |
Custom predict functions passed to pred_fun
, which consist of
multiple child functions, must be defined in one function.
A list (object of class sperrorest) with (up to) six components:
error_rep: sperrorestreperror
containing
predictive performances at the repetition level
error_fold: sperroresterror
object containing predictive
performances at the fold level
represampling: represampling object
importance: sperrorestimportance
object containing
permutation-based variable importances at the fold level
benchmark: sperrorestbenchmark
object containing
information on the system the code is running on, starting and
finishing times, number of available CPU cores and runtime performance
package_version: sperrorestpackageversion
object containing
information about the sperrorest package version
Running in parallel is supported via package future.
Have a look at vignette("future-1-overview", package = "future")
.
In short: Choose a backend and specify the number of workers, then call
sperrorest()
as usual. Example:
future::plan(future.callr::callr, workers = 2) sperrorest()
Parallelization at the repetition is recommended when using repeated cross-validation. If the 'granularity' of parallelized function calls is too fine, the overall runtime will be very poor since the overhead for passing arguments and handling environments becomes too large. Use fold-level parallelization only when the processing time of individual folds is very large and the number of repetitions is small or equals 1.
Note that nested calls to future
are not possible.
Therefore a sequential sperrorest
call should be used for
hyperparameter tuning in a nested cross-validation.
Brenning, A. 2012. Spatial cross-validation and bootstrap for the assessment of prediction rules in remote sensing: the R package 'sperrorest'. 2012 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), 23-27 July 2012, p. 5372-5375. https://ieeexplore.ieee.org/document/6352393
Brenning, A. 2005. Spatial prediction models for landslide hazards: review, comparison and evaluation. Natural Hazards and Earth System Sciences, 5(6), 853-862. doi:10.5194/nhess-5-853-2005
Brenning, A., S. Long & P. Fieguth. 2012. Detecting rock glacier flow structures using Gabor filters and IKONOS imagery. Remote Sensing of Environment, 125, 227-237. doi:10.1016/j.rse.2012.07.005
Russ, G. & A. Brenning. 2010a. Data mining in precision agriculture: Management of spatial information. In 13th International Conference on Information Processing and Management of Uncertainty, IPMU 2010; Dortmund; 28 June - 2 July 2010. Lecture Notes in Computer Science, 6178 LNAI: 350-359.
Russ, G. & A. Brenning. 2010b. Spatial variable importance assessment for yield prediction in Precision Agriculture. In Advances in Intelligent Data Analysis IX, Proceedings, 9th International Symposium, IDA 2010, Tucson, AZ, USA, 19-21 May 2010. Lecture Notes in Computer Science, 6065 LNCS: 184-195.
## ------------------------------------------------------------ ## Classification tree example using non-spatial partitioning ## ------------------------------------------------------------ # Muenchow et al. (2012), see ?ecuador fo <- slides ~ dem + slope + hcurv + vcurv + log.carea + cslope library(rpart) mypred_part <- function(object, newdata) predict(object, newdata)[, 2] ctrl <- rpart.control(cp = 0.005) # show the effects of overfitting # show the effects of overfitting fit <- rpart(fo, data = ecuador, control = ctrl) ### Non-spatial cross-validation: mypred_part <- function(object, newdata) predict(object, newdata)[, 2] nsp_res <- sperrorest( data = ecuador, formula = fo, model_fun = rpart, model_args = list(control = ctrl), pred_fun = mypred_part, progress = TRUE, smp_fun = partition_cv, smp_args = list(repetition = 1:2, nfold = 3) ) summary(nsp_res$error_rep) summary(nsp_res$error_fold) summary(nsp_res$represampling) # plot(nsp_res$represampling, ecuador) ### Spatial cross-validation: sp_res <- sperrorest( data = ecuador, formula = fo, model_fun = rpart, model_args = list(control = ctrl), pred_fun = mypred_part, progress = TRUE, smp_fun = partition_kmeans, smp_args = list(repetition = 1:2, nfold = 3) ) summary(sp_res$error_rep) summary(sp_res$error_fold) summary(sp_res$represampling) # plot(sp_res$represampling, ecuador) smry <- data.frame( nonspat_training = unlist(summary(nsp_res$error_rep, level = 1 )$train_auroc), nonspat_test = unlist(summary(nsp_res$error_rep, level = 1 )$test_auroc), spatial_training = unlist(summary(sp_res$error_rep, level = 1 )$train_auroc), spatial_test = unlist(summary(sp_res$error_rep, level = 1 )$test_auroc) ) boxplot(smry, col = c("red", "red", "red", "green"), main = "Training vs. test, nonspatial vs. spatial", ylab = "Area under the ROC curve" )
## ------------------------------------------------------------ ## Classification tree example using non-spatial partitioning ## ------------------------------------------------------------ # Muenchow et al. (2012), see ?ecuador fo <- slides ~ dem + slope + hcurv + vcurv + log.carea + cslope library(rpart) mypred_part <- function(object, newdata) predict(object, newdata)[, 2] ctrl <- rpart.control(cp = 0.005) # show the effects of overfitting # show the effects of overfitting fit <- rpart(fo, data = ecuador, control = ctrl) ### Non-spatial cross-validation: mypred_part <- function(object, newdata) predict(object, newdata)[, 2] nsp_res <- sperrorest( data = ecuador, formula = fo, model_fun = rpart, model_args = list(control = ctrl), pred_fun = mypred_part, progress = TRUE, smp_fun = partition_cv, smp_args = list(repetition = 1:2, nfold = 3) ) summary(nsp_res$error_rep) summary(nsp_res$error_fold) summary(nsp_res$represampling) # plot(nsp_res$represampling, ecuador) ### Spatial cross-validation: sp_res <- sperrorest( data = ecuador, formula = fo, model_fun = rpart, model_args = list(control = ctrl), pred_fun = mypred_part, progress = TRUE, smp_fun = partition_kmeans, smp_args = list(repetition = 1:2, nfold = 3) ) summary(sp_res$error_rep) summary(sp_res$error_fold) summary(sp_res$represampling) # plot(sp_res$represampling, ecuador) smry <- data.frame( nonspat_training = unlist(summary(nsp_res$error_rep, level = 1 )$train_auroc), nonspat_test = unlist(summary(nsp_res$error_rep, level = 1 )$test_auroc), spatial_training = unlist(summary(sp_res$error_rep, level = 1 )$train_auroc), spatial_test = unlist(summary(sp_res$error_rep, level = 1 )$test_auroc) ) boxplot(smry, col = c("red", "red", "red", "green"), main = "Training vs. test, nonspatial vs. spatial", ylab = "Area under the ROC curve" )
Calculates sample sizes of training and test sets within
repetitions and folds of a resampling
or represampling
object.
## S3 method for class 'represampling' summary(object, ...) ## S3 method for class 'resampling' summary(object, ...)
## S3 method for class 'represampling' summary(object, ...) ## S3 method for class 'resampling' summary(object, ...)
object |
A |
... |
currently ignored. |
A list of data.frame
s summarizing the sample sizes of training and
test sets in each fold of each repetition.
summary.sperroresterror
calculates mean, standard deviation,
median etc. of the calculated error measures at the specified level
(overall, repetition, or fold). summary.sperrorestreperror
does the same
with the pooled error, at the overall or repetition level.
## S3 method for class 'sperroresterror' summary(object, level = 0, pooled = TRUE, na.rm = TRUE, ...)
## S3 method for class 'sperroresterror' summary(object, level = 0, pooled = TRUE, na.rm = TRUE, ...)
object |
|
level |
Level at which errors are summarized: 0: overall (i.e. across all repetitions); 1: repetition; 2: fold |
pooled |
If |
na.rm |
Remove |
... |
additional arguments (currently ignored) |
Let's use an example to explain the error_rep
argument. E.g.,
assume we are using 100-repeated 10-fold cross-validation. If error_rep = TRUE
(default), the mean and standard deviation calculated when
summarizing at level = 0
are calculated across the error estimates
obtained for each of the 100*10 = 1000
folds. If error_rep = FALSE
,
mean and standard deviation are calculated across the 100
repetitions,
using the weighted average of the fold-level errors to calculate an error
value for the entire sample. This will essentially not affect the mean
value but of course the standard deviation of the error.
error_rep = FALSE
is not recommended, it is mainly for testing purposes;
when the test sets are small (as in leave-one-out cross-validation, in the
extreme case), consider running sperrorest with error_rep = TRUE
and
examine only the error_rep
component of its result.
Depending on the level of aggregation, a list
or data.frame
with
mean, and at level 0 also standard deviation, median and IQR of the error
measures.
summary.sperrorestimportance
calculated mean, standard
deviation, median etc. of the calculated error measures at the specified
level (overall, repetition, or fold).
## S3 method for class 'sperrorestimportance' summary(object, level = 0, na.rm = TRUE, which = NULL, ...)
## S3 method for class 'sperrorestimportance' summary(object, level = 0, na.rm = TRUE, which = NULL, ...)
object |
|
level |
Level at which errors are summarized: 0: overall (i.e. across all repetitions); 1: repetition; 2: fold |
na.rm |
Remove |
which |
optional character vector specifying selected variables for which the importances should be summarized |
... |
additional arguments (currently ignored) |
a list or data.frame, depending on the level
of aggregation
Summary methods provide varying level of detail while print methods provide full details.
## S3 method for class 'sperrorestreperror' summary(object, level = 0, na.rm = TRUE, ...) ## S3 method for class 'sperrorest' summary(object, ...) ## S3 method for class 'sperrorestimportance' print(x, ...) ## S3 method for class 'sperroresterror' print(x, ...) ## S3 method for class 'sperrorestreperror' print(x, ...) ## S3 method for class 'sperrorest' print(x, ...) ## S3 method for class 'sperrorestbenchmarks' print(x, ...) ## S3 method for class 'sperrorestpackageversion' print(x, ...)
## S3 method for class 'sperrorestreperror' summary(object, level = 0, na.rm = TRUE, ...) ## S3 method for class 'sperrorest' summary(object, ...) ## S3 method for class 'sperrorestimportance' print(x, ...) ## S3 method for class 'sperroresterror' print(x, ...) ## S3 method for class 'sperrorestreperror' print(x, ...) ## S3 method for class 'sperrorest' print(x, ...) ## S3 method for class 'sperrorestbenchmarks' print(x, ...) ## S3 method for class 'sperrorestpackageversion' print(x, ...)
object |
a sperrorest object |
level |
Level at which errors are summarized: 0: overall (i.e. across all repetitions); 1: repetition; 2: fold |
na.rm |
Remove |
... |
additional arguments for summary.sperroresterror or summary.sperrorestimportance |
x |
Depending on method, a sperrorest, |
sperrorest, summary.sperroresterror, summary.sperrorestimportance
This based on 'counting' up and down based on the tile name.
tile_neighbors(nm, tileset, iterate = 0, diagonal = FALSE)
tile_neighbors(nm, tileset, iterate = 0, diagonal = FALSE)
nm |
Character string or factor: name of a tile, e.g., |
tileset |
Admissible tile names; if missing and |
iterate |
internal - do not change default: to control behaviour in an interactive call to this function. |
diagonal |
if |
Character string.