autoSTK provides automatic spatio-temporal Kriging inspired by
automap (Hiemstra et al. 2010). It extends the automap workflow to
spatio-temporal variogram fitting, Kriging prediction, and cross-validation
for common R spatio-temporal data classes.
remotes::install_github("sigmafelix/autoSTK")Before installing, make sure the core spatial dependencies are available:
gstat, spacetime, sp, sf, sftime, and automap. The cubble
package is suggested and is only required when working directly with
cubble_df objects.
- Automatic spatio-temporal variogram fitting with
autofitVariogramST(). - Automatic spatio-temporal Kriging with
autoKrigeST(). - Explicit fit/predict workflow with
fitVariogramST()andpredictKrigeST(). - Cross-validation with
autoKrigeST.cv()using spatial, temporal, spatio-temporal, or random folds. - Support for
STFDF,STSDF,STIDF,sftime, andcubbleinputs in the main fitting and prediction workflow. - Conversion helper
cubble_to_sftime()for nested or temporalcubbleobjects. - Multiple spatio-temporal variogram structures, including
sumMetric,metric,productSum, andseparable. - Multiple optimizers, including
lbfgsb,grid,sa, andga.
autofitVariogramST(): automatically fits a spatio-temporal variogram.fitVariogramST(): fits a variogram and returns a reusableSTVariogramFitobject.predictKrigeST(): predicts from a fittedSTVariogramFitobject.autoKrigeST(): fits the variogram and predicts in one call.autoKrigeST.cv(): cross-validates spatio-temporal Kriging results.cubble_to_sftime(): convertscubble_dfobjects tosftime.
The main Kriging workflow accepts both legacy spacetime classes and modern
simple-feature classes:
| Input class | Typical use |
|---|---|
STFDF |
Full space-time grids, such as all stations observed at all dates. |
STSDF |
Sparse space-time grids, such as missing station-date observations. |
STIDF |
Irregular space-time observations. |
sftime |
Simple-feature workflows with one geometry and one time column. |
cubble_df |
Tidy spatio-temporal data stored in spatial and temporal faces. |
Internally, autoSTK converts modern inputs to the spacetime classes
required by gstat::variogramST() and gstat::krigeST().
Use autoKrigeST() when you have repeated measurements at monitoring
stations and want predictions on a future spatio-temporal grid. This is the
classic STFDF/STSDF workflow.
library(autoSTK)
library(gstat)
library(sp)
library(spacetime)
library(stars)
data(air)
deair <- STFDF(stations, dates, data.frame(PM10 = as.vector(air)))
deair_sf <- st_as_stars(deair) |>
st_transform("+proj=longlat +ellps=sphere") |>
st_transform(3857)
deair_r <- as(deair_sf, "STFDF")
deair_r@sp@proj4string <- CRS(
"+proj=merc +a=6378137 +b=6378137 +lat_ts=0 +lon_0=0 +x_0=0 +y_0=0 +k=1 +units=m +nadgrids=@null +wktext +no_defs +type=crs"
)
deair_rs <- deair_r[, 3701:3800]
deair_rss <- as(deair_rs, "STSDF")
akst_stk <- autoKrigeST(
formula = PM10 ~ 1,
input_data = deair_rss,
cutoff = 300000,
width = 30000,
type_stv = "sumMetric",
model = c("Exc", "Mat", "Ste", "Exp", "Wav"),
tlags = 0:7,
cores = 8,
optimizer = "sa",
verbose = TRUE
)
akst_stk_stars <- st_as_stars(akst_stk$krige_output)
plot(akst_stk_stars[1, ])Use sftime when your data already lives in an sf-style workflow with one
geometry column and one time column. autoKrigeST() accepts sftime for both
training data and prediction data.
library(autoSTK)
library(sf)
library(sftime)
# station_long has columns:
# id, time, PM10, geometry
# where geometry is an sfc_POINT column and time is Date/POSIXct.
station_sf <- st_as_sf(
station_long,
sf_column_name = "geometry",
crs = 3857
)
station_st <- st_as_sftime(
station_sf,
time_column_name = "time",
sf_column_name = "geometry"
)
fit <- fitVariogramST(
formula = PM10 ~ 1,
data = station_st,
typestv = "sumMetric",
candidate_model = c("Sph", "Exp", "Gau", "Ste"),
cutoff = 300000,
width = 30000,
tlags = 0:7
)
pred <- predictKrigeST(
fit = fit,
data = station_st,
newdata = prediction_grid_st,
formula = PM10 ~ 1,
nmax = 40
)fitVariogramST() and predictKrigeST() are useful when you want to reuse the
same fitted variogram across several prediction grids.
Use cubble when your workflow stores station metadata in the spatial face
and repeated measurements in the temporal face. autoSTK can convert either
the nested or temporal face to sftime.
library(autoSTK)
library(cubble)
# cb is a cubble_df created from an sf object with station geometries and a
# temporal table containing PM10 observations.
pm10_st <- cubble_to_sftime(cb, time_col = "time", key_col = "id")
akst_cb <- autoKrigeST(
formula = PM10 ~ 1,
input_data = pm10_st,
cutoff = 300000,
width = 30000,
tlags = 0:7,
type_stv = "sumMetric"
)You can also pass a cubble_df directly to the main Kriging workflow when
supplying a prediction grid:
akst_cb <- autoKrigeST(
formula = PM10 ~ 1,
input_data = cb,
new_data = prediction_grid_st,
cutoff = 300000,
width = 30000,
tlags = 0:7
)For cross-validation, convert cubble or sftime data to a spacetime
object first, because autoKrigeST.cv() builds folds from the @sp and
@time slots.
pm10_stfdf <- as(as(pm10_st, "STIDF"), "STFDF")
akst_cv_cb <- autoKrigeST.cv(
formula = PM10 ~ 1,
data = pm10_stfdf,
nfold = 3,
fold_dim = "temporal",
cutoff = 300000,
width = 30000,
tlags = 0:7
)autoKrigeST.cv() evaluates prediction performance by holding out different
parts of the spatio-temporal data set.
fold_dim |
What is held out | Good for |
|---|---|---|
"spatial" |
Groups of monitoring locations | Testing transfer to unseen stations. |
"temporal" |
Groups of dates/times | Testing forecasting over unseen periods. |
"spacetime" |
Blocks of locations and times | Testing combined spatial and temporal extrapolation. |
"random" |
Random observations | Testing interpolation under scattered missingness. |
akst_cv_t <- autoKrigeST.cv(
formula = PM10 ~ 1,
data = deair_rs,
nfold = 3,
fold_dim = "temporal",
cutoff = 300000,
width = 30000,
tlags = 0:7,
cores = 8,
optimizer = "sa"
)
akst_cv_s <- autoKrigeST.cv(
formula = PM10 ~ 1,
data = deair_rs,
nfold = 3,
fold_dim = "spatial",
cutoff = 300000,
width = 30000,
tlags = 0:7,
cores = 8
)
akst_cv_spt <- autoKrigeST.cv(
formula = PM10 ~ 1,
data = deair_rs,
nfold = 4,
fold_dim = "spacetime",
cutoff = 300000,
width = 30000,
tlags = 0:7,
cores = 8
)The result is a data frame with one row per fold:
akst_cv_t
#> CVFold RMSE MAE BIAS
#> 1 1 ...
#> 2 2 ...
#> 3 3 ...The columns are:
CVFold: fold identifier.RMSE: root mean squared prediction error.MAE: mean absolute prediction error.BIAS: mean prediction minus mean observed value.
Summarise the cross-validation results to compare model settings:
summary_cv <- data.frame(
fold_dim = "temporal",
mean_RMSE = mean(akst_cv_t$RMSE, na.rm = TRUE),
mean_MAE = mean(akst_cv_t$MAE, na.rm = TRUE),
mean_BIAS = mean(akst_cv_t$BIAS, na.rm = TRUE)
)
summary_cvLower RMSE and MAE indicate smaller prediction errors. BIAS values close
to zero indicate little systematic over- or under-prediction.
For fold_dim = "spacetime", nfold must be a perfect square, such as 4,
9, or 16, because the folds are built from spatial groups crossed with
temporal groups.
- Projected coordinate reference systems are recommended because variogram distances are computed in map units.
- Universal Kriging formulas such as
PM10 ~ elevation + trafficrequirenew_datawith matching covariate columns. predict_chunkcan be used for large prediction grids to reduce memory pressure.variogram_from_full = TRUEinautoKrigeST.cv()fits one variogram on the full data set and reuses it across folds, which can be much faster when that assumption is appropriate.
