| Title: | Nonparametric Kernel Estimation of Distribution Function |
|---|---|
| Description: | Nonparametric kernel distribution function estimation is performed. Three bandwidth selectors are implemented: the plug-in selectors of Altman and Leger and of Polansky and Baker, and the cross-validation selector of Bowman, Hall and Prvan. The exceedance function, the mean return period and the return level are also computed. For details, see Quintela-del-Río and Estévez-Pérez (2012) <doi:10.18637/jss.v050.i08>. |
| Authors: | Alejandro Quintela-del-Río [aut], Graciela Estévez-Pérez [aut], Ignacio López-de-Ullibarri [cre] |
| Maintainer: | Ignacio López-de-Ullibarri <[email protected]> |
| License: | GPL (>= 2) |
| Version: | 1.3-1 |
| Built: | 2026-05-21 09:21:53 UTC |
| Source: | https://github.com/cran/kerdiest |
Nonparametric kernel distribution function estimation for continuous random variables is performed. Three automatic bandwidth selection procedures for nonparametric kernel distribution function estimation are implemented: the plug-in method of Altman and Leger, the plug-in method of Polansky and Baker, and the modified cross-validation method of Bowman, Hall and Prvan. The exceedance function, the mean return period and the return level are also computed.
Graciela Estévez Pérez and Alejandro Quintela del Río
Maintainer: Ignacio López-de-Ullibarri
Quintela-del-Río, A. and Estévez-Pérez, G. (2012), "Nonparametric kernel distribution function estimation with kerdiest: an R package for bandwidth choice and applications", Journal of Statistical Software, 50(8), 1-21.
The bandwidth parameter for the distribution function kernel estimator is calculated, using the plug-in method of Altman and Leger (1995). Four possible kernel functions can be used for the kernel estimator: "e" Epanechnikov, "n" Normal, "b" Biweight and "t" Triweight.
ALbw(type_kernel = "n", vec_data)ALbw(type_kernel = "n", vec_data)
type_kernel |
The kernel function. You can use four types: "e" Epanechnikov, "n" Normal, "b" Biweight and "t" Triweight. The normal kernel is used by default. |
vec_data |
The data sample. |
Altman and Leger (1995) recommend the use of the Epanechnikov kernel, because in this case the rate of convergence for the kernel derivative estimator is improved. For the sake of uniformity along the package, the gaussian kernel is used by default, but the user can obviously choose the Epanechnikov function.
A numeric value for the bandwidth parameter.
Graciela Estévez Pérez and Alejandro Quintela del Río
Altman, N. and Leger, C. (1995), "Bandwidth selection for kernel distribution function estimation", Journal of Statistical Planning and Inference, 46, 195-214.
Quintela-del-Río, A. and Estévez-Pérez, G. (2012), "Nonparametric kernel distribution function estimation with kerdiest: an R package for bandwidth choice and applications", Journal of Statistical Software, 50(8), 1-21.
## Compute the plug-in bandwidth for a sample of 100 random N(0,1) data x <- rnorm(100, 0, 1) h_AL <- ALbw(type_kernel = "e", vec_data = x) h_AL ## A quick plot of a distribution function estimate x <- rnorm(1000) h_AL <- ALbw(vec_data = x) F_AL <- kde(vec_data = x, bw = h_AL) plot(F_AL$grid, F_AL$Estimated_values, type = "l") ## Plotting the distribution function estimate, controling the grid points ## and the kernel function ss <- quantile(x, c(0.05, 0.95)) ## Number of points to be used in the representation of estimated distribution ## function n_pts <- 100 y <- seq(ss[1], ss[2], length.out = n_pts) F_AL <- kde(type_kernel = "e", x, y, h_AL)$Estimated_values ## Plot of the theoretical and estimated distribution functions plot(y, F_AL, type = "l", lty = 2) lines(y, pnorm(y), type = "l", lty = 1) legend(-1, 0.8, c("Real", "Nonparametric"), lty = 1:2)## Compute the plug-in bandwidth for a sample of 100 random N(0,1) data x <- rnorm(100, 0, 1) h_AL <- ALbw(type_kernel = "e", vec_data = x) h_AL ## A quick plot of a distribution function estimate x <- rnorm(1000) h_AL <- ALbw(vec_data = x) F_AL <- kde(vec_data = x, bw = h_AL) plot(F_AL$grid, F_AL$Estimated_values, type = "l") ## Plotting the distribution function estimate, controling the grid points ## and the kernel function ss <- quantile(x, c(0.05, 0.95)) ## Number of points to be used in the representation of estimated distribution ## function n_pts <- 100 y <- seq(ss[1], ss[2], length.out = n_pts) F_AL <- kde(type_kernel = "e", x, y, h_AL)$Estimated_values ## Plot of the theoretical and estimated distribution functions plot(y, F_AL, type = "l", lty = 2) lines(y, pnorm(y), type = "l", lty = 1) legend(-1, 0.8, c("Real", "Nonparametric"), lty = 1:2)
The bandwidth parameter for the distribution function kernel estimator is calculated using the modified cross-validation method of Bowman, Hall and Prvan (1998). Four possible kernel functions can be used: "e" Epanechnikov, "n" Normal, "b" Biweight and "t" Triweight. The cross-validation function involves an integral term, that is calculated using Simpson's rule.
CVbw(type_kernel = "n", vec_data, n_pts = 100, seq_bws = NULL)CVbw(type_kernel = "n", vec_data, n_pts = 100, seq_bws = NULL)
type_kernel |
The kernel function. You can use four types: "e" Epanechnikov, "n" Normal, "b" Biweight and "t" Triweight. The normal kernel is used by default. |
vec_data |
The data sample. |
n_pts |
The number of points used to approximate, by Simpson's rule, the integral term in the cross-validation function. Because this numeric method increases the computing time, you can check different numbers, depending on your sample size. By default, 100 points are used. |
seq_bws |
The sequence of bandwidths in which to compute the cross-validation function. By default, the procedure defines a sequence of 50 points, from the range of the data divided by 200 to the range divided by 2. |
A list consisting of:
seq_bws |
The sequence of bandwidths. |
CV function |
The values of the cross-validation function in the bandwidths grid. |
bw |
A numeric value for the cross-validation bandwidth. |
Graciela Estévez Pérez and Alejandro Quintela del Río
Bowman, A.W., Hall, P. and Prvan,T. (1998), "Cross-validation for the smoothing of distribution functions", Biometrika. 85, 799-808.
Quintela-del-Río, A. and Estévez-Pérez, G. (2012), "Nonparametric kernel distribution function estimation with kerdiest: an R package for bandwidth choice and applications", Journal of Statistical Software, 50(8), 1-21.
## Compute the cross-validation bandwidth for a sample of 100 random N(0,1) ## data x <- rnorm(100, 0, 1) num_bws <- 50 seq_bws <- seq(((max(x) - min(x))/2)/50, (max(x) - min(x))/2,length = num_bws) hCV <- CVbw(type_kernel = "e", vec_data = x, n_pts = 200, seq_bws = seq_bws) hCV ## The CV function is plotted h_CV <- CVbw(vec_data = x) h_CV$bw plot(h_CV$seq_bws, h_CV$CVfunction, type = "l") ## Plotting the distribution function estimate controlling the grid points ## and the kernel function ss <- quantile(x, c(0.05, 0.95)) ## Number of points to be used in the representation of estimated distribution ## function n_pts <- 100 y <- seq(ss[1], ss[2], length.out = n_pts) F_CV <- kde(type_kernel = "e", x, y, h_CV$bw)$Estimated_values ## Plot of the theoretical and estimated distribution functions plot(y, F_CV, type = "l", lty = 2) lines(y, pnorm(y), type = "l", lty = 1) legend(-1, 0.8, c("Real", "Nonparametric"), lty = 1:2)## Compute the cross-validation bandwidth for a sample of 100 random N(0,1) ## data x <- rnorm(100, 0, 1) num_bws <- 50 seq_bws <- seq(((max(x) - min(x))/2)/50, (max(x) - min(x))/2,length = num_bws) hCV <- CVbw(type_kernel = "e", vec_data = x, n_pts = 200, seq_bws = seq_bws) hCV ## The CV function is plotted h_CV <- CVbw(vec_data = x) h_CV$bw plot(h_CV$seq_bws, h_CV$CVfunction, type = "l") ## Plotting the distribution function estimate controlling the grid points ## and the kernel function ss <- quantile(x, c(0.05, 0.95)) ## Number of points to be used in the representation of estimated distribution ## function n_pts <- 100 y <- seq(ss[1], ss[2], length.out = n_pts) F_CV <- kde(type_kernel = "e", x, y, h_CV$bw)$Estimated_values ## Plot of the theoretical and estimated distribution functions plot(y, F_CV, type = "l", lty = 2) lines(y, pnorm(y), type = "l", lty = 1) legend(-1, 0.8, c("Real", "Nonparametric"), lty = 1:2)
Computes the exceedance probability, i.e., the probability that a
specified value c (a magnitude of a seismic event, a flow level...)
will be exceeded in D time units.
ef(type_kernel = "n", vec_data, c, bw = PBbw(type_kernel = "n", vec_data, 2), Dmin = 0, Dmax = 15, size_grid = 50, lambda)ef(type_kernel = "n", vec_data, c, bw = PBbw(type_kernel = "n", vec_data, 2), Dmin = 0, Dmax = 15, size_grid = 50, lambda)
type_kernel |
The kernel function. You can use four types: "e" Epanechnikov, "n" Normal, "b" Biweight and "t" Triweight. The Normal kernel is used by default. |
vec_data |
The data sample (earthquake magnitudes, flow levels, wind speed...) |
c |
The concrete level in which we want to compute the exceedance probability. |
bw |
The bandwidth parameter. The plug-in method of Polansky and Baker (2000) is used by default. |
Dmin |
Minimum value for D time units (years, days... ). Default is 0. |
Dmax |
Maximum value for D time units (years, days... ). Default is 15. |
size_grid |
Length of a grid in which we compute the exceedance function. By default, 50. |
lambda |
The mean activity rate. |
The exceedance function is usually calculated assuming that event
occurrence follows a Poisson process. In this case, the exceedance
function, i.e., the probability of an specific value c is calculated
as
See, e.g., Orlecka-Sikora (2008) or Quintela-del-Rio (2010) for earthquake data applications.
Returns a list containing:
Estimated_values |
Vector containing the estimated function. |
grid |
The used grid. |
bw |
Value of the bandwidth. |
Graciela Estévez Pérez and Alejandro Quintela del Río
Orlecka-Sikora, B. (2008), "Resampling methods for evaluating the uncertainty of the nonparametric magnitude distribution estimation in the probabilistic seismic hazard analysis", Tectonophysics 456, 38-51.
Quintela-del-Rio, A. (2010), "On nonparametric techniques for area-characteristic seismic hazard parameters", Geophysical Journal International, 180, 339-346.
Quintela-del-Río, A. and Estévez-Pérez, G. (2012), "Nonparametric kernel distribution function estimation with kerdiest: an R package for bandwidth choice and applications", Journal of Statistical Software, 50(8), 1-21.
## Working with earthquake data. We use the catalogue of the National ## Geographic Institute (IGN) of Spain and select the data of the Northwest ## of the Iberian Peninsula. data(nwip) require(chron) require(date) ## The data with magnitude greater than 3 are considered mg <- nwip$magnitude[nwip$magnitude > 3.0] x1 <- nwip$year x2 <- nwip$month x3 <- nwip$day ys <- paste(x1, x2, x3) earthquake_date <- as.character(ys) y1s <- as.date(earthquake_date, order = "ymd") ## Computation of the total number of years y2s <- as.POSIXct(y1s) z <- years(y2s) n.years <- length(levels(z)) ## Mean rate of earthquakes per year lambda <- length(mg)/n.years ## Estimation of the exceedance probability for magnitude = 4 est <- ef(vec_data = mg, c = 4, lambda = lambda) plot(est$grid, est$Estimated_values, type = "l", xlab = "Years", ylab = "Probability of Exceedance")## Working with earthquake data. We use the catalogue of the National ## Geographic Institute (IGN) of Spain and select the data of the Northwest ## of the Iberian Peninsula. data(nwip) require(chron) require(date) ## The data with magnitude greater than 3 are considered mg <- nwip$magnitude[nwip$magnitude > 3.0] x1 <- nwip$year x2 <- nwip$month x3 <- nwip$day ys <- paste(x1, x2, x3) earthquake_date <- as.character(ys) y1s <- as.date(earthquake_date, order = "ymd") ## Computation of the total number of years y2s <- as.POSIXct(y1s) z <- years(y2s) n.years <- length(levels(z)) ## Mean rate of earthquakes per year lambda <- length(mg)/n.years ## Estimation of the exceedance probability for magnitude = 4 est <- ef(vec_data = mg, c = 4, lambda = lambda) plot(est$grid, est$Estimated_values, type = "l", xlab = "Years", ylab = "Probability of Exceedance")
Computes the value of the kernel estimator of the distribution function, in a single value or in a grid. Four possibilites for the kernel function are implemented, and the bandwidth parameter can be directly calculated by the plug-in method of Polansky and Baker (2000).
kde(type_kernel = "n", vec_data, y = NULL, bw = PBbw(type_kernel = "n", vec_data, 2))kde(type_kernel = "n", vec_data, y = NULL, bw = PBbw(type_kernel = "n", vec_data, 2))
type_kernel |
The kernel function. You can use four types: "e" Epanechnikov, "n" Normal, "b" Biweight and "t" Triweight. The normal kernel is used by default. |
vec_data |
The data sample. |
y |
The single value or the grid vector where the distribution function is estimated. By default, a grid of 100 equidistant points from the minimum to the maximum of the data sample is selected. |
bw |
The bandwidth used. If it is not provided, the plug-in bandwidth of Polansky and Baker (2000) is computed. |
A list containing:
Estimated_values |
Vector containing the estimated function in the grid values. |
grid |
The used grid. |
bw |
Value of the bandwidth. |
Graciela Estévez Pérez and Alejandro Quintela del Río
Reiss, R.D. (1981), "Nonparametric estimation of smooth distribution functions", Scandinavian Journal of Statistics, 8, 116-119.
Simonoff, J. (1996), "Smoothing Methods in Statistics", Springer, New York.
Polansky, A.M. and Baker, E.R. (2000), "Multistage plug-in bandwidth selection for kernel distribution function estimates", Journal of Statistical Computation and Simulation, 65, 63-80.
Quintela-del-Río, A. and Estévez-Pérez, G. (2012), "Nonparametric kernel distribution function estimation with kerdiest: an R package for bandwidth choice and applications", Journal of Statistical Software, 50(8), 1-21.
## Comparison of three bandwidth selection methods x <- rnorm(100) ## The bandwidths by cross-validation, plug-in of Altman and Leger ## and plug-in of Polansky and Baker are computed using a normal kernel ## and a standard setting of parameters h_CV <- CVbw(vec_data = x)$bw h_CV ## Plug-in of Altman and Leger h_AL <- ALbw(vec_data = x) h_AL ## Plug-in of Polansky and Baker h_PB <- PBbw(vec_data = x) h_PB ## Plot of the three estimates together with the real distribution F_CV <- kde(vec_data = x, bw = h_CV) F_AL <- kde(vec_data = x, bw = h_AL) F_PB <- kde(vec_data = x, bw = h_PB) y <- F_CV$grid Ft <- pnorm(y) plot(y, Ft, ylab = "Distribution", xlab = "Data", type = "l", lty = 1) lines(y, F_CV$Estimated_values, type = "l", lty = 2) lines(y, F_AL$Estimated_values, type = "l", lty = 3) lines(y, F_PB$Estimated_values, type = "l", lty = 4) legend(1, 0.4, c("Real", "F_CV", "F_AL", "F_PB"), lty = 1:4)## Comparison of three bandwidth selection methods x <- rnorm(100) ## The bandwidths by cross-validation, plug-in of Altman and Leger ## and plug-in of Polansky and Baker are computed using a normal kernel ## and a standard setting of parameters h_CV <- CVbw(vec_data = x)$bw h_CV ## Plug-in of Altman and Leger h_AL <- ALbw(vec_data = x) h_AL ## Plug-in of Polansky and Baker h_PB <- PBbw(vec_data = x) h_PB ## Plot of the three estimates together with the real distribution F_CV <- kde(vec_data = x, bw = h_CV) F_AL <- kde(vec_data = x, bw = h_AL) F_PB <- kde(vec_data = x, bw = h_PB) y <- F_CV$grid Ft <- pnorm(y) plot(y, Ft, ylab = "Distribution", xlab = "Data", type = "l", lty = 1) lines(y, F_CV$Estimated_values, type = "l", lty = 2) lines(y, F_AL$Estimated_values, type = "l", lty = 3) lines(y, F_PB$Estimated_values, type = "l", lty = 4) legend(1, 0.4, c("Real", "F_CV", "F_AL", "F_PB"), lty = 1:4)
This functions computes an estimate of the time between two values of a concrete level (size of an earthquake, flow lewel, wind speed...).
mrp(type_kernel = "n", vec_data, y = NULL, bw = PBbw(type_kernel = "n", vec_data, 2), lambda)mrp(type_kernel = "n", vec_data, y = NULL, bw = PBbw(type_kernel = "n", vec_data, 2), lambda)
type_kernel |
The kernel function. You can use four types: "e" Epanechnikov, "n" Normal, "b" Biweight and "t" Triweight. The normal kernel is used by default. |
vec_data |
The data sample (earthquake magnitudes, flow levels, wind speed...). |
y |
A grid or a singular value where the estimator is computed. By default, a grid of 50 values between the minimum and the maximum of the data is computed. |
bw |
The bandwidth parameter. By default, the plug-in method of Polansky and Baker (2000) is used. |
lambda |
The mean activity rate. |
The mean return period is usually calculated assuming that event
occurrence follows a Poisson process. In this case, the mean return
period of events of size c is calculated as
In Orlecka-Sikora (2008) or Quintela-del-Rio (2010) an application to
earthquake data is made. In hydrological applications, if we work with
annual maxima data, the parameter of the Poisson variable is 1 (one maximum
per year). The mean return period between flow levels of value c is
calculated as
See, for instance, Quintela-del-Rio (2011), for an application to data of Salt River near Roosevelt, AZ, USA (saltriver data).
A list containing:
Estimated_values |
Vector containing the estimated function. |
grid |
The used grid. |
bw |
Value of the bandwidth. |
Graciela Estévez Pérez and Alejandro Quintela del Río
Orlecka-Sikora, B. (2008), "Resampling methods for evaluating the uncertainty of the nonparametric magnitude distribution estimation in the probabilistic seismic hazard analysis", Tectonophysics, 456, 38-51.
Quintela-del-Rio, A. (2010), "On nonparametric techniques for area- characteristic seismic hazard parameters", Geophysical Journal International, 180, 339-346.
Quintela-del-Rio, A. (2011), "On bandwidth selection for nonparametric estimation in flood frequency analysis", Hydrological Processes, 25, 671-678.
Quintela-del-Río, A. and Estévez-Pérez, G. (2012), "Nonparametric kernel distribution function estimation with kerdiest: an R package for bandwidth choice and applications", Journal of Statistical Software, 50(8), 1-21.
## Working with earthquake data. We use the catalogue of the National ## Geographic Institute (IGN) of Spain and select the data of the Northwest ## of the Iberian Peninsula. data(nwip) require(chron) require(date) ## Data with magnitude greater than 3 are considered mg <- nwip$magnitude[nwip$magnitude > 3.0] x1 <- nwip$year x2 <- nwip$month x3 <- nwip$day ys <- paste(x1, x2, x3) earthquake_date <- as.character(ys) y1s <- as.date(earthquake_date, order = "ymd") ## Computation of the total number of years y2s <- as.POSIXct(y1s) z <- years(y2s) n.years <- length(levels(z)) ## Mean rate of earthquakes per year lambda <- length(mg)/n.years ## Estimation of the mean return period (in years) between earthquakes of ## the same magnitude est2 <- mrp(vec_data = mg, lambda = lambda) plot(est2$grid, est2$Estimated_values, type = "l", xlab = "Magnitude", ylab = "Mean return period (years)") ## Working with hydrological data: annual peak instantaneous flow of the ## Salt River near Roosevelt, AZ, USA, for 1924-2009. data(saltriver) peak <- saltriver$peakflow year <- saltriver$year plot(year, peak, type = "l", xlab = "Year", ylab = "Annual peak flow") ## Mean return period for the Saltriver data rp <- mrp(type_kernel = "n", vec_data = peak, lambda = 1) plot(rp$grid, rp$Estimated_values, type = "l", xlab = "Flow level", ylab = "Years ", main = "Mean return period")## Working with earthquake data. We use the catalogue of the National ## Geographic Institute (IGN) of Spain and select the data of the Northwest ## of the Iberian Peninsula. data(nwip) require(chron) require(date) ## Data with magnitude greater than 3 are considered mg <- nwip$magnitude[nwip$magnitude > 3.0] x1 <- nwip$year x2 <- nwip$month x3 <- nwip$day ys <- paste(x1, x2, x3) earthquake_date <- as.character(ys) y1s <- as.date(earthquake_date, order = "ymd") ## Computation of the total number of years y2s <- as.POSIXct(y1s) z <- years(y2s) n.years <- length(levels(z)) ## Mean rate of earthquakes per year lambda <- length(mg)/n.years ## Estimation of the mean return period (in years) between earthquakes of ## the same magnitude est2 <- mrp(vec_data = mg, lambda = lambda) plot(est2$grid, est2$Estimated_values, type = "l", xlab = "Magnitude", ylab = "Mean return period (years)") ## Working with hydrological data: annual peak instantaneous flow of the ## Salt River near Roosevelt, AZ, USA, for 1924-2009. data(saltriver) peak <- saltriver$peakflow year <- saltriver$year plot(year, peak, type = "l", xlab = "Year", ylab = "Annual peak flow") ## Mean return period for the Saltriver data rp <- mrp(type_kernel = "n", vec_data = peak, lambda = 1) plot(rp$grid, rp$Estimated_values, type = "l", xlab = "Flow level", ylab = "Years ", main = "Mean return period")
This data set corresponds to the earthquakes occurred in the Northwest of the Iberian Peninsula, from 25/Nov/1924 to 31/Jul/2010. The area is limited by the coordinates 41 N–44 N and 6 W–10 W, involving the autonomic region of Galicia (Spain) and northern Portugal.
data(nwip)data(nwip)
A data frame with 3491 observations on the following 10 variables, corresponding to the earthquake epicenters and time of ocurrence.
dayDay
monthMonth
yearYear
hourHour
minuteMinute
secondsecond
latitudeLatitude in degrees
longitudeLongitude in degrees
depthDepth in km
magnitudeMagnitude in Richter Scale
The data catalogue has been obtained from the National Geographic Institute (IGN) of Spain. The web page is www.ign.es.
Rueda, J., and J. Mezcua (2001), "Sismicidad, sismotectonica y peligrosidad sismica en Galicia", IGN Technical Publication, 35.
Quintela-del-Río, A. and Estévez-Pérez, G. (2012), "Nonparametric kernel distribution function estimation with kerdiest: an R package for bandwidth choice and applications", Journal of Statistical Software, 50(8), 1-21.
The bandwidth parameter for the distribution function kernel estimator is calculated, using the plug-in method of Polansky and Baker (2000). Four possible kernel functions can be used for the kernel estimator: "e" Epanechnikov, "n" Normal, "b" Biweight and "t" Triweight. Because kernel estimators of derivatives of order greater than two are required, only the normal kernel is used in this case.
PBbw(type_kernel = "n", vec_data, num_stage = 2)PBbw(type_kernel = "n", vec_data, num_stage = 2)
type_kernel |
The kernel function used. You can use four types: "e" Epanechnikov, "n" Normal, "b" Biweight and "t" Triweight. The normal kernel is used by default. |
vec_data |
The data sample. |
num_stage |
The number of iterations in the Polansky and Baker method. The default, 2, is usually a good option. Values of 3 or 4 are also allowed. |
A numeric value for the bandwidth parameter.
Graciela Estévez Pérez and Alejandro Quintela del Río
Polansky, A.M. and Baker, E.R. (2000), "Multistage plug-in bandwidth selection for kernel distribution function estimates", Journal of Statistical Computation and Simulation, 65, 63-80.
Quintela-del-Río, A. and Estévez-Pérez, G. (2012), "Nonparametric kernel distribution function estimation with kerdiest: an R package for bandwidth choice and applications", Journal of Statistical Software, 50(8), 1-21.
## Compute the plug-in bandwidth for a sample of 100 random N(0,1) data x <- rnorm(100, 0, 1) h_PB <- PBbw(vec_data = x, num_stage = 4) h_PB ## A quick plot of a distribution function estimate x <- rnorm(1000) h_PB <- PBbw(vec_data = x) F_PB <- kde(vec_data = x, bw = h_PB) plot(F_PB$grid, F_PB$Estimated_values, type = "l") ## Plotting the distribution function estimate controling the grid points and ## the kernel function ss <- quantile(x, c(0.05, 0.95)) ## number of points to be used in the representation of the estimated ## distribution function n_pts <- 100 y <- seq(ss[1], ss[2], length.out = n_pts) F_PB <- kde(type_kernel = "e", x, y, h_PB)$Estimated_values ## Plot of the theoretical and estimated distribution functions plot(y, F_PB, type = "l", lty = 2) lines(y, pnorm(y), type = "l", lty = 1) legend(-1.2, 0.8, c("Real", "Nonparametric"), lty = 1:2)## Compute the plug-in bandwidth for a sample of 100 random N(0,1) data x <- rnorm(100, 0, 1) h_PB <- PBbw(vec_data = x, num_stage = 4) h_PB ## A quick plot of a distribution function estimate x <- rnorm(1000) h_PB <- PBbw(vec_data = x) F_PB <- kde(vec_data = x, bw = h_PB) plot(F_PB$grid, F_PB$Estimated_values, type = "l") ## Plotting the distribution function estimate controling the grid points and ## the kernel function ss <- quantile(x, c(0.05, 0.95)) ## number of points to be used in the representation of the estimated ## distribution function n_pts <- 100 y <- seq(ss[1], ss[2], length.out = n_pts) F_PB <- kde(type_kernel = "e", x, y, h_PB)$Estimated_values ## Plot of the theoretical and estimated distribution functions plot(y, F_PB, type = "l", lty = 2) lines(y, pnorm(y), type = "l", lty = 1) legend(-1.2, 0.8, c("Real", "Nonparametric"), lty = 1:2)
The T-return level is defined as the value of the observed
variable that can be expected to be once exceeded during a T-period of
time. This is computed as the quantile of the distribution corresponding
to the value .
rl(type_kernel = "n", vec_data, time, bw = PBbw(type_kernel = "n", vec_data, 2))rl(type_kernel = "n", vec_data, time, bw = PBbw(type_kernel = "n", vec_data, 2))
type_kernel |
The kernel function used. You can use four types: "e" Epanechnikov, "n" Normal, "b" Biweight and "t" Triweight. The normal kernel is used by default. |
vec_data |
The data sample (earthquake magnitudes, flow levels, wind speeds...). |
time |
A time or a vector of times for T. |
bw |
The bandwidth parameter. By default, he plug-in method of Polansky and Baker (2000) is used. |
In several scientific fields, it is of interest to estimate quantiles
corresponding to a probability of exceedance. E.g., in hydrology, the
T-return level is defined as the value of the observed flow
that can be expected to be once exceeded during a T-period of time; i.e.,
the quantile
It can be directly estimated by
See, e.g., Quintela-del-Rio (2011), for an application to data of Salt River near Roosevelt, AZ, USA.
A single value or an array for the estimated quantiles.
Graciela Estévez Pérez and Alejandro Quintela del Río
Quintela-del-Rio, A. (2011), "On bandwidth selection for nonparametric estimation in flood frequency analysis". Hydrological Processes 25, 671-678.
Quintela-del-Río, A. and Estévez-Pérez, G. (2012), "Nonparametric kernel distribution function estimation with kerdiest: an R package for bandwidth choice and applications", Journal of Statistical Software, 50(8), 1-21.
data(saltriver) peak <- saltriver$peakflow year <- saltriver$year plot(year, peak, type = "l", ylab = "Annual peak flow") ## Calculating the return values for a period from 2 to 100 years times <- seq(2,100, length.out = 100) ret.lev <- rl(vec_data = peak, time = times) plot(times, ret.lev, type = "l", xlab = "Years", ylab = "Flow (cumecs)", main = "Return level Plot")data(saltriver) peak <- saltriver$peakflow year <- saltriver$year plot(year, peak, type = "l", ylab = "Annual peak flow") ## Calculating the return values for a period from 2 to 100 years times <- seq(2,100, length.out = 100) ret.lev <- rl(vec_data = peak, time = times) plot(times, ret.lev, type = "l", xlab = "Years", ylab = "Flow (cumecs)", main = "Return level Plot")
The annual peak instantaneous flow of the Salt River near Roosevelt, AZ, USA, for 1924-2009. Data are in cfs (0.028317 m3/s); water year October-September. The data were examined in several papers related with extreme values in hydrology. Among others, they were analyzed by Anderson and Meerschaert (1998) and Dettinger and Diaz (2000), where they were fitted to GEV and GPD distributions. In Quintela-del-Rio (2011), a nonparametric analysis for this data set is presented.
data(saltriver)data(saltriver)
A data frame with 85 observations on the following 2 variables.
yearYear
peakflowThe annual observed maximum peak flow
US Geological Survey http://water.usgs.gov/nwis/peak.
Anderson, P.L. and Meerschaert, M.M. (1998), "Modeling river flows with heavy tails", Water Resources Research, 34, 2271-2280.
Dettinger, M.D. and Diaz, H.F. (2000), "Global characteristics of stream flow seasonality and variability", Journal of Hydrometeorology, 1, 289-310.
Quintela-del-Rio, A. (2011), "On bandwidth selection for nonparametric estimation in flood frequency analysis". Hydrological Processes 25, 671-678.
Quintela-del-Río, A. and Estévez-Pérez, G. (2012), "Nonparametric kernel distribution function estimation with kerdiest: an R package for bandwidth choice and applications", Journal of Statistical Software, 50(8), 1-21.