---
title: "Cate and Nelson (1971)"
author: Adrian Correndo
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Cate and Nelson (1971)}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.width=6,
fig.height=4
)
```
## Description
The `soiltestcorr`-package also allows users to implement the quadrants analysis approach, also known as the Cate-Nelson analysis. This tutorial is intended to show how to deploy the `cate_nelson_1971()` function for estimating critical soil test values based on Cate and Nelson (1971). This approach is also known as the "statistical" version of the Cate-Nelson analysis. The first step of this alternative version is to estimate the CSTV (x-axis) as the minimum stv that minimizes the residual sum of squares when dividing data points in two classes (lower or greater than the CSTV) without using an arbitrary ry value. This refined version does not constrains the model performance (measured with the coefficient of determination -R2-) but the user has no control on the ry level for the CSTV estimation.
## General Instructions
i. Load your dataframe with soil test value (stv) and relative yield (ry) data.
ii. Specify the following arguments into the function -cate_nelson_1971()-:
(a). `data` (optional),
(b). `stv` (soil test value) and `ry` (relative yield) columns or vectors,
(c). `plot` TRUE (produces a ggplot as main output) or FALSE (DEFAULT -no plot-, only produces a list or data.frame),
(d). `tidy` TRUE (produces a data.frame with results) or FALSE (store results as list),
iii. Run and check results.
iv. Adjust plot as desired.
# Tutorial
```{r setup}
library(soiltestcorr)
```
Suggested packages
```{r warning=FALSE, message=FALSE}
# Install if needed
library(ggplot2) # Plots
library(dplyr) # Data wrangling
library(tidyr) # Data wrangling
library(utils) # Data wrangling
library(purrr) # Mapping
```
This is a basic example using three different datasets:
## Load datasets
```{r}
# Example 1 dataset
# Fake dataset manually created
data_1 <- data.frame("RY" = c(65,80,85,88,90,94,93,96,97,95,98,100,99,99,100),
"STV" = c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15))
# Example 2. Native fake dataset from soiltestcorr package
data_2 <- soiltestcorr::data_test
# Example 3. Native dataset from soiltestcorr package, Freitas et al. (1966), used by Cate & Nelson (1971)
data_3 <- soiltestcorr::freitas1966
```
# Fit cate_nelson_1971()
## 1. Individual fits
RY target = 90%, replace with your desired value
### 1.1. `tidy` = FALSE
It returns a LIST (more efficient for multiple fits at once)
```{r warning=TRUE, message=TRUE}
# Using dataframe argument, tidy = FALSE -> return a LIST
fit_1_tidy_false <-
soiltestcorr::cate_nelson_1971(data = data_1,
ry = RY,
stv = STV,
tidy = FALSE)
utils::head(fit_1_tidy_false)
```
### 1.2. `tidy` = TRUE
It returns a data.frame (more organized results)
```{r warning=TRUE, message=TRUE}
# Using dataframe argument, tidy = FALSE -> return a LIST
fit_1_tidy_true <-
soiltestcorr::cate_nelson_1971(data = data_1,
ry = RY,
stv = STV,
tidy = TRUE)
utils::head(fit_1_tidy_true)
```
### 1.3. Alternative using the vectors
You can call `stv` and `ry` vectors using the `$`.
The `tidy` argument still applies for controlling the output type
```{r warning=TRUE, message=TRUE}
fit_1_vectors_list <-
soiltestcorr::cate_nelson_1971(ry = data_1$RY,
stv = data_1$STV,
tidy = FALSE)
fit_1_vectors_tidy <-
soiltestcorr::cate_nelson_1971(ry = data_1$RY,
stv = data_1$STV,
tidy = TRUE)
```
### 1.4. Data 2. Test dataset
```{r warning=TRUE, message=TRUE}
fit_2 <-
soiltestcorr::cate_nelson_1971(data = data_2,
ry = RY,
stv = STV,
tidy = TRUE)
utils::head(fit_2)
```
### 1.5. Data 3. Freitas et al. 1966
```{r warning=TRUE, message=TRUE}
fit_3 <-
soiltestcorr::cate_nelson_1971(data = data_3,
ry = RY,
stv = STK,
tidy = TRUE)
utils::head(fit_3)
```
## 2. Multiple fits at once
### 2.1. Using map
#### Create nested data
Note: the `stv` column needs to have the same name for all datasets
```{r warning=T, message=F}
#
data.all <- dplyr::bind_rows(data_1, data_2,
data_3 %>% dplyr::rename(STV = STK),
.id = "id") %>%
tidyr::nest(data = c("STV", "RY"))
```
#### Fit
```{r warning=T, message=F}
# Run multiple examples at once with map()
fit_multiple_map <- data.all %>%
dplyr::mutate(mod_alcc = purrr::map(data,
~ soiltestcorr::cate_nelson_1971(ry = .$RY,
stv = .$STV,
tidy = TRUE)))
utils::head(fit_multiple_map)
```
### 2.2. Using group_map
Alternatively, with group_map, we do not require nested data.
However, it requires to dplyr::bind_rows and add an `id` column specifying the name of each dataset.
This option return models as lists objects.
```{r warning=T, message=F}
fit_multiple_group_map <-
data.all %>% tidyr::unnest(data) %>%
#dplyr::bind_rows(data_1, data_2, .id = "id") %>%
dplyr::group_by(id) %>%
dplyr::group_modify(~ soiltestcorr::cate_nelson_1971(data = .,
ry = RY,
stv = STV,
tidy = TRUE))
utils::head(fit_multiple_group_map)
```
## 3. Bootstrapping
A suitable alternative for obtaining confidence intervals for parameters or derived quantities is bootstrapping.
Bootstrapping is a resampling technique (with replacement) that draws samples from the original data with the same size. If you have groups within your data, you can specify grouping variables as arguments in order to maintain, within each resample, the same proportion of observations than in the original dataset.
This function returns a table with as many rows as the resampling size (n) containing the results for each resample.
```{r}
boot_cn71 <- soiltestcorr::boot_cn_1971(data = data_1,
ry = RY, stv = STV, n = 99)
boot_cn71 %>% dplyr::slice_head(., n=5)
# CSTV Confidence Interval
quantile(boot_cn71$CSTV, probs = c(0.025, 0.5, 0.975))
# Plot
boot_cn71 %>%
ggplot2::ggplot(aes(x = CSTV))+
geom_histogram(color = "grey25", fill = "#9de0bf", bins = 10)
```
## 4. Plots
### 4.1. Calibration Curve
We can generate a ggplot with the same mod_alcc() function.
We just need to specify the argument `plot = TRUE`.
```{r warning=F, message=F}
soiltestcorr::cate_nelson_1971(data = data_1,
ry = RY,
stv = STV,
plot = TRUE)
soiltestcorr::cate_nelson_1971(data = data_2,
ry = RY,
stv = STV,
plot = TRUE)
soiltestcorr::cate_nelson_1971(data = data_3,
ry = RY,
stv = STK,
plot = TRUE)
```
References
*Cate, R.B. Jr., and Nelson, L.A., 1971. A simple statistical procedure for partitioning soil test correlation data into two classes. Soil Sci. Soc. Am. Proc. 35:658-659 *