Title: | A Fast Permutation-Based Split-Half Reliability Algorithm |
---|---|
Description: | Accurately estimates the reliability of cognitive tasks using a fast and flexible permutation-based split-half reliability algorithm that supports stratified splitting while maintaining equal split sizes. See Kahveci, Bathke, and Blechert (2022) <doi:10.31234/osf.io/ta59r> for details. |
Authors: | Sercan Kahveci [aut, cre] |
Maintainer: | Sercan Kahveci <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.5 |
Built: | 2025-03-05 12:38:22 UTC |
Source: | https://github.com/spiritspeak/rapidsplit |
Create a matrix of bootstrap samples expressed as frequency weights
bootstrapWeights(size, times)
bootstrapWeights(size, times)
size |
Number of values to bootstrap |
times |
Number of bootstraps |
A matrix with bootstrap samples expressed as frequency weights. Each column represents a single bootstrap iteration and each row represents a case.
# Rapidly compute a bootstrapped median to obtain its standard error myweights<-bootstrapWeights(size=50, times=100) meds<-mediansByWeight(x=rnorm(50),weights=myweights) # SE sd(meds)
# Rapidly compute a bootstrapped median to obtain its standard error myweights<-bootstrapWeights(size=50, times=100) meds<-mediansByWeight(x=rnorm(50),weights=myweights) # SE sd(meds)
Fast matrix column aggregators
colMedians(x) colProds(x) colSds(x) colMediansMasked(x, mask) colMeansMasked(x, mask) colSdsMasked(x, mask)
colMedians(x) colProds(x) colSds(x) colMediansMasked(x, mask) colMeansMasked(x, mask) colSdsMasked(x, mask)
x |
A numeric matrix to compute column aggregates of. |
mask |
A logical matrix determining which data points to include in the column-wise aggregations. |
A numeric vector representing values aggregated by column.
Sercan Kahveci
colMeans, mediansByMask, maskAggregators
x <- cbind(x1 = 3, x2 = c(4:1, 2:5)) colMedians(x) colProds(x) colSds(x) mask<-cbind(rep(c(TRUE,FALSE),4), rep(c(TRUE,FALSE),each=4)) colMediansMasked(x,mask) colMeansMasked(x,mask) colSdsMasked(x,mask)
x <- cbind(x1 = 3, x2 = c(4:1, 2:5)) colMedians(x) colProds(x) colSds(x) mask<-cbind(rep(c(TRUE,FALSE),4), rep(c(TRUE,FALSE),each=4)) colMediansMasked(x,mask) colMeansMasked(x,mask) colSdsMasked(x,mask)
Correlate each column of 1 matrix with the same column in another matrix
corByColumns(x, y) corByColumns_mask(x, y, mask) corStatsByColumns(x, y)
corByColumns(x, y) corByColumns_mask(x, y, mask) corStatsByColumns(x, y)
x , y
|
Matrices whose values to correlate by column. |
mask |
Logical matrix marking which data points to include. |
The primary use for these functions is to rapidly compute the correlations between two sets of split-half scores stored in matrix columns.
corStatsByColumns
produces the mean correlation of all column-pairs
using the formula mean(covariances) / sqrt(mean(col1variance) * mean(col2variance))
This method is more accurate than cormean()
and was suggested by
prof. John Christie of Dalhousie University.
corByColumns()
and corByColumns_mask()
return
a numeric vector of correlations of each pair of columns.
corStatsByColumns()
returns a list with named items:
cormean: the aggregated correlation coefficient of all column pairs (see Details)
allcors: the correlations of each column pair
xvar: the column variances of matrix x
yvar: the column variances of matrix y
covar: the covariances of each column pair
Sercan Kahveci
m1<-matrix((1:9)+rnorm(9),ncol=3) m2<-matrix((9:1)+rnorm(9),ncol=3) corByColumns(m1,m2) mask<-1-diag(3) corByColumns_mask(m1,m2,mask) corStatsByColumns(m1,m2)
m1<-matrix((1:9)+rnorm(9),ncol=3) m2<-matrix((9:1)+rnorm(9),ncol=3) corByColumns(m1,m2) mask<-1-diag(3) corByColumns_mask(m1,m2,mask) corStatsByColumns(m1,m2)
This function computes a minimally biased average of correlation values. This is needed because simple averaging of correlations is negatively biased, and the often used z-transformation method of averaging correlations is positively biased. The algorithm was developed by Olkin & Pratt (1958).
cormean( r, n, weights = c("none", "n", "df"), type = c("OP5", "OP2", "OPK"), na.rm = FALSE, incl.trans = FALSE )
cormean( r, n, weights = c("none", "n", "df"), type = c("OP5", "OP2", "OPK"), na.rm = FALSE, incl.trans = FALSE )
r |
A vector containing correlation values/ |
n |
A single value or vector containing sample sizes/ |
weights |
Character. How should the correlations be weighted?
|
type |
Character. Determines which averaging algorithm to use, with "OP5" usually being the most accurate. |
na.rm |
Logical. Should missing values be removed? |
incl.trans |
Logical. Should the transformed correlations be included? |
An average correlation.
Olkin, I., & Pratt, J. (1958). Unbiased estimation of certain correlation coefficients. The Annals of Mathematical Statistics, 29. https://doi.org/10.1214/aoms/1177706717
Shieh, G. (2010). Estimation of the simple correlation coefficient. Behavior Research Methods, 42(4), 906-917. https://doi.org/10.3758/BRM.42.4.906
cormean(c(0,.3,.5),c(30,30,60))
cormean(c(0,.3,.5),c(30,30,60))
Helper functions to compute important statistics from correlation coefficients.
r2z(r) z2r(z) r2t(r, n) t2r(t, n) r2p(r, n) rconfint(r, n, alpha = 0.05) compcorr(r1, r2, n1, n2) ## S3 method for class 'compcorr' print(x, ...)
r2z(r) z2r(z) r2t(r, n) t2r(t, n) r2p(r, n) rconfint(r, n, alpha = 0.05) compcorr(r1, r2, n1, n2) ## S3 method for class 'compcorr' print(x, ...)
r , r1 , r2
|
Correlation values. |
z |
Z-scores. |
n , n1 , n2
|
Sample sizes. |
t |
t-scores. |
alpha |
The significance level to use. |
x |
A |
... |
Ignored. |
For r2z()
, z2r
, r2t
, t2r
, and r2p
,
a numeric vector with the requested transformation applied.
For rconfint()
, a numeric vector with two values representing
the lower and upper confidence intervals of the correlation coefficient.
For compcorr()
, a compcorr
object containing
a z and p value for the requested comparison,
which can be printed with print.compcorr()
.
r2z()
: Converts correlation coefficients to z-scores.
z2r()
: Converts z-scores to correlation coefficients.
r2t()
: Converts correlation coefficients to t-scores.
t2r()
: Converts t-scores to correlation coefficients.
r2p()
: Computes the two-sided p-value for a given correlation.
rconfint()
: Computes confidence intervals for one or multiple correlation coefficients.
compcorr()
: Computes the significance of the difference between two correlation coefficients.
print(compcorr)
: Computes the significance of the difference between two correlation coefficients.
Sercan Kahveci
z <- r2z(.5) r <- z2r(z) t<-r2t(r,30) r<-t2r(t,30) r2p(r,30) print(rconfint(r,30)) print(compcorr(.5,.7,20,20))
z <- r2z(.5) r <- z2r(z) t<-r2t(r,30) r<-t2r(t,30) r2p(r,30) print(rconfint(r,30)) print(compcorr(.5,.7,20,20))
Different masks (columns of a logical matrix) are applied to the same input vector,
and outliers in each resulting subvector are marked with FALSE
in the mask.
excludeOutliersByMask(x, mask, sdlim = 3)
excludeOutliersByMask(x, mask, sdlim = 3)
x |
Vector to exclude outliers from. |
mask |
A logical matrix determining which data points to include and which not to. |
sdlim |
Standard deviation limit to apply; values beyond are classified as outliers and masked. |
An updated mask.
x<-rnorm(50) x[1]<-100 x[2]<-50 mask<-matrix(TRUE,ncol=3,nrow=50) mask[1,2]<-FALSE mask[2,3]<-FALSE excludeOutliersByMask(x,mask)
x<-rnorm(50) x[1]<-100 x[2]<-50 mask<-matrix(TRUE,ncol=3,nrow=50) mask[1,2]<-FALSE mask[2,3]<-FALSE excludeOutliersByMask(x,mask)
This data originates from an approach-avoidance task examining approach bias towards food. Participants responded to the stimulus category (food or object) by pulling or pushing a joystick. Instructions were flipped from one block to the next.
data(foodAAT, package="rapidsplithalf")
data(foodAAT, package="rapidsplithalf")
An object of class "data.frame"
.
subjectid: Participant ID.
stimid: Stimulus ID.
is_pull: Whether the trial required an approach response (1) or an avoid response (0).
is_target: Whether the trial featured a food stimulus (1) or an object stimulus (0).
error: Whether the response was incorrect (1) or correct (0).
RT: The response initiation time.
FullRT: The time from stimulus onset to response completion.
trialnum: The trial number.
blocknum: The block number.
palatability: The participant's palatability rating for the stimulus (foods only).
valence: The participant's valence rating for the stimulus.
FCQS_2_craving: The participant's FCQS state food craving score at time of testing.
FCQS_2_hunger: The participant's FCQS state hunger score at time of testing.
doi:10.1016/j.appet.2018.01.032
Lender, A., Meule, A., Rinck, M., Brockmeyer, T., & Blechert, J. (2018). Measurement of food-related approach–avoidance biases: Larger biases when food stimuli are task relevant. Appetite, 125, 42–47. doi:10.1016/j.appet.2018.01.032
Generates split-half indices that can be stratified by multiple subgroup variables while guaranteeing near-equal numbers of trials in both halves.
generateSplits(data, subsetvars, stratvars = NULL, splits, verbose = TRUE)
generateSplits(data, subsetvars, stratvars = NULL, splits, verbose = TRUE)
data |
A dataset to generate split-halves from. |
subsetvars |
Variables identifying subgroups that must be individually split into equally sized halves, e.g. participant number and experimental condition. |
stratvars |
Variables identifying subgroups that are nested within the subsetvars, and must be split as fairly as possible, while preserving the equal size of the two halves of each subset identified by the subsetvars, e.g. stimulus ID. |
splits |
How many splits to generate. |
verbose |
Display progress bar? |
A logical matrix
in which each row represents a row of the input dataset,
and each column represents a single split.
data(foodAAT) mysplits<-generateSplits(data=foodAAT, subsetvars=c("subjectid","is_pull","is_target"), stratvars="stimid", splits=1) half1<-foodAAT[ mysplits[,1],] half2<-foodAAT[!mysplits[,1],]
data(foodAAT) mysplits<-generateSplits(data=foodAAT, subsetvars=c("subjectid","is_pull","is_target"), stratvars="stimid", splits=1) half1<-foodAAT[ mysplits[,1],] half2<-foodAAT[!mysplits[,1],]
Methods to aggregate the same vector with different masks or frequency weights.
Useful for fast bootstrapping or split-half scoring.
A single aggregate value of x
is computed for each column of the mask or weight matrix.
mediansByMask(x, mask) meansByMask(x, mask) sdsByMask(x, mask) mediansByWeight(x, weights) meansByWeight(x, weights) sdsByWeight(x, weights)
mediansByMask(x, mask) meansByMask(x, mask) sdsByMask(x, mask) mediansByWeight(x, weights) meansByWeight(x, weights) sdsByWeight(x, weights)
x |
A vector to aggregate over with different masks or weights. |
mask |
Logical matrix where each column represents a separate vector of masks
to aggregate |
weights |
Integer matrix where each column represents frequency weights to weight the aggregation by. |
a vector with each value representing an aggregate of the same single input vector but with different masks or frequency weights applied.
Sercan Kahveci
colMedians, colAggregators, generateSplits
# Demonstration of mediansByMask() x<-1:6 mask<-rbind(c(TRUE,FALSE,FALSE), c(TRUE,FALSE,FALSE), c(FALSE,TRUE,FALSE), c(FALSE,TRUE,FALSE), c(FALSE,FALSE,TRUE), c(FALSE,FALSE,TRUE)) mediansByMask(x,mask) # Compute split-halves for a single # participant, stratified by stimulus data(foodAAT) currdata<-foodAAT[foodAAT$subjectid==3,] currdata$stratfactor<- interaction(currdata$is_pull, currdata$is_target, currdata$stimid) currdata<-currdata[order(currdata$stratfactor),] groupsizes<- rle(as.character(currdata$stratfactor))$lengths mysplits<- stratifiedItersplits(splits=1000, groupsizes=groupsizes) # Median for half 1 mediansByMask(currdata$RT,mysplits==1) #How to use meansByMask() meansByMask(x,mask) sd(meansByMask(currdata$RT,mysplits==1)) # How to use sdsByMask() to compute # mask-based D-scores meansByMask(currdata$RT,mysplits==1) / sdsByMask(currdata$RT,mysplits==1) # Compute the bootstrapped # standard error of a median weights<- bootstrapWeights(size=nrow(currdata), times=1000) bootmeds<-mediansByWeight(currdata$RT,weights) sd(bootmeds) # bootstrapped standard error # Compute the bootstrapped # standard error of a mean bootmeans<-meansByWeight(currdata$RT,weights) sd(bootmeans) # bootstrapped standard error # exact standard error for comparison sd(currdata$RT)/sqrt(length(currdata$RT)) # Use sdsByWeight to compute bootstrapped D-scores bootsds<-sdsByWeight(currdata$RT,weights) # bootstrapped standard error of D-score sd(bootmeans/bootsds)
# Demonstration of mediansByMask() x<-1:6 mask<-rbind(c(TRUE,FALSE,FALSE), c(TRUE,FALSE,FALSE), c(FALSE,TRUE,FALSE), c(FALSE,TRUE,FALSE), c(FALSE,FALSE,TRUE), c(FALSE,FALSE,TRUE)) mediansByMask(x,mask) # Compute split-halves for a single # participant, stratified by stimulus data(foodAAT) currdata<-foodAAT[foodAAT$subjectid==3,] currdata$stratfactor<- interaction(currdata$is_pull, currdata$is_target, currdata$stimid) currdata<-currdata[order(currdata$stratfactor),] groupsizes<- rle(as.character(currdata$stratfactor))$lengths mysplits<- stratifiedItersplits(splits=1000, groupsizes=groupsizes) # Median for half 1 mediansByMask(currdata$RT,mysplits==1) #How to use meansByMask() meansByMask(x,mask) sd(meansByMask(currdata$RT,mysplits==1)) # How to use sdsByMask() to compute # mask-based D-scores meansByMask(currdata$RT,mysplits==1) / sdsByMask(currdata$RT,mysplits==1) # Compute the bootstrapped # standard error of a median weights<- bootstrapWeights(size=nrow(currdata), times=1000) bootmeds<-mediansByWeight(currdata$RT,weights) sd(bootmeds) # bootstrapped standard error # Compute the bootstrapped # standard error of a mean bootmeans<-meansByWeight(currdata$RT,weights) sd(bootmeans) # bootstrapped standard error # exact standard error for comparison sd(currdata$RT)/sqrt(length(currdata$RT)) # Use sdsByWeight to compute bootstrapped D-scores bootsds<-sdsByWeight(currdata$RT,weights) # bootstrapped standard error of D-score sd(bootmeans/bootsds)
Generate or update a mask matrix based on outlyingness of values in each column.
maskOutliers(x, sdlim = 3) maskOutliersMasked(x, mask, sdlim = 3)
maskOutliers(x, sdlim = 3) maskOutliersMasked(x, mask, sdlim = 3)
x |
Matrix in which to mark SD-based outliers by column. |
sdlim |
Standard deviation limit to apply; values beyond are classified as outliers and masked. |
mask |
A logical matrix determining which data points to include and which not to. |
A logical matrix with outliers (and previously masked values) marked as FALSE
.
# Generate data with outliers testmat<-matrix(rnorm(100),ncol=2) testmat[1,]<-100 testmat[2,]<-50 # Detect outliers maskOutliers(testmat) # Generate a mask testmask<-matrix(TRUE,ncol=2,nrow=50) testmask[1,1]<-FALSE # Detect outliers with pre-existing mask maskOutliersMasked(x=testmat, mask=testmask, sdlim = 3)
# Generate data with outliers testmat<-matrix(rnorm(100),ncol=2) testmat[1,]<-100 testmat[2,]<-50 # Detect outliers maskOutliers(testmat) # Generate a mask testmask<-matrix(TRUE,ncol=2,nrow=50) testmask[1,1]<-FALSE # Detect outliers with pre-existing mask maskOutliersMasked(x=testmat, mask=testmask, sdlim = 3)
This data originates from the publicly available implicit association test (IAT) on racial prejudice hosted by Project Implicit. 200 participants were randomly sampled from the full trial-level data available for participants from 2002 to 2022. We included only those IAT blocks relevant to scoring (3,4,6,7) and only individuals with full data.
data(raceIAT, package="rapidsplithalf")
data(raceIAT, package="rapidsplithalf")
An object of class "data.frame"
.
session_id: The session id, proxy for participant number.
task_name: Subtype of IAT used.
block_number: IAT block number.
block_pairing_definition: Stimulus pairing displayed in block.
block_trial_number: Trial number within block.
stimulus: Stimulus name.
required_response: The response required from the participant.
latency: Participant's response latency.
error: Whether the response was wrong (TRUE
).
trial_number: Experimentwise trial number.
stimcat: The stimulus category.
respcat: Category of the required response.
blocktype: Either practice block or full IAT block.
congruent: Whether the block was congruent with anti-black bias (TRUE
) or not.
latency2: Response latencies with those for incorrect responses replaced by the block mean plus a penalty.
Xu, K., Nosek, B., & Greenwald, A. G. (2014). Psychology data from the race implicit association test on the project implicit demo website. Journal of open psychology data, 2(1), e3-e3. doi:10.5334/jopd.ac
A very fast algorithm for computing stratified permutation-based split-half reliability.
rapidsplit( data, subjvar, diffvars = NULL, stratvars = NULL, subscorevar = NULL, aggvar, splits = 6000, aggfunc = c("means", "medians"), errorhandling = list(type = c("none", "fixedpenalty"), errorvar = NULL, fixedpenalty = 600, blockvar = NULL), standardize = FALSE, include.scores = TRUE, verbose = TRUE, check = TRUE ) ## S3 method for class 'rapidsplit' print(x, ...) ## S3 method for class 'rapidsplit' plot( x, type = c("average", "minimum", "maximum", "random", "all"), show.labels = TRUE, ... ) rapidsplit.chunks( data, subjvar, diffvars = NULL, stratvars = NULL, subscorevar = NULL, aggvar, splits = 6000, aggfunc = c("means", "medians"), errorhandling = list(type = c("none", "fixedpenalty"), errorvar = NULL, fixedpenalty = 600, blockvar = NULL), standardize = FALSE, include.scores = TRUE, verbose = TRUE, check = TRUE, split.chunksize = 10000, sample.chunksize = 200 )
rapidsplit( data, subjvar, diffvars = NULL, stratvars = NULL, subscorevar = NULL, aggvar, splits = 6000, aggfunc = c("means", "medians"), errorhandling = list(type = c("none", "fixedpenalty"), errorvar = NULL, fixedpenalty = 600, blockvar = NULL), standardize = FALSE, include.scores = TRUE, verbose = TRUE, check = TRUE ) ## S3 method for class 'rapidsplit' print(x, ...) ## S3 method for class 'rapidsplit' plot( x, type = c("average", "minimum", "maximum", "random", "all"), show.labels = TRUE, ... ) rapidsplit.chunks( data, subjvar, diffvars = NULL, stratvars = NULL, subscorevar = NULL, aggvar, splits = 6000, aggfunc = c("means", "medians"), errorhandling = list(type = c("none", "fixedpenalty"), errorvar = NULL, fixedpenalty = 600, blockvar = NULL), standardize = FALSE, include.scores = TRUE, verbose = TRUE, check = TRUE, split.chunksize = 10000, sample.chunksize = 200 )
data |
Dataset, a |
subjvar |
Subject ID variable name, a |
diffvars |
Names of variables that determine which conditions
need to be subtracted from each other, |
stratvars |
Additional variables that the splits should
be stratified by; a |
subscorevar |
A |
aggvar |
Name of variable whose values to aggregate, a |
splits |
Number of split-halves to average, an |
aggfunc |
The function by which to aggregate the variable
defined in |
errorhandling |
A list with 4 named items, to be used to replace error trials
with the block mean of correct responses plus a fixed penalty, as in the IAT D-score.
The 4 items are |
standardize |
Whether to divide by scores by the subject's SD; a |
include.scores |
Include all individual split-half scores? |
verbose |
Display progress bars? Defaults to |
check |
Check input for possible problems? |
x |
|
... |
Ignored. |
type |
Character argument indicating what should be plotted.
By default, this plots the random split whose correlation is closest to the average.
However, this can also plot the random split with
the |
show.labels |
Should participant IDs be shown above their points in the scatterplot?
Defaults to |
split.chunksize , sample.chunksize
|
Number of chunks to divide the splits and sample in for more memory-efficient computation. This has no bearing on the result. |
The order of operations (with optional steps between brackets) is:
Splitting
(Replacing error trials within block within split)
Computing aggregates per condition (per subscore) per person
Subtracting conditions from each other
(Dividing the resulting (sub)score by the SD of the data used to compute that (sub)score)
(Averaging subscores together into a single score per person)
Computing the covariances of scores from one half with scores from the other half for every split
Computing the variances of scores within each half for every split
Computing the average split-half correlation with the average variances and covariance
across all splits, using corStatsByColumns()
Applying the Spearman-Brown formula to the absolute correlation
using spearmanBrown()
, and restoring the original sign after
cormean()
was used to aggregate correlations in previous versions
of this package & in the associated manuscript, but the method based on
(co)variance averaging was found to be more accurate. This was suggested by
prof. John Christie of Dalhousie University.
A list
containing
r
: the averaged reliability.
ci
: the 95% confidence intervals.
allcors
: a vector with the reliability of each iteration.
nobs
: the number of participants.
rcomponents
: a list containing the mean variance of the scores of both halves,
as well as their mean covariance.
scores
: the individual participants scores in each split-half,
contained in a list with two matrices (Only included if requested with include.scores
).
rapidsplit()
function can use a lot of memory in one go.
If you are computing the reliability of a large dataset or you have little RAM,
it may pay off to use rapidsplit.chunks()
instead.
It is currently unclear it is better to pre-process your data before or after splitting it.
If you are computing the IAT D-score,
you can therefore use errorhandling
and standardize
to perform these two actions
after splitting, or you can process your data before splitting and forgo these two options.
Sercan Kahveci
Kahveci, S., Bathke, A.C. & Blechert, J. (2024) Reaction-time task reliability is more accurately computed with permutation-based split-half correlations than with Cronbach’s alpha. Psychonomic Bulletin and Review. doi:10.3758/s13423-024-02597-y
data(foodAAT) # Reliability of the double difference score: # [RT(push food)-RT(pull food)] - [RT(push object)-RT(pull object)] frel<-rapidsplit(data=foodAAT, subjvar="subjectid", diffvars=c("is_pull","is_target"), stratvars="stimid", aggvar="RT", splits=100) print(frel) plot(frel,type="all") # Compute a single random split-half reliability of the error rate rapidsplit(data=foodAAT, subjvar="subjectid", aggvar="error", splits=1, aggfunc="means") # Compute the reliability of an IAT D-score data(raceIAT) rapidsplit(data=raceIAT, subjvar="session_id", diffvars="congruent", subscorevar="blocktype", aggvar="latency", errorhandling=list(type="fixedpenalty",errorvar="error", fixedpenalty=600,blockvar="block_number"), splits=100, standardize=TRUE) # Unstratified reliability of the median RT # computed in chunks of 20 participants at a time # to handle large samples rapidsplit.chunks(data=raceIAT, subjvar="session_id", aggvar="latency", splits=200, aggfunc="medians", sample.chunksize=10) # Compute the reliability of Tukey's trimean of the RT # in subsets of 100 splits and 20 participants per run trimean<-function(x){ sum(quantile(x,c(.25,.5,.75))*c(1,2,1))/4 } rapidsplit.chunks(data=foodAAT, subjvar="subjectid", aggvar="RT", splits=400, aggfunc=trimean, split.chunksize=150, sample.chunksize=20)
data(foodAAT) # Reliability of the double difference score: # [RT(push food)-RT(pull food)] - [RT(push object)-RT(pull object)] frel<-rapidsplit(data=foodAAT, subjvar="subjectid", diffvars=c("is_pull","is_target"), stratvars="stimid", aggvar="RT", splits=100) print(frel) plot(frel,type="all") # Compute a single random split-half reliability of the error rate rapidsplit(data=foodAAT, subjvar="subjectid", aggvar="error", splits=1, aggfunc="means") # Compute the reliability of an IAT D-score data(raceIAT) rapidsplit(data=raceIAT, subjvar="session_id", diffvars="congruent", subscorevar="blocktype", aggvar="latency", errorhandling=list(type="fixedpenalty",errorvar="error", fixedpenalty=600,blockvar="block_number"), splits=100, standardize=TRUE) # Unstratified reliability of the median RT # computed in chunks of 20 participants at a time # to handle large samples rapidsplit.chunks(data=raceIAT, subjvar="session_id", aggvar="latency", splits=200, aggfunc="medians", sample.chunksize=10) # Compute the reliability of Tukey's trimean of the RT # in subsets of 100 splits and 20 participants per run trimean<-function(x){ sum(quantile(x,c(.25,.5,.75))*c(1,2,1))/4 } rapidsplit.chunks(data=foodAAT, subjvar="subjectid", aggvar="RT", splits=400, aggfunc=trimean, split.chunksize=150, sample.chunksize=20)
To learn more about rapidsplithalf, view the introductory vignette:
vignette("rapidsplithalf",package="rapidsplithalf")
Sercan Kahveci
Spearman-Brown correction Perform a Spearman-Brown correction on the provided correlation score.
spearmanBrown(r, ntests = 2, fix.negative = c("mirror", "nullify", "none"))
spearmanBrown(r, ntests = 2, fix.negative = c("mirror", "nullify", "none"))
r |
To-be-corrected correlation coefficient. |
ntests |
An integer indicating how many times larger the full test is, for which the corrected correlation coefficient is being computed. |
fix.negative |
How will negative input values be dealt with?
|
When ntests=2
, the formula will compute what the correlation coefficient would be
if the test were twice as long.
Spearman-Brown corrected correlation coefficients.
Sercan Kahveci
spearmanBrown(.5)
spearmanBrown(.5)
Generate stratified splits for a single participant
stratifiedItersplits(splits, groupsizes)
stratifiedItersplits(splits, groupsizes)
splits |
Number of iterations. |
groupsizes |
An integer vector of how many RTs per group need to be stratified. |
This equally splits what can be equally split within groups. Then it randomly splits all the leftovers to ensure near-equal split sizes. This function is moreso used internally, but you can use it if you know what you are doing.
A matrix with zeroes and ones. Each column is a random split.
# We will create splits stratified by stimulus for a single participant data(foodAAT) currdata<-foodAAT[foodAAT$subjectid==3,] currdata$stratfactor<-interaction(currdata$is_pull,currdata$is_target,currdata$stimid) currdata<-currdata[order(currdata$stratfactor),] groupsizes<-rle(as.character(currdata$stratfactor))$lengths mysplits<-stratifiedItersplits(splits=1000,groupsizes=groupsizes) # Now the data can be split with the values from any column. half1<-currdata[mysplits[,1]==1,] half2<-currdata[mysplits[,1]==0,] # Or the split objects can be used as masks for the aggregation functions in this package meansByMask(x=currdata$RT,mask=mysplits==1)
# We will create splits stratified by stimulus for a single participant data(foodAAT) currdata<-foodAAT[foodAAT$subjectid==3,] currdata$stratfactor<-interaction(currdata$is_pull,currdata$is_target,currdata$stimid) currdata<-currdata[order(currdata$stratfactor),] groupsizes<-rle(as.character(currdata$stratfactor))$lengths mysplits<-stratifiedItersplits(splits=1000,groupsizes=groupsizes) # Now the data can be split with the values from any column. half1<-currdata[mysplits[,1]==1,] half2<-currdata[mysplits[,1]==0,] # Or the split objects can be used as masks for the aggregation functions in this package meansByMask(x=currdata$RT,mask=mysplits==1)