- Added the ability to generate data from an empirical distribution by using new functions
genDataDensity
andaddDataDensity
. - The binary and binomial distributions can now accommodate a "log" link.
addCorGen
no longer requires all clusters to have the same size when using the rho and corstr arguments to define the correlation.- Fixed an issue that prevented functions defined outside the global namespace from being referenced in
defData
.
- Added the option to specify a customized distribution in
defData
anddefDataAdd
by specifyingdist = "custom"
. *addPeriods
now includes a new argumentperiodVec
that allows users to designate specific measurement time periods using vector.
- Function
logisticCoefs
now correctly handles double dot notation.
trtAssign
withratio=NULL
used to produce 0-index values but 1-indexed values ifratio
was set. This was adjusted so now both versions produce 0-indexed values. This is a potentially breaking change for existing scripts that use the generated treatment values while assuming the old behavior (e.g. using hard coded values to filter).
- Function
logisticCoefs
determines the intercept and treatment/exposure parameter for a data generating process (based on a logistic regression model) that has a specific target population prevalence of a binary outcome, and an option to target a risk ratio, risk difference, or AUC.
- Data generation speed has been improved for very large data sets with many variables.
- Added double-dot (dynamic) functionality to defSurv. Users can now specify double-dot variables in scale, shape, and formula parameters.
- It is possible to generate variable cluster sizes using the clusterSize
distribution in
defData
anddefDataAdd
. - Can set x-axis limits in plot generated by
survParamPlot
- Improved the random effect variance generation for function
iccRE
under the Poisson distribution. The current approach is based on the 2013 paper by Nakagawa & Schielzeth titled "A general and simple method for obtaining$R^2$ from generalized linear mixed-effects models."
- Modified internal function to speed up beta distribution data generation.
- Added function
blockExchangekMat
andblockDecayMat
. Users can now generate correlation matrices that can accommodate clustered observations over time where the within-cluster correlation in the same time period can be different from the within-cluster correlation across time periods. - Updated function
genCorMat
to allow generation of cluster-specific correlation matrices in case one wants to induce variability in correlation across clusters.
- Overhauled function
addCorGen
to make it more flexible. It can now handle cluster-dependent data, and not just time-dependent data. In addition, performance has been dramatically improved.
- Fixed bug in
genSpline
.
- Fixed bug in
trtAssign
.
- Updated
genFormula
to allow for 'double dot' functionality. - Added new functions
genSynthetic
andaddSynthetic
. Allows users to sample records with replacement from an existing data table. - Added argument 'startProb' to
genMarkov
. Allows user to set probability distribution of start state. - Added utility functions
survGetParams
andsurvParamPlot
to aid users in identifying parameters that can be used to generate desired distributions of time to event data. - Major updates to functions
defSurv
andgenSurv
. It is now possible to generate survival outcomes with hazard functions that change over time. In addition, competing risk outcomes can be explicitly generated.
- genOrdCat now supports non-proportional odds.
- Added functions defRepeat and defRepeatAdd to facilitate the definition of multiple variables that share identical data definitions.
- Fixed bug resulting from rounding error when specifying probabilities for 'categorical' distributions.
- You can now use non-scalar variables with double-dot notation. See the Dynamic Data Definition Vignette.
- The 'categorical' distribution now supports the variance parameter to introduce categories other than 1...n.
- You can now use [trtAssign()] as a distribution with [defData()].
- Added CITATION
- genData now warns that a set 'id' parameter will override previously defined 'id' names from the data definition.
- genData now handles NULL as 'id' value in data definitions (e.g. when definitions are not created via defData etc.) by defaulting to 'id'.
- Fix an error in genOrdCat when only a single adjustment variable is given but more than one new category will be created.
- Fix a bug where ..variables did not work within a function using the
dist="beta"
.
- Improve documentation and vignettes.
- Add 'backports' for compatibility with R < 4.0
- Fix a bug on R < 4.0 in genOrdCat
- Current version is now only compatible with R version >= 3.3.0
- Moved genCorOrdCat's functionality into genOrdCat. genCorOrdCat is now deprecated.
- Renamed catProbs to genCatFormula for naming consistency. catProbs is now deprecated.
- Introduced a new system for formula definitions and completely reworked the underlying code. See vignette "Dynamic Data Definition".
- The new function genMixFormula generates mixture formulas from different inputs.
- Some simstudy functions now produce custom errors and warnings. Eventually all conditions will be replaced by the new system to make error handling easier for the user.
- Added new vignettes.
- Created documentation pages for:
- the release version https://kgoldfeld.github.io/simstudy/
- and development version https://kgoldfeld.github.io/simstudy/dev
- genCatFormula now warns if an additional category is created or probabilities are normalized.
- Fixed bug in trtAssign related new ratio argument.
- Fixed bug in trtAssign when strata had count of one.
- defData now also checks the first row in the definition table for validity.
- Added "mixture" distribution that takes a value from an existing column with a specified probability.
- Modified function trtAssign to improve speed performance of stratified sampling with very large numbers of strata.
- Add argument "ratio" to function trtAssign to allow users to specify more than 1:1 randomization.
- Added function trimData (that uses new rcpp function clipVec) to clip or truncate a longitudinal data set after a certain event has occurred.
- Fixed bug in addMarkov, added trimvalue argument to use trimData function
- Added trimvalue argument in genMarkov
- Added functions genMarkov and addMarkov to create data.table with (or add to existing data.table) individual chains of Markov processes.
- Added function genNthEvent to create data.table with binary event outcome in a longitudinal setting.
- Updated function genCluster so that cluster size can be specified as an integer, and will be constant across all clusters.
- Updated function addPeriods that period name can be specified.
- Updated function trtStepWedge so that a transition period can be included.
- Fixed bug in function delColumns related to multiple keys.
- Added negative binomial distribution as an option to function iccRE
- Fixed function genCorOrdCat so that it can accept user-specified correlation matrix
- Added function trtStepWedge to generate treatment assignment for a stepped-wedge design cluster randomized trial.
- Fixed genCorFlex and addMultiFac to accommodate bug fixes with package data.table
- Added negative binomial option to genCorGen, addCorGen, genCorFlex, and addCorFlex
- Fixed bug in function genFactor
- Added LAG() functionality to missing data generation - updated functions genMiss and added two new internal functions .checkLags and .addLags
- Function catProbs now accepts a vector of probabilities or weights as an argument
- Fixed bugs in function addCondition
- Added function genCorMat - generate an n x n correlation matrix
- Added function genCorOrdCat - generate correlated ordinal categorical data
- Added beta distribution option to function defData (and associated functions)
- Added function betaGetShapes
- Implemented Emrich and Piedmonte algorithm for correlated binary data for function genCorGen and addCorGen
- Modified function genOrdCat - allows adjVar = NULL
- Fixed bug in function addCorFlex
- Added function catProbs - to be used to generate categorical data
- Added binomial distribution
- Added ability to specify formula in variance
- Added function genMultiFac - generates multi-factorial design data
- Added function addMultiFac - adds multi-factorial design data
- Added function iccRE - generates required random effect variance for specified intra-class coefficients (ICCs)
- Fixed bug in function genCorFlex
- Fixed bug in numerous functions related to error checking and scoping
- Fixed bug in function addCondition
- Fixed function updateDef
- Fixed bug in internal function genbinom
- Added function genCorFlex - generate correlated data from variables that have different marginal distributions
- Added function genCorFlex - generate correlated data from variables that have different marginal distributions, can be dependent on previously defined data
- Added function genOrdCat - creates ordinal categorical data
- Added function genFormula - creates a linear formula in the form of a string
- Added function updateDef - modify existing data definition table (to be used in genData())
- Added function updateDefData - modify existing data def table (to be used in addColumns())
- Fixed function genSurv
- Added spline generating functions
- Added uniform integer distribution (uniformInt)
- Added negative binomial distribution (negBinomial)
- Added exponential distribution (exponential)
- Added function delColumns - deletes one or more columns from data.table
- Added error check to verify that specified distributions are valid
- Added function genFactor - converts an existing (non-double) field in a data.table to a factor
- Added function genDummy - creates dummy variables from an integer or factor field in a data.table
- Added function defCondition - define distribution conditional on existing fields
- Added function defReadCond - read in conditional definitions from external csv file
- Added function addCondition - generate data based on conditional definition
- Modified "nonrandom" data generation to allow "log"" and "logit"" link options.
- Added function genCorGen - generate a new data.table with correlated data from various distributions.
- Added function addCorData - add correlated data from various distributions to existing data.tables.
- Fixed index variable issue related to generating categorical data
- Fixed index variable issue related to generating longitudinal data
- Fixed issue that arose when creating categorical variable in first field
- Increased speed required to generate categorical data with large sample sizes
- Categorical data can now accommodate probabilities condition on covariates
- Fix: package data.table 1.10.0 broke genMissDataMat. genMissDataMat has been updated.
- This is the first submission of simstudy, so there is no news yet!