Title: | Data Management of Large Hierarchical Data |
---|---|
Description: | Import 'SPSS' data, handle and change 'SPSS' meta data, store and access large hierarchical data in 'SQLite' data bases. |
Authors: | Benjamin Becker [aut, cre], Karoline Sachse [ctb], Johanna Busse [ctb] |
Maintainer: | Benjamin Becker <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.1.1.9000 |
Built: | 2024-11-22 13:30:14 UTC |
Source: | https://github.com/beckerbenj/eatgads |
Function to apply meta data changes to a GADSdat
object specified by a change table extracted by getChangeMeta
.
applyChangeMeta(changeTable, GADSdat, ...) ## S3 method for class 'varChanges' applyChangeMeta(changeTable, GADSdat, checkVarNames = TRUE, ...) ## S3 method for class 'valChanges' applyChangeMeta( changeTable, GADSdat, existingMeta = c("stop", "value", "value_new", "drop", "ignore"), ... )
applyChangeMeta(changeTable, GADSdat, ...) ## S3 method for class 'varChanges' applyChangeMeta(changeTable, GADSdat, checkVarNames = TRUE, ...) ## S3 method for class 'valChanges' applyChangeMeta( changeTable, GADSdat, existingMeta = c("stop", "value", "value_new", "drop", "ignore"), ... )
changeTable |
Change table as provided by |
GADSdat |
|
... |
further arguments passed to or from other methods. |
checkVarNames |
Logical. Should new variable names be checked by |
existingMeta |
If values are recoded, which meta data should be used (see details)? |
Values for which the change columns contain NA
remain unchanged. If changes are performed on value levels, recoding into
existing values can occur. In these cases, existingMeta
determines how the resulting meta data conflicts are handled,
either raising an error if any occur ("stop"
),
keeping the original meta data for the value ("value"
),
using the meta data in the changeTable
and, if incomplete, from the recoded value ("value_new"
),
or leaving the respective meta data untouched ("ignore"
).
Furthermore, one might recode multiple old values in the same new value. This is currently only possible with
existingMeta = "drop"
, which drops all related meta data on value level, or
existingMeta = "ignore"
, which leaves all related meta data on value level untouched.
Returns the modified GADSdat
object.
# Change a variable name and label varChangeTable <- getChangeMeta(pisa, level = "variable") varChangeTable[1, c("varName_new", "varLabel_new")] <- c("IDstud", "Person ID") pisa2 <- applyChangeMeta(varChangeTable, GADSdat = pisa)
# Change a variable name and label varChangeTable <- getChangeMeta(pisa, level = "variable") varChangeTable[1, c("varName_new", "varLabel_new")] <- c("IDstud", "Person ID") pisa2 <- applyChangeMeta(varChangeTable, GADSdat = pisa)
Recode one or multiple variables based on a lookup table created via createLookup
(and potentially formatted by collapseColumns
).
applyLookup(GADSdat, lookup, suffix = NULL)
applyLookup(GADSdat, lookup, suffix = NULL)
GADSdat |
A |
lookup |
Lookup table created by |
suffix |
Suffix to add to the existing variable names. If |
If there are missing values in the column value_new
, NAs
are inserted as new values
and a warning
is issued.
The complete work flow when using a lookup table to recode multiple variables in a GADSdat
could be:
(0) optional: Recode empty strings to NA
(necessary, if the look up table is written to excel).
(1) create a lookup table with createLookup
.
(2) Save the lookup table to .xlsx
with write_xlsx
from eatAnalysis
.
(3) fill out the lookup table via Excel
.
(4) Import the lookup table back to R
via read_excel
from readxl
.
(5) Apply the final lookup table with applyLookup
.
See applyLookup_expandVar
for recoding a single variable into multiple variables.
Returns a recoded GADSdat
.
## create an example GADSdat iris2 <- iris iris2$Species <- as.character(iris2$Species) gads <- import_DF(iris2) ## create Lookup lu <- createLookup(gads, recodeVars = "Species") lu$value_new <- c("plant 1", "plant 2", "plant 3") ## apply lookup table gads2 <- applyLookup(gads, lookup = lu, suffix = "_r") ## only recode some values lu2 <- createLookup(gads, recodeVars = "Species") lu2$value_new <- c("plant 1", "plant 2", NA) gads3 <- applyLookup(gads, lookup = lu2, suffix = "_r")
## create an example GADSdat iris2 <- iris iris2$Species <- as.character(iris2$Species) gads <- import_DF(iris2) ## create Lookup lu <- createLookup(gads, recodeVars = "Species") lu$value_new <- c("plant 1", "plant 2", "plant 3") ## apply lookup table gads2 <- applyLookup(gads, lookup = lu, suffix = "_r") ## only recode some values lu2 <- createLookup(gads, recodeVars = "Species") lu2$value_new <- c("plant 1", "plant 2", NA) gads3 <- applyLookup(gads, lookup = lu2, suffix = "_r")
Recode one or multiple variables based on a lookup table created via createLookup
.
In contrast to applyLookup
, this function allows the creation of multiple resulting
variables from a single input variable. All variables in lookup
except
variable
and value
are treated as recode columns.
applyLookup_expandVar(GADSdat, lookup)
applyLookup_expandVar(GADSdat, lookup)
GADSdat |
A |
lookup |
Lookup table created by |
If a variable contains information that should be split into multiple variables via manual recoding,
applyLookup_expandVar
can be used. If there are missing values in any recode column,
NAs
are inserted as new values. A warning
is issued only for the first column.
The complete work flow when using a lookup table to expand variables in a GADSdat
based on manual recoding could be:
(1) create a lookup table with createLookup
.
(2) Save the lookup table to .xlsx
with write_xlsx
from eatAnalysis
.
(3) fill out the lookup table via Excel
.
(4) Import the lookup table back to R
via read_excel
from readxl
.
(5) Apply the final lookup table with applyLookup_expandVar
.
See applyLookup
for simply recoding variables in a GADSdat
.
Returns a recoded GADSdat
.
## create an example GADSdat example_df <- data.frame(ID = 1:6, citizenship = c("germ", "engl", "germ, usa", "china", "austral, morocco", "nothin"), stringsAsFactors = FALSE) gads <- import_DF(example_df) ## create Lookup lu <- createLookup(gads, recodeVars = "citizenship", addCol = c("cit_1", "cit_2")) lu$cit_1 <- c("German", "English", "German", "Chinese", "Australian", NA) lu$cit_2 <- c(NA, NA, "USA", NA, "Morocco", NA) ## apply lookup table gads2 <- applyLookup_expandVar(gads, lookup = lu)
## create an example GADSdat example_df <- data.frame(ID = 1:6, citizenship = c("germ", "engl", "germ, usa", "china", "austral, morocco", "nothin"), stringsAsFactors = FALSE) gads <- import_DF(example_df) ## create Lookup lu <- createLookup(gads, recodeVars = "citizenship", addCol = c("cit_1", "cit_2")) lu$cit_1 <- c("German", "English", "German", "Chinese", "Australian", NA) lu$cit_2 <- c(NA, NA, "USA", NA, "Morocco", NA) ## apply lookup table gads2 <- applyLookup_expandVar(gads, lookup = lu)
Applies recodes as specified by a numCheck
data.frame
, as created by createNumCheck
.
applyNumCheck(GADSdat, numCheck)
applyNumCheck(GADSdat, numCheck)
GADSdat |
A |
numCheck |
A |
This function is currently under development.
A recoded GADSdat
.
# tbd
# tbd
Assimilate all value labels of multiple variables as part of a GADSdat
or all_GADSdat
object.
assimilateValLabels(GADSdat, varNames, lookup = NULL)
assimilateValLabels(GADSdat, varNames, lookup = NULL)
GADSdat |
|
varNames |
Character string of a variable name. |
lookup |
Look up |
Assimilation can be performed using all existing value labels or a look up table containing at least all existing value labels.
Missing codes are reused based on the meta data of the first variable in varNames
.
Returns the GADSdat
object with changed meta data and recoded values.
# Example data set facs_df <- data.frame(id = 1:3, fac1 = c("Eng", "Aus", "Ger"), fac2 = c("Ger", "Franz", "Ita"), fac3 = c("Kor", "Chi", "Alg"), stringsAsFactors = TRUE) facs_gads <- import_DF(facs_df) assimilateValLabels(facs_gads, varNames = paste0("fac", 1:3))
# Example data set facs_df <- data.frame(id = 1:3, fac1 = c("Eng", "Aus", "Ger"), fac2 = c("Ger", "Franz", "Ita"), fac3 = c("Kor", "Chi", "Alg"), stringsAsFactors = TRUE) facs_gads <- import_DF(facs_df) assimilateValLabels(facs_gads, varNames = paste0("fac", 1:3))
GADSdat
.Auto recode a variable in a GADSdat
. A look up table is created containing the respective recode pairs.
An existing look up table can be utilized via template
. This function somewhat mirrors the functionality provided
by the SPSS
function autorecode
.
autoRecode( GADSdat, var, var_suffix = "", label_suffix = "", csv_path = NULL, template = NULL )
autoRecode( GADSdat, var, var_suffix = "", label_suffix = "", csv_path = NULL, template = NULL )
GADSdat |
A |
var |
Character string of the variable name which should be recoded. |
var_suffix |
Variable suffix for the newly created |
label_suffix |
Suffix added to variable label for the newly created variable in the |
csv_path |
Path for the |
template |
Existing look up table. |
If an existing template
is used and a look up table is saved as a .csv
file, the resulting look up
table will contain the existing recodes plus additional recode pairs required for the data.
Returns a GADSdat
object.
gads <- import_DF(data.frame(v1 = letters)) # auto recode without saving look up table gads2 <- autoRecode(gads, var = "v1", var_suffix = "_num") # auto recode with saving look up table f <- tempfile(fileext = ".csv") gads2 <- autoRecode(gads, var = "v1", var_suffix = "_num", csv_path = f)
gads <- import_DF(data.frame(v1 = letters)) # auto recode without saving look up table gads2 <- autoRecode(gads, var = "v1", var_suffix = "_num") # auto recode with saving look up table f <- tempfile(fileext = ".csv") gads2 <- autoRecode(gads, var = "v1", var_suffix = "_num", csv_path = f)
Calculate a scale variable based on multiple items.
calculateScale( GADSdat, items, scale, maxNA = length(items), reportDescr = FALSE )
calculateScale( GADSdat, items, scale, maxNA = length(items), reportDescr = FALSE )
GADSdat |
A |
items |
A character vector with all item variable names. |
scale |
A character vector with the scale name. |
maxNA |
Maximum number of allowed |
reportDescr |
Should descriptive statistics be reported for the calculated scale. |
Descriptive statistics (including Cronbach's alpha, credit to the psy
package) are calculated and printed to the console.
The new scale variable is automatically inserted right after the last item in the original GADSdat
.
Returns a GADSdat
containing the newly computed variable.
## items <- paste0("norms_", letters[1:6]) pisa_new <- calculateScale(pisa, items = items, scale = "norms")
## items <- paste0("norms_", letters[1:6]) pisa_new <- calculateScale(pisa, items = items, scale = "norms")
GADSdat
objects into a single GADSdat
object by columns.Is a secure way to cbind
the data and the meta data of two GADSdat
objects. Currently, only limited merging options are supported.
## S3 method for class 'GADSdat' cbind(..., deparse.level = 1)
## S3 method for class 'GADSdat' cbind(..., deparse.level = 1)
... |
Multiple |
deparse.level |
Argument is ignored in this method. |
If there are duplicate variables (except the variables specified in the by
argument), these variables are removed from y.
The meta data is joined for the remaining variables via rbind
.
Returns a GADSdat
object.
Change or add missing codes of one or multiple variables as part of a GADSdat
object.
changeMissings(GADSdat, varName, value, missings)
changeMissings(GADSdat, varName, value, missings)
GADSdat |
|
varName |
Character vector containing variable names. |
value |
Numeric values. |
missings |
Character vector of the new missing codes, either |
Applied to a GADSdat
or all_GADSdat
object, this function is a wrapper of
getChangeMeta
and applyChangeMeta
.
The function supports changing multiple missing tags (missings
) as well as missing tags of
multiple variables (varName
) at once.
Returns the GADSdat
object with changed meta data.
# Set a specific value to missing pisa2 <- changeMissings(pisa, varName = "computer_age", value = 5, missings = "miss") # Set multiple values to missing pisa3 <- changeMissings(pisa, varName = "computer_age", value = 1:4, missings = c("miss", "miss", "miss", "miss")) # Set a specific value to not missing pisa4 <- changeMissings(pisa2, varName = "computer_age", value = 5, missings = "valid") # Add missing tags to multiple variables pisa5 <- changeMissings(pisa, varName = c("g8g9", "computer_age"), value = c(-99, -98), missings = c("miss", "miss"))
# Set a specific value to missing pisa2 <- changeMissings(pisa, varName = "computer_age", value = 5, missings = "miss") # Set multiple values to missing pisa3 <- changeMissings(pisa, varName = "computer_age", value = 1:4, missings = c("miss", "miss", "miss", "miss")) # Set a specific value to not missing pisa4 <- changeMissings(pisa2, varName = "computer_age", value = 5, missings = "valid") # Add missing tags to multiple variables pisa5 <- changeMissings(pisa, varName = c("g8g9", "computer_age"), value = c(-99, -98), missings = c("miss", "miss"))
Change the SPSS format of one or multiple variables as part of a GADSdat
object.
changeSPSSformat(GADSdat, varName, format)
changeSPSSformat(GADSdat, varName, format)
GADSdat |
|
varName |
Character vector of variable names. |
format |
A single string containing the new SPSS format, for example 'A25' or 'F10'. |
Applied to a GADSdat
or all_GADSdat
object, this function is a wrapper
of getChangeMeta
and applyChangeMeta
.
SPSS format is supplied following SPSS logic. 'A'
represents character variables,
'F'
represents numeric variables. The number following this letter represents the maximum width.
Optionally, another number can be added after a dot, representing the number of decimals
in case of a numeric variable. For instance, 'F8.2'
is used for a numeric variable with
a maximum width of 8 with 2 decimal numbers.
Returns the GADSdat
object with changed meta data..
# change SPSS format for a single variable (numeric variable with no decimals) pisa2 <- changeSPSSformat(pisa, varName = "idstud", format = "F10.0") # change SPSS format for multiple variables (numeric variable with no decimals) pisa2 <- changeSPSSformat(pisa, varName = c("idstud", "idschool"), format = "F10.0")
# change SPSS format for a single variable (numeric variable with no decimals) pisa2 <- changeSPSSformat(pisa, varName = "idstud", format = "F10.0") # change SPSS format for multiple variables (numeric variable with no decimals) pisa2 <- changeSPSSformat(pisa, varName = c("idstud", "idschool"), format = "F10.0")
Change or add value labels of one or multiple variables as part of a GADSdat
object.
changeValLabels(GADSdat, varName, value, valLabel)
changeValLabels(GADSdat, varName, value, valLabel)
GADSdat |
|
varName |
Character vector containing variable names. |
value |
Numeric values which are being labeled. |
valLabel |
Character vector of the new value labels.
Labels are applied in the same ordering as |
Applied to a GADSdat
or all_GADSdat
object, this function is a wrapper
of getChangeMeta
and applyChangeMeta
.
The function supports changing multiple value labels (valLabel
) as well as value labels of
multiple variables (varName
) at once.
Returns the GADSdat
object with changed meta data.
# Change existing value labels pisa2 <- changeValLabels(pisa, varName = "repeated", value = c(1, 2), valLabel = c("no grade repetition", "grade repitition")) # Add value label to unlabeled value mtcars_g <- import_DF(mtcars) mtcars_g2 <- changeValLabels(mtcars_g, varName = "cyl", value = c(4, 6, 8), valLabel = c("four", "six", "eight")) # Add value labels to multiple variables at once mtcars_g3 <- changeValLabels(mtcars_g, varName = c("mpg", "cyl", "disp"), value = c(-99, -98), valLabel = c("missing", "not applicable"))
# Change existing value labels pisa2 <- changeValLabels(pisa, varName = "repeated", value = c(1, 2), valLabel = c("no grade repetition", "grade repitition")) # Add value label to unlabeled value mtcars_g <- import_DF(mtcars) mtcars_g2 <- changeValLabels(mtcars_g, varName = "cyl", value = c(4, 6, 8), valLabel = c("four", "six", "eight")) # Add value labels to multiple variables at once mtcars_g3 <- changeValLabels(mtcars_g, varName = c("mpg", "cyl", "disp"), value = c(-99, -98), valLabel = c("missing", "not applicable"))
Change variable labels of one or multiple variables as part of a GADSdat
object.
changeVarLabels(GADSdat, varName, varLabel)
changeVarLabels(GADSdat, varName, varLabel)
GADSdat |
|
varName |
Character vector of variable names. |
varLabel |
Character vector of the new variable labels. |
Applied to a GADSdat
or all_GADSdat
object, this function is a wrapper
of getChangeMeta
and applyChangeMeta
.
Returns the GADSdat
object with changed meta data.
# Change one variable label pisa2 <- changeVarLabels(pisa, varName = "repeated", varLabel = c("Has a grade been repeated?")) # Change multiple variable labels pisa2 <- changeVarLabels(pisa, varName = c("repeated", "gender"), varLabel = c("Has a grade been repeated?", "Student gender"))
# Change one variable label pisa2 <- changeVarLabels(pisa, varName = "repeated", varLabel = c("Has a grade been repeated?")) # Change multiple variable labels pisa2 <- changeVarLabels(pisa, varName = c("repeated", "gender"), varLabel = c("Has a grade been repeated?", "Student gender"))
Change variable names of a GADSdat
or all_GADSdat
object.
changeVarNames(GADSdat, oldNames, newNames, checkVarNames = TRUE)
changeVarNames(GADSdat, oldNames, newNames, checkVarNames = TRUE)
GADSdat |
|
oldNames |
Vector containing the old variable names. |
newNames |
Vector containing the new variable names, in identical order as |
checkVarNames |
Logical. Should new variable names be checked by |
Applied to a GADSdat
or all_GADSdat
object, this function is a wrapper of getChangeMeta
and
applyChangeMeta
Returns the GADSdat
object with changed variable names.
# Change multiple variable name pisa2 <- changeVarNames(pisa, oldNames = c("idstud", "idschool"), newNames = c("IDstud", "IDschool"))
# Change multiple variable name pisa2 <- changeVarNames(pisa, oldNames = c("idstud", "idschool"), newNames = c("IDstud", "IDschool"))
SPSS
Compliance of Meta DataFunction to check if variable names and labels, value labels and missing codes comply with SPSS
requirements for meta data.
check4SPSS(GADSdat)
check4SPSS(GADSdat)
GADSdat |
|
The function measures the length of variable names ("varNames_length"
, maximum of 64 characters)
variable labels ("varLabels"
, maximum of 256 characters),
value labels ("valLabels"
, maximum of 120 characters). Furthermore,
missing codes are counted ("missings"
, maximum of three missing codes for character variables)
and special characters are flagged in variable names ("varNames_special"
).
Check results are reported back on variable level, with the exception of "valLabels"
, which is a list
with entries per violating variable.
Returns a list with the entries "varNames_special"
, "varNames_length"
,
"varLabels"
, "valLabels"
and "missings"
.
# Change example data set (create a violating label) pisa2 <- changeVarLabels(pisa, varName = "computer_age", varLabel = paste(rep("3", 125), collapse = "")) check4SPSS(pisa2)
# Change example data set (create a violating label) pisa2 <- changeVarLabels(pisa, varName = "computer_age", varLabel = paste(rep("3", 125), collapse = "")) check4SPSS(pisa2)
Check value labels for (a) value labels with no occurrence in the data (checkEmptyValLabels
) and
(b) values with no value labels (checkMissingValLabels
).
checkEmptyValLabels( GADSdat, vars = namesGADS(GADSdat), valueRange = NULL, output = c("list", "data.frame") ) checkMissingValLabels( GADSdat, vars = namesGADS(GADSdat), classes = c("integer"), valueRange = NULL, output = c("list", "data.frame") )
checkEmptyValLabels( GADSdat, vars = namesGADS(GADSdat), valueRange = NULL, output = c("list", "data.frame") ) checkMissingValLabels( GADSdat, vars = namesGADS(GADSdat), classes = c("integer"), valueRange = NULL, output = c("list", "data.frame") )
GADSdat |
A |
vars |
Character vector with the variable names to which |
valueRange |
[optional] Numeric vector of length 2: In which range should numeric values be checked? If specified, only numeric values are returned and strings are omitted. |
output |
Should the output be structured as a |
classes |
Character vector with the classes to which |
NAs
are excluded from this check. Designated missing codes are reported normally.
Returns a list of length vars
or a data.frame
.
checkEmptyValLabels()
: check for superfluous value labels
checkMissingValLabels()
: check for missing value labels
# Check a categorical and a metric variable checkMissingValLabels(pisa, vars = c("g8g9", "age")) checkEmptyValLabels(pisa, vars = c("g8g9", "age")) # Check while defining a specific value range checkMissingValLabels(pisa, vars = c("g8g9", "age", "idschool"), valueRange = c(0, 5)) checkEmptyValLabels(pisa, vars = c("g8g9", "age", "idschool"), valueRange = c(0, 5))
# Check a categorical and a metric variable checkMissingValLabels(pisa, vars = c("g8g9", "age")) checkEmptyValLabels(pisa, vars = c("g8g9", "age")) # Check while defining a specific value range checkMissingValLabels(pisa, vars = c("g8g9", "age", "idschool"), valueRange = c(0, 5)) checkEmptyValLabels(pisa, vars = c("g8g9", "age", "idschool"), valueRange = c(0, 5))
Function to check if SPSS format statements are specified correctly in a GADSdat
object.
checkFormat(GADSdat, type = "SPSS", changeFormat = TRUE)
checkFormat(GADSdat, type = "SPSS", changeFormat = TRUE)
GADSdat |
|
type |
If |
changeFormat |
If |
The function compares SPSS format statements "format"
and actual character length and
decimal places of all variables in a GADSdat
object and its
meta data information. Mismatches are reported and can be automatically adjusted.
Returns a GADSdat
object.
# Change example meta information (create a value label with incorrect missing code) pisa2 <- checkFormat(pisa)
# Change example meta information (create a value label with incorrect missing code) pisa2 <- checkFormat(pisa)
Functions to check if missings are tagged and labeled correctly in a GADSdat
object.
checkMissings( GADSdat, missingLabel = "missing", addMissingCode = TRUE, addMissingLabel = FALSE ) checkMissingsByValues(GADSdat, missingValues = -50:-99, addMissingCode = TRUE)
checkMissings( GADSdat, missingLabel = "missing", addMissingCode = TRUE, addMissingLabel = FALSE ) checkMissingsByValues(GADSdat, missingValues = -50:-99, addMissingCode = TRUE)
GADSdat |
|
missingLabel |
Single regular expression indicating how missing labels are commonly named in the value labels. |
addMissingCode |
If |
addMissingLabel |
If |
missingValues |
Numeric vector of values which are commonly used for missing values. |
checkMissings()
compares value labels (valLabels
) and missing tags (missings
) of a GADSdat
object and its
meta data information.
checkMissingsByValues()
compares labeled values (value
) and missing tags (missings
) of a GADSdat
object
and its meta data information.
Mismatches are reported and can be automatically adjusted. Note that all checks are only applied to the
meta data information, not the actual data. For detecting missing value labels, see checkMissingValLabels
.
Returns a GADSdat
object with - if specified - modified missing tags.
checkMissings()
: compare missing tags and value labels
checkMissingsByValues()
: compare missing tags and values in a certain range
# checkMissings pisa2 <- changeValLabels(pisa, varName = "computer_age", value = 5, valLabel = "missing: No computer use") pisa3 <- checkMissings(pisa2) # checkMissingsByValues pisa4 <- changeValLabels(pisa, varName = "computer_age", value = c(-49, -90, -99), valLabel = c("test1", "test2", "test3")) pisa5 <- checkMissingsByValues(pisa4, missingValues = -50:-99)
# checkMissings pisa2 <- changeValLabels(pisa, varName = "computer_age", value = 5, valLabel = "missing: No computer use") pisa3 <- checkMissings(pisa2) # checkMissingsByValues pisa4 <- changeValLabels(pisa, varName = "computer_age", value = c(-49, -90, -99), valLabel = c("test1", "test2", "test3")) pisa5 <- checkMissingsByValues(pisa4, missingValues = -50:-99)
eatGADS
data bases.This function checks if both data bases perform identical joins via foreign keys, if they contain the same variable names and if these variables have the same value labels. Results of this comparison are reported on data table level as messages and as an output list.
checkTrendStructure(filePath1, filePath2)
checkTrendStructure(filePath1, filePath2)
filePath1 |
Path of the first |
filePath2 |
Path of the second |
An error is thrown if the key structure or the data table structure differs between the two data bases. Differences regarding meta data for missing value labels and for variables labels (and formatting) are ignored.
Reported differences regarding meta data can be inspected further via inspectMetaDifferences
.
Returns a report list.
Function to check if a variable is unique for all cases of an identifier variable.
checkUniqueness(GADSdat, varName, idVar)
checkUniqueness(GADSdat, varName, idVar)
GADSdat |
|
varName |
Single string containing the variable name for which the check should be performed. |
idVar |
Single string containing the identifier variable name. |
For example if missing values are multiple imputed and data is stored in a long format, checking the uniqueness of a variable within an identifier can be tricky. This function automates this task.
Returns either TRUE
if the variable is unique within each value for idVar
or a GADSdat
object including
the not unique cases.
## create an example GADSdat iris2 <- iris iris2$Species <- as.character(iris2$Species) gads <- import_DF(iris2, checkVarNames = FALSE) ## check uniqueness checkUniqueness(gads, varName = "Sepal.Length", idVar = "Species")
## create an example GADSdat iris2 <- iris iris2$Species <- as.character(iris2$Species) gads <- import_DF(iris2, checkVarNames = FALSE) ## check uniqueness checkUniqueness(gads, varName = "Sepal.Length", idVar = "Species")
Function to check if a variable is unique for all cases of an identifier variable. This is a fast and more efficient version of
checkUniqueness
which always returns a logical, non missing value of length one.
checkUniqueness2(GADSdat, varName, idVar, impVar)
checkUniqueness2(GADSdat, varName, idVar, impVar)
GADSdat |
|
varName |
Single string containing the variable name for which the check should be performed. |
idVar |
Single string containing the name of the identifier variable. |
impVar |
Single string containing the name of the imputation number. |
For example if missing values are multiple imputed and data is stored in a long format, checking the uniqueness of a variable
within an identifier can be tricky. This function automates this task via reshaping the data into wide format and testing equality
among the reshaped variables. Similar functionality (via matrices) is covered by lme4::isNested
,
which is more general and performs similarly.
Returns a logical of length one.
## create an example GADSdat l <- 1000 long_df <- data.table::data.table(id = sort(rep(1:l, 15)), v1 = sort(rep(1:l, 15)), imp = rep(1:15, l)) gads <- import_DF(long_df) ## check uniqueness checkUniqueness2(gads, varName = "v1", idVar = "id", impVar = "imp")
## create an example GADSdat l <- 1000 long_df <- data.table::data.table(id = sort(rep(1:l, 15)), v1 = sort(rep(1:l, 15)), imp = rep(1:15, l)) gads <- import_DF(long_df) ## check uniqueness checkUniqueness2(gads, varName = "v1", idVar = "id", impVar = "imp")
Function to look for occurrences of a specific value in a GADSdat
.
checkValue(GADSdat, value, vars = namesGADS(GADSdat))
checkValue(GADSdat, value, vars = namesGADS(GADSdat))
GADSdat |
|
value |
Single string indicating how missing labels are commonly named in the value labels. |
vars |
Character vector with the variable names to which |
The function checks occurrences of a specific value in a set of variables (default: all variables) in the GADSdat
and outputs a vector
containing the count of occurrences for all variables in which the value occurs. It explicitly supports checking for NA
.
A named integer.
# for all variables in the data checkValue(pisa, value = 99) # only for specific variables in the data checkValue(pisa, vars = c("idschool", "g8g9"), value = 99)
# for all variables in the data checkValue(pisa, value = 99) # only for specific variables in the data checkValue(pisa, vars = c("idschool", "g8g9"), value = 99)
SQLite
column name conventions.Checks names for SQLite
column name conventions and
applies appropriate variable name changes to GADSdat
or all_GADSdat
objects.
checkVarNames(GADSdat, checkKeywords = TRUE, checkDots = TRUE)
checkVarNames(GADSdat, checkKeywords = TRUE, checkDots = TRUE)
GADSdat |
|
checkKeywords |
Logical. Should |
checkDots |
Logical. Should occurrences of |
Invalid column names in a SQLite
data base include
SQLite
keywords (see sqlite_keywords
) and
column names with a "."
in it.
The corresponding variable name changes are
appending the suffix "Var"
to all SQLite
keywords and
changing all "."
in variable names to "_"
.
Note that avoiding "."
in variable names is beneficial for multiple reasons, such as
avoiding confusion with S3
methods in R
and issues when importing from Stata
.
Returns the original object with updated variable names.
# Change example data set (create an invalid variable name) pisa2 <- changeVarNames(pisa, oldNames = "computer_age", newNames = "computer.age") pisa3 <- checkVarNames(pisa2)
# Change example data set (create an invalid variable name) pisa2 <- changeVarNames(pisa, oldNames = "computer_age", newNames = "computer.age") pisa3 <- checkVarNames(pisa2)
Deprecated. The cached data base is now cleaned when the R sessions ends automatically.
clean_cache(tempPath = tempdir())
clean_cache(tempPath = tempdir())
tempPath |
Local directory in which the data base was temporarily be stored. |
Cleans the temporary cache, specified by tempdir()
. This function had to be executed at the end of an R
session if
getGADS_fast
or getTrendGADS
with fast = TRUE
had been used.
Returns nothing.
Clone a variable as part of a GADSdat
object.
cloneVariable( GADSdat, varName, new_varName, label_suffix = "", checkVarName = TRUE )
cloneVariable( GADSdat, varName, new_varName, label_suffix = "", checkVarName = TRUE )
GADSdat |
|
varName |
Name of the variable to be cloned. |
new_varName |
Name of the new variable. |
label_suffix |
Suffix added to variable label for the newly created variable in the |
checkVarName |
Logical. Should |
The variable is simply duplicated and assigned a new name.
Returns a GADSdat
.
# duplicate the variable schtype pisa_new <- cloneVariable(pisa, varName = "schtype", new_varName = "schtype_new")
# duplicate the variable schtype pisa_new <- cloneVariable(pisa, varName = "schtype", new_varName = "schtype_new")
Collapse two columns or format a single column of a lookup table created by createLookup
.
collapseColumns(lookup, recodeVars, prioritize)
collapseColumns(lookup, recodeVars, prioritize)
lookup |
For example a lookup table |
recodeVars |
Character vector of column names which should be collapsed (currently only up to two variables are supported). |
prioritize |
Character vector of length 1. Which of the columns in |
If a lookup table is created by createLookup
, different recoding columns can be specified by the addCols
argument.
This might be the case if two rater suggest recodes or one rater corrects recodes by another rater in a separate column.
After the recoding columns have been filled out, collapseColumns
can be used to either:
(a) Collapse two recoding columns into one recoding column. This might be desirable, if the two columns contain missing values.
prioritize
can be used to specify, which of the two columns should be prioritized if both columns contain valid values.
(b) Format the lookup table for applyLookup
, if recodeVars
is a single variable.
This simply renames the single variable specified under recodeVars
.
Returns a data.frame
that can be used for applyLookup
, with the columns:
variable |
Variable names |
value |
Old values |
value_new |
New values. Renamed and/or collapsed column. |
## (a) Collapse two columns # create example recode data.frame lookup_raw <- data.frame(variable = c("var1"), value = c("germa", "German", "dscherman"), recode1 = c(NA, "English", "German"), recode2 = c("German", "German", NA), stringsAsFactors = FALSE) # collapse columns lookup <- collapseColumns(lookup_raw, recodeVars = c("recode1", "recode2"), prioritize = "recode2") ## (b) Format one column # create example recode data.frame lookup_raw2 <- data.frame(variable = c("var1"), value = c("germa", "German", "dscherman"), recode1 = c("German", "German", "German"), stringsAsFactors = FALSE) # collapse columns lookup2 <- collapseColumns(lookup_raw2, recodeVars = c("recode1"))
## (a) Collapse two columns # create example recode data.frame lookup_raw <- data.frame(variable = c("var1"), value = c("germa", "German", "dscherman"), recode1 = c(NA, "English", "German"), recode2 = c("German", "German", NA), stringsAsFactors = FALSE) # collapse columns lookup <- collapseColumns(lookup_raw, recodeVars = c("recode1", "recode2"), prioritize = "recode2") ## (b) Format one column # create example recode data.frame lookup_raw2 <- data.frame(variable = c("var1"), value = c("germa", "German", "dscherman"), recode1 = c("German", "German", "German"), stringsAsFactors = FALSE) # collapse columns lookup2 <- collapseColumns(lookup_raw2, recodeVars = c("recode1"))
Recode an labeled integer variable (based on an multiple choice item), according to a character variable (e.g. an open answer item).
collapseMC_Text( GADSdat, mc_var, text_var, mc_code4text, var_suffix = "_r", label_suffix = "(recoded)" )
collapseMC_Text( GADSdat, mc_var, text_var, mc_code4text, var_suffix = "_r", label_suffix = "(recoded)" )
GADSdat |
A |
mc_var |
The variable name of the multiple choice variable. |
text_var |
The variable name of the text variable. |
mc_code4text |
The value label in |
var_suffix |
Variable name suffix for the newly created variables. If |
label_suffix |
Variable label suffix for the newly created variable (only added in the meta data). If |
Multiple choice variables can be represented as labeled integer variables in a GADSdat
. Multiple choice items with a forced choice
frequently contain an open answer category. However, sometimes open answers overlap with the existing categories in the multiple choice
item. collapseMC_Text
allows recoding the multiple choice variable based on the open answer variable.
mc_code4text
indicates when entries in the text_var
should be used. Additionally, entries in the text_var
are also
used when there are missings on the mc_var
. New values for the mc_var
are added in the meta data, while preserving the initial
ordering of the value labels. Newly added value labels are sorted alphabetically.
For more details see the help vignette:
vignette("recoding_forcedChoice", package = "eatGADS")
.
Returns a GADSdat
containing the newly computed variable.
# Example gads example_df <- data.frame(ID = 1:5, mc = c("blue", "blue", "green", "other", "other"), open = c(NA, NA, NA, "yellow", "blue"), stringsAsFactors = FALSE) example_df$mc <- as.factor(example_df$mc) gads <- import_DF(example_df) # recode gads2 <- collapseMC_Text(gads, mc_var = "mc", text_var = "open", mc_code4text = "other")
# Example gads example_df <- data.frame(ID = 1:5, mc = c("blue", "blue", "green", "other", "other"), open = c(NA, NA, NA, "yellow", "blue"), stringsAsFactors = FALSE) example_df$mc <- as.factor(example_df$mc) gads <- import_DF(example_df) # recode gads2 <- collapseMC_Text(gads, mc_var = "mc", text_var = "open", mc_code4text = "other")
Recode multiple variables (representing a single multiple choice item) based on multiple character variables (representing a text field).
collapseMultiMC_Text( GADSdat, mc_vars, text_vars, mc_var_4text, var_suffix = "_r", label_suffix = "(recoded)", invalid_miss_code = -98, invalid_miss_label = "Missing: Invalid response", notext_miss_code = -99, notext_miss_label = "Missing: By intention" )
collapseMultiMC_Text( GADSdat, mc_vars, text_vars, mc_var_4text, var_suffix = "_r", label_suffix = "(recoded)", invalid_miss_code = -98, invalid_miss_label = "Missing: Invalid response", notext_miss_code = -99, notext_miss_label = "Missing: By intention" )
GADSdat |
A |
mc_vars |
A character vector with the variable names of the multiple choice variable. Names of the character
vector are the corresponding values that are represented by the individual variables.
Creation by |
text_vars |
A character vector with the names of the text variables which should be collapsed. |
mc_var_4text |
The name of the multiple choice variable that signals that information from the text variable should be used. This variable is recoded according to the final status of the text variables. |
var_suffix |
Variable suffix for the newly created |
label_suffix |
Suffix added to variable label for the newly created or modified variables in the |
invalid_miss_code |
Missing code which is given to new character variables if all text entries where recoded into the dichotomous variables. |
invalid_miss_label |
Value label for |
notext_miss_code |
Missing code which is given to empty character variables. |
notext_miss_label |
Value label for |
If a multiple choice item can be answered with ticking multiple boxes, multiple variables in the data
set are necessary to represent this item. In this case, an additional text field for further answers can also
contain multiple values at once. However, some of the answers in the text field might be redundant to
the dummy variables. collapseMultiMC_Text
allows to recode multiple MC items of this
kind based on multiple text variables. The recoding can be prepared by expanding the single text variable
(createLookup
and applyLookup_expandVar
) and by matching the dummy variables
to its underlying values stored in variable labels (matchValues_varLabels
).
The function recodes the dummy variables according to the character variables. Additionally, the mc_var_4text
variable is recoded according to the final status of the text_vars
(exception: if the text variables were
originally NA
, mc_var_4text
is left as it was).
Missing values in the character variables can be represented either by NAs
or by empty characters.
The multiple choice variables specified with mc_vars
can only contain the values 0
,
1
and missing codes. The value 1
must always represent "this category applies".
If necessary, use recodeGADS
for recoding.
For cases for which the text_vars
contain only values that can be recoded into the mc_vars
,
all new text_vars
are given specific missing codes (see invalid_miss_code
and invalid_miss_label
).
All remaining NAs
on the character variables are given a specific missing code (notext_miss_code
).
Returns a GADSdat
containing the newly computed variables.
# Prepare example data mt2 <- data.frame(ID = 1:4, mc1 = c(1, 0, 0, 0), mc2 = c(0, 0, 0, 0), mc3 = c(0, 1, 1, 0), text1 = c(NA, "Eng", "Aus", "Aus2"), text2 = c(NA, "Franz", NA, "Ger"), stringsAsFactors = FALSE) mt2_gads <- import_DF(mt2) mt3_gads <- changeVarLabels(mt2_gads, varName = c("mc1", "mc2", "mc3"), varLabel = c("Lang: Eng", "Aus spoken", "other")) ## All operations (see also respective help pages of functions for further explanations) mc_vars <- matchValues_varLabels(mt3_gads, mc_vars = c("mc1", "mc2", "mc3"), values = c("Aus", "Eng", "Eng"), label_by_hand = c("other" = "mc3")) out_gads <- collapseMultiMC_Text(mt3_gads, mc_vars = mc_vars, text_vars = c("text1", "text2"), mc_var_4text = "mc3") out_gads2 <- multiChar2fac(out_gads, vars = c("text1_r", "text2_r")) final_gads <- remove2NAchar(out_gads2, vars = c("text1_r_r", "text2_r_r"), max_num = 1, na_value = -99, na_label = "missing: excessive answers")
# Prepare example data mt2 <- data.frame(ID = 1:4, mc1 = c(1, 0, 0, 0), mc2 = c(0, 0, 0, 0), mc3 = c(0, 1, 1, 0), text1 = c(NA, "Eng", "Aus", "Aus2"), text2 = c(NA, "Franz", NA, "Ger"), stringsAsFactors = FALSE) mt2_gads <- import_DF(mt2) mt3_gads <- changeVarLabels(mt2_gads, varName = c("mc1", "mc2", "mc3"), varLabel = c("Lang: Eng", "Aus spoken", "other")) ## All operations (see also respective help pages of functions for further explanations) mc_vars <- matchValues_varLabels(mt3_gads, mc_vars = c("mc1", "mc2", "mc3"), values = c("Aus", "Eng", "Eng"), label_by_hand = c("other" = "mc3")) out_gads <- collapseMultiMC_Text(mt3_gads, mc_vars = mc_vars, text_vars = c("text1", "text2"), mc_var_4text = "mc3") out_gads2 <- multiChar2fac(out_gads, vars = c("text1_r", "text2_r")) final_gads <- remove2NAchar(out_gads2, vars = c("text1_r_r", "text2_r_r"), max_num = 1, na_value = -99, na_label = "missing: excessive answers")
Compare multiple variables of two GADSdat
or all_GADSdat
objects.
compareGADS( GADSdat_old, GADSdat_new, varNames, output = c("list", "data.frame", "aggregated") )
compareGADS( GADSdat_old, GADSdat_new, varNames, output = c("list", "data.frame", "aggregated") )
GADSdat_old |
|
GADSdat_new |
|
varNames |
Character string of variable names to be compared. |
output |
How should the output be structured? |
Returns "all equal"
if the variable is identical across the objects or a data.frame
containing a frequency table with the values which have been changed. Especially useful for checks
after recoding.
Returns either a list with "all equal"
and data.frames
or a single data.frame
.
# Recode a GADS pisa2 <- recodeGADS(pisa, varName = "schtype", oldValues = 3, newValues = 9) pisa2 <- recodeGADS(pisa2, varName = "language", oldValues = 1, newValues = 15) # Compare compareGADS(pisa, pisa2, varNames = c("ganztag", "schtype", "language"), output = "list") compareGADS(pisa, pisa2, varNames = c("ganztag", "schtype", "language"), output = "data.frame") compareGADS(pisa, pisa2, varNames = c("ganztag", "schtype", "language"), output = "aggregated")
# Recode a GADS pisa2 <- recodeGADS(pisa, varName = "schtype", oldValues = 3, newValues = 9) pisa2 <- recodeGADS(pisa2, varName = "language", oldValues = 1, newValues = 15) # Compare compareGADS(pisa, pisa2, varNames = c("ganztag", "schtype", "language"), output = "list") compareGADS(pisa, pisa2, varNames = c("ganztag", "schtype", "language"), output = "data.frame") compareGADS(pisa, pisa2, varNames = c("ganztag", "schtype", "language"), output = "aggregated")
Create a composite variable out of two variables.
composeVar(GADSdat, sourceVars, primarySourceVar, newVar, checkVarName = TRUE)
composeVar(GADSdat, sourceVars, primarySourceVar, newVar, checkVarName = TRUE)
GADSdat |
|
sourceVars |
Character vector of length two containing the variable names which represent the sources of information. |
primarySourceVar |
Character vector containing a single variable name. Which of the |
newVar |
Character vector containing the name of the new composite variable. |
checkVarName |
Logical. Should |
A common use case for creating a composite variable is if there are multiple sources for the same information. For example, a child and the parents are asked about the child's native language. In such cases a composite variable contains information from both variables, meaning that one source is preferred and the other source is used to substitute missing values.
The modified GADSdat
.
# example data dat <- data.frame(ID = 1:4, nat_lang_child = c("Engl", "Ger", "missing", "missing"), nat_lang_father = c("Engl", "Engl", "Ger", "missing"), stringsAsFactors = TRUE) gads <- import_DF(dat) changeMissings(gads, "nat_lang_child", value = 3, missings = "miss") changeMissings(gads, "nat_lang_father", value = 3, missings = "miss") # compose variable composeVar(gads, sourceVars = c("nat_lang_child", "nat_lang_father"), primarySourceVar = "nat_lang_child", newVar = "nat_lang_comp")
# example data dat <- data.frame(ID = 1:4, nat_lang_child = c("Engl", "Ger", "missing", "missing"), nat_lang_father = c("Engl", "Engl", "Ger", "missing"), stringsAsFactors = TRUE) gads <- import_DF(dat) changeMissings(gads, "nat_lang_child", value = 3, missings = "miss") changeMissings(gads, "nat_lang_father", value = 3, missings = "miss") # compose variable composeVar(gads, sourceVars = c("nat_lang_child", "nat_lang_father"), primarySourceVar = "nat_lang_child", newVar = "nat_lang_comp")
Convert a character vector, all character variables in a data.frame
or selected variables in a GADSdat
to
upper ("uppper"
), lower ("lower"
), or first letter upper and everything else lower case ("upperFirst"
).
convertCase(x, case = c("lower", "upper", "upperFirst"), ...) ## S3 method for class 'GADSdat' convertCase(x, case = c("lower", "upper", "upperFirst"), vars, ...)
convertCase(x, case = c("lower", "upper", "upperFirst"), ...) ## S3 method for class 'GADSdat' convertCase(x, case = c("lower", "upper", "upperFirst"), vars, ...)
x |
A character vector, |
case |
Character vector of length 1. What case should the strings be converted to? Available options are
|
... |
further arguments passed to or from other methods. |
vars |
Character vector. What variables in the |
Returns the converted object.
convertCase(GADSdat)
: convert case for GADSdats
# for character convertCase(c("Hi", "HEllo", "greaT"), case = "upperFirst") # for GADSdat input_g <- import_DF(data.frame(v1 = 1:3, v2 = c("Hi", "HEllo", "greaT"), stringsAsFactors = FALSE)) convertCase(input_g, case = "upperFirst", vars = "v2")
# for character convertCase(c("Hi", "HEllo", "greaT"), case = "upperFirst") # for GADSdat input_g <- import_DF(data.frame(v1 = 1:3, v2 = c("Hi", "HEllo", "greaT"), stringsAsFactors = FALSE)) convertCase(input_g, case = "upperFirst", vars = "v2")
eatGADS
data base.Creates a relational data base containing hierarchically stored data with meta information (e.g. value and variable labels).
createGADS(allList, pkList, fkList, filePath)
createGADS(allList, pkList, fkList, filePath)
allList |
An object created via |
pkList |
List of primary keys. |
fkList |
List of foreign keys. |
filePath |
Path to the db file to write (including name); has to end on '.db'. |
Uses createDB
from the eatDB
package to create a relational data base. For details on how to define
keys see the documentation of createDB
.
Creates a data base in the given path, returns NULL
.
# see createDB vignette
# see createDB vignette
Extract unique values from one or multiple variables of a GADSdat
object for recoding (e.g. via an Excel spreadsheet).
createLookup(GADSdat, recodeVars, sort_by = NULL, addCols = c("value_new"))
createLookup(GADSdat, recodeVars, sort_by = NULL, addCols = c("value_new"))
GADSdat |
A |
recodeVars |
Character vector of variable names which should be recoded. |
sort_by |
By which column ( |
addCols |
Character vector of additional column names for recoding purposes. |
If recoding of one or multiple variables is more complex, a lookup table can be created for later application via
applyLookup
or applyLookup_expandVar
. The function allows the extraction of the values
of multiple variables and sorting of these unique values via variable
and/or values
.
If addCols
are specified the lookup table has to be formatted via collapseColumns
,
before it can be applied to recode data.
Returns a data frame in long format with the following variables:
variable |
Variables as specified in |
value |
Unique values of the variables specified in |
value_new |
This is the default for |
# create example GADS dat <- data.frame(ID = 1:4, var1 = c(NA, "Eng", "Aus", "Aus2"), var2 = c(NA, "French", "Ger", "Ita"), stringsAsFactors = FALSE) gads <- import_DF(dat) # create Lookup table for recoding lookup <- createLookup(gads, recodeVars = c("var1", "var2"), sort_by = c("value", "variable")) # create Lookup table for recoding by multiple recoders lookup2 <- createLookup(gads, recodeVars = c("var1", "var2"), sort_by = c("value", "variable"), addCols = c("value_recoder1", "value_recoder2"))
# create example GADS dat <- data.frame(ID = 1:4, var1 = c(NA, "Eng", "Aus", "Aus2"), var2 = c(NA, "French", "Ger", "Ita"), stringsAsFactors = FALSE) gads <- import_DF(dat) # create Lookup table for recoding lookup <- createLookup(gads, recodeVars = c("var1", "var2"), sort_by = c("value", "variable")) # create Lookup table for recoding by multiple recoders lookup2 <- createLookup(gads, recodeVars = c("var1", "var2"), sort_by = c("value", "variable"), addCols = c("value_recoder1", "value_recoder2"))
All numerical variables without value labels in a GADSdat
are selected and a data.frame
is created, which allows the specification
of minima and maxima.
createNumCheck(GADSdat)
createNumCheck(GADSdat)
GADSdat |
A |
This function is currently under development.
A data.frame with the following variables:
variable |
All numerical variables in the |
varLabel |
Corresponding variable labels |
min |
Minimum value for the specific variable. |
max |
Maximum value for the specific variable. |
value_new |
Which value should be inserted if values exceed the specified range? |
# tbd
# tbd
Create an empty variable as part of a GADSdat
object.
createVariable(GADSdat, varName, checkVarName = TRUE)
createVariable(GADSdat, varName, checkVarName = TRUE)
GADSdat |
|
varName |
Name of the variable to be cloned. |
checkVarName |
Logical. Should |
Returns a GADSdat
.
# create a new variable pisa_new <- createVariable(pisa, varName = "new_variable")
# create a new variable pisa_new <- createVariable(pisa, varName = "new_variable")
GADSdat
.Drop rows with duplicate IDs in a GADSdat
object based on numbers of missing values.
dropDuplicateIDs(GADSdat, ID, varNames = setdiff(namesGADS(GADSdat), ID))
dropDuplicateIDs(GADSdat, ID, varNames = setdiff(namesGADS(GADSdat), ID))
GADSdat |
A |
ID |
Name of the ID variable. |
varNames |
Character vector of variable names: Sum of missing values on these variables decide which rows are kept. Per default, all variables except the ID variable are used. |
If duplicate IDs occur, it is often desirable to keep the row with the least missing information.
Therefore, dropDuplicateIDs
drops rows based on number of missing values
on the specified variables (varNames
).
If multiple rows have the same number of missing values, a warning is issued and the first of the respective rows is kept.
Returns the GADSdat
with duplicate ID rows removed.
# create example data set gads_ori <- import_DF(data.frame(id_var = c(1, 2, 5, 4, 4), var1 = c(1, 2, -99, 1, -99))) gads_ori <- changeMissings(gads_ori, varName = "var1", value = -99, missings = "miss") # drop duplicate IDs dropDuplicateIDs(gads_ori, ID = "id_var")
# create example data set gads_ori <- import_DF(data.frame(id_var = c(1, 2, 5, 4, 4), var1 = c(1, 2, -99, 1, -99))) gads_ori <- changeMissings(gads_ori, varName = "var1", value = -99, missings = "miss") # drop duplicate IDs dropDuplicateIDs(gads_ori, ID = "id_var")
Convert a set of dummy variables into a set of character variables.
dummies2char(GADSdat, dummies, dummyValues, charNames, checkVarNames = TRUE)
dummies2char(GADSdat, dummies, dummyValues, charNames, checkVarNames = TRUE)
GADSdat |
A |
dummies |
A character vector with the names of the dummy variables. |
dummyValues |
A vector with the values which the dummy variables represent. |
charNames |
A character vector containing the new variable names. |
checkVarNames |
Logical. Should |
A set of dummy variables is transformed to an equal number of character variables.
The character variables are aligned to the left and the remaining character variables are set to NA
.
For each new variable the missing codes of the respective dummy variable are reused.
Returns a GADSdat
.
## create an example GADSdat dummy_df <- data.frame(d1 = c("eng", "no eng", "eng"), d2 = c("french", "french", "no french"), d3 = c("no ger", "ger", "no ger"), stringsAsFactors = TRUE) dummy_g <- import_DF(dummy_df) ## transform dummy variables dummy_g2 <- dummies2char(dummy_g, dummies = c("d1", "d2", "d3"), dummyValues = c("english", "french", "german"), charNames = c("char1", "char2", "char3"))
## create an example GADSdat dummy_df <- data.frame(d1 = c("eng", "no eng", "eng"), d2 = c("french", "french", "no french"), d3 = c("no ger", "ger", "no ger"), stringsAsFactors = TRUE) dummy_g <- import_DF(dummy_df) ## transform dummy variables dummy_g2 <- dummies2char(dummy_g, dummies = c("d1", "d2", "d3"), dummyValues = c("english", "french", "german"), charNames = c("char1", "char2", "char3"))
NA
.Set all values within one or multiple variables to NA
.
emptyTheseVariables(GADSdat, vars, label_suffix = "")
emptyTheseVariables(GADSdat, vars, label_suffix = "")
GADSdat |
A |
vars |
Character vector of variable names which should be set to |
label_suffix |
Suffix added to variable labels for the affected variables in the |
Returns the recoded GADSdat
.
# empty multiple variables pisa2 <- emptyTheseVariables(pisa, vars = c("idstud", "idschool"))
# empty multiple variables pisa2 <- emptyTheseVariables(pisa, vars = c("idstud", "idschool"))
GADSdat
objects are (nearly) equalRun tests to check whether two GADSdat
objects are (nearly) equal. Variable names, number of rows in the data,
meta data and data differences are checked and reported as a list output.
equalGADS( target, current, id = NULL, metaExceptions = c("display_width", "labeled"), tolerance = sqrt(.Machine$double.eps) )
equalGADS( target, current, id = NULL, metaExceptions = c("display_width", "labeled"), tolerance = sqrt(.Machine$double.eps) )
target |
A |
current |
A |
id |
A character vector of length 1 containing the unique identifier column of both |
metaExceptions |
Should certain meta data columns be excluded from the comparison? |
tolerance |
A numeric value greater than or equal to |
More detailed checks for individual variables can be performed via inspectDifferences
and inspectMetaDifferences
.
Returns a list.
GADSdat
to a tibble
haven
's read_spss
stores data together with meta data (e.g. value and variable labels) in a
tibble
with attributes on variable level. This function transforms a GADSdat
object to such a tibble
.
export_tibble(GADSdat)
export_tibble(GADSdat)
GADSdat |
|
This function is mainly intended for internal use. For further documentation see also write_spss
.
Returns a tibble
.
pisa_tbl <- export_tibble(pisa)
pisa_tbl <- export_tibble(pisa)
Extract data.frame
from a GADSdat
object for analyses in R
. Value labels can be
selectively applied via defining convertLabels
and covertVariables
.
For extracting meta data see extractMeta
.
extractData( GADSdat, convertMiss = TRUE, convertLabels = c("character", "factor", "numeric"), convertVariables = NULL, dropPartialLabels = TRUE )
extractData( GADSdat, convertMiss = TRUE, convertLabels = c("character", "factor", "numeric"), convertVariables = NULL, dropPartialLabels = TRUE )
GADSdat |
A |
convertMiss |
Should values tagged as missing values be recoded to |
convertLabels |
If |
convertVariables |
Character vector of variables names, which labels should be applied to.
All other variables remain as numeric variables in the data.
If not specified [default], value labels are applied to all variables for which labels are available.
Variable names not in the actual |
dropPartialLabels |
Should value labels for partially labeled variables be dropped?
If |
A GADSdat
object includes actual data (GADSdat$dat
) and the corresponding meta data information
(GADSdat$labels
). extractData
extracts the data and applies relevant meta data on value level (missing conversion, value labels),
so the data can be used for analyses in R
. Variable labels are retained as label
attributes on column level.
If factor
are extracted via convertLabels == "factor"
, an attempt is made to preserve the underlying integers.
If this is not possible, a warning is issued.
As SPSS
has almost no limitations regarding the underlying values of labeled
integers and R
's factor
format is very strict (no 0
, only integers increasing by + 1
),
this procedure can lead to frequent problems.
Returns a data frame.
# Extract Data for Analysis dat <- extractData(pisa) # convert labeled variables to factors dat <- extractData(pisa, convertLabels = "factor") # convert only some variables to factor, all others remain numeric dat <- extractData(pisa, convertLabels = "factor", convertVariables = c("schtype", "ganztag")) # convert only some variables to character, all others remain numeric dat <- extractData(pisa, convertLabels = "factor", convertVariables = c("schtype", "ganztag")) # schtype is now character table(dat$schtype) # schtype remains numeric table(dat$gender)
# Extract Data for Analysis dat <- extractData(pisa) # convert labeled variables to factors dat <- extractData(pisa, convertLabels = "factor") # convert only some variables to factor, all others remain numeric dat <- extractData(pisa, convertLabels = "factor", convertVariables = c("schtype", "ganztag")) # convert only some variables to character, all others remain numeric dat <- extractData(pisa, convertLabels = "factor", convertVariables = c("schtype", "ganztag")) # schtype is now character table(dat$schtype) # schtype remains numeric table(dat$gender)
Extract data.frame
from a GADSdat
object for analyses in R
. Per default, missing codes are applied but
value labels are dropped. Alternatively, value labels can be selectively applied via
labels2character
, labels2factor
, and labels2ordered
.
For extracting meta data see extractMeta
.
extractData2( GADSdat, convertMiss = TRUE, labels2character = NULL, labels2factor = NULL, labels2ordered = NULL, dropPartialLabels = TRUE )
extractData2( GADSdat, convertMiss = TRUE, labels2character = NULL, labels2factor = NULL, labels2ordered = NULL, dropPartialLabels = TRUE )
GADSdat |
A |
convertMiss |
Should values tagged as missing values be recoded to |
labels2character |
For which variables should values be recoded to their labels? The resulting variables
are of type |
labels2factor |
For which variables should values be recoded to their labels? The resulting variables
are of type |
labels2ordered |
For which variables should values be recoded to their labels? The resulting variables
are of type |
dropPartialLabels |
Should value labels for partially labeled variables be dropped?
If |
A GADSdat
object includes actual data (GADSdat$dat
) and the corresponding meta data information
(GADSdat$labels
). extractData2
extracts the data and applies relevant meta data on value level
(missing tags, value labels),
so the data can be used for analyses in R
. Variable labels are retained as label
attributes on column level.
If factor
are extracted via labels2factor
or labels2ordered
, an attempt is made to preserve the underlying integers.
If this is not possible, a warning is issued.
As SPSS
has almost no limitations regarding the underlying values of labeled
integers and R
's factor
format is very strict (no 0
, only integers increasing by + 1
),
this procedure can lead to frequent problems.
If multiple values of the same variable are assigned the same value label and the variable should be transformed to
character
, factor
, or ordered
, a warning is issued and the transformation is correctly performed.
Returns a data frame.
# Extract Data for Analysis dat <- extractData2(pisa) # convert only some variables to character, all others remain numeric dat <- extractData2(pisa, labels2character = c("schtype", "ganztag")) # convert only some variables to factor, all others remain numeric dat <- extractData2(pisa, labels2factor = c("schtype", "ganztag")) # convert all labeled variables to factors dat <- extractData2(pisa, labels2factor = namesGADS(pisa)) # convert somme variables to factor, some to character dat <- extractData2(pisa, labels2character = c("schtype", "ganztag"), labels2factor = c("migration"))
# Extract Data for Analysis dat <- extractData2(pisa) # convert only some variables to character, all others remain numeric dat <- extractData2(pisa, labels2character = c("schtype", "ganztag")) # convert only some variables to factor, all others remain numeric dat <- extractData2(pisa, labels2factor = c("schtype", "ganztag")) # convert all labeled variables to factors dat <- extractData2(pisa, labels2factor = namesGADS(pisa)) # convert somme variables to factor, some to character dat <- extractData2(pisa, labels2character = c("schtype", "ganztag"), labels2factor = c("migration"))
Support for linking error data bases has been removed from eatGADS
.
extractDataOld
provides (for the time being) backwards compatibility, so linking errors can still be merged automatically.
extractDataOld( GADSdat, convertMiss = TRUE, convertLabels = "character", dropPartialLabels = TRUE, convertVariables = NULL )
extractDataOld( GADSdat, convertMiss = TRUE, convertLabels = "character", dropPartialLabels = TRUE, convertVariables = NULL )
GADSdat |
A |
convertMiss |
Should values coded as missing values be recoded to |
convertLabels |
If |
dropPartialLabels |
Should value labels for partially labeled variables be dropped? If |
convertVariables |
Character vector of variables names, which labels should be applied to. If not specified (default), value labels are applied to all variables for which labels are available. Variable names not in the actual GADS are silently dropped. |
See extractData
for the current functionality.
Returns a data frame.
GADSdat
from all_GADSdat
Function to extract a single GADSdat
from an all_GADSdat
object.
extractGADSdat(all_GADSdat, name)
extractGADSdat(all_GADSdat, name)
all_GADSdat |
|
name |
A character vector with length 1 with the name of the |
GADSdat
objects can be merged into a single all_GADSdat
object via mergeLabels
. This function, performs the
reverse action, extracting a single GADSdat
object.
Returns an GADSdat
object.
# see createGADS vignette
# see createGADS vignette
Extract meta data (e.g. variable and values labels) from an eatGADS
object. This can be a GADSdat
, an all_GADSdat
,
a labels data.frame
, or the path to an existing data base.
extractMeta(GADSobject, vars = NULL)
extractMeta(GADSobject, vars = NULL)
GADSobject |
Either a |
vars |
A character vector containing variable names. If |
Meta data is stored tidily in all GADSdat
objects as a separate long format data frame. This information can be extracted for a single or
multiple variables.
Returns a long format data frame with meta information.
# Extract Meta data from data base db_path <- system.file("extdata", "pisa.db", package = "eatGADS") extractMeta(db_path, vars = c("schtype", "sameteach")) # Extract Meta data from loaded/imported GADS extractMeta(pisa, vars = c("schtype", "sameteach"))
# Extract Meta data from data base db_path <- system.file("extdata", "pisa.db", package = "eatGADS") extractMeta(db_path, vars = c("schtype", "sameteach")) # Extract Meta data from loaded/imported GADS extractMeta(pisa, vars = c("schtype", "sameteach"))
GADSdat
.Extract or remove variables and their meta data from a GADSdat
object.
extractVars(GADSdat, vars) removeVars(GADSdat, vars)
extractVars(GADSdat, vars) removeVars(GADSdat, vars)
GADSdat |
|
vars |
A character vector containing the variables names in the |
Both functions simply perform the variable removal or extraction from the underlying data.frame
in the GADSdat
object followed by calling updateMeta
.
Returns a GADSdat
object.
## create an example GADSdat example_df <- data.frame(ID = 1:4, age = c(12, 14, 16, 13), citizenship1 = c("German", "English", "Polish", "Chinese"), citizenship2 = c(NA, "German", "Chinese", "Polish"), stringsAsFactors = TRUE) gads <- import_DF(example_df) ## remove variables from GADSdat gads2 <- removeVars(gads, vars = c("citizenship2", "age")) ## extract GADSdat with specific variables gads3 <- extractVars(gads, vars = c("ID", "citizenship1"))
## create an example GADSdat example_df <- data.frame(ID = 1:4, age = c(12, 14, 16, 13), citizenship1 = c("German", "English", "Polish", "Chinese"), citizenship2 = c(NA, "German", "Chinese", "Polish"), stringsAsFactors = TRUE) gads <- import_DF(example_df) ## remove variables from GADSdat gads2 <- removeVars(gads, vars = c("citizenship2", "age")) ## extract GADSdat with specific variables gads3 <- extractVars(gads, vars = c("ID", "citizenship1"))
Convert a factor variable with n levels to n dummy variables.
fac2dummies(GADSdat, var)
fac2dummies(GADSdat, var)
GADSdat |
A |
var |
A character vector with the name of the factor variable. |
Newly created variables are named as the original variable with the suffix "_a"
, "_b"
and so on. Variable labels
are created by using the original variable label (if available) and adding the value label of the corresponding level.
All missing codes are forwarded to all dummy variables.
Returns a GADSdat
containing the newly computed variables.
## create an example GADSdat suppressMessages(gads <- import_DF(iris)) ## transform factor variable gads2 <- fac2dummies(gads, var = "Species")
## create an example GADSdat suppressMessages(gads <- import_DF(iris)) ## transform factor variable gads2 <- fac2dummies(gads, var = "Species")
Convert a factor variable with complex factor levels (factor levels contain combinations of other factor levels) to dummy variables.
Dummy variables are coded 1
("yes"
) and 0
("no"
).
fac2dummies_complex(GADSdat, var)
fac2dummies_complex(GADSdat, var)
GADSdat |
A |
var |
A character vector with the name of the factor variable. |
The basic functionality of this function is analogous to fac2dummies
. However, the function expects factor levels to only go
to 9
. Higher numbers are treated as combinations of factor levels, for example "13"
as "1"
and "3"
.
Returns a GADSdat
containing the newly computed variables.
## create an example GADSdat df_fac <- data.frame(id = 1:6, fac = c("Opt a", "Opt c, Opt b", "Opt c", "Opt b", "Opt a, Opt b", "Opt a, Opt b, Opt c"), stringsAsFactors = TRUE) g_fac <- import_DF(df_fac) g_fac <- recodeGADS(g_fac, varName = "fac", oldValues = c(1, 2, 3, 4, 5, 6), newValues = c(1, 12, 123, 2, 3, 23)) ## transform factor variable fac2dummies_complex(g_fac, "fac")
## create an example GADSdat df_fac <- data.frame(id = 1:6, fac = c("Opt a", "Opt c, Opt b", "Opt c", "Opt b", "Opt a, Opt b", "Opt a, Opt b, Opt c"), stringsAsFactors = TRUE) g_fac <- import_DF(df_fac) g_fac <- recodeGADS(g_fac, varName = "fac", oldValues = c(1, 2, 3, 4, 5, 6), newValues = c(1, 12, 123, 2, 3, 23)) ## transform factor variable fac2dummies_complex(g_fac, "fac")
Fill imputed values in a imputed GADSdat_imp
object with original, not imputed values from a GADSdat
.
fillImputations(GADSdat, GADSdat_imp, varName, varName_imp = varName, id, imp)
fillImputations(GADSdat, GADSdat_imp, varName, varName_imp = varName, id, imp)
GADSdat |
A |
GADSdat_imp |
A |
varName |
A character vector of length 1 containing the variable name in |
varName_imp |
A character vector of length 1 containing the variable name in |
id |
A character vector of length 1 containing the unique identifier column of both |
imp |
A character vector of length 1 containing the imputation number in |
This function only fills in missing values in the imputed variable from the not imputed variable. It provides parts
of the functionality of subImputations
but does not check whether values have been mistakenly imputed. However,
performance is increased substantially.
The modified GADSdat_imp
..
# tbd
# tbd
Remove special characters from a character vector or a GADSdat
object.
Also suitable to fix encoding problems of a character vector or a GADSdat
object. See details for available options.
fixEncoding(x, input = c("other", "ASCII", "windows1250", "BRISE"))
fixEncoding(x, input = c("other", "ASCII", "windows1250", "BRISE"))
x |
A character vector or |
input |
Which encoding was used in |
The option "other"
replaces correctly encoded special signs.
The option "ASCII"
works for strings which were encoded presumably using UTF-8
and imported using ASCII
encoding.
The option "windows1250"
works for strings which were encoded presumably using UTF-8
and imported using windows-1250
encoding.
The option "BRISE"
covers a unique case used at the FDZ at IQB
.
If entries are all upper case, special characters are also transformed to all upper case (e.g., "AE"
instead
of "Ae"
).
The modified character vector or GADSdat
object.
fixEncoding(c("\U00C4pfel", "\U00C4PFEL", paste0("\U00DC", "ben"), paste0("\U00DC", "BEN")))
fixEncoding(c("\U00C4pfel", "\U00C4PFEL", paste0("\U00DC", "ben"), paste0("\U00DC", "BEN")))
Function to obtain a data frame from a GADSdat
object for for changes to meta data on variable or on value level.
getChangeMeta(GADSdat, level = "variable")
getChangeMeta(GADSdat, level = "variable")
GADSdat |
|
level |
|
Changes on variable level include variable names (varName
), variable labels (varLabel
),
SPSS format ((format
)) and display width (display_width
).
Changes on value level include values (value
), value labels (valLabel
) and
missing codes (missings
).
Returns the meta data sheet for all variables including the corresponding change columns.
# For changes on variable level varChangeTable <- getChangeMeta(pisa, level = "variable") # For changes on value level valChangeTable <- getChangeMeta(pisa, level = "value")
# For changes on variable level varChangeTable <- getChangeMeta(pisa, level = "variable") # For changes on value level valChangeTable <- getChangeMeta(pisa, level = "value")
Extracts variables from a GADS data base. Only the specified variables are extracted. Note that this selection determines the format of
the data.frame
that is extracted.
getGADS(vSelect = NULL, filePath)
getGADS(vSelect = NULL, filePath)
vSelect |
Character vector of variable names. |
filePath |
Path of the existing |
See createDB
and dbPull
for further explanation of the query and merging processes.
Returns a GADSdat
object.
# Use data base within package db_path <- system.file("extdata", "pisa.db", package = "eatGADS") pisa_gads <- getGADS(db_path, vSelect = c("schtype", "sameteach"))
# Use data base within package db_path <- system.file("extdata", "pisa.db", package = "eatGADS") pisa_gads <- getGADS(db_path, vSelect = c("schtype", "sameteach"))
Extracts variables from a eatGADS
data base. Only the specified variables are extracted. Note that this selection determines the format
of the data.frame
that is extracted. CAREFUL: This function uses a local temporary directory to speed up loading the data base
from a server and caches the data base locally for a running R session. The temporary data base is removed automatically when the
running R
session is terminated.
getGADS_fast(vSelect = NULL, filePath, tempPath = tempdir())
getGADS_fast(vSelect = NULL, filePath, tempPath = tempdir())
vSelect |
Character vector of variable names. |
filePath |
Path of the existing |
tempPath |
Local directory in which the data base can temporarily be stored. Using the default is recommended. |
A random temporary directory is used for caching the data base and is removed, when the R
sessions terminates. See
createDB
and dbPull
for further explanation of the query and merging processes.
Returns a GADSdat
object.
Extracts variables from multiple eatGADS
data bases.
Data can then be extracted from the GADSdat
object via
extractData
. For extracting meta data from a data base or a GADSdat
object see extractMeta
. To speed
up the data loading, getGADS_fast
is used per default.
getTrendGADS( filePaths, vSelect = NULL, years, fast = TRUE, tempPath = tempdir(), verbose = TRUE )
getTrendGADS( filePaths, vSelect = NULL, years, fast = TRUE, tempPath = tempdir(), verbose = TRUE )
filePaths |
Character vectors with paths to the |
vSelect |
Variables from all GADS to be selected (as character vector). |
years |
A numeric vector with identical length as |
fast |
Should |
tempPath |
The directory, in which both GADS will be temporarily stored. Using the default is heavily recommended. |
verbose |
Should the loading process be reported? |
This function extracts data from multiple GADS data bases. All data bases have to be created via
createGADS
. The data bases are joined via rbind()
and a variable year
is added, corresponding to the
argument years
. The GADSdat
object can then further
be used via extractData
. See createDB
and dbPull
for further explanation
of the querying and merging processes.
Returns a GADSdat
object.
# See getGADS vignette
# See getGADS vignette
Support for linking error data bases has been removed from eatGADS
.
getGADSold
provides (for the time being) backwards compatibility, so linking errors can still be extracted automatically.
getTrendGADSOld( filePath1, filePath2, lePath = NULL, vSelect = NULL, years, fast = TRUE, tempPath = tempdir() )
getTrendGADSOld( filePath1, filePath2, lePath = NULL, vSelect = NULL, years, fast = TRUE, tempPath = tempdir() )
filePath1 |
Path of the first |
filePath2 |
Path of the second |
lePath |
Path of the linking error db file. If |
vSelect |
Variables from both GADS to be selected (as character vector). |
years |
A numeric vector of length 2. The first elements corresponds to |
fast |
Should |
tempPath |
The directory, in which both GADS will be temporarily stored. Using the default is heavily recommended. |
See getGADS
for the current functionality.
Returns a GADSdat
object.
# See getGADS vignette
# See getGADS vignette
convertLabel
Function to import a data.frame
object created by convertLabel
for use in eatGADS
. If possible, importing data via import_spss
should always be preferred.
import_convertLabel(df, checkVarNames = TRUE)
import_convertLabel(df, checkVarNames = TRUE)
df |
A |
checkVarNames |
Should variable names be checked for violations of |
convertLabel
from R
package eatAnalysis
converts an object imported via read.spss
(from the foreign
package) to a data.frame
with factors and variable labels stored in variable attributes.
Returns a list with the actual data dat
and a data frame with all meta information in long format labels
.
data.frame
Function to import a data.frame
object for use in eatGADS
while extracting value labels from factors.
import_DF(df, checkVarNames = TRUE)
import_DF(df, checkVarNames = TRUE)
df |
A |
checkVarNames |
Should variable names be checked for violations of |
Factors are integers with labeled variable levels. import_DF
extracts these labels and stores them in a separate meta data data.frame.
See import_spss
for detailed information.
Returns a list with the actual data dat
and a data frame with all meta information in long format labels
.
dat <- import_DF(iris, checkVarNames = FALSE) # Inspect Meta data extractMeta(dat) # Extract Data dat <- extractData(dat, convertLabels = "character")
dat <- import_DF(iris, checkVarNames = FALSE) # Inspect Meta data extractMeta(dat) # Extract Data dat <- extractData(dat, convertLabels = "character")
Function to import a data.frame
object for use in eatGADS
while adding explicit variable and value meta information through
separate data.frames
.
import_raw(df, varLabels, valLabels = NULL, checkVarNames = TRUE)
import_raw(df, varLabels, valLabels = NULL, checkVarNames = TRUE)
df |
A |
varLabels |
A |
valLabels |
A |
checkVarNames |
Should variable names be checked for violations of |
The argument varLables
has to contain exactly two variables, namely varName
and varLabel
. valLables
has
to contain exactly four variables, namely varName
, value
, valLabel
and missings
. The column value
can only contain numerical values. The column missings
can only contain the values "valid"
and "miss"
.
Variables of type factor
are not supported in any of the data.frames
.
Returns a list with the actual data dat
and with all meta information in long format labels
.
dat <- data.frame(ID = 1:5, grade = c(1, 1, 2, 3, 1)) varLabels <- data.frame(varName = c("ID", "grade"), varLabel = c("Person Identifier", "School grade Math"), stringsAsFactors = FALSE) valLabels <- data.frame(varName = c("grade", "grade", "grade"), value = c(1, 2, 3), valLabel = c("very good", "good", "sufficient"), missings = c("valid", "valid", "valid"), stringsAsFactors = FALSE) gads <- import_raw(df = dat, varLabels = varLabels, valLabels = valLabels, checkVarNames = FALSE) # Inspect Meta data extractMeta(gads) # Extract Data dat <- extractData(gads, convertLabels = "character")
dat <- data.frame(ID = 1:5, grade = c(1, 1, 2, 3, 1)) varLabels <- data.frame(varName = c("ID", "grade"), varLabel = c("Person Identifier", "School grade Math"), stringsAsFactors = FALSE) valLabels <- data.frame(varName = c("grade", "grade", "grade"), value = c(1, 2, 3), valLabel = c("very good", "good", "sufficient"), missings = c("valid", "valid", "valid"), stringsAsFactors = FALSE) gads <- import_raw(df = dat, varLabels = varLabels, valLabels = valLabels, checkVarNames = FALSE) # Inspect Meta data extractMeta(gads) # Extract Data dat <- extractData(gads, convertLabels = "character")
Function to create a GADSdat
object based on a dat
data.frame
and a labels
data.frame
.
import_raw2(dat, labels)
import_raw2(dat, labels)
dat |
A |
labels |
A |
A GADSdat
is basically a list
with two elements: a dat
and a labels
data.frame
. If these elements are
separated, they can be cleanly tied together again by import_raw2
. The function performs extensive checks on the integrity of the
resulting GADSdat
object. See import_spss
and import_raw
for further details.
Returns a GADSdat
object.
dat <- data.frame(ID = 1:5, grade = c(1, 1, 2, 3, 1)) varLabels <- data.frame(varName = c("ID", "grade"), varLabel = c("Person Identifier", "School grade Math"), stringsAsFactors = FALSE) valLabels <- data.frame(varName = c("grade", "grade", "grade"), value = c(1, 2, 3), valLabel = c("very good", "good", "sufficient"), missings = c("valid", "valid", "valid"), stringsAsFactors = FALSE) gads <- import_raw(df = dat, varLabels = varLabels, valLabels = valLabels, checkVarNames = FALSE) # separate the GADSdat object dat <- gads$dat labels <- gads$labels # rejoin it dat <- import_raw2(dat, labels)
dat <- data.frame(ID = 1:5, grade = c(1, 1, 2, 3, 1)) varLabels <- data.frame(varName = c("ID", "grade"), varLabel = c("Person Identifier", "School grade Math"), stringsAsFactors = FALSE) valLabels <- data.frame(varName = c("grade", "grade", "grade"), value = c(1, 2, 3), valLabel = c("very good", "good", "sufficient"), missings = c("valid", "valid", "valid"), stringsAsFactors = FALSE) gads <- import_raw(df = dat, varLabels = varLabels, valLabels = valLabels, checkVarNames = FALSE) # separate the GADSdat object dat <- gads$dat labels <- gads$labels # rejoin it dat <- import_raw2(dat, labels)
RDS
fileFunction to import a data.frame
stored as a .RDS
file while extracting value labels from factors.
import_RDS(filePath, checkVarNames = TRUE)
import_RDS(filePath, checkVarNames = TRUE)
filePath |
Source file location, ending on |
checkVarNames |
Should variable names be checked for violations of |
Factors are integers with labeled variable levels. import_RDS
extracts these labels and stores them in a separate meta data data.frame.
See import_DF
for detailed information. This function is a wrapper around import_DF
.
Returns a list with the actual data dat
and a data frame with all meta information in long format labels
.
Function to import .sav
files while extracting meta information, e.g. variable and value labels.
import_spss( filePath, checkVarNames = TRUE, labeledStrings = c("drop", "keep", "transform"), encoding = NULL )
import_spss( filePath, checkVarNames = TRUE, labeledStrings = c("drop", "keep", "transform"), encoding = NULL )
filePath |
Source file location, ending on |
checkVarNames |
Should variable names be checked for violations of |
labeledStrings |
Should strings as labeled values be allowed? If |
encoding |
The character encoding used for the file. The default, |
SPSS files (.sav
) store variable and value labels and assign specific formatting to variables. import_spss
imports
data from SPSS, while storing this meta-information separately in a long format data frame. Value labels and missing labels are used
to identify missing values (see checkMissings
). Time and date variables are converted to character.
In some special cases, .sav
files seem to consist of a mix of different encoding types. In such cases, haven
might
throw an error if the encoding argument is not specified or UTF-8
is selected as encoding. To circumvent this problem we
recommend using encoding = "ASCII"
and fixing the resulting issues manually. For example, fixEncoding
provides some fixes for encoding issues specific to the German language.
Returns a list with the actual data dat
and a data frame with all meta information in long format labels
.
# Use spss data from within package spss_path <- system.file("extdata", "pisa.zsav", package = "eatGADS") pisa_gads <- import_spss(spss_path)
# Use spss data from within package spss_path <- system.file("extdata", "pisa.zsav", package = "eatGADS") pisa_gads <- import_spss(spss_path)
Stata
dataFunction to import .dta
files while extracting meta information, e.g. variable and value labels.
import_stata(filePath, checkVarNames = TRUE, labeledStrings = FALSE)
import_stata(filePath, checkVarNames = TRUE, labeledStrings = FALSE)
filePath |
Source file location, ending on |
checkVarNames |
Should variable names be checked for violations of |
labeledStrings |
Should strings as labeled values be allowed? This possibly corrupts all labeled values. |
Stata
files (.dta
) store variable and value labels and assign specific formatting to variables. import_stata
imports
data from Stata
, while storing this meta-information separately in a long format data frame. Time and date variables are converted to character.
Returns a list with the actual data dat
and a data frame with all meta information in long format labels
.
tibble
Function to import a tibble
while extracting meta information, e.g. variable and value labels.
import_tibble( tibble, checkVarNames = TRUE, labeledStrings = c("drop", "keep", "transform") )
import_tibble( tibble, checkVarNames = TRUE, labeledStrings = c("drop", "keep", "transform") )
tibble |
A |
checkVarNames |
Should variable names be checked for violations of |
labeledStrings |
Should strings as labeled values be allowed? If |
Tibbles
may store variable and value labels as well as missing tags via the labelled
class. import_tibble
restructures this meta information separately in a long format data.frame
. Value labels and missing tags are used
to identify missing tags (see checkMissings
). Time and date variables are converted to character.
Returns a list with the actual data dat
and a data frame with all meta information in long format labels
.
# Use spss data from within package spss_path <- system.file("extdata", "pisa.zsav", package = "eatGADS") pisa_gads <- import_spss(spss_path)
# Use spss data from within package spss_path <- system.file("extdata", "pisa.zsav", package = "eatGADS") pisa_gads <- import_spss(spss_path)
GADSdat
.Deprecated. Please use relocateVariable
instead.
insertVariable(GADSdat, var, after = NULL)
insertVariable(GADSdat, var, after = NULL)
GADSdat |
A |
var |
Character string of the variable name which should be sorted. |
after |
Character string of the variable name after which |
Inspect differences within a single GADSdat
or between two GADSdat
objects for a specific variable.
inspectDifferences( GADSdat, varName, other_GADSdat = GADSdat, other_varName = varName, id )
inspectDifferences( GADSdat, varName, other_GADSdat = GADSdat, other_varName = varName, id )
GADSdat |
A |
varName |
A character vector of length 1 containing the variable name. |
other_GADSdat |
A second |
other_varName |
A character vector of length 1 containing the other variable name.
If omitted, it is assumed that both variables have identical names (as supplied in |
id |
A character vector of length 1 containing the unique identifier column of both |
Two GADSdat
objects can be compared using equalGADS
.
If differences in the data for specific variables in the two objects occur,
these variables can be further inspected using inspectDifferences
.
Differences on meta data-level can be inspected via inspectMetaDifferences
.
A list.
# create a second GADS with different data pisa2 <- pisa pisa2$dat$age[400:nrow(pisa$dat)] <- sample(pisa2$dat$age[400:nrow(pisa$dat)]) # inspect via equalGADS() equalGADS(pisa, pisa2) # inspect via inspectDifferences() inspectDifferences(GADSdat = pisa, varName = "age", other_GADSdat = pisa2, id = "idstud")
# create a second GADS with different data pisa2 <- pisa pisa2$dat$age[400:nrow(pisa$dat)] <- sample(pisa2$dat$age[400:nrow(pisa$dat)]) # inspect via equalGADS() equalGADS(pisa, pisa2) # inspect via inspectDifferences() inspectDifferences(GADSdat = pisa, varName = "age", other_GADSdat = pisa2, id = "idstud")
Inspect meta data differences within a single GADSdat
or between two GADSdat
objects
or GADSdat
data bases regarding a specific variable.
inspectMetaDifferences( GADSdat, varName, other_GADSdat = GADSdat, other_varName = varName )
inspectMetaDifferences( GADSdat, varName, other_GADSdat = GADSdat, other_varName = varName )
GADSdat |
A |
varName |
A character vector of length 1 containing the variable name. |
other_GADSdat |
A second |
other_varName |
A character vector of length 1 containing the other variable name.
If omitted, it is assumed that both variables have identical names (as supplied in |
Two GADSdat
objects can be compared using equalGADS
.
If meta data differences for specific variables in the two objects occur,
these variables can be further inspected using inspectMetaDifferences
.
For data-level differences for a specific variable, see inspectDifferences
.
A list.
# create a second GADS with different meta data pisa2 <- pisa pisa2 <- changeVarLabels(pisa2, varName = "sameteach", varLabel = "Same math teacher") pisa2 <- recodeGADS(pisa2, varName = "sameteach", oldValues = c(1, 2), newValues = c(0, 1)) # inspect via equalGADS() equalGADS(pisa, pisa2) # inspect via inspectMetaDifferences() inspectMetaDifferences(GADSdat = pisa, varName = "sameteach", other_GADSdat = pisa2)
# create a second GADS with different meta data pisa2 <- pisa pisa2 <- changeVarLabels(pisa2, varName = "sameteach", varLabel = "Same math teacher") pisa2 <- recodeGADS(pisa2, varName = "sameteach", oldValues = c(1, 2), newValues = c(0, 1)) # inspect via equalGADS() equalGADS(pisa, pisa2) # inspect via inspectMetaDifferences() inspectMetaDifferences(GADSdat = pisa, varName = "sameteach", other_GADSdat = pisa2)
eatGADS
data base.Returns the variable and value labels of all variables in the eatGADS
data base.
labelsGADS(filePath)
labelsGADS(filePath)
filePath |
Path of the existing |
Variable, value and missing labels as stored in the original SPSS-files and factors from R files are converted to long format for
storage in the data base. labelsGADS
returns them as a long format data frame.
Returns a long format data frame including variable names, labels, values, value labels and missing labels.
# Extract Meta data from data base db_path <- system.file("extdata", "pisa.db", package = "eatGADS") metaData <- labelsGADS(db_path)
# Extract Meta data from data base db_path <- system.file("extdata", "pisa.db", package = "eatGADS") metaData <- labelsGADS(db_path)
Using variable labels, matchValues_varLabels
matches a vector of regular expressions to a set of variable names.
matchValues_varLabels(GADSdat, mc_vars, values, label_by_hand = character(0))
matchValues_varLabels(GADSdat, mc_vars, values, label_by_hand = character(0))
GADSdat |
A |
mc_vars |
A vector containing the names of the variables, which should be matched according to their variable labels. |
values |
A character vector containing the regular expressions for which the |
label_by_hand |
Additional value - |
Multiple choice items can be stored as multiple dichotomous variables with the information about the variable
stored in the variable labels. The function collapseMultiMC_Text
can be used to collapse such dichotomous
variables and a character variable, but requires a character vector with variables names of the multiple choice variables.
matchValues_varLabels
creates such a vector based on matching regular expressions (values
) to variable labels.
Note that all variables in mc_vars
have to be assigned exactly one value (and vice versa).
If a variable name is missing in the output,
an error will be thrown. In this case, the label_by_hand
argument should be used to specify the regular expression
variable name pair manually.
Returns a named character vector. Values of the vector are the variable names in the GADSdat
, names of the vector
are the regular expressions.
# Prepare example data mt2 <- data.frame(ID = 1:4, mc1 = c(1, 0, 0, 0), mc2 = c(0, 0, 0, 0), mc3 = c(0, 1, 1, 0), text1 = c(NA, "Eng", "Aus", "Aus2"), text2 = c(NA, "Franz", NA, NA), stringsAsFactors = FALSE) mt2_gads <- import_DF(mt2) mt3_gads <- changeVarLabels(mt2_gads, varName = c("mc1", "mc2", "mc3"), varLabel = c("Lang: Eng", "Aus spoken", "other")) out <- matchValues_varLabels(mt3_gads, mc_vars = c("mc1", "mc2", "mc3"), values = c("Aus", "Eng", "Eng"), label_by_hand = c("other" = "mc3"))
# Prepare example data mt2 <- data.frame(ID = 1:4, mc1 = c(1, 0, 0, 0), mc2 = c(0, 0, 0, 0), mc3 = c(0, 1, 1, 0), text1 = c(NA, "Eng", "Aus", "Aus2"), text2 = c(NA, "Franz", NA, NA), stringsAsFactors = FALSE) mt2_gads <- import_DF(mt2) mt3_gads <- changeVarLabels(mt2_gads, varName = c("mc1", "mc2", "mc3"), varLabel = c("Lang: Eng", "Aus spoken", "other")) out <- matchValues_varLabels(mt3_gads, mc_vars = c("mc1", "mc2", "mc3"), values = c("Aus", "Eng", "Eng"), label_by_hand = c("other" = "mc3"))
GADSdat
objects into a single GADSdat
object.Is a secure way to merge the data and the meta data of two GADSdat
objects. Currently, only limited merging options are supported.
## S3 method for class 'GADSdat' merge(x, y, by, all = TRUE, all.x = all, all.y = all, ...)
## S3 method for class 'GADSdat' merge(x, y, by, all = TRUE, all.x = all, all.y = all, ...)
x |
|
y |
|
by |
A character vector. |
all |
A character vector (either a full join or an inner join). |
all.x |
See merge. |
all.y |
See merge. |
... |
Further arguments are currently not supported but have to be included for |
If there are duplicate variables (except the variables specified in the by
argument), these variables are removed from y.
The meta data is joined for the remaining variables via rbind
.
Returns a GADSdat
object.
Transform multiple GADSdat
objects into a list ready for data base creation.
mergeLabels(...)
mergeLabels(...)
... |
|
The function createGADS
takes multiple GADSdat
objects as input. The function preserves the ordering
in which the objects are supplied, which is then used for the merging order in createGADS
. Additionally,
the separate lists of meta information for each GADSdat
are merged and a data frame unique identifier is added.
Returns an all_GADSdat
object, which consists of list with a list of all data frames "datList"
and a single data frame containing all meta data information "allLabels"
.
# see createGADS vignette
# see createGADS vignette
NA
Recode Missings to NA
according to missing labels in label data.frame
.
miss2NA(GADSdat)
miss2NA(GADSdat)
GADSdat |
A |
Missings are imported as their values via import_spss
. Using the value labels in the labels data.frame
,
miss2NA
recodes these missings codes to NA
. This function is mainly intended for internal use.
Returns a data.frame
with NA
instead of missing codes.
Convert one or multiple character variables to factors. If multiple variables are converted, a common set of value labels is created, which is identical across variables. Existing value labels are preserved.
multiChar2fac( GADSdat, vars, var_suffix = "_r", label_suffix = "(recoded)", convertCases = NULL )
multiChar2fac( GADSdat, vars, var_suffix = "_r", label_suffix = "(recoded)", convertCases = NULL )
GADSdat |
A |
vars |
A character vector with all variables that should be transformed to factor. |
var_suffix |
Variable suffix for the newly created |
label_suffix |
Suffix added to variable label for the newly created variable in the |
convertCases |
Should cases be transformed for all variables? Default |
If a set of variables has the same possible values, it is desirable that these variables share the same
value labels, even if some of the values do not occur on the individual variables. This function allows
the transformation of multiple character variables to factors while assimilating the value labels.
The SPSS format of the newly created variables is set to F10.0
.
A current limitation of the function is that prior to the conversion, all variables specified in vars
must have identical
meta data on value level (value labels and missing tags).
If necessary, missing codes can be set after transformation via checkMissings
for setting missing codes
depending on value labels for all variables or
changeMissings
for setting missing codes for specific values in a specific variable.
The argument convertCases
uses the function convertCase
internally. See the respective documentation for more details.
Returns a GADSdat
containing the newly computed variable.
## create an example GADSdat example_df <- data.frame(ID = 1:4, citizenship1 = c("German", "English", "missing by design", "Chinese"), citizenship2 = c("missing", "German", "missing by design", "Polish"), stringsAsFactors = FALSE) gads <- import_DF(example_df) ## transform one character variable gads2 <- multiChar2fac(gads, vars = "citizenship1") ## transform multiple character variables gads2 <- multiChar2fac(gads, vars = c("citizenship1", "citizenship2")) ## set values to missings gads3 <- checkMissings(gads2, missingLabel = "missing")
## create an example GADSdat example_df <- data.frame(ID = 1:4, citizenship1 = c("German", "English", "missing by design", "Chinese"), citizenship2 = c("missing", "German", "missing by design", "Polish"), stringsAsFactors = FALSE) gads <- import_DF(example_df) ## transform one character variable gads2 <- multiChar2fac(gads, vars = "citizenship1") ## transform multiple character variables gads2 <- multiChar2fac(gads, vars = c("citizenship1", "citizenship2")) ## set values to missings gads3 <- checkMissings(gads2, missingLabel = "missing")
Variables names of a GADSdat
object, a all_GADSdat
object or a eatGADS
data base.
namesGADS(GADS)
namesGADS(GADS)
GADS |
A |
If the function is applied to a GADSdat
object, a character vector with all variable names is returned. If the function is
applied to a all_GADSdat
object or to the path of a eatGADS
data base, a named list is returned. Each list entry
represents a data table in the object.
Returns a character vector or a named list of character vectors.
# Extract variable names from data base db_path <- system.file("extdata", "pisa.db", package = "eatGADS") namesGADS(db_path) # Extract variable names from loaded/imported GADS namesGADS(pisa)
# Extract variable names from data base db_path <- system.file("extdata", "pisa.db", package = "eatGADS") namesGADS(db_path) # Extract variable names from loaded/imported GADS namesGADS(pisa)
GADSdat
.Order the variables in a GADSdat
according to a character vector. If there are discrepancies between the two sets, a warning is issued.
orderLike(GADSdat, newOrder)
orderLike(GADSdat, newOrder)
GADSdat |
A |
newOrder |
A character vector containing the order of variables. |
The variables in the dat
and in the labels
section are ordered. Variables not contained in the character vector are moved to the end of the data.
Returns a GADSdat
object.
A small example data set from the German PISA Plus campus files as distributed by the Forschungsdatenzentrum, IQB
.
pisa
pisa
A data.frame with 500 rows and 133 variables, including:
Person ID variable
School ID variable
School type
Research Data Center at the Institute for Educational Quality Improvement (2020). Programme for International Student Assessment - Plus 2012, 2013 (PISA Plus 2012-2013) - Campus File (Version 1) [Data set]. Berlin: Institute for Educational Quality Improvement. doi:10.5159/IQB_PISA_Plus_2012-13_CF_v1
NA
.Recode multiple values in multiple variables in a GADSdat
to NA
.
recode2NA(GADSdat, recodeVars = namesGADS(GADSdat), value = "")
recode2NA(GADSdat, recodeVars = namesGADS(GADSdat), value = "")
GADSdat |
A |
recodeVars |
Character vector of variable names which should be recoded. |
value |
Which values should be recoded to |
If there are value labels given to the specified value, a warning is issued. Number of recodes per variable are reported.
If a data set is imported from .sav
, character variables frequently contain empty strings. Especially if parts of the
data are written to .xlsx
, this can cause problems (e.g. as look up tables from createLookup
),
as most function which write to .xlsx
convert empty strings to NAs
. recodeString2NA
can be
used to recode all empty strings to NA
beforehand.
Returns the recoded GADSdat
.
# create example GADS dat <- data.frame(ID = 1:4, var1 = c("", "Eng", "Aus", "Aus2"), var2 = c("", "French", "Ger", "Ita"), stringsAsFactors = FALSE) gads <- import_DF(dat) # recode empty strings gads2 <- recode2NA(gads) # recode numeric value gads3 <- recode2NA(gads, recodeVars = "ID", value = 1:3)
# create example GADS dat <- data.frame(ID = 1:4, var1 = c("", "Eng", "Aus", "Aus2"), var2 = c("", "French", "Ger", "Ita"), stringsAsFactors = FALSE) gads <- import_DF(dat) # recode empty strings gads2 <- recode2NA(gads) # recode numeric value gads3 <- recode2NA(gads, recodeVars = "ID", value = 1:3)
Recode one or multiple variables as part of a GADSdat
or all_GADSdat
object.
recodeGADS( GADSdat, varName, oldValues, newValues, existingMeta = c("stop", "value", "value_new", "drop", "ignore") )
recodeGADS( GADSdat, varName, oldValues, newValues, existingMeta = c("stop", "value", "value_new", "drop", "ignore") )
GADSdat |
|
varName |
Name of the variable to be recoded. |
oldValues |
Vector containing the old values. |
newValues |
Vector containing the new values (in the respective order as |
existingMeta |
If values are recoded, which meta data should be used (see details)? |
Applied to a GADSdat
or all_GADSdat
object, this function is a wrapper of getChangeMeta
and applyChangeMeta
. Beyond that, unlabeled variables and values are recoded as well.
oldValues
and newValues
are matched by ordering in the function call.
If changes are performed on value levels, recoding into existing values can occur.
In these cases, existingMeta
determines how the resulting meta data conflicts are handled,
either raising an error if any occur ("stop"
),
keeping the original meta data for the value ("value"
),
using the meta data in the changeTable
and, if incomplete, from the recoded value ("value_new"
),
or leaving the respective meta data untouched ("ignore"
).
Furthermore, one might recode multiple old values in the same new value. This is currently only possible with
existingMeta = "drop"
, which drops all related meta data on value level, or
existingMeta = "ignore"
, which leaves all related meta data on value level untouched.
Missing values (NA
) are supported in oldValues
but not in newValues
. For recoding values to
NA
see recode2NA
instead.
For recoding character variables, using lookup tables via createLookup
is recommended. For changing
value labels see changeValLabels
.
Returns a GADSdat
.
# Example gads example_df <- data.frame(ID = 1:5, color = c("blue", "blue", "green", "other", "other"), animal = c("dog", "Dog", "cat", "hors", "horse"), age = c(NA, 16, 15, 23, 50), stringsAsFactors = FALSE) example_df$animal <- as.factor(example_df$animal) gads <- import_DF(example_df) # simple recode gads2 <- recodeGADS(gads, varName = "animal", oldValues = c(3, 4), newValues = c(7, 8))
# Example gads example_df <- data.frame(ID = 1:5, color = c("blue", "blue", "green", "other", "other"), animal = c("dog", "Dog", "cat", "hors", "horse"), age = c(NA, 16, 15, 23, 50), stringsAsFactors = FALSE) example_df$animal <- as.factor(example_df$animal) gads <- import_DF(example_df) # simple recode gads2 <- recodeGADS(gads, varName = "animal", oldValues = c(3, 4), newValues = c(7, 8))
NAs
to Missing.Recode NAs
in multiple variables in a GADSdat
to a numeric value with a value label and a missing tag.
recodeNA2missing( GADSdat, recodeVars = namesGADS(GADSdat), value = -99, valLabel = "missing" )
recodeNA2missing( GADSdat, recodeVars = namesGADS(GADSdat), value = -99, valLabel = "missing" )
GADSdat |
A |
recodeVars |
Character vector of variable names which should be recoded. |
value |
Which value should |
valLabel |
Which value label should |
The value label and missing tag are only added to variables which contain NAs
and which have been recoded.
If a variable has an existing value label for value
, the existing value label is overwritten and a missing tag is added.
A corresponding warning is issued.
Returns the recoded GADSdat
.
# create example GADS dat <- data.frame(ID = 1:4, age = c(NA, 18, 21, 23), siblings = c(0, 2, NA, NA)) gads <- import_DF(dat) # recode NAs gads2 <- recodeNA2missing(gads)
# create example GADS dat <- data.frame(ID = 1:4, age = c(NA, 18, 21, 23), siblings = c(0, 2, NA, NA)) gads <- import_DF(dat) # recode NAs gads2 <- recodeNA2missing(gads)
NA
.Deprecated, use recode2NA
instead..
recodeString2NA(GADSdat, recodeVars = namesGADS(GADSdat), string = "")
recodeString2NA(GADSdat, recodeVars = namesGADS(GADSdat), string = "")
GADSdat |
A |
recodeVars |
Character vector of variable names which should be recoded. |
string |
Which string should be recoded to |
Returns the recoded GADSdat
.
GADSdat
.Reorder a single variable in a GADSdat
. The variable (var
) can be inserted right after another variable (after
) or at the beginning
of the GADSdat
via after = NULL
.
relocateVariable(GADSdat, var, after = NULL)
relocateVariable(GADSdat, var, after = NULL)
GADSdat |
A |
var |
Character string of the variable name which should be sorted. |
after |
Character string of the variable name after which |
The variables in the dat
and in the labels
section are ordered. For reordering the whole GADSdat
, see
orderLike
.
Returns a GADSdat
object.
# Insert variable 'migration' after variable 'idclass' pisa2 <- relocateVariable(pisa, var = "migration", after = "idclass") # Insert variable 'idclass' at the beginning of the data set pisa2 <- relocateVariable(pisa, var = "idclass", after = NULL)
# Insert variable 'migration' after variable 'idclass' pisa2 <- relocateVariable(pisa, var = "migration", after = "idclass") # Insert variable 'idclass' at the beginning of the data set pisa2 <- relocateVariable(pisa, var = "idclass", after = NULL)
Shorten text variables from a certain number on while coding overflowing answers as complete missings.
remove2NAchar(GADSdat, vars, max_num = 2, na_value, na_label)
remove2NAchar(GADSdat, vars, max_num = 2, na_value, na_label)
GADSdat |
A |
vars |
A character vector with the names of the text variables. |
max_num |
Maximum number of text variables. Additional text variables will be removed and NA codes given accordingly. |
na_value |
Which NA value should be given in cases of too many values on text variables. |
na_label |
Which value label should be given to the |
In some cases, multiple text variables contain the information of one variable (e.g. multiple answers to an open item).
If this is a case, sometimes the number text variables displaying this variable should be limited. remove2NAchar
allows shortening multiple character variables, this means character variables after max_num
are removed
from the GADSdat
. Cases, which had valid responses on these removed variables are coded as missings (using
na_value
and na_label
).
Returns the modified GADSdat
.
## create an example GADSdat example_df <- data.frame(ID = 1:4, citizenship1 = c("German", "English", "missing by design", "Chinese"), citizenship2 = c(NA, "German", "missing by design", "Polish"), citizenship3 = c(NA, NA, NA, "German"), stringsAsFactors = FALSE) gads <- import_DF(example_df) ## shorten character variables gads2 <- remove2NAchar(gads, vars = c("citizenship1", "citizenship2", "citizenship3"), na_value = -99, na_label = "missing: too many answers")
## create an example GADSdat example_df <- data.frame(ID = 1:4, citizenship1 = c("German", "English", "missing by design", "Chinese"), citizenship2 = c(NA, "German", "missing by design", "Polish"), citizenship3 = c(NA, NA, NA, "German"), stringsAsFactors = FALSE) gads <- import_DF(example_df) ## shorten character variables gads2 <- remove2NAchar(gads, vars = c("citizenship1", "citizenship2", "citizenship3"), na_value = -99, na_label = "missing: too many answers")
Remove unused value labels and missing tags of a variable as part of a GADSdat
object.
removeEmptyValLabels(GADSdat, vars, whichValLabels = c("miss", "valid", "all"))
removeEmptyValLabels(GADSdat, vars, whichValLabels = c("miss", "valid", "all"))
GADSdat |
|
vars |
Character string of variable names. |
whichValLabels |
Should unused missing value tags and labels ( |
Returns the GADSdat
object with changed meta data.
gads <- import_DF(data.frame(v1 = 1)) gads <- changeMissings(gads, varName = "v1", value = c(-99, -98), missings = c("miss", "miss")) gads <- changeValLabels(gads, varName = "v1", value = c(-99), valLabel = c("not reached")) gads2 <- removeEmptyValLabels(gads, vars = "v1")
gads <- import_DF(data.frame(v1 = 1)) gads <- changeMissings(gads, varName = "v1", value = c(-99, -98), missings = c("miss", "miss")) gads <- changeValLabels(gads, varName = "v1", value = c(-99), valLabel = c("not reached")) gads2 <- removeEmptyValLabels(gads, vars = "v1")
Remove meta data for specific values (value
) of a single variable (varName
).
This includes value labels and missings tags.
removeValLabels(GADSdat, varName, value, valLabel = NULL)
removeValLabels(GADSdat, varName, value, valLabel = NULL)
GADSdat |
|
varName |
Character string of a variable name. |
value |
Numeric values. |
valLabel |
[optional] Regular expressions in the value labels corresponding to |
If the argument valLabel
is provided the function checks for value
and valLabel
pairs in the
meta data that match both arguments.
Returns the GADSdat
object with changed meta data.
# Remove a label based on value extractMeta(pisa, "schtype") pisa2 <- removeValLabels(pisa, varName = "schtype", value = 1) extractMeta(pisa2, "schtype") # Remove multiple labels based on value extractMeta(pisa, "schtype") pisa3 <- removeValLabels(pisa, varName = "schtype", value = 1:3) extractMeta(pisa3, "schtype") # Remove multiple labels based on value - valLabel combination extractMeta(pisa, "schtype") pisa4 <- removeValLabels(pisa, varName = "schtype", value = 1:3, valLabel = c("Gymnasium", "other", "several courses")) extractMeta(pisa4, "schtype")
# Remove a label based on value extractMeta(pisa, "schtype") pisa2 <- removeValLabels(pisa, varName = "schtype", value = 1) extractMeta(pisa2, "schtype") # Remove multiple labels based on value extractMeta(pisa, "schtype") pisa3 <- removeValLabels(pisa, varName = "schtype", value = 1:3) extractMeta(pisa3, "schtype") # Remove multiple labels based on value - valLabel combination extractMeta(pisa, "schtype") pisa4 <- removeValLabels(pisa, varName = "schtype", value = 1:3, valLabel = c("Gymnasium", "other", "several courses")) extractMeta(pisa4, "schtype")
GADSdat
.Transfer meta information from one GADSdat
to another for one or multiple variables.
reuseMeta( GADSdat, varName, other_GADSdat, other_varName = NULL, missingLabels = NULL, addValueLabels = FALSE )
reuseMeta( GADSdat, varName, other_GADSdat, other_varName = NULL, missingLabels = NULL, addValueLabels = FALSE )
GADSdat |
|
varName |
Character vector with the names of the variables that should get the new meta data. |
other_GADSdat |
|
other_varName |
Character vector with the names of the variables in |
missingLabels |
How should meta data for missing values be treated? If |
addValueLabels |
Should only value labels be added and all other meta information retained? |
Transfer of meta information can mean substituting the complete meta information, only adding value labels, adding only
"valid"
or adding only "miss"
missing labels.
See the arguments missingLabels
and addValueLabels
for further details.
Returns the original object with updated meta data.
# see createGADS vignette
# see createGADS vignette
GADSdat
into hierarchy levels.Split a GADSdat
into multiple, specified hierarchical levels.
splitGADS(GADSdat, nameList)
splitGADS(GADSdat, nameList)
GADSdat |
A |
nameList |
A list of character vectors. The names in the list correspond the the hierarchy levels. |
The function takes a GADSdat
object and splits it into its desired hierarchical levels (a all_GADSdat
object).
Hierarchy level of a variable is also accessible in the meta data via the column data_table
. If not all variable names
are included in the nameList
, the missing variables will be dropped.
Returns an all_GADSdat
object, which consists of list with a list of all data frames "datList"
and
a single data frame containing all meta data information "allLabels"
. For more details see also mergeLabels
.
# see createGADS vignette
# see createGADS vignette
Transform a string variable within a GADSdat
or all_GADSdat
object to a numeric variable.
stringAsNumeric(GADSdat, varName)
stringAsNumeric(GADSdat, varName)
GADSdat |
|
varName |
Character string of a variable name. |
Applied to a GADSdat
or all_GADSdat
object, this function uses asNumericIfPossible
to
change the variable class and changes the format
column in the meta data.
Returns the GADSdat
object with with the changed variable.
Substitute imputed values in a imputed GADSdat_imp
object with original, not imputed values from a GADSdat
.
subImputations(GADSdat, GADSdat_imp, varName, varName_imp = varName, id, imp)
subImputations(GADSdat, GADSdat_imp, varName, varName_imp = varName, id, imp)
GADSdat |
A |
GADSdat_imp |
A |
varName |
A character vector of length 1 containing the variable name in |
varName_imp |
A character vector of length 1 containing the variable name in |
id |
A character vector of length 1 containing the unique identifier column of both |
imp |
A character vector of length 1 containing the imputation number in |
There are two cases in which values are substituted: (a) there are missings in varName_imp
, (b) values have been imputed
even though there is valid information in varName
.
The modified GADSdat_imp
..
# tbd
# tbd
Update the meta data of a GADSdat
or all_GADSdat
object according to the variables in a new data object.
updateMeta(GADSdat, newDat, checkVarNames = TRUE)
updateMeta(GADSdat, newDat, checkVarNames = TRUE)
GADSdat |
|
newDat |
|
checkVarNames |
Logical. Should new variable names be checked by |
If the data of a GADSdat
or a all_GADSdat
has changed (supplied via newDat
), updateMeta
assimilates the corresponding meta data set. If variables have been removed, the corresponding meta data is also removed.
If variables have been added, empty meta data is added for these variables. Factors are transformed to numerical
and their levels added to the meta data set.
Returns the original object with updated meta data (and removes factors from the data).
# see createGADS vignette
# see createGADS vignette
GADSdat
object to a fileWrite a GADSdat
object, which contains meta information as value and variable labels to an SPSS
file (sav
)
or Stata
file (dta
).
See 'details' for some important limitations.
write_spss(GADSdat, filePath) write_stata(GADSdat, filePath)
write_spss(GADSdat, filePath) write_stata(GADSdat, filePath)
GADSdat |
A |
filePath |
Path of |
The provided functionality relies on havens
write_sav
and
write_dta
functions.
Currently known limitations for write_spss
are:
a) value labels for long character variables (> A10
) are dropped,
b) under specific conditions very long character variables (> A254
) are incorrectly
displayed as multiple character variables in SPSS
,
c) exporting date or time variables is currently not supported,
d) missing tags are slightly incompatible between SPSS
and eatGADS
as eatGADS
supports unlimited discrete missing tags (but no range of missing tags) and
SPSS
only supports up to three discrete missing tags or ranges of missing tags. For this purpose, if a variable
is assigned more than three discrete missing tags, write_spss()
(more precisely export_tibble
)
performs a silent conversion of the discrete missing tags into a missing range.
If this conversion affects other value labels or values in the data not tagged as missing, an error is issued.
Currently known limitations for write_stata
are:
a) Variable format is dropped,
b) missing codes are dropped.
Writes file to disc, returns NULL
.
# write to spss tmp <- tempfile(fileext = ".sav") write_spss(pisa, tmp) # write to stata tmp <- tempfile(fileext = ".dta") write_stata(pisa, tmp)
# write to spss tmp <- tempfile(fileext = ".sav") write_spss(pisa, tmp) # write to stata tmp <- tempfile(fileext = ".dta") write_stata(pisa, tmp)
GADSdat
object to txt
and SPSS
syntaxWrite a GADSdat
object to a text file (txt
) and an accompanying SPSS
syntax file containing all meta information (e.g. value and variable labels).
write_spss2( GADSdat, txtPath, spsPath = NULL, savPath = NULL, dec = ".", fileEncoding = "UTF-8", chkFormat = TRUE, ... )
write_spss2( GADSdat, txtPath, spsPath = NULL, savPath = NULL, dec = ".", fileEncoding = "UTF-8", chkFormat = TRUE, ... )
GADSdat |
A |
txtPath |
Path of |
spsPath |
Path of |
savPath |
Path of |
dec |
Decimal delimiter for your SPSS version. Other values for |
fileEncoding |
Data file encoding for SPSS. Default is |
chkFormat |
Whether format checks via |
... |
Arguments to pass to |
This function is based on eatPreps
writeSpss
function and is currently under development.
Writes a txt
and an sav
file to disc, returns nothing.
# write to spss tmp_txt <- tempfile(fileext = ".txt") write_spss2(pisa, txtPath = tmp_txt)
# write to spss tmp_txt <- tempfile(fileext = ".txt") write_spss2(pisa, txtPath = tmp_txt)