| Title: | Data Management of Large Hierarchical Data |
|---|---|
| Description: | Import 'SPSS' data, handle and change 'SPSS' meta data, store and access large hierarchical data in 'SQLite' data bases. |
| Authors: | Benjamin Becker [aut, cre], Karoline Sachse [ctb], Johanna Busse [ctb] |
| Maintainer: | Benjamin Becker <[email protected]> |
| License: | GPL (>= 2) |
| Version: | 1.2.0.9000 |
| Built: | 2026-06-07 07:28:03 UTC |
| Source: | https://github.com/beckerbenj/eatgads |
Function to apply meta data changes to a GADSdat object specified by a change table extracted by getChangeMeta.
applyChangeMeta(changeTable, GADSdat, ...) ## S3 method for class 'varChanges' applyChangeMeta(changeTable, GADSdat, checkVarNames = TRUE, ...) ## S3 method for class 'valChanges' applyChangeMeta( changeTable, GADSdat, existingMeta = c("stop", "value", "value_new", "drop", "ignore"), ... )applyChangeMeta(changeTable, GADSdat, ...) ## S3 method for class 'varChanges' applyChangeMeta(changeTable, GADSdat, checkVarNames = TRUE, ...) ## S3 method for class 'valChanges' applyChangeMeta( changeTable, GADSdat, existingMeta = c("stop", "value", "value_new", "drop", "ignore"), ... )
changeTable |
Change table as provided by |
GADSdat |
|
... |
further arguments passed to or from other methods. |
checkVarNames |
Logical. Should new variable names be checked by |
existingMeta |
If values are recoded, which meta data should be used (see details)? |
Values for which the change columns contain NA remain unchanged. If changes are performed on value levels, recoding into
existing values can occur. In these cases, existingMeta determines how the resulting meta data conflicts are handled,
either raising an error if any occur ("stop"),
keeping the original meta data for the value ("value"),
using the meta data in the changeTable and, if incomplete, from the recoded value ("value_new"),
or leaving the respective meta data untouched ("ignore").
Furthermore, one might recode multiple old values in the same new value. This is currently only possible with
existingMeta = "drop", which drops all related meta data on value level, or
existingMeta = "ignore", which leaves all related meta data on value level untouched.
Returns the modified GADSdat object.
# Change a variable name and label varChangeTable <- getChangeMeta(pisa, level = "variable") varChangeTable[1, c("varName_new", "varLabel_new")] <- c("IDstud", "Person ID") pisa2 <- applyChangeMeta(varChangeTable, GADSdat = pisa)# Change a variable name and label varChangeTable <- getChangeMeta(pisa, level = "variable") varChangeTable[1, c("varName_new", "varLabel_new")] <- c("IDstud", "Person ID") pisa2 <- applyChangeMeta(varChangeTable, GADSdat = pisa)
Recode one or multiple variables based on a lookup table created via createLookup
(and potentially formatted by collapseColumns).
applyLookup(GADSdat, lookup, suffix = NULL)applyLookup(GADSdat, lookup, suffix = NULL)
GADSdat |
A |
lookup |
Lookup table created by |
suffix |
Suffix to add to the existing variable names. If |
If there are missing values in the column value_new, NAs are inserted as new values
and a warning is issued.
The complete work flow when using a lookup table to recode multiple variables in a GADSdat could be:
(0) optional: Recode empty strings to NA (necessary, if the look up table is written to excel).
(1) create a lookup table with createLookup.
(2) Save the lookup table to .xlsx with write_xlsx from eatAnalysis.
(3) fill out the lookup table via Excel.
(4) Import the lookup table back to R via read_excel from readxl.
(5) Apply the final lookup table with applyLookup.
See applyLookup_expandVar for recoding a single variable into multiple variables.
Returns a recoded GADSdat.
## create an example GADSdat iris2 <- iris iris2$Species <- as.character(iris2$Species) gads <- import_DF(iris2) ## create Lookup lu <- createLookup(gads, recodeVars = "Species") lu$value_new <- c("plant 1", "plant 2", "plant 3") ## apply lookup table gads2 <- applyLookup(gads, lookup = lu, suffix = "_r") ## only recode some values lu2 <- createLookup(gads, recodeVars = "Species") lu2$value_new <- c("plant 1", "plant 2", NA) gads3 <- applyLookup(gads, lookup = lu2, suffix = "_r")## create an example GADSdat iris2 <- iris iris2$Species <- as.character(iris2$Species) gads <- import_DF(iris2) ## create Lookup lu <- createLookup(gads, recodeVars = "Species") lu$value_new <- c("plant 1", "plant 2", "plant 3") ## apply lookup table gads2 <- applyLookup(gads, lookup = lu, suffix = "_r") ## only recode some values lu2 <- createLookup(gads, recodeVars = "Species") lu2$value_new <- c("plant 1", "plant 2", NA) gads3 <- applyLookup(gads, lookup = lu2, suffix = "_r")
Recode one or multiple variables based on a lookup table created via createLookup.
In contrast to applyLookup, this function allows the creation of multiple resulting
variables from a single input variable. All variables in lookup except
variable and value are treated as recode columns.
applyLookup_expandVar(GADSdat, lookup)applyLookup_expandVar(GADSdat, lookup)
GADSdat |
A |
lookup |
Lookup table created by |
If a variable contains information that should be split into multiple variables via manual recoding,
applyLookup_expandVar can be used. If there are missing values in any recode column,
NAs are inserted as new values. A warning is issued only for the first column.
The complete work flow when using a lookup table to expand variables in a GADSdat based on manual recoding could be:
(1) create a lookup table with createLookup.
(2) Save the lookup table to .xlsx with write_xlsx from eatAnalysis.
(3) fill out the lookup table via Excel.
(4) Import the lookup table back to R via read_excel from readxl.
(5) Apply the final lookup table with applyLookup_expandVar.
See applyLookup for simply recoding variables in a GADSdat.
Returns a recoded GADSdat.
## create an example GADSdat example_df <- data.frame(ID = 1:6, citizenship = c("germ", "engl", "germ, usa", "china", "austral, morocco", "nothin"), stringsAsFactors = FALSE) gads <- import_DF(example_df) ## create Lookup lu <- createLookup(gads, recodeVars = "citizenship", addCol = c("cit_1", "cit_2")) lu$cit_1 <- c("German", "English", "German", "Chinese", "Australian", NA) lu$cit_2 <- c(NA, NA, "USA", NA, "Morocco", NA) ## apply lookup table gads2 <- applyLookup_expandVar(gads, lookup = lu)## create an example GADSdat example_df <- data.frame(ID = 1:6, citizenship = c("germ", "engl", "germ, usa", "china", "austral, morocco", "nothin"), stringsAsFactors = FALSE) gads <- import_DF(example_df) ## create Lookup lu <- createLookup(gads, recodeVars = "citizenship", addCol = c("cit_1", "cit_2")) lu$cit_1 <- c("German", "English", "German", "Chinese", "Australian", NA) lu$cit_2 <- c(NA, NA, "USA", NA, "Morocco", NA) ## apply lookup table gads2 <- applyLookup_expandVar(gads, lookup = lu)
Applies recodes as specified by a numCheck data.frame, as created by createNumCheck.
applyNumCheck(GADSdat, numCheck)applyNumCheck(GADSdat, numCheck)
GADSdat |
A |
numCheck |
A |
This function is currently under development.
A recoded GADSdat.
# tbd# tbd
Assimilate all value labels of multiple variables as part of a GADSdat or all_GADSdat object.
assimilateValLabels(GADSdat, varNames, lookup = NULL)assimilateValLabels(GADSdat, varNames, lookup = NULL)
GADSdat |
|
varNames |
Character string of a variable name. |
lookup |
Lookup |
Assimilation can be performed using all existing value labels or a lookup table containing at least all existing value labels.
Missing codes are reused based on the meta data of the first variable in varNames.
Returns the GADSdat object with changed meta data and recoded values.
# Example data set facs_df <- data.frame(id = 1:3, fac1 = c("Eng", "Aus", "Ger"), fac2 = c("Ger", "Franz", "Ita"), fac3 = c("Kor", "Chi", "Alg"), stringsAsFactors = TRUE) facs_gads <- import_DF(facs_df) assimilateValLabels(facs_gads, varNames = paste0("fac", 1:3))# Example data set facs_df <- data.frame(id = 1:3, fac1 = c("Eng", "Aus", "Ger"), fac2 = c("Ger", "Franz", "Ita"), fac3 = c("Kor", "Chi", "Alg"), stringsAsFactors = TRUE) facs_gads <- import_DF(facs_df) assimilateValLabels(facs_gads, varNames = paste0("fac", 1:3))
GADSdat.Auto recode a variable in a GADSdat, mirroring the core functionality provided by
AUTORECODE in SPSS. A lookup table containing the respective recode pairs can be
applied and/or saved.
autoRecode( GADSdat, var, var_suffix = "", label_suffix = "", csv_path = NULL, template = NULL )autoRecode( GADSdat, var, var_suffix = "", label_suffix = "", csv_path = NULL, template = NULL )
GADSdat |
A |
var |
Character string of the variable name which should be recoded. |
var_suffix |
Variable suffix for the newly created |
label_suffix |
Suffix added to variable label for the newly created variable in the |
csv_path |
Path for the |
template |
Existing lookup table. |
Existing values are replaced with sequential numbers, and all existing value-level metadata
(valLabel and missings) are dropped. This can be useful to remove confidential
information from ID variables. If the original (character) values are to be preserved as
valLabels, multiChar2fac should be used instead.
An existing template may be used to ensure that identical original values are recoded as
the same new values. The lookup table used to recode var may also be saved as a
.csv file, e.g. to be used as a template later. If both an existing template
is used and the lookup table is saved, the resulting lookup table will contain the existing
recodes and additional recode pairs required for the data, if any were needed.
Returns a GADSdat object.
gads <- import_DF(data.frame(v1 = letters)) # auto recode without saving lookup table gads2 <- autoRecode(gads, var = "v1", var_suffix = "_num") # auto recode with saving lookup table f <- tempfile(fileext = ".csv") gads2 <- autoRecode(gads, var = "v1", var_suffix = "_num", csv_path = f) # auto recode with applying and expanding a lookup table gads3 <- import_DF(data.frame(v2 = c(letters[1:3], "aa"))) gads3 <- autoRecode(gads3, var = "v2", csv_path = f, template = read.csv(f))gads <- import_DF(data.frame(v1 = letters)) # auto recode without saving lookup table gads2 <- autoRecode(gads, var = "v1", var_suffix = "_num") # auto recode with saving lookup table f <- tempfile(fileext = ".csv") gads2 <- autoRecode(gads, var = "v1", var_suffix = "_num", csv_path = f) # auto recode with applying and expanding a lookup table gads3 <- import_DF(data.frame(v2 = c(letters[1:3], "aa"))) gads3 <- autoRecode(gads3, var = "v2", csv_path = f, template = read.csv(f))
Calculate a scale variable based on multiple items.
calculateScale( GADSdat, items, scale, maxNA = length(items), reportDescr = FALSE )calculateScale( GADSdat, items, scale, maxNA = length(items), reportDescr = FALSE )
GADSdat |
A |
items |
A character vector with all item variable names. |
scale |
A character vector with the scale name. |
maxNA |
Maximum number of allowed |
reportDescr |
Should descriptive statistics be reported for the calculated scale. |
Descriptive statistics (including Cronbach's alpha, credit to the psy package) are calculated and printed to the console.
The new scale variable is automatically inserted right after the last item in the original GADSdat.
Returns a GADSdat containing the newly computed variable.
## items <- paste0("norms_", letters[1:6]) pisa_new <- calculateScale(pisa, items = items, scale = "norms")## items <- paste0("norms_", letters[1:6]) pisa_new <- calculateScale(pisa, items = items, scale = "norms")
GADSdat objects into a single GADSdat object by columns.Is a secure way to cbind the data and the meta data of two GADSdat objects. Currently, only limited merging options are supported.
## S3 method for class 'GADSdat' cbind(..., deparse.level = 1)## S3 method for class 'GADSdat' cbind(..., deparse.level = 1)
... |
Multiple |
deparse.level |
Argument is ignored in this method. |
If there are duplicate variables (except the variables specified in the by argument), these variables are removed from y.
The meta data is joined for the remaining variables via rbind.
Returns a GADSdat object.
Change or add missing codes of one or multiple variables as part of a GADSdat object.
changeMissings(GADSdat, varName, value, missings)changeMissings(GADSdat, varName, value, missings)
GADSdat |
|
varName |
Character vector containing variable names. |
value |
Numeric values. |
missings |
Character vector of the new missing codes, either |
Applied to a GADSdat or all_GADSdat object, this function is a wrapper of
getChangeMeta and applyChangeMeta.
The function supports changing multiple missing tags (missings) as well as missing tags of
multiple variables (varName) at once.
Returns the GADSdat object with changed meta data.
# Set a specific value to missing pisa2 <- changeMissings(pisa, varName = "computer_age", value = 5, missings = "miss") # Set multiple values to missing pisa3 <- changeMissings(pisa, varName = "computer_age", value = 1:4, missings = c("miss", "miss", "miss", "miss")) # Set a specific value to not missing pisa4 <- changeMissings(pisa2, varName = "computer_age", value = 5, missings = "valid") # Add missing tags to multiple variables pisa5 <- changeMissings(pisa, varName = c("g8g9", "computer_age"), value = c(-99, -98), missings = c("miss", "miss"))# Set a specific value to missing pisa2 <- changeMissings(pisa, varName = "computer_age", value = 5, missings = "miss") # Set multiple values to missing pisa3 <- changeMissings(pisa, varName = "computer_age", value = 1:4, missings = c("miss", "miss", "miss", "miss")) # Set a specific value to not missing pisa4 <- changeMissings(pisa2, varName = "computer_age", value = 5, missings = "valid") # Add missing tags to multiple variables pisa5 <- changeMissings(pisa, varName = c("g8g9", "computer_age"), value = c(-99, -98), missings = c("miss", "miss"))
Change the SPSS format of one or multiple variables as part of a GADSdat object.
changeSPSSformat(GADSdat, varName, format)changeSPSSformat(GADSdat, varName, format)
GADSdat |
|
varName |
Character vector of variable names. |
format |
A single string containing the new SPSS format, for example 'A25' or 'F10'. |
Applied to a GADSdat or all_GADSdat object, this function is a wrapper
of getChangeMeta and applyChangeMeta.
SPSS format is supplied following SPSS logic. 'A' represents character variables,
'F' represents numeric variables. The number following this letter represents the maximum width.
Optionally, another number can be added after a dot, representing the number of decimals
in case of a numeric variable. For instance, 'F8.2' is used for a numeric variable with
a maximum width of 8 with 2 decimal numbers.
Returns the GADSdat object with changed meta data..
# change SPSS format for a single variable (numeric variable with no decimals) pisa2 <- changeSPSSformat(pisa, varName = "idstud", format = "F10.0") # change SPSS format for multiple variables (numeric variable with no decimals) pisa2 <- changeSPSSformat(pisa, varName = c("idstud", "idschool"), format = "F10.0")# change SPSS format for a single variable (numeric variable with no decimals) pisa2 <- changeSPSSformat(pisa, varName = "idstud", format = "F10.0") # change SPSS format for multiple variables (numeric variable with no decimals) pisa2 <- changeSPSSformat(pisa, varName = c("idstud", "idschool"), format = "F10.0")
Change or add value labels of one or multiple variables as part of a GADSdat object.
changeValLabels(GADSdat, varName, value, valLabel)changeValLabels(GADSdat, varName, value, valLabel)
GADSdat |
|
varName |
Character vector containing variable names. |
value |
Numeric values which are being labeled. |
valLabel |
Character vector of the new value labels.
Labels are applied in the same ordering as |
Applied to a GADSdat or all_GADSdat object, this function is a wrapper
of getChangeMeta and applyChangeMeta.
The function supports changing multiple value labels (valLabel) as well as value labels of
multiple variables (varName) at once.
Returns the GADSdat object with changed meta data.
# Change existing value labels pisa2 <- changeValLabels(pisa, varName = "repeated", value = c(1, 2), valLabel = c("no grade repetition", "grade repitition")) # Add value label to unlabeled value mtcars_g <- import_DF(mtcars) mtcars_g2 <- changeValLabels(mtcars_g, varName = "cyl", value = c(4, 6, 8), valLabel = c("four", "six", "eight")) # Add value labels to multiple variables at once mtcars_g3 <- changeValLabels(mtcars_g, varName = c("mpg", "cyl", "disp"), value = c(-99, -98), valLabel = c("missing", "not applicable"))# Change existing value labels pisa2 <- changeValLabels(pisa, varName = "repeated", value = c(1, 2), valLabel = c("no grade repetition", "grade repitition")) # Add value label to unlabeled value mtcars_g <- import_DF(mtcars) mtcars_g2 <- changeValLabels(mtcars_g, varName = "cyl", value = c(4, 6, 8), valLabel = c("four", "six", "eight")) # Add value labels to multiple variables at once mtcars_g3 <- changeValLabels(mtcars_g, varName = c("mpg", "cyl", "disp"), value = c(-99, -98), valLabel = c("missing", "not applicable"))
Change variable labels of one or multiple variables as part of a GADSdat object.
changeVarLabels(GADSdat, varName, varLabel)changeVarLabels(GADSdat, varName, varLabel)
GADSdat |
|
varName |
Character vector of variable names. |
varLabel |
Character vector of the new variable labels. |
Applied to a GADSdat or all_GADSdat object, this function is a wrapper
of getChangeMeta and applyChangeMeta.
Returns the GADSdat object with changed meta data.
# Change one variable label pisa2 <- changeVarLabels(pisa, varName = "repeated", varLabel = c("Has a grade been repeated?")) # Change multiple variable labels pisa2 <- changeVarLabels(pisa, varName = c("repeated", "gender"), varLabel = c("Has a grade been repeated?", "Student gender"))# Change one variable label pisa2 <- changeVarLabels(pisa, varName = "repeated", varLabel = c("Has a grade been repeated?")) # Change multiple variable labels pisa2 <- changeVarLabels(pisa, varName = c("repeated", "gender"), varLabel = c("Has a grade been repeated?", "Student gender"))
Change variable names of a GADSdat or all_GADSdat object.
changeVarNames(GADSdat, oldNames, newNames, checkVarNames = TRUE)changeVarNames(GADSdat, oldNames, newNames, checkVarNames = TRUE)
GADSdat |
|
oldNames |
Vector containing the old variable names. |
newNames |
Vector containing the new variable names, in identical order as |
checkVarNames |
Logical. Should new variable names be checked by |
Applied to a GADSdat or all_GADSdat object, this function is a wrapper of getChangeMeta and
applyChangeMeta
Returns the GADSdat object with changed variable names.
# Change multiple variable name pisa2 <- changeVarNames(pisa, oldNames = c("idstud", "idschool"), newNames = c("IDstud", "IDschool"))# Change multiple variable name pisa2 <- changeVarNames(pisa, oldNames = c("idstud", "idschool"), newNames = c("IDstud", "IDschool"))
SPSS Compliance of Meta DataFunction to check if variable names and labels, value labels and missing codes comply with SPSS requirements for meta data.
check4SPSS(GADSdat)check4SPSS(GADSdat)
GADSdat |
|
The function measures the length of variable names ("varNames_length", maximum of 64 characters)
variable labels ("varLabels", maximum of 256 characters),
value labels ("valLabels", maximum of 120 characters). Furthermore,
missing codes are counted ("missings", maximum of three missing codes for character variables)
and special characters are flagged in variable names ("varNames_special").
Check results are reported back on variable level, with the exception of "valLabels", which is a list
with entries per violating variable.
Returns a list with the entries "varNames_special", "varNames_length",
"varLabels", "valLabels" and "missings".
Other dataset compliance checks:
check4Stata()
# Change example data set (create a violating label) pisa2 <- changeVarLabels(pisa, varName = "computer_age", varLabel = paste(rep("3", 125), collapse = "")) check4SPSS(pisa2)# Change example data set (create a violating label) pisa2 <- changeVarLabels(pisa, varName = "computer_age", varLabel = paste(rep("3", 125), collapse = "")) check4SPSS(pisa2)
GADSdat for compatibility with Stata.This function performs all relevant checks to assess if a GADSdat complies with all of
Stata's dataset requirements. Run this before exporting a dataset as .dta, using
write_stata.
check4Stata(GADSdat, version = c("Stata", "Stata 19/BE", "Stata 19/MP"))check4Stata(GADSdat, version = c("Stata", "Stata 19/BE", "Stata 19/MP"))
GADSdat |
A |
version |
Optional single string to request checks for a specific Stata version (see details). |
Specifically, the following requirements are tested:
dots_in_varNames* |
Variable names do not contain dots (checkVarNames). |
special_chars_in_varNames* |
Variable names do not contain special characters. |
varName_length* |
Variable names are not longer than the specific limit (checkVarNames). |
labeled_fractionals* |
There are no labeled fractional values (checkLabeledFractionals). |
large_integers* |
All labeled values can be coerced as.integer (checkIntOverflow). |
varLabel_length |
Variable labels are not longer than the specific limit (checkVarLabels). |
valLabel_length |
Value labels are not longer than the specific limit (checkValLabels). |
long_strings |
String variables do not contain string values that are longer than the specific limit. |
too_many_rows* |
The number of rows/observations does not exceed the specific limit. |
too_many_cols* |
The number of columns/variables does not exceed the specific limit. |
Not complying with the marked (*) requirements will prevent a dataset from being exported to a
.dta file. Issues with unmarked requirements will be solved automatically by truncating.
Limits to different aspects of the dataset vary between versions of the software. By default
(version = "Stata"), compliance with the limits for Stata 19/SE is checked.
Checks against the limits for Stata 19/BE or Stata 19/MP can be requested by
specifying version with the corresponding string. For more details, see program_limits.
Either NULL if all checks are passed successfully, or a list of all
check results (see details for explanations of the keywords) if any problem was detected.
Other dataset compliance checks:
check4SPSS()
check4Stata(pisa)check4Stata(pisa)
Check value labels for (a) value labels with no occurrence in the data (checkEmptyValLabels) and
(b) values with no value labels (checkMissingValLabels).
checkEmptyValLabels( GADSdat, vars = namesGADS(GADSdat), valueRange = NULL, output = c("list", "data.frame") ) checkMissingValLabels( GADSdat, vars = namesGADS(GADSdat), classes = c("integer"), valueRange = NULL, output = c("list", "data.frame") )checkEmptyValLabels( GADSdat, vars = namesGADS(GADSdat), valueRange = NULL, output = c("list", "data.frame") ) checkMissingValLabels( GADSdat, vars = namesGADS(GADSdat), classes = c("integer"), valueRange = NULL, output = c("list", "data.frame") )
GADSdat |
A |
vars |
Character vector with the variable names to which |
valueRange |
[optional] Numeric vector of length 2: In which range should numeric values be checked? If specified, only numeric values are returned and strings are omitted. |
output |
Should the output be structured as a |
classes |
Character vector with the classes to which |
NAs are excluded from this check. Designated missing codes are reported normally.
Returns a list of length vars or a data.frame.
checkEmptyValLabels(): check for superfluous value labels
checkMissingValLabels(): check for missing value labels
# Check a categorical and a metric variable checkMissingValLabels(pisa, vars = c("g8g9", "age")) checkEmptyValLabels(pisa, vars = c("g8g9", "age")) # Check while defining a specific value range checkMissingValLabels(pisa, vars = c("g8g9", "age", "idschool"), valueRange = c(0, 5)) checkEmptyValLabels(pisa, vars = c("g8g9", "age", "idschool"), valueRange = c(0, 5))# Check a categorical and a metric variable checkMissingValLabels(pisa, vars = c("g8g9", "age")) checkEmptyValLabels(pisa, vars = c("g8g9", "age")) # Check while defining a specific value range checkMissingValLabels(pisa, vars = c("g8g9", "age", "idschool"), valueRange = c(0, 5)) checkEmptyValLabels(pisa, vars = c("g8g9", "age", "idschool"), valueRange = c(0, 5))
Function to check if SPSS format statements are specified correctly in a GADSdat object.
checkFormat(GADSdat, type = "SPSS", changeFormat = TRUE)checkFormat(GADSdat, type = "SPSS", changeFormat = TRUE)
GADSdat |
|
type |
If |
changeFormat |
If |
The function compares SPSS format statements "format" and actual character length and
decimal places of all variables in a GADSdat object and its
meta data information. Mismatches are reported and can be automatically adjusted.
Returns a GADSdat object.
# Change example meta information (create a value label with incorrect missing code) pisa2 <- checkFormat(pisa)# Change example meta information (create a value label with incorrect missing code) pisa2 <- checkFormat(pisa)
GADSdat for large labeled whole-number values.Check a GADSdat object for any occurrences of labeled whole-number values
that would be too large for R to handle if they were coerced as.integer().
checkIntOverflow(GADSdat)checkIntOverflow(GADSdat)
GADSdat |
A |
According to its documentation, R can only handle integer
values of up to (roughly) (2,147,483,647 to be exact;
c.f. .Machine$integer.max).
This restriction is relevant when exporting a GADSdat to .dta
and only when any value exceeding the limit is also labeled (or tagged as missing).
This is because Stata only accepts labeled integer (not labeled floating-point;
c.f. checkLabeledFractionals() in this package)
values. haven's write_dta function will therefore
try to coerce any labeled values as.integer(). Unlabeled values, however, will
stay generic numeric values that have a higher limit.
Returns a data.frame, listing the affected varNames,
the large whole-number values, their respective missings tags,
and whether they actually occur in the data (empty).
The rownums of the affected rows in GADSdat$labels are also
provided in a separate column as a fail safe.
# Introduce a fractional value into meta data pisa2 <- changeMissings(GADSdat = pisa, varName = "schtype", value = 9999999999, missings = "miss") eatGADS:::checkIntOverflow(pisa2)# Introduce a fractional value into meta data pisa2 <- changeMissings(GADSdat = pisa, varName = "schtype", value = 9999999999, missings = "miss") eatGADS:::checkIntOverflow(pisa2)
GADSdat for labeled fractional values.Check a GADSdat object for any occurrences of fractional values in its metadata,
including both "truly" labeled values and values tagged as missings.
checkLabeledFractionals(GADSdat)checkLabeledFractionals(GADSdat)
GADSdat |
A |
This function is mainly useful to ensure a data set can be saved as a .dta file.
Unlike, for example, SPSS, Stata only allows for integer values
(and so-called extended missing values) to be labeled
(Stata manual: 12.6.3).
Trying to export (meta) data with labeled fractional values would therefore cause problems
and run into an error from haven's write_dta function.
Returns a data.frame, listing the affected varNames,
the labeled fractional values, their respective missings tags,
and whether they actually occur in the data (empty).
# Introduce a fractional value into meta data pisa2 <- recodeGADS(GADSdat = pisa, varName = "schtype", oldValues = 2, newValues = .5) eatGADS:::checkLabeledFractionals(pisa2)# Introduce a fractional value into meta data pisa2 <- recodeGADS(GADSdat = pisa, varName = "schtype", oldValues = 2, newValues = .5) eatGADS:::checkLabeledFractionals(pisa2)
Functions to check if missings are tagged and labeled correctly in a GADSdat object.
checkMissings( GADSdat, missingLabel = "missing", addMissingCode = TRUE, addMissingLabel = FALSE ) checkMissingsByValues(GADSdat, missingValues = -50:-99, addMissingCode = TRUE)checkMissings( GADSdat, missingLabel = "missing", addMissingCode = TRUE, addMissingLabel = FALSE ) checkMissingsByValues(GADSdat, missingValues = -50:-99, addMissingCode = TRUE)
GADSdat |
|
missingLabel |
Single regular expression indicating how missing labels are commonly named in the value labels. |
addMissingCode |
If |
addMissingLabel |
If |
missingValues |
Numeric vector of values which are commonly used for missing values. |
checkMissings() compares value labels (valLabels) and missing tags (missings) of a GADSdat object and its
meta data information.
checkMissingsByValues() compares labeled values (value) and missing tags (missings) of a GADSdat object
and its meta data information.
Mismatches are reported and can be automatically adjusted. Note that all checks are only applied to the
meta data information, not the actual data. For detecting missing value labels, see checkMissingValLabels.
Returns a GADSdat object with - if specified - modified missing tags.
checkMissings(): compare missing tags and value labels
checkMissingsByValues(): compare missing tags and values in a certain range
# checkMissings pisa2 <- changeValLabels(pisa, varName = "computer_age", value = 5, valLabel = "missing: No computer use") pisa3 <- checkMissings(pisa2) # checkMissingsByValues pisa4 <- changeValLabels(pisa, varName = "computer_age", value = c(-49, -90, -99), valLabel = c("test1", "test2", "test3")) pisa5 <- checkMissingsByValues(pisa4, missingValues = -50:-99)# checkMissings pisa2 <- changeValLabels(pisa, varName = "computer_age", value = 5, valLabel = "missing: No computer use") pisa3 <- checkMissings(pisa2) # checkMissingsByValues pisa4 <- changeValLabels(pisa, varName = "computer_age", value = c(-49, -90, -99), valLabel = c("test1", "test2", "test3")) pisa5 <- checkMissingsByValues(pisa4, missingValues = -50:-99)
eatGADS data bases.This function checks if both data bases perform identical joins via foreign keys, if they contain the same variable names and if these variables have the same value labels. Results of this comparison are reported on data table level as messages and as an output list.
checkTrendStructure(filePath1, filePath2)checkTrendStructure(filePath1, filePath2)
filePath1 |
Path of the first |
filePath2 |
Path of the second |
An error is thrown if the key structure or the data table structure differs between the two data bases. Differences regarding meta data for missing value labels and for variables labels (and formatting) are ignored.
Reported differences regarding meta data can be inspected further via inspectMetaDifferences.
Returns a report list.
Function to check if a variable is unique for all cases of an identifier variable.
checkUniqueness(GADSdat, varName, idVar)checkUniqueness(GADSdat, varName, idVar)
GADSdat |
|
varName |
Single string containing the variable name for which the check should be performed. |
idVar |
Single string containing the identifier variable name. |
For example if missing values are multiple imputed and data is stored in a long format, checking the uniqueness of a variable within an identifier can be tricky. This function automates this task.
Returns either TRUE if the variable is unique within each value for idVar or a GADSdat object including
the not unique cases.
## create an example GADSdat iris2 <- iris iris2$Species <- as.character(iris2$Species) gads <- import_DF(iris2, checkVarNames = FALSE) ## check uniqueness checkUniqueness(gads, varName = "Sepal.Length", idVar = "Species")## create an example GADSdat iris2 <- iris iris2$Species <- as.character(iris2$Species) gads <- import_DF(iris2, checkVarNames = FALSE) ## check uniqueness checkUniqueness(gads, varName = "Sepal.Length", idVar = "Species")
Function to check if a variable is unique for all cases of an identifier variable. This is a fast and more efficient version of
checkUniqueness which always returns a logical, non missing value of length one.
checkUniqueness2(GADSdat, varName, idVar, impVar)checkUniqueness2(GADSdat, varName, idVar, impVar)
GADSdat |
|
varName |
Single string containing the variable name for which the check should be performed. |
idVar |
Single string containing the name of the identifier variable. |
impVar |
Single string containing the name of the imputation number. |
For example if missing values are multiple imputed and data is stored in a long format, checking the uniqueness of a variable
within an identifier can be tricky. This function automates this task via reshaping the data into wide format and testing equality
among the reshaped variables. Similar functionality (via matrices) is covered by lme4::isNested,
which is more general and performs similarly.
Returns a logical of length one.
## create an example GADSdat l <- 1000 long_df <- data.table::data.table(id = sort(rep(1:l, 15)), v1 = sort(rep(1:l, 15)), imp = rep(1:15, l)) gads <- import_DF(long_df) ## check uniqueness checkUniqueness2(gads, varName = "v1", idVar = "id", impVar = "imp")## create an example GADSdat l <- 1000 long_df <- data.table::data.table(id = sort(rep(1:l, 15)), v1 = sort(rep(1:l, 15)), imp = rep(1:15, l)) gads <- import_DF(long_df) ## check uniqueness checkUniqueness2(gads, varName = "v1", idVar = "id", impVar = "imp")
Check if the value or variable labels of a GADSdat comply with the length limits imposed
by SPSS or Stata.
checkValLabels( GADSdat, charLimits = c("SPSS", "Stata"), vars = namesGADS(GADSdat), printLength = 40 ) checkVarLabels( GADSdat, charLimits = c("SPSS", "Stata"), vars = namesGADS(GADSdat), printLength = 40 )checkValLabels( GADSdat, charLimits = c("SPSS", "Stata"), vars = namesGADS(GADSdat), printLength = 40 ) checkVarLabels( GADSdat, charLimits = c("SPSS", "Stata"), vars = namesGADS(GADSdat), printLength = 40 )
GADSdat |
A |
charLimits |
Character vector of the program(s) against whose limit(s) the labels should be checked. |
vars |
Optional character vector of the variables whose value labels should be checked. By default, all value labels will be checked. |
printLength |
Single numeric value. The first n = |
If more than one program name is given in charLimits, the most restrictive limit will be
applied. For details about program-specific limits, see program_limits.
Please note that setting printLength to NULL (and thereby deactivating label
truncation) might not actually result in the printing of the full length of the exceedingly
long labels if you are using RStudio. The program's own limits on the number of characters
printed to the console may still apply
(see Stack Overflow).
A data.frame, reporting every (truncated) long varLabel/valLabel,
their respective length in the relevant unit and the varName in which
they occur. For checkValLabels, the labeled value, as well as whether that value
actually occurs in the data (empty), is also reported.
checkValLabels(): Check value labels for length limits.
checkVarLabels(): Check variable labels for length limits.
# check value labels pisa2 <- pisa pisa2$labels[4, "valLabel"] <- paste0(rep("abcdefg", 4300), collapse = "") eatGADS:::checkValLabels(pisa2) # check variable labels pisa2$labels[1, "varLabel"] <- paste0(rep("abcdefg", 12), collapse = "") eatGADS:::checkVarLabels(pisa2)# check value labels pisa2 <- pisa pisa2$labels[4, "valLabel"] <- paste0(rep("abcdefg", 4300), collapse = "") eatGADS:::checkValLabels(pisa2) # check variable labels pisa2$labels[1, "varLabel"] <- paste0(rep("abcdefg", 12), collapse = "") eatGADS:::checkVarLabels(pisa2)
Function to look for occurrences of a specific value in a GADSdat.
checkValue(GADSdat, value, vars = namesGADS(GADSdat))checkValue(GADSdat, value, vars = namesGADS(GADSdat))
GADSdat |
|
value |
Single string indicating how missing labels are commonly named in the value labels. |
vars |
Character vector with the variable names to which |
The function checks occurrences of a specific value in a set of variables (default: all variables) in the GADSdat and outputs a vector
containing the count of occurrences for all variables in which the value occurs. It explicitly supports checking for NA.
A named integer.
# for all variables in the data checkValue(pisa, value = 99) # only for specific variables in the data checkValue(pisa, vars = c("idschool", "g8g9"), value = 99)# for all variables in the data checkValue(pisa, value = 99) # only for specific variables in the data checkValue(pisa, vars = c("idschool", "g8g9"), value = 99)
SQLite column name conventions and length limits.Checks names for SQLite column name conventions and SPSS/Stata
variable name limits, and applies appropriate variable name changes to
GADSdat or all_GADSdat objects.
checkVarNames( GADSdat, checkKeywords = TRUE, checkDots = TRUE, checkDuplicates = TRUE, charLimits = NULL )checkVarNames( GADSdat, checkKeywords = TRUE, checkDots = TRUE, checkDuplicates = TRUE, charLimits = NULL )
GADSdat |
|
checkKeywords |
Logical. Should |
checkDots |
Logical. Should occurrences of |
checkDuplicates |
Logical. Should case insensitive duplicate variable names be checked and modified? |
charLimits |
Optional character vector of one or more program names(s) for the
limit check (see details). Currently, these are implemented: |
Invalid column names in a SQLite data base include
SQLite keywords (see sqlite_keywords),
column names with a "." in it and
duplicate variable names which arise from SQLite being case insensitive.
The corresponding variable name changes are
appending the suffix "Var" to all SQLite keywords,
changing all "." in variable names to "_", and
appending "_2" to case insensitive duplicated variable names.
Note that avoiding "." in variable names is beneficial for multiple reasons, such as
avoiding confusion with S3 methods in R and issues when exporting to Stata.
The length of variable names is limited to 64 bytes in SPSS and to 32
characters in Stata. If more than one program name is provided in
charLimits, the most restrictive among the chosen limits will be applied. Variable names
exceeding that limit will be truncated and marked with the suffix "_tr".
Returns the original object with updated variable names.
# Change example data set (create an invalid variable name) pisa2 <- changeVarNames(pisa, oldNames = "computer_age", newNames = "computer.age") pisa3 <- checkVarNames(pisa2)# Change example data set (create an invalid variable name) pisa2 <- changeVarNames(pisa, oldNames = "computer_age", newNames = "computer.age") pisa3 <- checkVarNames(pisa2)
Deprecated. The cached data base is now cleaned when the R sessions ends automatically.
clean_cache(tempPath = tempdir())clean_cache(tempPath = tempdir())
tempPath |
Local directory in which the data base was temporarily be stored. |
Cleans the temporary cache, specified by tempdir(). This function had to be executed at the end of an R session if
getGADS_fast or getTrendGADS with fast = TRUE had been used.
Returns nothing.
Clone a variable as part of a GADSdat object.
cloneVariable( GADSdat, varName, new_varName, label_suffix = "", checkVarName = TRUE )cloneVariable( GADSdat, varName, new_varName, label_suffix = "", checkVarName = TRUE )
GADSdat |
|
varName |
Name of the variable to be cloned. |
new_varName |
Name of the new variable. |
label_suffix |
Suffix added to variable label for the newly created variable in the |
checkVarName |
Logical. Should |
The variable is simply duplicated and assigned a new name.
Returns a GADSdat.
# duplicate the variable schtype pisa_new <- cloneVariable(pisa, varName = "schtype", new_varName = "schtype_new")# duplicate the variable schtype pisa_new <- cloneVariable(pisa, varName = "schtype", new_varName = "schtype_new")
Collapse two columns or format a single column of a lookup table created by createLookup.
collapseColumns(lookup, recodeVars, prioritize)collapseColumns(lookup, recodeVars, prioritize)
lookup |
For example a lookup table |
recodeVars |
Character vector of column names which should be collapsed (currently only up to two variables are supported). |
prioritize |
Character vector of length 1. Which of the columns in |
If a lookup table is created by createLookup, different recoding columns can be specified by the addCols argument.
This might be the case if two rater suggest recodes or one rater corrects recodes by another rater in a separate column.
After the recoding columns have been filled out, collapseColumns can be used to either:
(a) Collapse two recoding columns into one recoding column. This might be desirable, if the two columns contain missing values.
prioritize can be used to specify, which of the two columns should be prioritized if both columns contain valid values.
(b) Format the lookup table for applyLookup, if recodeVars is a single variable.
This simply renames the single variable specified under recodeVars.
Returns a data.frame that can be used for applyLookup, with the columns:
variable |
Variable names |
value |
Old values |
value_new |
New values. Renamed and/or collapsed column. |
## (a) Collapse two columns # create example recode data.frame lookup_raw <- data.frame(variable = c("var1"), value = c("germa", "German", "dscherman"), recode1 = c(NA, "English", "German"), recode2 = c("German", "German", NA), stringsAsFactors = FALSE) # collapse columns lookup <- collapseColumns(lookup_raw, recodeVars = c("recode1", "recode2"), prioritize = "recode2") ## (b) Format one column # create example recode data.frame lookup_raw2 <- data.frame(variable = c("var1"), value = c("germa", "German", "dscherman"), recode1 = c("German", "German", "German"), stringsAsFactors = FALSE) # collapse columns lookup2 <- collapseColumns(lookup_raw2, recodeVars = c("recode1"))## (a) Collapse two columns # create example recode data.frame lookup_raw <- data.frame(variable = c("var1"), value = c("germa", "German", "dscherman"), recode1 = c(NA, "English", "German"), recode2 = c("German", "German", NA), stringsAsFactors = FALSE) # collapse columns lookup <- collapseColumns(lookup_raw, recodeVars = c("recode1", "recode2"), prioritize = "recode2") ## (b) Format one column # create example recode data.frame lookup_raw2 <- data.frame(variable = c("var1"), value = c("germa", "German", "dscherman"), recode1 = c("German", "German", "German"), stringsAsFactors = FALSE) # collapse columns lookup2 <- collapseColumns(lookup_raw2, recodeVars = c("recode1"))
Recode an labeled integer variable (based on an multiple choice item), according to a character variable (e.g. an open answer item).
collapseMC_Text( GADSdat, mc_var, text_var, mc_code4text, var_suffix = "_r", label_suffix = "(recoded)" )collapseMC_Text( GADSdat, mc_var, text_var, mc_code4text, var_suffix = "_r", label_suffix = "(recoded)" )
GADSdat |
A |
mc_var |
The variable name of the multiple choice variable. |
text_var |
The variable name of the text variable. |
mc_code4text |
The value label in |
var_suffix |
Variable name suffix for the newly created variables. If |
label_suffix |
Variable label suffix for the newly created variable (only added in the meta data). If |
Multiple choice variables can be represented as labeled integer variables in a GADSdat. Multiple choice items with a forced choice
frequently contain an open answer category. However, sometimes open answers overlap with the existing categories in the multiple choice
item. collapseMC_Text allows recoding the multiple choice variable based on the open answer variable.
mc_code4text indicates when entries in the text_var should be used. Additionally, entries in the text_var are also
used when there are missings on the mc_var. New values for the mc_var are added in the meta data, while preserving the initial
ordering of the value labels. Newly added value labels are sorted alphabetically.
For more details see the help vignette:
vignette("recoding_forcedChoice", package = "eatGADS").
Returns a GADSdat containing the newly computed variable.
# Example gads example_df <- data.frame(ID = 1:5, mc = c("blue", "blue", "green", "other", "other"), open = c(NA, NA, NA, "yellow", "blue"), stringsAsFactors = FALSE) example_df$mc <- as.factor(example_df$mc) gads <- import_DF(example_df) # recode gads2 <- collapseMC_Text(gads, mc_var = "mc", text_var = "open", mc_code4text = "other")# Example gads example_df <- data.frame(ID = 1:5, mc = c("blue", "blue", "green", "other", "other"), open = c(NA, NA, NA, "yellow", "blue"), stringsAsFactors = FALSE) example_df$mc <- as.factor(example_df$mc) gads <- import_DF(example_df) # recode gads2 <- collapseMC_Text(gads, mc_var = "mc", text_var = "open", mc_code4text = "other")
Recode multiple variables (representing a single multiple choice item) based on multiple character variables (representing a text field).
collapseMultiMC_Text( GADSdat, mc_vars, text_vars, mc_var_4text, var_suffix = "_r", label_suffix = "(recoded)", invalid_miss_code = -98, invalid_miss_label = "Missing: Invalid response", notext_miss_code = -99, notext_miss_label = "Missing: By intention" )collapseMultiMC_Text( GADSdat, mc_vars, text_vars, mc_var_4text, var_suffix = "_r", label_suffix = "(recoded)", invalid_miss_code = -98, invalid_miss_label = "Missing: Invalid response", notext_miss_code = -99, notext_miss_label = "Missing: By intention" )
GADSdat |
A |
mc_vars |
A character vector with the variable names of the multiple choice variable. Names of the character
vector are the corresponding values that are represented by the individual variables.
Creation by |
text_vars |
A character vector with the names of the text variables which should be collapsed. |
mc_var_4text |
The name of the multiple choice variable that signals that information from the text variable should be used. This variable is recoded according to the final status of the text variables. |
var_suffix |
Variable suffix for the newly created |
label_suffix |
Suffix added to variable label for the newly created or modified variables in the |
invalid_miss_code |
Missing code which is given to new character variables if all text entries where recoded into the dichotomous variables. |
invalid_miss_label |
Value label for |
notext_miss_code |
Missing code which is given to empty character variables. |
notext_miss_label |
Value label for |
If a multiple choice item can be answered with ticking multiple boxes, multiple variables in the data
set are necessary to represent this item. In this case, an additional text field for further answers can also
contain multiple values at once. However, some of the answers in the text field might be redundant to
the dummy variables. collapseMultiMC_Text allows to recode multiple MC items of this
kind based on multiple text variables. The recoding can be prepared by expanding the single text variable
(createLookup and applyLookup_expandVar) and by matching the dummy variables
to its underlying values stored in variable labels (matchValues_varLabels).
The function recodes the dummy variables according to the character variables. Additionally, the mc_var_4text
variable is recoded according to the final status of the text_vars (exception: if the text variables were
originally NA, mc_var_4text is left as it was).
Missing values in the character variables can be represented either by NAs or by empty characters.
The multiple choice variables specified with mc_vars can only contain the values 0,
1 and missing codes. The value 1 must always represent "this category applies".
If necessary, use recodeGADS for recoding.
For cases for which the text_vars contain only values that can be recoded into the mc_vars,
all new text_vars are given specific missing codes (see invalid_miss_code and invalid_miss_label).
All remaining NAs on the character variables are given a specific missing code (notext_miss_code).
Returns a GADSdat containing the newly computed variables.
# Prepare example data mt2 <- data.frame(ID = 1:4, mc1 = c(1, 0, 0, 0), mc2 = c(0, 0, 0, 0), mc3 = c(0, 1, 1, 0), text1 = c(NA, "Eng", "Aus", "Aus2"), text2 = c(NA, "Franz", NA, "Ger"), stringsAsFactors = FALSE) mt2_gads <- import_DF(mt2) mt3_gads <- changeVarLabels(mt2_gads, varName = c("mc1", "mc2", "mc3"), varLabel = c("Lang: Eng", "Aus spoken", "other")) ## All operations (see also respective help pages of functions for further explanations) mc_vars <- matchValues_varLabels(mt3_gads, mc_vars = c("mc1", "mc2", "mc3"), values = c("Aus", "Eng", "Eng"), label_by_hand = c("other" = "mc3")) out_gads <- collapseMultiMC_Text(mt3_gads, mc_vars = mc_vars, text_vars = c("text1", "text2"), mc_var_4text = "mc3") out_gads2 <- multiChar2fac(out_gads, vars = c("text1_r", "text2_r")) final_gads <- remove2NAchar(out_gads2, vars = c("text1_r_r", "text2_r_r"), max_num = 1, na_value = -99, na_label = "missing: excessive answers")# Prepare example data mt2 <- data.frame(ID = 1:4, mc1 = c(1, 0, 0, 0), mc2 = c(0, 0, 0, 0), mc3 = c(0, 1, 1, 0), text1 = c(NA, "Eng", "Aus", "Aus2"), text2 = c(NA, "Franz", NA, "Ger"), stringsAsFactors = FALSE) mt2_gads <- import_DF(mt2) mt3_gads <- changeVarLabels(mt2_gads, varName = c("mc1", "mc2", "mc3"), varLabel = c("Lang: Eng", "Aus spoken", "other")) ## All operations (see also respective help pages of functions for further explanations) mc_vars <- matchValues_varLabels(mt3_gads, mc_vars = c("mc1", "mc2", "mc3"), values = c("Aus", "Eng", "Eng"), label_by_hand = c("other" = "mc3")) out_gads <- collapseMultiMC_Text(mt3_gads, mc_vars = mc_vars, text_vars = c("text1", "text2"), mc_var_4text = "mc3") out_gads2 <- multiChar2fac(out_gads, vars = c("text1_r", "text2_r")) final_gads <- remove2NAchar(out_gads2, vars = c("text1_r_r", "text2_r_r"), max_num = 1, na_value = -99, na_label = "missing: excessive answers")
Compare multiple variables of two GADSdat or all_GADSdat objects.
compareGADS( GADSdat_old, GADSdat_new, varNames, output = c("list", "data.frame", "aggregated") )compareGADS( GADSdat_old, GADSdat_new, varNames, output = c("list", "data.frame", "aggregated") )
GADSdat_old |
|
GADSdat_new |
|
varNames |
Character string of variable names to be compared. |
output |
How should the output be structured? |
Returns "all equal" if the variable is identical across the objects or a data.frame
containing a frequency table with the values which have been changed. Especially useful for checks
after recoding.
Returns either a list with "all equal" and data.frames or a single data.frame.
# Recode a GADS pisa2 <- recodeGADS(pisa, varName = "schtype", oldValues = 3, newValues = 9) pisa2 <- recodeGADS(pisa2, varName = "language", oldValues = 1, newValues = 15) # Compare compareGADS(pisa, pisa2, varNames = c("ganztag", "schtype", "language"), output = "list") compareGADS(pisa, pisa2, varNames = c("ganztag", "schtype", "language"), output = "data.frame") compareGADS(pisa, pisa2, varNames = c("ganztag", "schtype", "language"), output = "aggregated")# Recode a GADS pisa2 <- recodeGADS(pisa, varName = "schtype", oldValues = 3, newValues = 9) pisa2 <- recodeGADS(pisa2, varName = "language", oldValues = 1, newValues = 15) # Compare compareGADS(pisa, pisa2, varNames = c("ganztag", "schtype", "language"), output = "list") compareGADS(pisa, pisa2, varNames = c("ganztag", "schtype", "language"), output = "data.frame") compareGADS(pisa, pisa2, varNames = c("ganztag", "schtype", "language"), output = "aggregated")
Create a composite variable out of two variables.
composeVar(GADSdat, sourceVars, primarySourceVar, newVar, checkVarName = TRUE)composeVar(GADSdat, sourceVars, primarySourceVar, newVar, checkVarName = TRUE)
GADSdat |
|
sourceVars |
Character vector of length two containing the variable names which represent the sources of information. |
primarySourceVar |
Character vector containing a single variable name. Which of the |
newVar |
Character vector containing the name of the new composite variable. |
checkVarName |
Logical. Should |
A common use case for creating a composite variable is if there are multiple sources for the same information. For example, a child and the parents are asked about the child's native language. In such cases a composite variable contains information from both variables, meaning that one source is preferred and the other source is used to substitute missing values.
The modified GADSdat.
# example data dat <- data.frame(ID = 1:4, nat_lang_child = c("Engl", "Ger", "missing", "missing"), nat_lang_father = c("Engl", "Engl", "Ger", "missing"), stringsAsFactors = TRUE) gads <- import_DF(dat) changeMissings(gads, "nat_lang_child", value = 3, missings = "miss") changeMissings(gads, "nat_lang_father", value = 3, missings = "miss") # compose variable composeVar(gads, sourceVars = c("nat_lang_child", "nat_lang_father"), primarySourceVar = "nat_lang_child", newVar = "nat_lang_comp")# example data dat <- data.frame(ID = 1:4, nat_lang_child = c("Engl", "Ger", "missing", "missing"), nat_lang_father = c("Engl", "Engl", "Ger", "missing"), stringsAsFactors = TRUE) gads <- import_DF(dat) changeMissings(gads, "nat_lang_child", value = 3, missings = "miss") changeMissings(gads, "nat_lang_father", value = 3, missings = "miss") # compose variable composeVar(gads, sourceVars = c("nat_lang_child", "nat_lang_father"), primarySourceVar = "nat_lang_child", newVar = "nat_lang_comp")
Convert a character vector, all character variables in a data.frame or selected variables in a GADSdat to
upper ("uppper"), lower ("lower"), or first letter upper and everything else lower case ("upperFirst").
convertCase(x, case = c("lower", "upper", "upperFirst"), ...) ## S3 method for class 'GADSdat' convertCase(x, case = c("lower", "upper", "upperFirst"), vars, ...)convertCase(x, case = c("lower", "upper", "upperFirst"), ...) ## S3 method for class 'GADSdat' convertCase(x, case = c("lower", "upper", "upperFirst"), vars, ...)
x |
A character vector, |
case |
Character vector of length 1. What case should the strings be converted to? Available options are
|
... |
further arguments passed to or from other methods. |
vars |
Character vector. What variables in the |
Returns the converted object.
convertCase(GADSdat): convert case for GADSdats
# for character convertCase(c("Hi", "HEllo", "greaT"), case = "upperFirst") # for GADSdat input_g <- import_DF(data.frame(v1 = 1:3, v2 = c("Hi", "HEllo", "greaT"), stringsAsFactors = FALSE)) convertCase(input_g, case = "upperFirst", vars = "v2")# for character convertCase(c("Hi", "HEllo", "greaT"), case = "upperFirst") # for GADSdat input_g <- import_DF(data.frame(v1 = 1:3, v2 = c("Hi", "HEllo", "greaT"), stringsAsFactors = FALSE)) convertCase(input_g, case = "upperFirst", vars = "v2")
eatGADS data base.Creates a relational data base containing hierarchically stored data with meta information (e.g. value and variable labels).
createGADS(allList, pkList, fkList, filePath)createGADS(allList, pkList, fkList, filePath)
allList |
An object created via |
pkList |
List of primary keys. |
fkList |
List of foreign keys. |
filePath |
Path to the db file to write (including name); has to end on '.db'. |
Uses createDB from the eatDB package to create a relational data base. For details on how to define
keys see the documentation of createDB.
Creates a data base in the given path, returns NULL.
# see createDB vignette# see createDB vignette
Extract unique values from one or multiple variables of a GADSdat object for recoding (e.g. via an Excel spreadsheet).
createLookup(GADSdat, recodeVars, sort_by = NULL, addCols = c("value_new"))createLookup(GADSdat, recodeVars, sort_by = NULL, addCols = c("value_new"))
GADSdat |
A |
recodeVars |
Character vector of variable names which should be recoded. |
sort_by |
By which column ( |
addCols |
Character vector of additional column names for recoding purposes. |
If recoding of one or multiple variables is more complex, a lookup table can be created for later application via
applyLookup or applyLookup_expandVar. The function allows the extraction of the values
of multiple variables and sorting of these unique values via variable and/or values.
If addCols are specified the lookup table has to be formatted via collapseColumns,
before it can be applied to recode data.
Returns a data frame in long format with the following variables:
variable |
Variables as specified in |
value |
Unique values of the variables specified in |
value_new |
This is the default for |
# create example GADS dat <- data.frame(ID = 1:4, var1 = c(NA, "Eng", "Aus", "Aus2"), var2 = c(NA, "French", "Ger", "Ita"), stringsAsFactors = FALSE) gads <- import_DF(dat) # create Lookup table for recoding lookup <- createLookup(gads, recodeVars = c("var1", "var2"), sort_by = c("value", "variable")) # create Lookup table for recoding by multiple recoders lookup2 <- createLookup(gads, recodeVars = c("var1", "var2"), sort_by = c("value", "variable"), addCols = c("value_recoder1", "value_recoder2"))# create example GADS dat <- data.frame(ID = 1:4, var1 = c(NA, "Eng", "Aus", "Aus2"), var2 = c(NA, "French", "Ger", "Ita"), stringsAsFactors = FALSE) gads <- import_DF(dat) # create Lookup table for recoding lookup <- createLookup(gads, recodeVars = c("var1", "var2"), sort_by = c("value", "variable")) # create Lookup table for recoding by multiple recoders lookup2 <- createLookup(gads, recodeVars = c("var1", "var2"), sort_by = c("value", "variable"), addCols = c("value_recoder1", "value_recoder2"))
All numerical variables without value labels in a GADSdat are selected and a data.frame is created, which allows the specification
of minima and maxima.
createNumCheck(GADSdat)createNumCheck(GADSdat)
GADSdat |
A |
This function is currently under development.
A data.frame with the following variables:
variable |
All numerical variables in the |
varLabel |
Corresponding variable labels |
min |
Minimum value for the specific variable. |
max |
Maximum value for the specific variable. |
value_new |
Which value should be inserted if values exceed the specified range? |
# tbd# tbd
Create an empty variable as part of a GADSdat object.
createVariable(GADSdat, varName, checkVarName = TRUE)createVariable(GADSdat, varName, checkVarName = TRUE)
GADSdat |
|
varName |
Name of the variable to be cloned. |
checkVarName |
Logical. Should |
Returns a GADSdat.
# create a new variable pisa_new <- createVariable(pisa, varName = "new_variable")# create a new variable pisa_new <- createVariable(pisa, varName = "new_variable")
GADSdat.Drop rows with duplicate IDs in a GADSdat object based on numbers of missing values.
dropDuplicateIDs(GADSdat, ID, varNames = setdiff(namesGADS(GADSdat), ID))dropDuplicateIDs(GADSdat, ID, varNames = setdiff(namesGADS(GADSdat), ID))
GADSdat |
A |
ID |
Name of the ID variable. |
varNames |
Character vector of variable names: Sum of missing values on these variables decide which rows are kept. Per default, all variables except the ID variable are used. |
If duplicate IDs occur, it is often desirable to keep the row with the least missing information.
Therefore, dropDuplicateIDs drops rows based on number of missing values
on the specified variables (varNames).
If multiple rows have the same number of missing values, a warning is issued and the first of the respective rows is kept.
Returns the GADSdat with duplicate ID rows removed.
# create example data set gads_ori <- import_DF(data.frame(id_var = c(1, 2, 5, 4, 4), var1 = c(1, 2, -99, 1, -99))) gads_ori <- changeMissings(gads_ori, varName = "var1", value = -99, missings = "miss") # drop duplicate IDs dropDuplicateIDs(gads_ori, ID = "id_var")# create example data set gads_ori <- import_DF(data.frame(id_var = c(1, 2, 5, 4, 4), var1 = c(1, 2, -99, 1, -99))) gads_ori <- changeMissings(gads_ori, varName = "var1", value = -99, missings = "miss") # drop duplicate IDs dropDuplicateIDs(gads_ori, ID = "id_var")
Convert a set of dummy variables into a set of character variables.
dummies2char(GADSdat, dummies, dummyValues, charNames, checkVarNames = TRUE)dummies2char(GADSdat, dummies, dummyValues, charNames, checkVarNames = TRUE)
GADSdat |
A |
dummies |
A character vector with the names of the dummy variables. |
dummyValues |
A vector with the values which the dummy variables represent. |
charNames |
A character vector containing the new variable names. |
checkVarNames |
Logical. Should |
A set of dummy variables is transformed to an equal number of character variables.
The character variables are aligned to the left and the remaining character variables are set to NA.
For each new variable the missing codes of the respective dummy variable are reused.
Returns a GADSdat.
## create an example GADSdat dummy_df <- data.frame(d1 = c("eng", "no eng", "eng"), d2 = c("french", "french", "no french"), d3 = c("no ger", "ger", "no ger"), stringsAsFactors = TRUE) dummy_g <- import_DF(dummy_df) ## transform dummy variables dummy_g2 <- dummies2char(dummy_g, dummies = c("d1", "d2", "d3"), dummyValues = c("english", "french", "german"), charNames = c("char1", "char2", "char3"))## create an example GADSdat dummy_df <- data.frame(d1 = c("eng", "no eng", "eng"), d2 = c("french", "french", "no french"), d3 = c("no ger", "ger", "no ger"), stringsAsFactors = TRUE) dummy_g <- import_DF(dummy_df) ## transform dummy variables dummy_g2 <- dummies2char(dummy_g, dummies = c("d1", "d2", "d3"), dummyValues = c("english", "french", "german"), charNames = c("char1", "char2", "char3"))
NA.Set all values within one or multiple variables to NA.
emptyTheseVariables(GADSdat, vars, label_suffix = "")emptyTheseVariables(GADSdat, vars, label_suffix = "")
GADSdat |
A |
vars |
Character vector of variable names which should be set to |
label_suffix |
Suffix added to variable labels for the affected variables in the |
Returns the recoded GADSdat.
# empty multiple variables pisa2 <- emptyTheseVariables(pisa, vars = c("idstud", "idschool"))# empty multiple variables pisa2 <- emptyTheseVariables(pisa, vars = c("idstud", "idschool"))
GADSdat objects are (nearly) equalRun tests to check whether two GADSdat objects are (nearly) equal.
equalData compares variable names, number of rows in the data, and data differences.
equalMeta compares variable names and meta data differences.
equalGADS combines both functions. All functions produce a test report in list format.
equalGADS( target, current, id = NULL, metaExceptions = c("display_width", "labeled"), tolerance = sqrt(.Machine$double.eps) ) equalData(target, current, id = NULL, tolerance = sqrt(.Machine$double.eps)) equalMeta(target, current, metaExceptions = c("display_width", "labeled"))equalGADS( target, current, id = NULL, metaExceptions = c("display_width", "labeled"), tolerance = sqrt(.Machine$double.eps) ) equalData(target, current, id = NULL, tolerance = sqrt(.Machine$double.eps)) equalMeta(target, current, metaExceptions = c("display_width", "labeled"))
target |
A |
current |
A |
id |
A character vector containing the unique identifier columns of both |
metaExceptions |
Should certain meta data columns be excluded from the comparison? |
tolerance |
A numeric value greater than or equal to |
More detailed checks for individual variables can be performed via inspectDifferences
and inspectMetaDifferences.
Returns a list with the following entries:
names_not_in_1 |
Which variables are included in |
names_not_in_2 |
Which variables are included in |
data_nrow |
Do the actual data sets have the same number of rows? |
data_differences |
For which variables are the data different? |
meta_data_differences |
For which variables are the meta data different? |
GADSdat to a tibble
haven's read_spss stores data together with meta data (e.g. value and variable labels) in a
tibble with attributes on variable level. This function transforms a GADSdat object to such a tibble.
export_tibble(GADSdat)export_tibble(GADSdat)
GADSdat |
|
This function is mainly intended for internal use. For further documentation see also write_spss.
Returns a tibble.
pisa_tbl <- export_tibble(pisa)pisa_tbl <- export_tibble(pisa)
Extract data.frame from a GADSdat object for analyses in R. Value labels can be
selectively applied via defining convertLabels and covertVariables.
For extracting meta data see extractMeta.
extractData( GADSdat, convertMiss = TRUE, convertLabels = c("character", "factor", "numeric"), convertVariables = NULL, dropPartialLabels = TRUE )extractData( GADSdat, convertMiss = TRUE, convertLabels = c("character", "factor", "numeric"), convertVariables = NULL, dropPartialLabels = TRUE )
GADSdat |
A |
convertMiss |
Should values tagged as missing values be recoded to |
convertLabels |
If |
convertVariables |
Character vector of variables names, which labels should be applied to.
All other variables remain as numeric variables in the data.
If not specified [default], value labels are applied to all variables for which labels are available.
Variable names not in the actual |
dropPartialLabels |
Should value labels for partially labeled variables be dropped?
If |
A GADSdat object includes actual data (GADSdat$dat) and the corresponding meta data information
(GADSdat$labels). extractData extracts the data and applies relevant meta data on value level (missing conversion, value labels),
so the data can be used for analyses in R. Variable labels are retained as label attributes on column level.
If factor are extracted via convertLabels == "factor", an attempt is made to preserve the underlying integers.
If this is not possible, a warning is issued.
As SPSS has almost no limitations regarding the underlying values of labeled
integers and R's factor format is very strict (no 0, only integers increasing by + 1),
this procedure can lead to frequent problems.
Returns a data frame.
# Extract Data for Analysis dat <- extractData(pisa) # convert labeled variables to factors dat <- extractData(pisa, convertLabels = "factor") # convert only some variables to factor, all others remain numeric dat <- extractData(pisa, convertLabels = "factor", convertVariables = c("schtype", "ganztag")) # convert only some variables to character, all others remain numeric dat <- extractData(pisa, convertLabels = "factor", convertVariables = c("schtype", "ganztag")) # schtype is now character table(dat$schtype) # schtype remains numeric table(dat$gender)# Extract Data for Analysis dat <- extractData(pisa) # convert labeled variables to factors dat <- extractData(pisa, convertLabels = "factor") # convert only some variables to factor, all others remain numeric dat <- extractData(pisa, convertLabels = "factor", convertVariables = c("schtype", "ganztag")) # convert only some variables to character, all others remain numeric dat <- extractData(pisa, convertLabels = "factor", convertVariables = c("schtype", "ganztag")) # schtype is now character table(dat$schtype) # schtype remains numeric table(dat$gender)
Extract data.frame from a GADSdat object for analyses in R. Per default, missing codes are applied but
value labels are dropped. Alternatively, value labels can be selectively applied via
labels2character, labels2factor, and labels2ordered.
For extracting meta data see extractMeta.
extractData2( GADSdat, convertMiss = TRUE, labels2character = NULL, labels2factor = NULL, labels2ordered = NULL, dropPartialLabels = TRUE )extractData2( GADSdat, convertMiss = TRUE, labels2character = NULL, labels2factor = NULL, labels2ordered = NULL, dropPartialLabels = TRUE )
GADSdat |
A |
convertMiss |
Should values tagged as missing values be recoded to |
labels2character |
For which variables should values be recoded to their labels? The resulting variables
are of type |
labels2factor |
For which variables should values be recoded to their labels? The resulting variables
are of type |
labels2ordered |
For which variables should values be recoded to their labels? The resulting variables
are of type |
dropPartialLabels |
Should value labels for partially labeled variables be dropped?
If |
A GADSdat object includes actual data (GADSdat$dat) and the corresponding meta data information
(GADSdat$labels). extractData2 extracts the data and applies relevant meta data on value level
(missing tags, value labels),
so the data can be used for analyses in R. Variable labels are retained as label attributes on column level.
If factor are extracted via labels2factor or labels2ordered, an attempt is made to preserve the underlying integers.
If this is not possible, a warning is issued.
As SPSS has almost no limitations regarding the underlying values of labeled
integers and R's factor format is very strict (no 0, only integers increasing by + 1),
this procedure can lead to frequent problems.
If multiple values of the same variable are assigned the same value label and the variable should be transformed to
character, factor, or ordered, a warning is issued and the transformation is correctly performed.
Returns a data frame.
# Extract Data for Analysis dat <- extractData2(pisa) # convert only some variables to character, all others remain numeric dat <- extractData2(pisa, labels2character = c("schtype", "ganztag")) # convert only some variables to factor, all others remain numeric dat <- extractData2(pisa, labels2factor = c("schtype", "ganztag")) # convert all labeled variables to factors dat <- extractData2(pisa, labels2factor = namesGADS(pisa)) # convert somme variables to factor, some to character dat <- extractData2(pisa, labels2character = c("schtype", "ganztag"), labels2factor = c("migration"))# Extract Data for Analysis dat <- extractData2(pisa) # convert only some variables to character, all others remain numeric dat <- extractData2(pisa, labels2character = c("schtype", "ganztag")) # convert only some variables to factor, all others remain numeric dat <- extractData2(pisa, labels2factor = c("schtype", "ganztag")) # convert all labeled variables to factors dat <- extractData2(pisa, labels2factor = namesGADS(pisa)) # convert somme variables to factor, some to character dat <- extractData2(pisa, labels2character = c("schtype", "ganztag"), labels2factor = c("migration"))
Support for linking error data bases has been removed from eatGADS.
extractDataOld provides (for the time being) backwards compatibility, so linking errors can still be merged automatically.
extractDataOld( GADSdat, convertMiss = TRUE, convertLabels = "character", dropPartialLabels = TRUE, convertVariables = NULL )extractDataOld( GADSdat, convertMiss = TRUE, convertLabels = "character", dropPartialLabels = TRUE, convertVariables = NULL )
GADSdat |
A |
convertMiss |
Should values coded as missing values be recoded to |
convertLabels |
If |
dropPartialLabels |
Should value labels for partially labeled variables be dropped? If |
convertVariables |
Character vector of variables names, which labels should be applied to. If not specified (default), value labels are applied to all variables for which labels are available. Variable names not in the actual GADS are silently dropped. |
See extractData for the current functionality.
Returns a data frame.
GADSdat from all_GADSdat
Function to extract a single GADSdat from an all_GADSdat object.
extractGADSdat(all_GADSdat, name)extractGADSdat(all_GADSdat, name)
all_GADSdat |
|
name |
A character vector with length 1 with the name of the |
GADSdat objects can be merged into a single all_GADSdat object via mergeLabels. This function, performs the
reverse action, extracting a single GADSdat object.
Returns an GADSdat object.
# see createGADS vignette# see createGADS vignette
Extract meta data (e.g. variable and values labels) from an eatGADS object. This can be a GADSdat, an all_GADSdat,
a labels data.frame, or the path to an existing data base.
extractMeta(GADSobject, vars = NULL)extractMeta(GADSobject, vars = NULL)
GADSobject |
Either a |
vars |
A character vector containing variable names. If |
Meta data is stored tidily in all GADSdat objects as a separate long format data frame. This information can be extracted for a single or
multiple variables.
Returns a long format data frame with meta information.
# Extract Meta data from data base db_path <- system.file("extdata", "pisa.db", package = "eatGADS") extractMeta(db_path, vars = c("schtype", "sameteach")) # Extract Meta data from loaded/imported GADS extractMeta(pisa, vars = c("schtype", "sameteach"))# Extract Meta data from data base db_path <- system.file("extdata", "pisa.db", package = "eatGADS") extractMeta(db_path, vars = c("schtype", "sameteach")) # Extract Meta data from loaded/imported GADS extractMeta(pisa, vars = c("schtype", "sameteach"))
GADSdat.Extract or remove variables and their meta data from a GADSdat object.
extractVars(GADSdat, vars) removeVars(GADSdat, vars)extractVars(GADSdat, vars) removeVars(GADSdat, vars)
GADSdat |
|
vars |
A character vector containing the variables names in the |
Both functions simply perform the variable removal or extraction from the underlying data.frame
in the GADSdat object followed by calling updateMeta.
Returns a GADSdat object.
## create an example GADSdat example_df <- data.frame(ID = 1:4, age = c(12, 14, 16, 13), citizenship1 = c("German", "English", "Polish", "Chinese"), citizenship2 = c(NA, "German", "Chinese", "Polish"), stringsAsFactors = TRUE) gads <- import_DF(example_df) ## remove variables from GADSdat gads2 <- removeVars(gads, vars = c("citizenship2", "age")) ## extract GADSdat with specific variables gads3 <- extractVars(gads, vars = c("ID", "citizenship1"))## create an example GADSdat example_df <- data.frame(ID = 1:4, age = c(12, 14, 16, 13), citizenship1 = c("German", "English", "Polish", "Chinese"), citizenship2 = c(NA, "German", "Chinese", "Polish"), stringsAsFactors = TRUE) gads <- import_DF(example_df) ## remove variables from GADSdat gads2 <- removeVars(gads, vars = c("citizenship2", "age")) ## extract GADSdat with specific variables gads3 <- extractVars(gads, vars = c("ID", "citizenship1"))
Convert a factor variable with n levels to n dummy variables.
fac2dummies(GADSdat, var)fac2dummies(GADSdat, var)
GADSdat |
A |
var |
A character vector with the name of the factor variable. |
Newly created variables are named as the original variable with the suffix "_a", "_b" and so on. Variable labels
are created by using the original variable label (if available) and adding the value label of the corresponding level.
All missing codes are forwarded to all dummy variables.
Returns a GADSdat containing the newly computed variables.
## create an example GADSdat suppressMessages(gads <- import_DF(iris)) ## transform factor variable gads2 <- fac2dummies(gads, var = "Species")## create an example GADSdat suppressMessages(gads <- import_DF(iris)) ## transform factor variable gads2 <- fac2dummies(gads, var = "Species")
Convert a factor variable with complex factor levels (factor levels contain combinations of other factor levels) to dummy variables.
Dummy variables are coded 1 ("yes") and 0 ("no").
fac2dummies_complex(GADSdat, var)fac2dummies_complex(GADSdat, var)
GADSdat |
A |
var |
A character vector with the name of the factor variable. |
The basic functionality of this function is analogous to fac2dummies. However, the function expects factor levels to only go
to 9. Higher numbers are treated as combinations of factor levels, for example "13" as "1" and "3".
Returns a GADSdat containing the newly computed variables.
## create an example GADSdat df_fac <- data.frame(id = 1:6, fac = c("Opt a", "Opt c, Opt b", "Opt c", "Opt b", "Opt a, Opt b", "Opt a, Opt b, Opt c"), stringsAsFactors = TRUE) g_fac <- import_DF(df_fac) g_fac <- recodeGADS(g_fac, varName = "fac", oldValues = c(1, 2, 3, 4, 5, 6), newValues = c(1, 12, 123, 2, 3, 23)) ## transform factor variable fac2dummies_complex(g_fac, "fac")## create an example GADSdat df_fac <- data.frame(id = 1:6, fac = c("Opt a", "Opt c, Opt b", "Opt c", "Opt b", "Opt a, Opt b", "Opt a, Opt b, Opt c"), stringsAsFactors = TRUE) g_fac <- import_DF(df_fac) g_fac <- recodeGADS(g_fac, varName = "fac", oldValues = c(1, 2, 3, 4, 5, 6), newValues = c(1, 12, 123, 2, 3, 23)) ## transform factor variable fac2dummies_complex(g_fac, "fac")
Fill imputed values in a imputed GADSdat_imp object with original, not imputed values from a GADSdat.
fillImputations(GADSdat, GADSdat_imp, varName, varName_imp = varName, id, imp)fillImputations(GADSdat, GADSdat_imp, varName, varName_imp = varName, id, imp)
GADSdat |
A |
GADSdat_imp |
A |
varName |
A character vector of length 1 containing the variable name in |
varName_imp |
A character vector of length 1 containing the variable name in |
id |
A character vector of length 1 containing the unique identifier column of both |
imp |
A character vector of length 1 containing the imputation number in |
This function only fills in missing values in the imputed variable from the not imputed variable. It provides parts
of the functionality of subImputations but does not check whether values have been mistakenly imputed. However,
performance is increased substantially.
The modified GADSdat_imp..
# tbd# tbd
Remove special characters from a character vector or a GADSdat object.
Also suitable to fix encoding problems of a character vector or a GADSdat object. See details for available options.
fixEncoding(x, input = c("other", "ASCII", "windows1250", "BRISE"))fixEncoding(x, input = c("other", "ASCII", "windows1250", "BRISE"))
x |
A character vector or |
input |
Which encoding was used in |
The option "other" replaces correctly encoded special signs.
The option "ASCII" works for strings which were encoded presumably using UTF-8 and imported using ASCII encoding.
The option "windows1250" works for strings which were encoded presumably using UTF-8
and imported using windows-1250 encoding.
The option "BRISE" covers a unique case used at the FDZ at IQB.
If entries are all upper case, special characters are also transformed to all upper case (e.g., "AE" instead
of "Ae").
The modified character vector or GADSdat object.
fixEncoding(c("\U00C4pfel", "\U00C4PFEL", paste0("\U00DC", "ben"), paste0("\U00DC", "BEN")))fixEncoding(c("\U00C4pfel", "\U00C4PFEL", paste0("\U00DC", "ben"), paste0("\U00DC", "BEN")))
Function to obtain a data frame from a GADSdat object for for changes to meta data on variable or on value level.
getChangeMeta(GADSdat, level = "variable")getChangeMeta(GADSdat, level = "variable")
GADSdat |
|
level |
|
Changes on variable level include variable names (varName), variable labels (varLabel),
SPSS format ((format)) and display width (display_width).
Changes on value level include values (value), value labels (valLabel) and
missing codes (missings).
Returns the meta data sheet for all variables including the corresponding change columns.
# For changes on variable level varChangeTable <- getChangeMeta(pisa, level = "variable") # For changes on value level valChangeTable <- getChangeMeta(pisa, level = "value")# For changes on variable level varChangeTable <- getChangeMeta(pisa, level = "variable") # For changes on value level valChangeTable <- getChangeMeta(pisa, level = "value")
Extracts variables from a GADS data base. Only the specified variables are extracted. Note that this selection determines the format of
the data.frame that is extracted.
getGADS(vSelect = NULL, filePath)getGADS(vSelect = NULL, filePath)
vSelect |
Character vector of variable names. |
filePath |
Path of the existing |
See createDB and dbPull for further explanation of the query and merging processes.
Returns a GADSdat object.
# Use data base within package db_path <- system.file("extdata", "pisa.db", package = "eatGADS") pisa_gads <- getGADS(db_path, vSelect = c("schtype", "sameteach"))# Use data base within package db_path <- system.file("extdata", "pisa.db", package = "eatGADS") pisa_gads <- getGADS(db_path, vSelect = c("schtype", "sameteach"))
Extracts variables from a eatGADS data base. Only the specified variables are extracted. Note that this selection determines the format
of the data.frame that is extracted. CAREFUL: This function uses a local temporary directory to speed up loading the data base
from a server and caches the data base locally for a running R session. The temporary data base is removed automatically when the
running R session is terminated.
getGADS_fast(vSelect = NULL, filePath, tempPath = tempdir())getGADS_fast(vSelect = NULL, filePath, tempPath = tempdir())
vSelect |
Character vector of variable names. |
filePath |
Path of the existing |
tempPath |
Local directory in which the data base can temporarily be stored. Using the default is recommended. |
A random temporary directory is used for caching the data base and is removed, when the R sessions terminates. See
createDB and dbPull for further explanation of the query and merging processes.
Returns a GADSdat object.
Get the (most restrictive) limits that SPSS and/or Stata
imposes on a specific aspect of a dataset.
getProgramLimit( program = c("SPSS", "Stata", "Stata 19/BE", "Stata 19/MP"), component = c("varNames", "varLabels", "valLabels", "stringvars", "nrows", "ncols") )getProgramLimit( program = c("SPSS", "Stata", "Stata 19/BE", "Stata 19/MP"), component = c("varNames", "varLabels", "valLabels", "stringvars", "nrows", "ncols") )
program |
Character vector of the programs/program version that should be considered. |
component |
Single string. Which limits should be returned? |
For more details about program specific limits as well as a full list, see program_limits.
In program, "SPSS" implies SPSS 30, and "Stata" implies
Stata 19/SE, as these are the most relevant version among the ones implemented here.
If more than one program/version name is given in program, the most restrictive limit
will be returned.
A list of two elements: value (numeric size of the limit) and
unit ("char", "byte", or "generic").
# Show all implemented limits program_limits # Get the specific limit on variable name lengths under SPSS getProgramLimit("SPSS", "varNames") # Get the variable name length limit a dataset has to adhere to to be compatible with # both SPSS and Stata 19/SE getProgramLimit(c("Stata", "SPSS"), "varNames")# Show all implemented limits program_limits # Get the specific limit on variable name lengths under SPSS getProgramLimit("SPSS", "varNames") # Get the variable name length limit a dataset has to adhere to to be compatible with # both SPSS and Stata 19/SE getProgramLimit(c("Stata", "SPSS"), "varNames")
Extracts variables from multiple eatGADS data bases.
Data can then be extracted from the GADSdat object via
extractData. For extracting meta data from a data base or a GADSdat object see extractMeta. To speed
up the data loading, getGADS_fast is used per default.
getTrendGADS( filePaths, vSelect = NULL, years, fast = TRUE, tempPath = tempdir(), verbose = TRUE )getTrendGADS( filePaths, vSelect = NULL, years, fast = TRUE, tempPath = tempdir(), verbose = TRUE )
filePaths |
Character vectors with paths to the |
vSelect |
Variables from all GADS to be selected (as character vector). |
years |
A numeric vector with identical length as |
fast |
Should |
tempPath |
The directory, in which both GADS will be temporarily stored. Using the default is heavily recommended. |
verbose |
Should the loading process be reported? |
This function extracts data from multiple GADS data bases. All data bases have to be created via
createGADS. The data bases are joined via rbind() and a variable year is added, corresponding to the
argument years. The GADSdat object can then further
be used via extractData. See createDB and dbPull for further explanation
of the querying and merging processes.
Returns a GADSdat object.
# See getGADS vignette# See getGADS vignette
Support for linking error data bases has been removed from eatGADS.
getGADSold provides (for the time being) backwards compatibility, so linking errors can still be extracted automatically.
getTrendGADSOld( filePath1, filePath2, lePath = NULL, vSelect = NULL, years, fast = TRUE, tempPath = tempdir() )getTrendGADSOld( filePath1, filePath2, lePath = NULL, vSelect = NULL, years, fast = TRUE, tempPath = tempdir() )
filePath1 |
Path of the first |
filePath2 |
Path of the second |
lePath |
Path of the linking error db file. If |
vSelect |
Variables from both GADS to be selected (as character vector). |
years |
A numeric vector of length 2. The first elements corresponds to |
fast |
Should |
tempPath |
The directory, in which both GADS will be temporarily stored. Using the default is heavily recommended. |
See getGADS for the current functionality.
Returns a GADSdat object.
# See getGADS vignette# See getGADS vignette
convertLabel
Function to import a data.frame object created by convertLabel for use in eatGADS. If possible, importing data via import_spss should always be preferred.
import_convertLabel(df, checkVarNames = TRUE)import_convertLabel(df, checkVarNames = TRUE)
df |
A |
checkVarNames |
Should variable names be checked for violations of |
convertLabel from R package eatAnalysis converts an object imported via read.spss (from the foreign package) to a data.frame with factors and variable labels stored in variable attributes.
Returns a list with the actual data dat and a data frame with all meta information in long format labels.
data.frame
Function to import a data.frame object for use in eatGADS while extracting value labels from factors.
import_DF(df, checkVarNames = TRUE)import_DF(df, checkVarNames = TRUE)
df |
A |
checkVarNames |
Should variable names be checked for violations of |
Factors are integers with labeled variable levels. import_DF extracts these labels and stores them in a separate meta data data.frame.
See import_spss for detailed information.
Returns a list with the actual data dat and a data frame with all meta information in long format labels.
dat <- import_DF(iris, checkVarNames = FALSE) # Inspect Meta data extractMeta(dat) # Extract Data dat <- extractData(dat, convertLabels = "character")dat <- import_DF(iris, checkVarNames = FALSE) # Inspect Meta data extractMeta(dat) # Extract Data dat <- extractData(dat, convertLabels = "character")
Function to import a data.frame object for use in eatGADS while adding explicit variable and value meta information through
separate data.frames.
import_raw(df, varLabels, valLabels = NULL, checkVarNames = TRUE)import_raw(df, varLabels, valLabels = NULL, checkVarNames = TRUE)
df |
A |
varLabels |
A |
valLabels |
A |
checkVarNames |
Should variable names be checked for violations of |
The argument varLables has to contain exactly two variables, namely varName and varLabel. valLables has
to contain exactly four variables, namely varName, value, valLabel and missings. The column value
can only contain numerical values. The column missings can only contain the values "valid" and "miss".
Variables of type factor are not supported in any of the data.frames.
Returns a list with the actual data dat and with all meta information in long format labels.
dat <- data.frame(ID = 1:5, grade = c(1, 1, 2, 3, 1)) varLabels <- data.frame(varName = c("ID", "grade"), varLabel = c("Person Identifier", "School grade Math"), stringsAsFactors = FALSE) valLabels <- data.frame(varName = c("grade", "grade", "grade"), value = c(1, 2, 3), valLabel = c("very good", "good", "sufficient"), missings = c("valid", "valid", "valid"), stringsAsFactors = FALSE) gads <- import_raw(df = dat, varLabels = varLabels, valLabels = valLabels, checkVarNames = FALSE) # Inspect Meta data extractMeta(gads) # Extract Data dat <- extractData(gads, convertLabels = "character")dat <- data.frame(ID = 1:5, grade = c(1, 1, 2, 3, 1)) varLabels <- data.frame(varName = c("ID", "grade"), varLabel = c("Person Identifier", "School grade Math"), stringsAsFactors = FALSE) valLabels <- data.frame(varName = c("grade", "grade", "grade"), value = c(1, 2, 3), valLabel = c("very good", "good", "sufficient"), missings = c("valid", "valid", "valid"), stringsAsFactors = FALSE) gads <- import_raw(df = dat, varLabels = varLabels, valLabels = valLabels, checkVarNames = FALSE) # Inspect Meta data extractMeta(gads) # Extract Data dat <- extractData(gads, convertLabels = "character")
Function to create a GADSdat object based on a dat data.frame and a labels data.frame.
import_raw2(dat, labels)import_raw2(dat, labels)
dat |
A |
labels |
A |
A GADSdat is basically a list with two elements: a dat and a labels data.frame. If these elements are
separated, they can be cleanly tied together again by import_raw2. The function performs extensive checks on the integrity of the
resulting GADSdat object. See import_spss and import_raw for further details.
Returns a GADSdat object.
dat <- data.frame(ID = 1:5, grade = c(1, 1, 2, 3, 1)) varLabels <- data.frame(varName = c("ID", "grade"), varLabel = c("Person Identifier", "School grade Math"), stringsAsFactors = FALSE) valLabels <- data.frame(varName = c("grade", "grade", "grade"), value = c(1, 2, 3), valLabel = c("very good", "good", "sufficient"), missings = c("valid", "valid", "valid"), stringsAsFactors = FALSE) gads <- import_raw(df = dat, varLabels = varLabels, valLabels = valLabels, checkVarNames = FALSE) # separate the GADSdat object dat <- gads$dat labels <- gads$labels # rejoin it dat <- import_raw2(dat, labels)dat <- data.frame(ID = 1:5, grade = c(1, 1, 2, 3, 1)) varLabels <- data.frame(varName = c("ID", "grade"), varLabel = c("Person Identifier", "School grade Math"), stringsAsFactors = FALSE) valLabels <- data.frame(varName = c("grade", "grade", "grade"), value = c(1, 2, 3), valLabel = c("very good", "good", "sufficient"), missings = c("valid", "valid", "valid"), stringsAsFactors = FALSE) gads <- import_raw(df = dat, varLabels = varLabels, valLabels = valLabels, checkVarNames = FALSE) # separate the GADSdat object dat <- gads$dat labels <- gads$labels # rejoin it dat <- import_raw2(dat, labels)
RDS fileFunction to import a data.frame stored as a .RDS file while extracting value labels from factors.
import_RDS(filePath, checkVarNames = TRUE)import_RDS(filePath, checkVarNames = TRUE)
filePath |
Source file location, ending on |
checkVarNames |
Should variable names be checked for violations of |
Factors are integers with labeled variable levels. import_RDS extracts these labels and stores them in a separate meta data data.frame.
See import_DF for detailed information. This function is a wrapper around import_DF.
Returns a list with the actual data dat and a data frame with all meta information in long format labels.
Function to import .sav files while extracting meta information, e.g. variable and value labels.
import_spss( filePath, checkVarNames = TRUE, labeledStrings = c("drop", "keep", "transform"), encoding = NULL )import_spss( filePath, checkVarNames = TRUE, labeledStrings = c("drop", "keep", "transform"), encoding = NULL )
filePath |
Source file location, ending on |
checkVarNames |
Should variable names be checked for violations of |
labeledStrings |
Should strings as labeled values be allowed? If |
encoding |
The character encoding used for the file. The default, |
SPSS files (.sav) store variable and value labels and assign specific formatting to variables. import_spss imports
data from SPSS, while storing this meta-information separately in a long format data frame. Value labels and missing labels are used
to identify missing values (see checkMissings). Time and date variables are converted to character.
In some special cases, .sav files seem to consist of a mix of different encoding types. In such cases, haven might
throw an error if the encoding argument is not specified or UTF-8 is selected as encoding. To circumvent this problem we
recommend using encoding = "ASCII" and fixing the resulting issues manually. For example, fixEncoding
provides some fixes for encoding issues specific to the German language.
Returns a list with the actual data dat and a data frame with all meta information in long format labels.
# Use spss data from within package spss_path <- system.file("extdata", "pisa.zsav", package = "eatGADS") pisa_gads <- import_spss(spss_path)# Use spss data from within package spss_path <- system.file("extdata", "pisa.zsav", package = "eatGADS") pisa_gads <- import_spss(spss_path)
Stata dataFunction to import .dta files while extracting meta information, e.g. variable and value labels.
import_stata(filePath, checkVarNames = TRUE, labeledStrings = FALSE)import_stata(filePath, checkVarNames = TRUE, labeledStrings = FALSE)
filePath |
Source file location, ending on |
checkVarNames |
Should variable names be checked for violations of |
labeledStrings |
Should strings as labeled values be allowed? This possibly corrupts all labeled values. |
Stata files (.dta) store variable and value labels and assign specific formatting to variables. import_stata imports
data from Stata, while storing this meta-information separately in a long format data frame. Time and date variables are converted to character.
Returns a list with the actual data dat and a data frame with all meta information in long format labels.
tibble
Function to import a tibble while extracting meta information, e.g. variable and value labels.
import_tibble( tibble, checkVarNames = TRUE, labeledStrings = c("drop", "keep", "transform") )import_tibble( tibble, checkVarNames = TRUE, labeledStrings = c("drop", "keep", "transform") )
tibble |
A |
checkVarNames |
Should variable names be checked for violations of |
labeledStrings |
Should strings as labeled values be allowed? If |
Tibbles may store variable and value labels as well as missing tags via the labelled class. import_tibble
restructures this meta information separately in a long format data.frame. Value labels and missing tags are used
to identify missing tags (see checkMissings). Time and date variables are converted to character.
Returns a list with the actual data dat and a data frame with all meta information in long format labels.
# Use spss data from within package spss_path <- system.file("extdata", "pisa.zsav", package = "eatGADS") pisa_gads <- import_spss(spss_path)# Use spss data from within package spss_path <- system.file("extdata", "pisa.zsav", package = "eatGADS") pisa_gads <- import_spss(spss_path)
GADSdat.Deprecated. Please use relocateVariable instead.
insertVariable(GADSdat, var, after = NULL)insertVariable(GADSdat, var, after = NULL)
GADSdat |
A |
var |
Character string of the variable name which should be sorted. |
after |
Character string of the variable name after which |
Inspect differences within a single GADSdat or between two GADSdat objects for a specific variable.
inspectDifferences( GADSdat, varName, other_GADSdat = GADSdat, other_varName = varName, id )inspectDifferences( GADSdat, varName, other_GADSdat = GADSdat, other_varName = varName, id )
GADSdat |
A |
varName |
A character vector of length 1 containing the variable name. |
other_GADSdat |
A second |
other_varName |
A character vector of length 1 containing the other variable name.
If omitted, it is assumed that both variables have identical names (as supplied in |
id |
A character vector of length 1 containing the unique identifier column of both |
Two GADSdat objects can be compared using equalGADS.
If differences in the data for specific variables in the two objects occur,
these variables can be further inspected using inspectDifferences.
Differences on meta data-level can be inspected via inspectMetaDifferences.
A list.
# create a second GADS with different data pisa2 <- pisa pisa2$dat$age[400:nrow(pisa$dat)] <- sample(pisa2$dat$age[400:nrow(pisa$dat)]) # inspect via equalGADS() equalGADS(pisa, pisa2) # inspect via inspectDifferences() inspectDifferences(GADSdat = pisa, varName = "age", other_GADSdat = pisa2, id = "idstud")# create a second GADS with different data pisa2 <- pisa pisa2$dat$age[400:nrow(pisa$dat)] <- sample(pisa2$dat$age[400:nrow(pisa$dat)]) # inspect via equalGADS() equalGADS(pisa, pisa2) # inspect via inspectDifferences() inspectDifferences(GADSdat = pisa, varName = "age", other_GADSdat = pisa2, id = "idstud")
Inspect meta data differences within a single GADSdat or between two GADSdat objects
or GADSdat data bases regarding a specific variable.
inspectMetaDifferences( GADSdat, varName, other_GADSdat = GADSdat, other_varName = varName )inspectMetaDifferences( GADSdat, varName, other_GADSdat = GADSdat, other_varName = varName )
GADSdat |
A |
varName |
A character vector of length 1 containing the variable name. |
other_GADSdat |
A second |
other_varName |
A character vector of length 1 containing the other variable name.
If omitted, it is assumed that both variables have identical names (as supplied in |
Two GADSdat objects can be compared using equalGADS.
If meta data differences for specific variables in the two objects occur,
these variables can be further inspected using inspectMetaDifferences.
For data-level differences for a specific variable, see inspectDifferences.
A list.
# create a second GADS with different meta data pisa2 <- pisa pisa2 <- changeVarLabels(pisa2, varName = "sameteach", varLabel = "Same math teacher") pisa2 <- recodeGADS(pisa2, varName = "sameteach", oldValues = c(1, 2), newValues = c(0, 1)) # inspect via equalGADS() equalGADS(pisa, pisa2) # inspect via inspectMetaDifferences() inspectMetaDifferences(GADSdat = pisa, varName = "sameteach", other_GADSdat = pisa2)# create a second GADS with different meta data pisa2 <- pisa pisa2 <- changeVarLabels(pisa2, varName = "sameteach", varLabel = "Same math teacher") pisa2 <- recodeGADS(pisa2, varName = "sameteach", oldValues = c(1, 2), newValues = c(0, 1)) # inspect via equalGADS() equalGADS(pisa, pisa2) # inspect via inspectMetaDifferences() inspectMetaDifferences(GADSdat = pisa, varName = "sameteach", other_GADSdat = pisa2)
eatGADS data base.Returns the variable and value labels of all variables in the eatGADS data base.
labelsGADS(filePath)labelsGADS(filePath)
filePath |
Path of the existing |
Variable, value and missing labels as stored in the original SPSS-files and factors from R files are converted to long format for
storage in the data base. labelsGADS returns them as a long format data frame.
Returns a long format data frame including variable names, labels, values, value labels and missing labels.
# Extract Meta data from data base db_path <- system.file("extdata", "pisa.db", package = "eatGADS") metaData <- labelsGADS(db_path)# Extract Meta data from data base db_path <- system.file("extdata", "pisa.db", package = "eatGADS") metaData <- labelsGADS(db_path)
Using variable labels, matchValues_varLabels matches a vector of regular expressions to a set of variable names.
matchValues_varLabels(GADSdat, mc_vars, values, label_by_hand = character(0))matchValues_varLabels(GADSdat, mc_vars, values, label_by_hand = character(0))
GADSdat |
A |
mc_vars |
A vector containing the names of the variables, which should be matched according to their variable labels. |
values |
A character vector containing the regular expressions for which the |
label_by_hand |
Additional value - |
Multiple choice items can be stored as multiple dichotomous variables with the information about the variable
stored in the variable labels. The function collapseMultiMC_Text can be used to collapse such dichotomous
variables and a character variable, but requires a character vector with variables names of the multiple choice variables.
matchValues_varLabels creates such a vector based on matching regular expressions (values) to variable labels.
Note that all variables in mc_vars have to be assigned exactly one value (and vice versa).
If a variable name is missing in the output,
an error will be thrown. In this case, the label_by_hand argument should be used to specify the regular expression
variable name pair manually.
Returns a named character vector. Values of the vector are the variable names in the GADSdat, names of the vector
are the regular expressions.
# Prepare example data mt2 <- data.frame(ID = 1:4, mc1 = c(1, 0, 0, 0), mc2 = c(0, 0, 0, 0), mc3 = c(0, 1, 1, 0), text1 = c(NA, "Eng", "Aus", "Aus2"), text2 = c(NA, "Franz", NA, NA), stringsAsFactors = FALSE) mt2_gads <- import_DF(mt2) mt3_gads <- changeVarLabels(mt2_gads, varName = c("mc1", "mc2", "mc3"), varLabel = c("Lang: Eng", "Aus spoken", "other")) out <- matchValues_varLabels(mt3_gads, mc_vars = c("mc1", "mc2", "mc3"), values = c("Aus", "Eng", "Eng"), label_by_hand = c("other" = "mc3"))# Prepare example data mt2 <- data.frame(ID = 1:4, mc1 = c(1, 0, 0, 0), mc2 = c(0, 0, 0, 0), mc3 = c(0, 1, 1, 0), text1 = c(NA, "Eng", "Aus", "Aus2"), text2 = c(NA, "Franz", NA, NA), stringsAsFactors = FALSE) mt2_gads <- import_DF(mt2) mt3_gads <- changeVarLabels(mt2_gads, varName = c("mc1", "mc2", "mc3"), varLabel = c("Lang: Eng", "Aus spoken", "other")) out <- matchValues_varLabels(mt3_gads, mc_vars = c("mc1", "mc2", "mc3"), values = c("Aus", "Eng", "Eng"), label_by_hand = c("other" = "mc3"))
GADSdat objects into a single GADSdat object.Is a secure way to merge the data and the meta data of two GADSdat objects.
Currently, only limited merging options are supported.
## S3 method for class 'GADSdat' merge( x, y, by, all = TRUE, all.x = all, all.y = all, missingValue = NULL, missingValLabel = NULL, ... )## S3 method for class 'GADSdat' merge( x, y, by, all = TRUE, all.x = all, all.y = all, missingValue = NULL, missingValLabel = NULL, ... )
x |
|
y |
|
by |
A character vector. |
all |
A character vector (either a full join or an inner join). |
all.x |
See merge. |
all.y |
See merge. |
missingValue |
A numeric value that is used to replace missing values introduced through the merge. |
missingValLabel |
The value label that is assigned to all variables into which |
... |
Further arguments are currently not supported but have to be included for |
If there are duplicate variables (except the variables specified in the by argument), these variables are removed from y.
The meta data is joined for the remaining variables via rbind.
The function supports automatically recoding missing values created through merging with a designated missing code
(missingValue) and a value label (missingValLabel).
Returns a GADSdat object.
Transform multiple GADSdat objects into a list ready for data base creation.
mergeLabels(...)mergeLabels(...)
... |
|
The function createGADS takes multiple GADSdat objects as input. The function preserves the ordering
in which the objects are supplied, which is then used for the merging order in createGADS. Additionally,
the separate lists of meta information for each GADSdat are merged and a data frame unique identifier is added.
Returns an all_GADSdat object, which consists of list with a list of all data frames "datList" and a single data frame containing all meta data information "allLabels".
# see createGADS vignette# see createGADS vignette
NA
Recode Missings to NA according to missing labels in label data.frame.
miss2NA(GADSdat)miss2NA(GADSdat)
GADSdat |
A |
Missings are imported as their values via import_spss. Using the value labels in the labels data.frame,
miss2NA recodes these missings codes to NA. This function is mainly intended for internal use.
Returns a data.frame with NA instead of missing codes.
Convert one or multiple character variables to factors. If multiple variables are converted, a common set of value labels is created, which is identical across variables. Existing value labels are preserved.
multiChar2fac( GADSdat, vars, var_suffix = "_r", label_suffix = "(recoded)", convertCases = NULL )multiChar2fac( GADSdat, vars, var_suffix = "_r", label_suffix = "(recoded)", convertCases = NULL )
GADSdat |
A |
vars |
A character vector with all variables that should be transformed to factor. |
var_suffix |
Variable suffix for the newly created |
label_suffix |
Suffix added to variable label for the newly created variable in the |
convertCases |
Should cases be transformed for all variables? Default |
If a set of variables has the same possible values, it is desirable that these variables share the same
value labels, even if some of the values do not occur on the individual variables. This function allows
the transformation of multiple character variables to factors while assimilating the value labels.
The SPSS format of the newly created variables is set to F10.0.
A current limitation of the function is that prior to the conversion, all variables specified in vars must have identical
meta data on value level (value labels and missing tags).
If necessary, missing codes can be set after transformation via checkMissings for setting missing codes
depending on value labels for all variables or
changeMissings for setting missing codes for specific values in a specific variable.
The argument convertCases uses the function convertCase internally. See the respective documentation for more details.
Returns a GADSdat containing the newly computed variable.
## create an example GADSdat example_df <- data.frame(ID = 1:4, citizenship1 = c("German", "English", "missing by design", "Chinese"), citizenship2 = c("missing", "German", "missing by design", "Polish"), stringsAsFactors = FALSE) gads <- import_DF(example_df) ## transform one character variable gads2 <- multiChar2fac(gads, vars = "citizenship1") ## transform multiple character variables gads2 <- multiChar2fac(gads, vars = c("citizenship1", "citizenship2")) ## set values to missings gads3 <- checkMissings(gads2, missingLabel = "missing")## create an example GADSdat example_df <- data.frame(ID = 1:4, citizenship1 = c("German", "English", "missing by design", "Chinese"), citizenship2 = c("missing", "German", "missing by design", "Polish"), stringsAsFactors = FALSE) gads <- import_DF(example_df) ## transform one character variable gads2 <- multiChar2fac(gads, vars = "citizenship1") ## transform multiple character variables gads2 <- multiChar2fac(gads, vars = c("citizenship1", "citizenship2")) ## set values to missings gads3 <- checkMissings(gads2, missingLabel = "missing")
Variables names of a GADSdat object, a all_GADSdat object or a eatGADS data base.
namesGADS(GADS)namesGADS(GADS)
GADS |
A |
If the function is applied to a GADSdat object, a character vector with all variable names is returned. If the function is
applied to a all_GADSdat object or to the path of a eatGADS data base, a named list is returned. Each list entry
represents a data table in the object.
Returns a character vector or a named list of character vectors.
# Extract variable names from data base db_path <- system.file("extdata", "pisa.db", package = "eatGADS") namesGADS(db_path) # Extract variable names from loaded/imported GADS namesGADS(pisa)# Extract variable names from data base db_path <- system.file("extdata", "pisa.db", package = "eatGADS") namesGADS(db_path) # Extract variable names from loaded/imported GADS namesGADS(pisa)
GADSdat.Order the variables in a GADSdat according to a character vector. If there are discrepancies between the two sets, a warning is issued.
orderLike(GADSdat, newOrder)orderLike(GADSdat, newOrder)
GADSdat |
A |
newOrder |
A character vector containing the order of variables. |
The variables in the dat and in the labels section are ordered. Variables not contained in the character vector are moved to the end of the data.
Returns a GADSdat object.
A small example data set from the German PISA Plus campus files as distributed by the Forschungsdatenzentrum, IQB.
pisapisa
A data.frame with 500 rows and 133 variables, including:
Person ID variable
School ID variable
School type
Research Data Center at the Institute for Educational Quality Improvement (2020). Programme for International Student Assessment - Plus 2012, 2013 (PISA Plus 2012-2013) - Campus File (Version 1) [Data set]. Berlin: Institute for Educational Quality Improvement. doi:10.5159/IQB_PISA_Plus_2012-13_CF_v1
Different programs impose different limits to different components of their datasets.
Additionally, limits may vary between software versions. This primarily applies to
Stata's product tiers, but also to (very) old SPSS versions. eatGADS
offers a number of check functions - chiefly check4SPSS and
check4Stata as wrappers - to ensure a GADSdat complies with these limits,
and can be exported into an SPSS or Stata file.
Use getProgramLimit for a more convenient interface for obtaining specific limits.
program_limitsprogram_limits
A data.frame listing relevant limits (see details) imposed to datasets by
SPSS and Stata.
While datasets have several components and characteristics, the following were deemed the most important and their limits implemented in this package's checks:
varNames: length of variable names
varLabels: length of variable labels
valLabels: length of value labels
stringvars: length of strings in character variables
nrows: number of observations
ncols: number of variables
While SPSS has only one set of limits (disregarding legacy limits for older versions),
Stata employs different limits for different product versions [1]. Within this package,
"SPSS" always implies SPSS 30, and "Stata" implies Stata 19/SE.
Limits of Stata 19/BE and Stata 19/MP are implemented as additional options.
However, no additional versions of SPSS have been implemented yet.
[1] Stata: Comparison of limits
NA.Recode multiple values in multiple variables in a GADSdat to NA.
recode2NA(GADSdat, recodeVars = namesGADS(GADSdat), value = "")recode2NA(GADSdat, recodeVars = namesGADS(GADSdat), value = "")
GADSdat |
A |
recodeVars |
Character vector of variable names which should be recoded. |
value |
Which values should be recoded to |
If there are value labels given to the specified value, a warning is issued. Number of recodes per variable are reported.
If a data set is imported from .sav, character variables frequently contain empty strings. Especially if parts of the
data are written to .xlsx, this can cause problems (e.g. as lookup tables from createLookup),
as most function which write to .xlsx convert empty strings to NAs. recodeString2NA can be
used to recode all empty strings to NA beforehand.
Returns the recoded GADSdat.
# create example GADS dat <- data.frame(ID = 1:4, var1 = c("", "Eng", "Aus", "Aus2"), var2 = c("", "French", "Ger", "Ita"), stringsAsFactors = FALSE) gads <- import_DF(dat) # recode empty strings gads2 <- recode2NA(gads) # recode numeric value gads3 <- recode2NA(gads, recodeVars = "ID", value = 1:3)# create example GADS dat <- data.frame(ID = 1:4, var1 = c("", "Eng", "Aus", "Aus2"), var2 = c("", "French", "Ger", "Ita"), stringsAsFactors = FALSE) gads <- import_DF(dat) # recode empty strings gads2 <- recode2NA(gads) # recode numeric value gads3 <- recode2NA(gads, recodeVars = "ID", value = 1:3)
Recode one or multiple variables as part of a GADSdat or all_GADSdat object.
recodeGADS( GADSdat, varName, oldValues, newValues, existingMeta = c("stop", "value", "value_new", "drop", "ignore") )recodeGADS( GADSdat, varName, oldValues, newValues, existingMeta = c("stop", "value", "value_new", "drop", "ignore") )
GADSdat |
|
varName |
Character vector containing variable names. |
oldValues |
Vector containing the old values. |
newValues |
Vector containing the new values (in the respective order as |
existingMeta |
If values are recoded, which meta data should be used (see details)? |
Applied to a GADSdat or all_GADSdat object, this function is a wrapper of getChangeMeta
and applyChangeMeta. Beyond that, unlabeled variables and values are recoded as well.
oldValues and newValues are matched by ordering in the function call.
If changes are performed on value levels, recoding into existing values can occur.
In these cases, existingMeta determines how the resulting meta data conflicts are handled,
either raising an error if any occur ("stop"),
keeping the original meta data for the value ("value"),
using the meta data in the changeTable and, if incomplete, from the recoded value ("value_new"),
or leaving the respective meta data untouched ("ignore").
Furthermore, one might recode multiple old values in the same new value. This is currently only possible with
existingMeta = "drop", which drops all related meta data on value level, or
existingMeta = "ignore", which leaves all related meta data on value level untouched.
Missing values (NA) are supported in oldValues but not in newValues. For recoding values to
NA see recode2NA instead.
For recoding character variables, using lookup tables via createLookup is recommended. For changing
value labels see changeValLabels.
Returns a GADSdat.
# Example gads example_df <- data.frame(ID = 1:5, color = c("blue", "blue", "green", "other", "other"), animal = c("dog", "Dog", "cat", "hors", "horse"), age = c(NA, 16, 15, 23, 50), stringsAsFactors = FALSE) example_df$animal <- as.factor(example_df$animal) gads <- import_DF(example_df) # simple recode gads2 <- recodeGADS(gads, varName = "animal", oldValues = c(3, 4), newValues = c(7, 8))# Example gads example_df <- data.frame(ID = 1:5, color = c("blue", "blue", "green", "other", "other"), animal = c("dog", "Dog", "cat", "hors", "horse"), age = c(NA, 16, 15, 23, 50), stringsAsFactors = FALSE) example_df$animal <- as.factor(example_df$animal) gads <- import_DF(example_df) # simple recode gads2 <- recodeGADS(gads, varName = "animal", oldValues = c(3, 4), newValues = c(7, 8))
NAs to Missing.Recode NAs in multiple variables in a GADSdat to a numeric value with a value label and a missing tag.
recodeNA2missing( GADSdat, recodeVars = namesGADS(GADSdat), value = -99, valLabel = "missing" )recodeNA2missing( GADSdat, recodeVars = namesGADS(GADSdat), value = -99, valLabel = "missing" )
GADSdat |
A |
recodeVars |
Character vector of variable names which should be recoded. |
value |
Which value should |
valLabel |
Which value label should |
The value label and missing tag are only added to variables which contain NAs and which have been recoded.
If a variable has an existing value label for value, the existing value label is overwritten and a missing tag is added.
A corresponding warning is issued.
Returns the recoded GADSdat.
# create example GADS dat <- data.frame(ID = 1:4, age = c(NA, 18, 21, 23), siblings = c(0, 2, NA, NA)) gads <- import_DF(dat) # recode NAs gads2 <- recodeNA2missing(gads)# create example GADS dat <- data.frame(ID = 1:4, age = c(NA, 18, 21, 23), siblings = c(0, 2, NA, NA)) gads <- import_DF(dat) # recode NAs gads2 <- recodeNA2missing(gads)
NA.Deprecated, use recode2NA instead..
recodeString2NA(GADSdat, recodeVars = namesGADS(GADSdat), string = "")recodeString2NA(GADSdat, recodeVars = namesGADS(GADSdat), string = "")
GADSdat |
A |
recodeVars |
Character vector of variable names which should be recoded. |
string |
Which string should be recoded to |
Returns the recoded GADSdat.
GADSdat.Reorder a single variable in a GADSdat. The variable (var) can be inserted right after another variable (after) or at the beginning
of the GADSdat via after = NULL.
relocateVariable(GADSdat, var, after = NULL)relocateVariable(GADSdat, var, after = NULL)
GADSdat |
A |
var |
Character string of the variable name which should be sorted. |
after |
Character string of the variable name after which |
The variables in the dat and in the labels section are ordered. For reordering the whole GADSdat, see
orderLike.
Returns a GADSdat object.
# Insert variable 'migration' after variable 'idclass' pisa2 <- relocateVariable(pisa, var = "migration", after = "idclass") # Insert variable 'idclass' at the beginning of the data set pisa2 <- relocateVariable(pisa, var = "idclass", after = NULL)# Insert variable 'migration' after variable 'idclass' pisa2 <- relocateVariable(pisa, var = "migration", after = "idclass") # Insert variable 'idclass' at the beginning of the data set pisa2 <- relocateVariable(pisa, var = "idclass", after = NULL)
Shorten text variables from a certain number on while coding overflowing answers as complete missings.
remove2NAchar(GADSdat, vars, max_num = 2, na_value, na_label)remove2NAchar(GADSdat, vars, max_num = 2, na_value, na_label)
GADSdat |
A |
vars |
A character vector with the names of the text variables. |
max_num |
Maximum number of text variables. Additional text variables will be removed and NA codes given accordingly. |
na_value |
Which NA value should be given in cases of too many values on text variables. |
na_label |
Which value label should be given to the |
In some cases, multiple text variables contain the information of one variable (e.g. multiple answers to an open item).
If this is a case, sometimes the number text variables displaying this variable should be limited. remove2NAchar
allows shortening multiple character variables, this means character variables after max_num are removed
from the GADSdat. Cases, which had valid responses on these removed variables are coded as missings (using
na_value and na_label).
Returns the modified GADSdat.
## create an example GADSdat example_df <- data.frame(ID = 1:4, citizenship1 = c("German", "English", "missing by design", "Chinese"), citizenship2 = c(NA, "German", "missing by design", "Polish"), citizenship3 = c(NA, NA, NA, "German"), stringsAsFactors = FALSE) gads <- import_DF(example_df) ## shorten character variables gads2 <- remove2NAchar(gads, vars = c("citizenship1", "citizenship2", "citizenship3"), na_value = -99, na_label = "missing: too many answers")## create an example GADSdat example_df <- data.frame(ID = 1:4, citizenship1 = c("German", "English", "missing by design", "Chinese"), citizenship2 = c(NA, "German", "missing by design", "Polish"), citizenship3 = c(NA, NA, NA, "German"), stringsAsFactors = FALSE) gads <- import_DF(example_df) ## shorten character variables gads2 <- remove2NAchar(gads, vars = c("citizenship1", "citizenship2", "citizenship3"), na_value = -99, na_label = "missing: too many answers")
Remove unused value labels and missing tags of a variable as part of a GADSdat object.
removeEmptyValLabels(GADSdat, vars, whichValLabels = c("miss", "valid", "all"))removeEmptyValLabels(GADSdat, vars, whichValLabels = c("miss", "valid", "all"))
GADSdat |
|
vars |
Character string of variable names. |
whichValLabels |
Should unused missing value tags and labels ( |
Returns the GADSdat object with changed meta data.
gads <- import_DF(data.frame(v1 = 1)) gads <- changeMissings(gads, varName = "v1", value = c(-99, -98), missings = c("miss", "miss")) gads <- changeValLabels(gads, varName = "v1", value = c(-99), valLabel = c("not reached")) gads2 <- removeEmptyValLabels(gads, vars = "v1")gads <- import_DF(data.frame(v1 = 1)) gads <- changeMissings(gads, varName = "v1", value = c(-99, -98), missings = c("miss", "miss")) gads <- changeValLabels(gads, varName = "v1", value = c(-99), valLabel = c("not reached")) gads2 <- removeEmptyValLabels(gads, vars = "v1")
Remove meta data for specific values (value) of a single variable (varName).
This includes value labels and missings tags.
removeValLabels(GADSdat, varName, value, valLabel = NULL)removeValLabels(GADSdat, varName, value, valLabel = NULL)
GADSdat |
|
varName |
Character string of a variable name. |
value |
Numeric values. |
valLabel |
[optional] Regular expressions in the value labels corresponding to |
If the argument valLabel is provided, the function checks for value and valLabel pairs in the
meta data that match both arguments.
Returns the GADSdat object with changed meta data.
# Remove a label based on value extractMeta(pisa, "schtype") pisa2 <- removeValLabels(pisa, varName = "schtype", value = 1) extractMeta(pisa2, "schtype") # Remove multiple labels based on value extractMeta(pisa, "schtype") pisa3 <- removeValLabels(pisa, varName = "schtype", value = 1:3) extractMeta(pisa3, "schtype") # Remove multiple labels based on value - valLabel combination extractMeta(pisa, "schtype") pisa4 <- removeValLabels(pisa, varName = "schtype", value = 1:3, valLabel = c("Gymnasium", "other", "several courses")) extractMeta(pisa4, "schtype")# Remove a label based on value extractMeta(pisa, "schtype") pisa2 <- removeValLabels(pisa, varName = "schtype", value = 1) extractMeta(pisa2, "schtype") # Remove multiple labels based on value extractMeta(pisa, "schtype") pisa3 <- removeValLabels(pisa, varName = "schtype", value = 1:3) extractMeta(pisa3, "schtype") # Remove multiple labels based on value - valLabel combination extractMeta(pisa, "schtype") pisa4 <- removeValLabels(pisa, varName = "schtype", value = 1:3, valLabel = c("Gymnasium", "other", "several courses")) extractMeta(pisa4, "schtype")
GADSdat.Transfer meta information from one GADSdat to another for one or multiple variables.
reuseMeta( GADSdat, varName, other_GADSdat, other_varName = NULL, missingLabels = NULL, addValueLabels = FALSE )reuseMeta( GADSdat, varName, other_GADSdat, other_varName = NULL, missingLabels = NULL, addValueLabels = FALSE )
GADSdat |
|
varName |
Character vector with the names of the variables that should get the new meta data. |
other_GADSdat |
|
other_varName |
Character vector with the names of the variables in |
missingLabels |
How should meta data for missing values be treated? If |
addValueLabels |
Should only value labels be added and all other meta information retained? |
Transfer of meta information can mean substituting the complete meta information, only adding value labels, adding only
"valid" or adding only "miss" missing labels.
See the arguments missingLabels and addValueLabels for further details.
Returns the original object with updated meta data.
# see createGADS vignette# see createGADS vignette
GADSdat into hierarchy levels.Split a GADSdat into multiple, specified hierarchical levels.
splitGADS(GADSdat, nameList)splitGADS(GADSdat, nameList)
GADSdat |
A |
nameList |
A list of character vectors. The names in the list correspond the the hierarchy levels. |
The function takes a GADSdat object and splits it into its desired hierarchical levels (a all_GADSdat object).
Hierarchy level of a variable is also accessible in the meta data via the column data_table. If not all variable names
are included in the nameList, the missing variables will be dropped.
Returns an all_GADSdat object, which consists of list with a list of all data frames "datList" and
a single data frame containing all meta data information "allLabels". For more details see also mergeLabels.
# see createGADS vignette# see createGADS vignette
Transform a string variable within a GADSdat or all_GADSdat object to a numeric variable.
stringAsNumeric(GADSdat, varName)stringAsNumeric(GADSdat, varName)
GADSdat |
|
varName |
Character string of a variable name. |
Applied to a GADSdat or all_GADSdat object, this function uses asNumericIfPossible to
change the variable class and changes the format column in the meta data.
Returns the GADSdat object with with the changed variable.
Substitute imputed values in a imputed GADSdat_imp object with original, not imputed values from a GADSdat.
subImputations(GADSdat, GADSdat_imp, varName, varName_imp = varName, id, imp)subImputations(GADSdat, GADSdat_imp, varName, varName_imp = varName, id, imp)
GADSdat |
A |
GADSdat_imp |
A |
varName |
A character vector of length 1 containing the variable name in |
varName_imp |
A character vector of length 1 containing the variable name in |
id |
A character vector of length 1 containing the unique identifier column of both |
imp |
A character vector of length 1 containing the imputation number in |
There are two cases in which values are substituted: (a) there are missings in varName_imp, (b) values have been imputed
even though there is valid information in varName.
The modified GADSdat_imp..
# tbd# tbd
Update the meta data of a GADSdat or all_GADSdat object according to the variables in a new data object.
updateMeta(GADSdat, newDat, checkVarNames = TRUE)updateMeta(GADSdat, newDat, checkVarNames = TRUE)
GADSdat |
|
newDat |
|
checkVarNames |
Logical. Should new variable names be checked by |
If the data of a GADSdat or a all_GADSdat has changed (supplied via newDat), updateMeta
assimilates the corresponding meta data set. If variables have been removed, the corresponding meta data is also removed.
If variables have been added, empty meta data is added for these variables. Factors are transformed to numerical
and their levels added to the meta data set.
Returns the original object with updated meta data (and removes factors from the data).
# see createGADS vignette# see createGADS vignette
GADSdat object to a fileWrite a GADSdat object, which contains meta information as value and variable labels to an SPSS file (sav)
or Stata file (dta).
See 'details' for some important limitations.
write_spss(GADSdat, filePath) write_stata(GADSdat, filePath)write_spss(GADSdat, filePath) write_stata(GADSdat, filePath)
GADSdat |
A |
filePath |
Path of |
The provided functionality relies on havens write_sav and
write_dta functions.
Currently known limitations for write_spss are:
a) value labels for long character variables (> A10) are dropped,
b) under specific conditions very long character variables (> A254) are incorrectly
displayed as multiple character variables in SPSS,
c) exporting date or time variables is currently not supported,
d) missing tags are slightly incompatible between SPSS and eatGADS
as eatGADS supports unlimited discrete missing tags (but no range of missing tags) and
SPSS only supports up to three discrete missing tags or ranges of missing tags. For this purpose, if a variable
is assigned more than three discrete missing tags, write_spss() (more precisely export_tibble)
performs a silent conversion of the discrete missing tags into a missing range.
If this conversion affects other value labels or values in the data not tagged as missing, an error is issued.
Currently known limitations for write_stata are:
a) Variable format is dropped,
b) missing codes are dropped.
Writes file to disc, returns NULL.
# write to spss tmp <- tempfile(fileext = ".sav") write_spss(pisa, tmp) # write to stata tmp <- tempfile(fileext = ".dta") write_stata(pisa, tmp)# write to spss tmp <- tempfile(fileext = ".sav") write_spss(pisa, tmp) # write to stata tmp <- tempfile(fileext = ".dta") write_stata(pisa, tmp)
GADSdat object to txt and SPSS syntaxWrite a GADSdat object to a text file (txt) and an accompanying SPSS syntax file containing all meta information (e.g. value and variable labels).
write_spss2( GADSdat, txtPath, spsPath = NULL, savPath = NULL, dec = ".", fileEncoding = "UTF-8", chkFormat = TRUE, ... )write_spss2( GADSdat, txtPath, spsPath = NULL, savPath = NULL, dec = ".", fileEncoding = "UTF-8", chkFormat = TRUE, ... )
GADSdat |
A |
txtPath |
Path of |
spsPath |
Path of |
savPath |
Path of |
dec |
Decimal delimiter for your SPSS version. Other values for |
fileEncoding |
Data file encoding for SPSS. Default is |
chkFormat |
Whether format checks via |
... |
Arguments to pass to |
This function is based on eatPreps writeSpss function and is currently under development.
Writes a txt and an sav file to disc, returns nothing.
# write to spss tmp_txt <- tempfile(fileext = ".txt") write_spss2(pisa, txtPath = tmp_txt)# write to spss tmp_txt <- tempfile(fileext = ".txt") write_spss2(pisa, txtPath = tmp_txt)