--- title: "Typical Use of `eatATA`: a Pilot Study Example" author: "Benjamin Becker, Dries Debeer" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Typical Use of `eatATA`: a Pilot Study Example} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` `eatATA` efficiently translates test design requirements for Automated Test Assembly (`ATA`) into constraints for a Mixed Integer Linear Programming Model (MILP). A number of efficient and user-friendly functions are available that translate conceptual test assembly constraints to constraint objects for MILP solvers, like the `GLPK` solver. In the remainder of this vignette I will illustrate the typical use of `eatATA` using a case-based example. A general overview over `eatATA` can be found in the vignette [Overview of `eatATA` Functionality](overview.html). ## Setup The `eatATA` package can be installed from `CRAN`. ```{r installation, eval = FALSE} install.packages("eatATA") ``` First, `eatATA` is loaded into the `R` session. ```{r library, message = FALSE} # loading eatATA library(eatATA) ``` ## Item pool No `ATA` without an item pool. In this example we use a fictional example item pool of 80 items. The item pool information is stored as an `excel` file that is included in the package. To import the item pool information into `R` we recommend using the package `readxl`. This package imports the data as a `tibble`, but in the code below, the item pool is immediately transformed into a `data.frame`. Note that `R` requires a rectangular data set. Yet, often excel files store additional information in rows above or below the "rectangular" item pool information. The `skip` argument in the `read_excel()` function can be used to skip unnecessary rows in the `excel` file. (Note that the item pool can also be directly accessed in the package via `items`; see `?items` for more information.) ```{r import items, message = FALSE} items_path <- system.file("extdata", "items.xlsx", package = "eatATA") items <- as.data.frame(readxl::read_excel(path = items_path), stringsAsFactors = FALSE) ``` Inspection of the item pool indicates that the items have different properties: item format (`MC`, `CMC`, `short_answer`, or `open`), difficulty (`diff_1` - `diff_5`), average response times in minutes (`time`). In addition, similar items can not be in the same booklet or test form. This information is stored in the column `exclusions`, which indicates which items are too similar and should not be in the same booklet with the item in that row.. ```{r inspect items, message = FALSE} head(items) ``` ## Prepare item information Before defining the constraints, item pool data has to be in the correct format. For instance, some dummy variables (indicator variables) in the item pool use both `NA` and `0` to indicate "the category does not apply". Therefore, the dummy variables should be transformed so that there are only two values (1 = "the category applies", and 0 = "the category does not apply"). Often a set of dummy variables can be summarized into a single factor variable. This is automatically done by the function `dummiesToFactor()`. However, the function can only be used when the categories are mutually exclusive. For instance, in the example item pool, items can contain sub-items with different format or difficulties. As a result, some items contain two sub-items with different formats. Therefore, in this example, the `dummiesToFactor()` function throws an error and cannot be used. ```{r dummies to factors, error = TRUE} # clean data set (categorical dummy variables must contain only 0 and 1) items <- dummiesToFactor(items, dummies = c("MC", "CMC", "short_answer", "open"), facVar = "itemFormat") items <- dummiesToFactor(items, dummies = paste0("diff_", 1:5), facVar = "itemDiff") items[c(24, 33, 37, 47, 48, 54, 76), ] ``` In addition, the column `short_answer` can have `r max(items$short_answer)` as a value, and is consequently not a dummy variable. Therefore, we will (a) treat `short_answer` as a numerical value, (b) collapse `MC` and `open` into a new factor `MC_open_none`, (these dummies are mutually exclusive), and (c) turn `CMC` and the difficulty indicators into factors. (See `?autoItemValuesMinMax` and `?computeTargetValues` for further information on the different treatment of factors and numerical variables.) ```{r data cleaning, message = FALSE} # make new factor with three levels: "MC", "open" and "else" items <- dummiesToFactor(items, dummies = c("MC", "open"), facVar = "MC_open_none") # clean data set (NA should be 0) for(ty in c(paste0("diff_", 1:5), "CMC", "short_answer")){ items[, ty] <- ifelse(is.na(items[, ty]), yes = 0, no = items[, ty]) } # make factors of CMC dummi items$f_CMC <- factor(items$CMC, labels = paste("CMC", c("no", "yes"), sep = "_")) # example item format table(items$short_answer) ``` ## `ATA` goal In this example, the goal is to assemble 14 booklets out of the 80 items item pool. All items should be assigned to one (and only one booklet), so that there is no item overlap and the item pool is completely depleted. To be more precise, the required constraints are: * no item overlap between test blocks * complete item pool depletion * equal distribution of item formats across test blocks * equal difficulty distribution across test blocks * some items can not be together in the same booklet (item exclusions) * as similar as possible response times across booklets Note that the booklets will later be manually assembled to test forms. For ease of use, we set up two variables that we will use frequently: the number of test forms or booklets to be created (`nForms`) and the number of items in the item pool (`nItems`). ```{r fixed variables, message = FALSE} # set up fixed variables nItems <- nrow(items) # number of items nForms <- 14 # number of blocks ``` `eatATA` offers a variety of functions that automatically compute the constraints mentioned above and which are illustrated in the following. ## Objective Function First, we are setting up an optimization constraint. This constraint is not a clear *yes* or *no* constraint, and it does not have to be attained perfectly. Instead, the solver will minimize the distance of the actual booklet value for all booklets towards a target value. In our example, we specify 10 minutes as the target response time `time` for all booklets. ```{r target constraints} # optimize average time av_time <- minimaxObjective(nForms, itemValues = items$time, targetValue = 10, itemIDs = items$item) ``` ## Set up constraints The first two constraints (no item overlap and item pool depletion) can be implemented by a single function: `itemUsageConstraint()`. To achieve this, the `operator` argument should be set to `"="`, meaning that every item should be used exactly once in the booklet assembly. ```{r item usage constraints} itemOverlap <- itemUsageConstraint(nForms, targetValue = 1, operator = "=", itemIDs = items$item) ``` Constraints with respect to categorical variables or factors (like `MC_open_none`) or numerical variables (like `short_answer`), can be easily implemented using the `autoItemValuesMinMax()` function. The result of this function depends on whether a factor or a numerical variable is used. That is, `autoItemValuesMinMax()` automatically determines the minimum and maximum frequency of each category of a factor. But for numerical variables, it automatically determines the target value. The `allowedDeviation` argument specifies the allowed range between booklets regarding the category or the numerical value. If the argument is omitted, it defaults to "no deviation is allowed" for numerical values, and to the minimal possible deviation for categorical variables or factors. Hence, for numeric values, we specify `allowedDeviation = 1`. The function prints the calculated target value or the resulting allowed value range on booklet level. ```{r categorical constraints} # item formats mc_openItems <- autoItemValuesMinMaxConstraint(nForms = nForms, itemValues = items$MC_open_none, itemIDs = items$item) cmcItems <- autoItemValuesMinMaxConstraint(nForms = nForms, itemValues = items$f_CMC, itemIDs = items$item) saItems <- autoItemValuesMinMaxConstraint(nForms = nForms, itemValues = items$short_answer, allowedDeviation = 1, itemIDs = items$item) # difficulty categories Items1 <- autoItemValuesMinMaxConstraint(nForms = nForms, itemValues = items$diff_1, allowedDeviation = 1, itemIDs = items$item) Items2 <- autoItemValuesMinMaxConstraint(nForms = nForms, itemValues = items$diff_2, allowedDeviation = 1, itemIDs = items$item) Items3 <- autoItemValuesMinMaxConstraint(nForms = nForms, itemValues = items$diff_3, allowedDeviation = 1, itemIDs = items$item) Items4 <- autoItemValuesMinMaxConstraint(nForms = nForms, itemValues = items$diff_4, allowedDeviation = 1, itemIDs = items$item) Items5 <- autoItemValuesMinMaxConstraint(nForms = nForms, itemValues = items$diff_5, allowedDeviation = 1, itemIDs = items$item) ``` To implement item exclusion constraints, two function can be used: `itemExclusionTuples()` and `itemExclusionConstraint()`. When item exclusions are supplied as a single character string for each item, with item identifiers separated by `", "`, they should be transformed first. ```{r exclusion demo} # item exclusions variable items$exclusions[1:5] ``` This transformation can be done using the `itemExclusionTuples()` function, which creates so called *tuples*: pairs of exclusive items. These *tuples* can be used directly with the `itemExclusionConstraint()` function. ```{r exclusion constraints} # item exclusions exclusionTuples <- itemTuples(items, idCol = "item", infoCol = "exclusions", sepPattern = ", ") excl_constraints <- itemExclusionConstraint(nForms = 14, itemTuples = exclusionTuples, itemIDs = items$item) ``` Another helpful function is the `itemsPerFormConstraint()` function, which constrains the number of items per booklet. However, since this is not required in this example, we will not use these constraints in the final `ATA` constraints. ```{r item numbers constraints} # number of items per test form min_Nitems <- floor(nItems / nForms) - 3 noItems <- itemsPerFormConstraint(nForms = nForms, operator = ">=", targetValue = min_Nitems, itemIDs = items$item) ``` ## Run solver Before calling the optimization algorithm the specified constraints are collected in a `list`. ```{r prepare constraints, eval = T} # Prepare constraints constr_list <- list(itemOverlap, mc_openItems, cmcItems, saItems, Items1, Items2, Items3, Items4, Items5, excl_constraints, av_time) ``` Now we can call the `useSolver()` function, which restructures the constraints internally and solves the optimization problem. Using the `solver` argument we specify `GLPK` as the solver (other alternatives are `lpSolve`, `Symphony` and `Gurobi`). Using the `timeLimit` argument we set the time limit to 10. This means we limit the solver to stop searching for an optimal solution after 10 seconds. Note that the computation times might depend on the solver you have selected. ```{r solver_local, eval = FALSE, echo = FALSE} # Local optimization chunk; prevents occasional errors on servers if no feasible solution is found output_local <- capture.output(solver_raw_local <- useSolver(constr_list, nForms = nForms, nItems = nItems, itemIDs = items$item, solver = "GLPK", timeLimit = 30)) saveRDS(solver_raw_local, "use_case_solver_out.RDS") saveRDS(output_local, "use_case_output.RDS") ``` ```{r solver, eval = FALSE, message=TRUE} # Optimization solver_raw <- useSolver(constr_list, nForms = nForms, nItems = nItems, itemIDs = items$item, solver = "GLPK", timeLimit = 10) ``` ```{r solver_load, eval = TRUE, echo = FALSE} solver_raw <- readRDS("use_case_solver_out.RDS") output_local <- readRDS("use_case_output.RDS") output_local message("The solution is feasible, but may not be optimal") ``` The function provides output that indicates whether an optimal solution has been found. In our case, a viable solution has been found but the function reached the time limit before finding the optimal solution. If there is no feasible solution, one option is to relax some of the constraints. Further, for first diagnostic purposes you can omit some constraints completely, to see which constraints are especially challenging. If you have a better grasp of the possibilities of the item pool, you can add these constraints back, but for example with larger `allowedDeviations`. ## Inspect Solution The solution provided by `eatATA` can be inspected using the `inspectSolution()` function. It allows us to inspect the assembled item blocks at a first glance, including some column sums. ```{r inspect solution} out_list <- inspectSolution(solver_raw, items = items, idCol = "item", colSums = TRUE, colNames = c("time", "subitems", "MC", "CMC", "short_answer", "open", paste0("diff_", 1:5))) # first two booklets out_list[1:2] ``` In our case we want to assemble the created booklets into test forms. Therefore, we are interested in booklet exclusions that can result from item exclusions. The `analyzeBlockExclusion()` function can be used to obtain tuples with booklet exclusions. ```{r block exclusions} analyzeBlockExclusion(solverOut = solver_raw, item = items, idCol = "item", exclusionTuples = exclusionTuples) ``` ## Save as Excel To save the item distribution on blocks or test forms, we can use the `appendSolution()` function. The function simply merges the new variables containing the solution to the test assembly problem to the original item pool. ```{r append solution} out_df <- appendSolution(solver_raw, items = items, idCol = "item") ``` Finally, when the solution should be exported as an `excel` file (`.xlsx`), this can, for example, be achieved via the `eatAnalysis` package, which has to be installed from `Github`. ```{r export solution to excel, eval = FALSE} devtools::install_github("beckerbenj/eatAnalysis") eatAnalysis::write_xlsx(out_df, filePath = "example_excel.xlsx", row.names = FALSE) ```