future.library(fxtract) xtractor = Xtractor$new("xtractor")
Data must be added as dataframes with $add_data, where the grouping variable must be specified. You can also add dataframes for each ID individually. This is especially helpful for large datasets.
xtractor$add_data(iris, group_by = "Species")
Features must be added as functions which have a dataframe as input and a named vector as output. A named list with atomic entries of length 1 is also allowed as output (useful for numerical and categorical outputs). This function will be calculated for each ID of a grouping variable individually.
fun1 = function(data) { c(mean_sepal_length = mean(data$Sepal.Length), sd_sepal_length = sd(data$Sepal.Length)) } fun2 = function(data) { list(mean_petal_length = mean(data$Petal.Length), sd_petal_length = sd(data$Petal.Length)) }
xtractor$add_feature(fun1) xtractor$add_feature(fun2)
The desired final dataframe can be accessed by the slot $results:
xtractor$results
##      Species mean_sepal_length sd_sepal_length mean_petal_length
## 1     setosa             5.006       0.3524897             1.462
## 2 versicolor             5.936       0.5161711             4.260
## 3  virginica             6.588       0.6358796             5.552
##   sd_petal_length
## 1       0.1736640
## 2       0.4699110
## 3       0.5518947Parallelization is realized with the package future Feature calculation and preprocessing data will be parallelized. For Windows and Linux machines you can parallelize like the following: