future
.library(fxtract) xtractor = Xtractor$new("xtractor")
Data must be added as dataframes with $add_data
, where the grouping variable must be specified. You can also add dataframes for each ID individually. This is especially helpful for large datasets.
xtractor$add_data(iris, group_by = "Species")
Features must be added as functions which have a dataframe as input and a named vector as output. A named list with atomic entries of length 1 is also allowed as output (useful for numerical and categorical outputs). This function will be calculated for each ID of a grouping variable individually.
fun1 = function(data) { c(mean_sepal_length = mean(data$Sepal.Length), sd_sepal_length = sd(data$Sepal.Length)) } fun2 = function(data) { list(mean_petal_length = mean(data$Petal.Length), sd_petal_length = sd(data$Petal.Length)) }
xtractor$add_feature(fun1) xtractor$add_feature(fun2)
The desired final dataframe can be accessed by the slot $results
:
xtractor$results
## Species mean_sepal_length sd_sepal_length mean_petal_length
## 1 setosa 5.006 0.3524897 1.462
## 2 versicolor 5.936 0.5161711 4.260
## 3 virginica 6.588 0.6358796 5.552
## sd_petal_length
## 1 0.1736640
## 2 0.4699110
## 3 0.5518947
Parallelization is realized with the package future Feature calculation and preprocessing data will be parallelized. For Windows and Linux machines you can parallelize like the following: