This implements the greedy algorithm from Wang and Stephens. It can be used to adds factors to an existing fit, or start from scratch. It adds factors iteratively, at each stage adding a new factor and then optimizing it. It is "greedy" in that it does not return to re-optimize previous factors. The function stops when an added factor contributes nothing, or Kmax is reached. Each new factor is intialized by applying the function `init_fn` to the residuals after removing previously-fitted factors.

flash_add_greedy(data, Kmax = 1, f_init = NULL, var_type = c("by_column",
  "by_row", "constant", "zero", "kroneker"), init_fn = "udv_si", tol = 0.01,
  ebnm_fn = ebnm_pn, ebnm_param = flash_default_ebnm_param(ebnm_fn),
  verbose = FALSE, nullcheck = TRUE, seed = 123)

Arguments

data

An n by p matrix or a flash data object created using flash_set_data.

Kmax

The maximum number of factors to be added to the flash object. (If nullcheck = TRUE, the actual number of factors added might be less than Kmax.)

f_init

The flash object to which new factors are to be added. If f_init = NULL, then a new flash fit object is created.

var_type

The type of variance structure to assume for residuals.

init_fn

The function used to initialize factors. This function should take parameters (Y,K) where Y is an n by p matrix of data (or a flash data object) and K is a number of factors. It should output a list with elements (u,d,v) where u is n by K matrix v is a p by K matrix and d is a K vector. See udv_si for an example. (If the input data includes missing values then this function must be able to deal with missing values in its input matrix.)

tol

Specifies how much the objective can change in a single iteration to be considered not converged.

ebnm_fn

The function used to solve the Empirical Bayes Normal Means problem.

ebnm_param

A named list containing parameters to be passed to ebnm_fn when optimizing; defaults are set by flash_default_ebnm_param().

verbose

If TRUE, various progress updates will be printed.

nullcheck

If TRUE, then after running hill-climbing updates, flash will check whether the achieved optimum is better than setting the factor to 0. If the check is performed and fails then the factor will be set to 0 in the returned fit.

seed

A random number seed to use before running flash - for reproducibility. Set to NULL if you don't want the seed set. (The seed can affect initialization when there are missing data; otherwise the algorithm is deterministic.)

Value

A fitted flash object.

Examples

l = rnorm(100) f = rnorm(10) Y = outer(l,f) + matrix(rnorm(1000),nrow=100) f = flash_add_greedy(Y,10)
#> fitting factor/loading 1
#> fitting factor/loading 2
# Gives the weights for each factor (analogue of singular values). flash_get_ldf(f)$d
#> [1] 26
# Example to show how to use a different initialization function. library(softImpute)
#> Loading required package: Matrix
#> Loaded softImpute 1.4
f2 = flash_add_greedy(Y,10,init_fn = function(x,K=1){ softImpute(x,K,lambda=10) })
#> fitting factor/loading 1
#> fitting factor/loading 2