A named list containing settings for the analysis. If NULL, defaults will be used. The settings list may contain:
- fileHeader
A data frame mapping the original column names to remapped column names. Used for t-SNE input preparation.
- selectedColumns
Character vector of columns to be used for the analysis. Defaults to NULL.
- cutOffColumnSize
Numeric. The maximum size of the dataset in terms of columns. Defaults to 50,000.
- excludedColumns
Character vector of columns to exclude from the analysis. Defaults to NULL.
- groupingVariables
Character vector of columns to use for grouping the data during analysis. Defaults to NULL.
- colorVariables
Character vector of columns to use for coloring in the plots. Defaults to NULL.
- preProcessDataset
Character vector of preprocessing methods to apply (e.g., scaling, normalization). Defaults to NULL.
- fontSize
Numeric. Font size for plots. Defaults to 12.
- pointSize
Numeric. Size of points in plots. Defaults to 1.5.
- theme
Character. The ggplot2 theme to use (e.g., "theme_gray"). Defaults to "theme_gray".
- colorPalette
Character. Color palette for plots (e.g., "RdPu"). Defaults to "RdPu".
- aspect_ratio
Numeric. The aspect ratio of plots. Defaults to 1.
- clusterType
Character. The clustering method to use. Options are "Louvain", "Hierarchical", "Mclust", "Density". Defaults to "Louvain".
- removeNA
Logical. Whether to remove rows with NA values. Defaults to FALSE.
- datasetAnalysisGrouped
Logical. Whether to perform grouped dataset analysis. Defaults to FALSE.
- plot_size
Numeric. The size of the plot. Defaults to 12.
- knn_clusters
Numeric. The number of clusters for KNN-based clustering. Defaults to 250.
- perplexity
Numeric. The perplexity parameter for t-SNE. Defaults to NULL (automatically determined).
- exaggeration_factor
Numeric. The exaggeration factor for t-SNE. Defaults to NULL.
- max_iter
Numeric. The maximum number of iterations for t-SNE. Defaults to NULL.
- theta
Numeric. The Barnes-Hut approximation parameter for t-SNE. Defaults to NULL.
- eta
Numeric. The learning rate for t-SNE. Defaults to NULL.
- clustLinkage
Character. Linkage method for hierarchical clustering. Defaults to "ward.D2".
- clustGroups
Numeric. The number of groups for hierarchical clustering. Defaults to 9.
- distMethod
Character. Distance metric for clustering. Defaults to "euclidean".
- minPtsAdjustmentFactor
Numeric. Adjustment factor for the minimum points in DBSCAN clustering. Defaults to 1.
- epsQuantile
Numeric. Quantile to compute the epsilon parameter for DBSCAN clustering. Defaults to 0.9.
- assignOutliers
Logical. Whether to assign outliers in the clustering step. Defaults to TRUE.
- excludeOutliers
Logical. Whether to exclude outliers from clustering. Defaults to TRUE.
- legendPosition
Character. Position of the legend in plots (e.g., "right", "bottom"). Defaults to "right".
- datasetAnalysisClustLinkage
Character. Linkage method for dataset-level analysis. Defaults to "ward.D2".
- datasetAnalysisType
Character. Type of dataset analysis (e.g., "heatmap"). Defaults to "heatmap".
- datasetAnalysisRemoveOutliersDownstream
Logical. Whether to remove outliers during downstream dataset analysis (e.g., machine learning). Defaults to FALSE.
- datasetAnalysisSortColumn
Character. The column used to sort dataset analysis results. Defaults to "cluster".
- datasetAnalysisClustOrdering
Numeric. The order of clusters for analysis. Defaults to 1.
- anyNAValues
Logical. Whether the dataset contains NA values. Defaults to FALSE.
- categoricalVariables
Logical. Whether the dataset contains categorical variables. Defaults to FALSE.
- resolution_increments
Numeric vector. The resolution increments to be used for Louvain clustering. Defaults to c(0.01, 0.05, 0.1, 0.2, 0.3, 0.4, 0.5)
.
- min_modularities
Numeric vector. The minimum modularities to test for clustering. Defaults to c(0.4, 0.5, 0.6, 0.7, 0.8, 0.85, 0.9)
.
- target_clusters_range
Numeric vector. The range of acceptable clusters to identify. Defaults to c(3, 6)
.
- pickBestClusterMethod
Character. The method to use for picking the best clustering result ("Modularity", "Silhouette", or "SIMON"). Defaults to "Modularity".
- weights
List. Weights for evaluating clusters based on AUROC
, modularity
, and silhouette
. Defaults to list(AUROC = 0.5, modularity = 0.3, silhouette = 0.2)
. These weights are applied to help choose the most relevant clusters based on user goals:
AUROC
Weight for predictive performance (area under the receiver operating characteristic curve). Prioritize this when predictive accuracy is the main goal. For predictive analysis, a recommended configuration could be list(AUROC = 0.8, modularity = 0.1, silhouette = 0.1)
.
modularity
Weight for modularity score, which indicates the strength of clustering. Higher modularity suggests that clusters are well-separated. To prioritize well-separated clusters, use a configuration like list(AUROC = 0.4, modularity = 0.4, silhouette = 0.2)
.
silhouette
Weight for silhouette score, a measure of cohesion within clusters. Useful when cluster cohesion and interpretability are desired. For balanced clusters, a suggested configuration is list(AUROC = 0.4, modularity = 0.3, silhouette = 0.3)
.