Title: | Genomic Selection using Single Trait |
---|---|
Description: | Genomic Selection (GS) is a latest development in animal and plant breeding where whole genome markers information is used to predict genetic merit of an individuals in a practical breeding programme. GS is one of the promising tool for improving genetic gain in animal and plants in today’s scenario. This package is basically developed for genomic predictions by estimating marker effects. These marker effects further used for calculation of genotypic merit of individual i.e. genome estimated breeding values (GEBVs). Genomic selection may be based on single trait or multi traits information. This package performs genomic selection only for single traits hence named as STGS i.e. single trait genomic selection. STGS is a comprehensive package which gives single step solution for genomic selection based on most commonly used statistical methods. |
Authors: | Neeraj Budhlakoti, D C Mishra, Anil Rai, K K Chaturvedi |
Maintainer: | Neeraj Budhlakoti <[email protected]> |
License: | GPL-3 |
Version: | 0.1.0 |
Built: | 2024-11-01 11:16:16 UTC |
Source: | https://github.com/cran/STGS |
Calculates the Genomic Estimated Breeding Value based on ANN method.
STGS.ann(X, Y, r)
STGS.ann(X, Y, r)
X |
X is a design matrix of marker genotype of size n×p where n are no of Individuals under study (i.e. genotype, lines) and p are no of markers. |
Y |
Y is a vector of individuals of size n×1. |
r |
fraction of testing data (ranges from (0-1)) used during model fitting (suppose if one want to use 75% of data for model training and remaining 25% for model testing so one has to define r=0.25). |
This function fits model by dividing data into two part i.e. training sets and testing sets. Former one is used to build the models and later one for performance evaluation. The performance of model is evaluated by calculating model accuracy i.e. pearson correlation coefficient between actual phenotypic value and predicted phenotypic value. Whole procedures is repeated 25 times and accuracy is averaged.
$fit meta-data of ANN model fitting
$Pred GEBV's for genotype under study
$Accuracy model accuracy i.e. pearson correlation coefficient between actual phenotypic value and predicted phenotypic value
Foresee, F. D., and M. T. Hagan. 1997. "Gauss-Newton approximation to Bayesian regularization", Proceedings of the 1997 International Joint Conference on Neural Networks.
MacKay, D. J. C. 1992. "Bayesian interpolation", Neural Computation, vol. 4, no. 3, pp. 415-447.
Nguyen, D. and Widrow, B. 1990. "Improving the learning speed of 2-layer neural networks by choosing initial values of the adaptive weights", Proceedings of the IJCNN, vol. 3, pp. 21-26.
Paulino Perez Rodriguez and Daniel Gianola (2018). brnn: Bayesian Regularization for Feed-Forward Neural Networks. R package version 0.7. https://CRAN.R-project.org/package=brnn.
library(STGS) data(wheat_data) X<-wheat_data[,1:100] Y<-as.data.frame(wheat_data[,101]) r<-0.25 STGS.ann(X,Y,r)
library(STGS) data(wheat_data) X<-wheat_data[,1:100] Y<-as.data.frame(wheat_data[,101]) r<-0.25 STGS.ann(X,Y,r)
Calculates the Genomic Estimated Breeding Value based on BLUP.
STGS.blup(X, Y, r)
STGS.blup(X, Y, r)
X |
X is a design matrix of marker genotype of size n×p where n are no of Individuals under study (i.e. genotype, lines) and p are no of markers. |
Y |
Y is a vector of individuals of size n×1. |
r |
fraction of testing data (ranges from (0-1)) used during model fitting (suppose if one want to use 75% of data for model training and remaining 25% for model testing so one has to define r=0.25). |
This function fits model by dividing data into two part i.e. training sets and testing sets. Former one is used to build the models and later one for performance evaluation. The performance of model is evaluated by calculating model accuracy i.e. pearson correlation coefficient between actual phenotypic value and predicted phenotypic value. Whole procedures is repeated 25 times and accuracy is averaged.
$Vu variance of random effect i.e. u
$Ve error variance
$beta estimate of fixed effects i.e. BLUE
$u estimate of random effects i.e. BLUP(u)
$LL maximized log-likelihood
$Pred GEBVs for genotype under study
$Accuracy model accuracy i.e. pearson correlation coefficient between actual phenotypic value and predicted phenotypic value
Endelman, J.B. 2011. Ridge regression and other kernels for genomic selection with R package rrBLUP. Plant Genome 4:250-255.
Searle, S.R., G. Casella and C.E. McCulloch. 1992. Variance Components. John Wiley, Hoboken.
library(STGS) data(wheat_data) X<-wheat_data[,1:100] Y<-as.data.frame(wheat_data[,101]) r<-0.25 STGS.blup(X,Y,r)
library(STGS) data(wheat_data) X<-wheat_data[,1:100] Y<-as.data.frame(wheat_data[,101]) r<-0.25 STGS.blup(X,Y,r)
Calculates the Genomic Estimated Breeding Value using LASSO method.
STGS.lasso(X, Y, r)
STGS.lasso(X, Y, r)
X |
X is a design matrix of marker genotype of size n×p where n are no of Individuals under study (i.e. genotype, lines) and p are no of markers. |
Y |
Y is a vector of individuals of size n×1. |
r |
fraction of testing data (ranges from (0-1)) used during model fitting (suppose if one want to use 75% of data for model training and remaining 25% for model testing so one has to define r=0.25). |
This function fits model by dividing data into two part i.e. training sets and testing sets. Former one is used to build the models and later one for performance evaluation. The performance of model is evaluated by calculating model accuracy i.e. pearson correlation coefficient between actual phenotypic value and predicted phenotypic value. Whole procedures is repeated 25 times and accuracy is averaged.
$fit Lists various coeffecient assocaited to LASSO model fitting
$Pred GEBV's for genotype under study
$Accuracy model accuracy i.e. pearson correlation coefficient between actual phenotypic value and predicted phenotypic value
Tibshirani, R. 1996. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Series B (Methodological). 267–288.
Searle, S.R., G. Casella and C.E. McCulloch. 1992. Variance Components. John Wiley, Hoboken.
Friedman, J., Hastie, T. and Tibshirani, R. (2008) Regularization Paths for Generalized Linear Models via Coordinate Descent, https://web.stanford.edu/~hastie/Papers/glmnet.pdf Journal of Statistical Software, Vol. 33(1), 1-22 Feb 2010 http://www.jstatsoft.org/v33/i01/.
Jerome Friedman, Trevor Hastie, Robert Tibshirani (2010). Regularization Paths for Generalized Linear Models via Coordinate Descent. Journal of Statistical Software, 33(1), 1-22. URL http://www.jstatsoft.org/v33/i01/ .
library(STGS) data(wheat_data) X<-wheat_data[,1:100] Y<-as.data.frame(wheat_data[,101]) r<-0.25 STGS.lasso(X,Y,r)
library(STGS) data(wheat_data) X<-wheat_data[,1:100] Y<-as.data.frame(wheat_data[,101]) r<-0.25 STGS.lasso(X,Y,r)
Calculates the Genomic Estimated Breeding Value by using Random Forest method.
STGS.rf(X, Y, r)
STGS.rf(X, Y, r)
X |
X is a design matrix of marker genotype of size n×p where n are no of Individuals under study (i.e. genotype, lines) and p are no of markers. |
Y |
Y is a vector of individuals of size n×1. |
r |
fraction of testing data (ranges from (0-1)) used during model fitting (suppose if one want to use 75% of data for model training and remaining 25% for model testing so one has to define r=0.25). |
This function fits model by dividing data into two part i.e. training sets and testing sets. Former one is used to build the models and later one for performance evaluation. The performance of model is evaluated by calculating model accuracy i.e. pearson correlation coefficient between actual phenotypic value and predicted phenotypic value. Whole procedures is repeated 25 times and accuracy is averaged.
$Pred GEBV's for genotype under study
$Accuracy model accuracy i.e. pearson correlation coefficient between actual phenotypic value and predicted phenotypic value
Breiman, L. (2001), Random Forests, Machine Learning 45(1), 5-32.
Breiman, L (2002), “Manual On Setting Up, Using, And Understanding Random Forests V3.1”,https://www.stat.berkeley.edu/~breiman/Using_random_forests_V3.1.pdf.
A. Liaw and M. Wiener (2002). Classification and Regression by randomForest. R News 2(3), 18-22
library(STGS) data(wheat_data) X<-wheat_data[,1:100] Y<-as.data.frame(wheat_data[,101]) r<-0.25 STGS.rf(X,Y,r)
library(STGS) data(wheat_data) X<-wheat_data[,1:100] Y<-as.data.frame(wheat_data[,101]) r<-0.25 STGS.rf(X,Y,r)
Calculates the Genomic Estimated Breeding Value using RR.
STGS.rr(X, Y, r)
STGS.rr(X, Y, r)
X |
X is a design matrix of marker genotype of size n×p where n are no of Individuals under study (i.e. genotype, lines) and p are no of markers. |
Y |
Y is a vector of individuals of size n×1. |
r |
fraction of testing data (ranges from (0-1)) used during model fitting (suppose if one want to use 75% of data for model training and remaining 25 for model testing so one has to define r=0.25). |
This function fits model by dividing data into two part i.e. training sets and testing sets. Former one is used to build the models and later one for performance evaluation. The performance of model is evaluated by calculating model accuracy i.e. pearson correlation coefficient between actual phenotypic value and predicted phenotypic value. Whole procedures is repeated 25 times and accuracy is averaged.
$bhat estimate of marker effects
$Pred GEBV's for genotype under study
$Accuracy model accuracy i.e. pearson correlation coefficient between actual phenotypic value and predicted phenotypic value
de los Campos, G., and P. P. Rodriguez, 2010 BLR: Bayesian Linear Regression. R package version 1.2. http://CRAN.R-project.org/package=BLR.
de los Campos, G., H. Naya, D. Gianola, J. Crossa, A. Legarra et al., 2009 Predicting quantitative traits with regression models for dense molecular markers and pedigrees. Genetics 182(1): 375–385.
library(STGS) data(wheat_data) X<-wheat_data[,1:100] Y<-as.data.frame(wheat_data[,101]) r<-0.25 STGS.rr(X,Y,r)
library(STGS) data(wheat_data) X<-wheat_data[,1:100] Y<-as.data.frame(wheat_data[,101]) r<-0.25 STGS.rr(X,Y,r)
Calculates the Genomic Estimated Breeding Value based on SVM method.
STGS.svm(X, Y, r)
STGS.svm(X, Y, r)
X |
X is a design matrix of marker genotype of size n×p where n are no of Individuals under study (i.e. genotype, lines) and p are no of markers. |
Y |
Y is a vector of individuals of size n×1. |
r |
fraction of testing data (ranges from (0-1)) used during model fitting (suppose if one want to use 75% of data for model training and remaining 25% for model testing so one has to define r=0.25). |
This function fits model by dividing data into two part i.e. training sets and testing sets. Former one is used to build the models and later one for performance evaluation. The performance of model is evaluated by calculating model accuracy i.e. pearson correlation coefficient between actual phenotypic value and predicted phenotypic value. Whole procedures is repeated 25 times and accuracy is averaged.
$fit List various coeffecient associated with SVM model fitting
$Pred GEBV's for genotype under study
$Accuracy model accuracy i.e. pearson correlation coefficient between actual phenotypic value and predicted phenotypic value
Vapnik, V., 1995. The Nature of Statistical Learning Theory, Ed. 2. Springer, New York.
Vapnik, V., and A. Vashist, 2009. A new learning paradigm: Learning using privileged information. Neural Networks 22: 544–557.
Alexandros Karatzoglou, Alex Smola, Kurt Hornik, Achim Zeileis (2004). kernlab - An S4 Package for Kernel Methods in R. Journal of Statistical Software 11(9), 1-20. URL http://www.jstatsoft.org/v11/i09/ .
library(STGS) data(wheat_data) X<-wheat_data[,1:100] Y<-as.data.frame(wheat_data[,101]) r<-0.25 STGS.svm(X,Y,r)
library(STGS) data(wheat_data) X<-wheat_data[,1:100] Y<-as.data.frame(wheat_data[,101]) r<-0.25 STGS.svm(X,Y,r)
Dataset used in this study is subset from CIMMYT 599 wheat lines. In our sample dataset it has one response for 50 lines genotyped for 100 markers.
data("wheat_data")
data("wheat_data")
A data frame with 50 rows as genotypes with 101 columns (i.e. First 100 columns contains information of genotyped markers and last column represent data of phenotypic trait under study).
This dataset is created by taking 50 wheat lines genotyped for 100 Markers as a subset from CIMMYT 599 wheat lines. Wheat lines were genotyped using 1447 Diversity Array Technology markers generated by Triticarte Pty. Ltd. (Canberra, Australia; http://www.triticarte.com.au). These markers may take two different values i.e. their presence (1) or absence (0). Phenotypic trait used under this study is grain yield.
International Maize and Wheat Improvement Center (CIMMYT), Mexico.
library(STGS) data(wheat_data) X<-wheat_data[,1:100] #### Extracting Genotype Y<-as.data.frame(wheat_data[,101]) #### Extracting Phenotype
library(STGS) data(wheat_data) X<-wheat_data[,1:100] #### Extracting Genotype Y<-as.data.frame(wheat_data[,101]) #### Extracting Phenotype