基因组预测中先验分布的两种超参数设定策略比较

(集美大学农业部东海海水健康养殖重点实验室,福建 厦门 361021)

基因组选择; 先验分布; 超参数设定; 准确性; 模拟研究

Comparison of Two Strategies for Setting Hyper-parameters of Prior Distribution in Genomic Prediction
DONG Linsong,FANG Ming,WANG Zhiyong*

(Key Laboratory of Healthy Mariculture for the East China Sea,Ministry of Agriculture,Jimei University,Xiamen 361021,China)

genomic selection; prior distribution; hyper-parameter setting; accuracy; simulation study

DOI: 10.6043/j.issn.0438-0479.201709021

备注

基因组选择是通过全基因组的标记信息估计出个体的基因组育种值并加以选择的育种方法.主要围绕最佳线性无偏预测(BLUP)和贝叶斯方法展开.这些方法均在某种先验假设下进行,因此需要对先验分布的参数进行设定.依据设定先验超参数的原理,探讨了对单核苷酸多态性(SNP)基因型进行与不进行标准化两种策略下先验超参数的设定方法,并利用QTLMAS2012的模拟数据,分别计算了7种预测方法(岭回归BLUP(RRBLUP)、BayesA、BayesB、BayesCπ、快速BayesB(FBayesB),快速混合正态分布(FMixP)和基于马尔科夫链-蒙特卡洛算法的MixP(简称MMixP))在2种策略下的基因组育种值.结果显示:当采用同一种预测方法,对SNP基因型进行标准化处理与否不影响基因组育种值估计结果.但由于对基因型进行标准化处理在方法上更具有通用性,并可以突出效应大的SNP位点,故建议进行SNP效应值估计前,先将SNP基因型标准化,再设定先验分布的参数值.

Genomic selection is a breeding method that uses whole-genome markers to predict genomic estimated breeding values to perform individual selection.Recently,various relevant statistical methods have been proposed,mainly including best linear unbiased prediction(BLUP)and Bayesian methods.These statistical methods are performed according to different prior assumptions,so it is necessary to set the parameters for prior distribution.This study was designed to describe the theory for setting prior hyper-parameters in detail,and discuss the prior hyper-parameters setting methods in the strategies of standardizing or not standardizing single nucleotide polymorphism(SNP)genotypes.Seven prediction methods(ridge-regression BLUP,BayesA,BayesB,BayesCπ,fast BayesB(FBayesB),fast MixP(FMixP)and Mixp based on Markov Chain-Monte Calo algorithm(MMixP)),were used to estimate genomic estimated breeding values in the two strategies using QTLMAS2012 simulated data.The results showed that the prediction accuracies were very similar when standardizing and not standardizing the SNP genotypes in a specific statistical method.As standardizing the SNP genotypes can fit various cases and highlight the SNPs with larger effects,we suggest using this strategy to set prior hyper-parameters before predicting SNP effects.