基于迭代延长纠错输出编码的微阵列数据多分类方法

(厦门大学软件学院,福建 厦门 361005)

微阵列; 纠错输出编码; 多分类算法; 癌症基因; 数据复杂度

Microarray Data Multiple Classification Method Based on Iterative Extension Error Correct Output Code
ZHONG Tianyun,LIU Kunhong*,WANG Beizhan

(Software School of Xiamen University,Xiamen 361005,China)

microarray; ECOC; multi-class; cancer gene; data complexities

DOI: 10.6043/j.issn.0438-0479.201710011

备注

微阵列技术使快速大量检测基因成为可能,人们迫切需要利用该技术提高疾病诊断水平.因此,对微阵列数据的分析研究迅速发展,其中以数据多类分类研究尤为突出.但由于微阵列数据具有特征多、样本少的特点,使得传统统计学习方法分类效果欠佳.为了针对微阵列数据特点解决多类分类问题,提出了一种迭代延长纠错输出编码(iterative extension error correct output coding,IE-ECOC)的算法.在几个特征子集上,配合与特征相关的数据复杂度,利用一种基于二叉树的编码方法生成一个列池,并提出一种择列策略构造编码矩阵; 然后,依据迭代验证结果延长矩阵.对癌症基因微阵列进行分类实验,结果显示,IE-ECOC对特征多、样本少的数据具有针对性,且与一些经典的ECOC算法相比,可以产生较好的结果,IE-ECOE算法效果也在实验中得到了验证.

Microarray technology makes it possible to quickly detect numerous genes,and it is urgent to use this technique to improve the diagnostic level of diseases.Therefore,researches of microarray data analysis has developed rapidly,and the multiclass classification is particularly important.However,the "large feature size and small sample size" problem continues to retard the traditional statistical classification method.To solve the problem,we proposes an iterative extended error correcting output coding algorithm(IE-ECOC).On some feature subsets,we use a binary-tree-based coding method,which is associated with feature related data complexities,generate a column pool,and develop a selecting method to construct a coding matrix by columns in the pool.Then,according to validation results,we extend matrix iteratively.Through classification experiments of cancer gene microarray data,results show that IE-ECOC is pertinent to "large feature size and small sample size" data.Compared with some classical ECOC algorithms,the IE-ECOC algorithm can produce better results,and its efficiency of the extend algorithm is also experimentally verified.