|本期目录/Table of Contents|

[1]钟天云,刘昆宏*,王备战.基于迭代延长纠错输出编码的微阵列数据多分类方法[J].厦门大学学报(自然科学版),2018,57(03):396-403.[doi:10.6043/j.issn.0438-0479.201711003]
 ZHONG Tianyun,LIU Kunhong*,WANG Beizhan.Microarray Data Multiple Classification Method Based on Iterative Extension Error Correct Output Code[J].Journal of Xiamen University(Natural Science),2018,57(03):396-403.[doi:10.6043/j.issn.0438-0479.201711003]
点击复制

基于迭代延长纠错输出编码的微阵列数据多分类方法(PDF/HTML)
分享到:

《厦门大学学报(自然科学版)》[ISSN:0438-0479/CN:35-1070/N]

卷:
57卷
期数:
2018年03期
页码:
396-403
栏目:
研究论文
出版日期:
2018-05-31

文章信息/Info

Title:
Microarray Data Multiple Classification Method Based on Iterative Extension Error Correct Output Code
文章编号:
0438-0479(2018)03-0396-08
作者:
钟天云刘昆宏*王备战
厦门大学软件学院,福建 厦门 361005
Author(s):
ZHONG TianyunLIU Kunhong*WANG Beizhan
Software School of Xiamen University,Xiamen 361005,China
关键词:
微阵列 纠错输出编码 多分类算法 癌症基因 数据复杂度
Keywords:
microarray ECOC multi-class cancer gene data complexities
分类号:
TP 391.4
DOI:
10.6043/j.issn.0438-0479.201711003
文献标志码:
A
摘要:
微阵列技术使快速大量检测基因成为可能,人们迫切需要利用该技术提高疾病诊断水平.因此,对微阵列数据的分析研究迅速发展,其中以数据多类分类研究尤为突出.但由于微阵列数据具有特征多、样本少的特点,使得传统统计学习方法分类效果欠佳.为了针对微阵列数据特点解决多类分类问题,提出了一种迭代延长纠错输出编码(iterative extension error correct output coding,IE-ECOC)的算法.在几个特征子集上,配合与特征相关的数据复杂度,利用一种基于二叉树的编码方法生成一个列池,并提出一种择列策略构造编码矩阵; 然后,依据迭代验证结果延长矩阵.对癌症基因微阵列进行分类实验,结果显示,IE-ECOC对特征多、样本少的数据具有针对性,且与一些经典的ECOC算法相比,可以产生较好的结果,IE-ECOE算法效果也在实验中得到了验证.
Abstract:
Microarray technology makes it possible to quickly detect numerous genes,and it is urgent to use this technique to improve the diagnostic level of diseases.Therefore,researches of microarray data analysis has developed rapidly,and the multiclass classification is particularly important.However,the "large feature size and small sample size" problem continues to retard the traditional statistical classification method.To solve the problem,we proposes an iterative extended error correcting output coding algorithm(IE-ECOC).On some feature subsets,we use a binary-tree-based coding method,which is associated with feature related data complexities,generate a column pool,and develop a selecting method to construct a coding matrix by columns in the pool.Then,according to validation results,we extend matrix iteratively.Through classification experiments of cancer gene microarray data,results show that IE-ECOC is pertinent to "large feature size and small sample size" data.Compared with some classical ECOC algorithms,the IE-ECOC algorithm can produce better results,and its efficiency of the extend algorithm is also experimentally verified.

参考文献/References:

[1] MONTI S,TAMAYO P,MESIROV J,et al.Consensus clustering:a resampling-based method for class discovery and visualization of gene expression microarray data[J].Machine Learning,2003,52(1/2):91-118.
[2] PENG Y.A novel ensemble machine learning for robust microarray data classification[J].Computers in Biology and Medicine,2006,36(6):553-573.
[3] DIETTERICH T G,BAKIRI G.Solving multiclass learning problems via error-correcting output codes[J].Journal of Artificial Intelligence Research,1995,2(2):263-286.
[4] TAPIA E,SERRA E,GONZALEZ J C.Recursive ECOC for microarray data classification[C]∥International Workshop on Multiple Classifier Systems.Berlin Heidelberg:Springer,2005:108-117.
[5] LIU K H,ZENG ZH,NG V T Y.A hierarchical ensemble of ECOC for cancer classification based on multi-class microarray data[J].Information Sciences,2016,349:102-118.
[6] WANG H R,LI K S,LIU K H.A genetic programming based ECOC algorithm for microarray data classification[C]∥International Conterence on Neural Information Processing.Cham:Springer,2017:683-691.
[7] PUJOL O,RADEVA P,VITRIA J.Discriminant ECOC:a heuristic method for application dependent design of error correcting output codes[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2006,28(6):1007-1012.
[8] CANO J.Analysis of data complexity measures for classification[J].Expert Systems with Applications,2013,40(12):4820-4831.
[9] LORENA A C,COSTA I G,SPOLA?R N,et al.Analysis of complexity indices for classification problems:cancer gene expression data[J].Neurocomputing,2012,75(1):33-42.
[10] SCIKIT-LEARN.sklearn.multiclass[EB/OL].[2017-11-01].http:∥scikit-learn.org/stable/modules/classes.html#module-sklearn.multiclass.
[11] ESCALERA S,PUJOL O,RADEVA P.Error-correcting output codes library[EB/OL].[2017-11-01].http:∥jmlr.csail.mit.edu/papers/v11/escalera10a.html.
[12] ESCALERA S,PUJOL O,RADEVA P.On the decoding process in ternary error-correcting output codes[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2010,32(1):120-134.
[13] KOBOLDT D,FULTON R,MCLELLAN M,et al.Comprehensive molecular portraits of human breast tumours[J].Nature,2012,490(7418):61-70.
[14] SU A,WELSH J,SAPINOSO L,et al.Molecular classification of human carcinomas by use of gene expression signatures[J].Cancer Research,2001,61(20):7388-7393.
[15] SHIPP M A,ROSS K N,TAMAYO P.Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning[J].Nature Medicine,2002,8(1):68-74.
[16] BEN-DOR A,BRUHN L,FRIEDMAN N,et al.Tissue classification with gene expression profiles[J].Journal of Computational Biology,2000,7(3/4):559-583.
[17] HONG Z,YANG J.Optimal discriminant plane for a small number of samples and design method of classifier on the plane[J].Pattern Recognition,1991,24(4):317-324.
[18] KHAN J,WEI J,RINGNéR M,et al.Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks[J].Nature Medicine,2001,7(6):673-679.

备注/Memo

备注/Memo:
收稿日期:2017-11-07 录用日期:2018-03-22
基金项目:国家自然科学基金(61502402,61772023); 福建省自然科学基金(2016J01320,2015J05129)
*通信作者:lkhqz@xmu.edu.cn
引文格式:钟天云,刘昆宏,王备战.基于迭代延长纠错输出编码的微阵列数据多分类方法[J].厦门大学学报(自然科学版),2018,57(3):396-403.
Citation:ZHONG T Y,LIU K H,WANG B Z.Microarray data multiple classification method based on iterative extension error correct output code[J].J Xiamen Univ Nat Sci,2018,57(3):396-403.(in Chinese)
更新日期/Last Update: 1900-01-01