CN110895969B - Atrial fibrillation prediction decision tree and its pruning method - Google Patents

Atrial fibrillation prediction decision tree and its pruning method Download PDF

Info

Publication number
CN110895969B
CN110895969B CN201811068303.1A CN201811068303A CN110895969B CN 110895969 B CN110895969 B CN 110895969B CN 201811068303 A CN201811068303 A CN 201811068303A CN 110895969 B CN110895969 B CN 110895969B
Authority
CN
China
Prior art keywords
atrial fibrillation
decision tree
patient
judged
attribute
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811068303.1A
Other languages
Chinese (zh)
Other versions
CN110895969A (en
Inventor
张敏
张树龙
汪祖民
杨慧英
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian University
Original Assignee
Dalian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian University filed Critical Dalian University
Priority to CN201811068303.1A priority Critical patent/CN110895969B/en
Publication of CN110895969A publication Critical patent/CN110895969A/en
Application granted granted Critical
Publication of CN110895969B publication Critical patent/CN110895969B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/30ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment

Landscapes

  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Medical Informatics (AREA)
  • Public Health (AREA)
  • Biomedical Technology (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Pathology (AREA)
  • Epidemiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Measurement And Recording Of Electrical Phenomena And Electrical Characteristics Of The Living Body (AREA)

Abstract

A atrial fibrillation prediction decision tree and a pruning method thereof belong to the field of data processing, and in order to solve the problem of constructing a decision tree to mine out indexes influencing atrial fibrillation prediction, a root node A peak in the decision tree is the peak of information gain, the attribute is the maximum information gain, the normal range of the peak is 41 to 87, the first branch of the decision tree is the value of the A peak when a < = 0, and the atrial fibrillation of a patient is judged as the atrial fibrillation of the patient because the data has no non-0 number, namely when a = 0; when a >0, the ef attribute needs to be considered continuously, when the ef value is smaller than 58, the patient is judged to be normal, and the decision tree has important guiding significance for determining some important reference indexes for influencing atrial fibrillation.

Description

房颤预测决策树及其剪枝的方法Atrial fibrillation prediction decision tree and its pruning method

技术领域Technical field

本发明属于数据处理领域,涉及一种构建房颤预测决策树的方法及该决策树。The invention belongs to the field of data processing and relates to a method for constructing a decision tree for predicting atrial fibrillation and the decision tree.

背景技术Background technique

房颤是一种以快速、无序心房电活动为特征的室上性快速性心律失常。房颤在心电图上主要表现为:P波消失,代之以不规则的心房颤动波;RR间期绝对不规则(房室传导存在时)。这也是现在医学等方面用于判断房颤的主要依据。医学上主要根据房颤发作的持续时间将房颤分为阵发性房颤(paroxysmalAF)、持续性房颤(persistentAF)、长程持续性房颤(long-standing persistentAF)和永久性房颤(permanentAF)。具体分类如表1。Atrial fibrillation is a supraventricular tachyarrhythmia characterized by rapid, disordered atrial electrical activity. The main manifestations of atrial fibrillation on the electrocardiogram are: P wave disappears and is replaced by irregular atrial fibrillation waves; RR interval is absolutely irregular (when atrioventricular conduction is present). This is also the main basis used to judge atrial fibrillation in medicine and other aspects. Medically, atrial fibrillation is mainly divided into paroxysmalAF, persistent atrial fibrillation (persistentAF), long-standing persistentAF and permanent atrial fibrillation (permanentAF) based on the duration of the atrial fibrillation episode. ). The specific classification is shown in Table 1.

表1.1医学上房颤的详细分类Table 1.1 Detailed classification of atrial fibrillation in medicine

心房颤动在临床上是一种非常常见的心律失常,在我国其发病率在0.5%-1%,随着年龄的增加发病的概率就越高。而高血压患者发生房颤的风险较血压正常者高1.7倍,目前已有33%房颤患者的发生归于高血压。针对高血压患者房颤的高发病率,有人甚至认为,房颤是高血压靶器官损害的另一种表现。但目前临床上尚无较好的指标预测高血压患者AF的发生。此外,一些房颤患者并无明显临床症状,导致这些患者无意识地暴露在各种危重病症的风险之下,当临床症状出现或突发疾病时,往往已导致心血管发生器质性病变,从而极大地影响患者的身体健康甚至危及生命。因此,研究高血压患者人群中患有房颤的概率则显得尤为重要。Atrial fibrillation is a very common arrhythmia clinically, with an incidence rate of 0.5%-1% in my country. The probability of onset increases with age. The risk of atrial fibrillation in patients with hypertension is 1.7 times higher than that in people with normal blood pressure. Currently, 33% of patients with atrial fibrillation are attributable to hypertension. In view of the high incidence of atrial fibrillation in patients with hypertension, some people even believe that atrial fibrillation is another manifestation of target organ damage in hypertension. However, there is currently no good clinical indicator to predict the occurrence of AF in patients with hypertension. In addition, some patients with atrial fibrillation have no obvious clinical symptoms, which results in these patients being unconsciously exposed to the risk of various critical illnesses. When clinical symptoms appear or a disease breaks out, it often leads to organic cardiovascular disease, thus It greatly affects the patient's health and even threatens his life. Therefore, it is particularly important to study the probability of atrial fibrillation among patients with hypertension.

现阶段预测房颤的方法很多,在医学领域,从房颤的治疗方面着手。在国际上虽然有CHA2 DS2-VASc评分(高血压、年龄、糖尿病、中风、血管病变、性别、充血性心力衰竭)和HATCH评分(高血压、年龄、脑缺血发作、慢阻肺、心衰)用于预测房颤,但是这两种评分都存在各种局限性,使得预测方法不规范,预测结果不准确。在计算机领域,大家普遍根据的是患者的心电图,根据判断P波、分析RR间期分布随时间的变化规律等因素来判断患者是否患有房颤,用到的算法有统计学方面的还有机器学习方面的。还有通过智能手表来检验人体一些特征指标从而来预测,通过智能手机扫描脸部通过人的脸色来预测,甚至有时对于无症的患者,直接借用医学仪器测试患者的Holter心率来预测。但这些仍然缺乏规范性、无特定标准。At present, there are many methods to predict atrial fibrillation. In the medical field, we start with the treatment of atrial fibrillation. Although internationally there are CHA 2 DS 2 -VASc score (hypertension, age, diabetes, stroke, vascular disease, gender, congestive heart failure) and HATCH score (hypertension, age, cerebral ischemic attack, COPD, Heart failure) is used to predict atrial fibrillation, but both scores have various limitations, making the prediction methods irregular and the prediction results inaccurate. In the computer field, everyone generally determines whether a patient has atrial fibrillation based on the patient's electrocardiogram, judging P waves, analyzing the change pattern of RR interval distribution over time and other factors. The algorithms used include statistical algorithms. Machine learning. There are also smart watches that can be used to test some characteristic indicators of the human body for prediction, and smartphones can be used to scan faces to predict the person's face. Sometimes, even for asymptomatic patients, medical equipment can be directly used to test the patient's Holter heart rate for prediction. However, these still lack normativeness and no specific standards.

发明内容Contents of the invention

为了解决建决策树以挖掘出影响房颤预测的指标的问题,本发明提出如下技术方案:In order to solve the problem of building a decision tree to dig out indicators that affect the prediction of atrial fibrillation, the present invention proposes the following technical solutions:

一种房颤预测决策树,决策树中根节点A峰,该属性是信息增益率最大的,它的正常范围是41到87,决策树的第一个分支,当a<=0时,a指代的是A峰的值,患者发生房颤,由于数据中没有非0数,所以也就是当a=0时,判断该患者发生房颤;当a>0时,需要继续考虑ef属性,当ef值小于58时,则判断该患者为正常。A kind of atrial fibrillation prediction decision tree, the root node A peak in the decision tree, this attribute has the largest information gain rate, its normal range is 41 to 87, the first branch of the decision tree, when a<=0, a refers to represents the value of peak A. The patient has atrial fibrillation. Since there are no non-zero numbers in the data, when a=0, it is judged that the patient has atrial fibrillation; when a>0, the ef attribute needs to be continued to be considered. When When the ef value is less than 58, the patient is judged to be normal.

另一种房颤预测决策树,决策树中根节点为XGN,当XGN等级大于1时即判断患者为房颤,当XGN等级小于等于1,继续考虑A峰,当A峰为0时,继续考虑FS,当FS大于0时,判断患者患有房颤,否则继续考虑FJB,当FJB小于等于0,考虑LVPWD,当LVPWD小于等于9时,继续考虑EF的值,当EF小于等于57时,则判断该患者为正常,否则为房颤;继续回溯到LVPWD的右支,当LVPWD大于9时,考虑FDMB1的值,当该值小于等于101时,则判断该患者为房颤,否则考虑LAD,当LAD小于等于50时,则判断患者是房颤,否则则判断患者为正常;继续回溯到FJB的右支,当FJB大于0时,考虑GXB,当GXB小于等于2,则判断患者为正常,否则判断患者为房颤;继续回溯到FS的右支,当FS大于0,则判断患者为房颤;继续回溯到A峰的右支,当A大于0,则考虑TNB,当TNB小于等于0时,则判断患者为正常,否则考虑FDMB当FDMB大于0,则判断患者为正常,否则考虑E值,当E大于72时,则判断患者为房颤,否则考虑MCHC值,当MCHC小于等于338,则判断患者为房颤,否则为正常,遍历整棵决策树。Another atrial fibrillation prediction decision tree. The root node in the decision tree is XGN. When the XGN level is greater than 1, the patient is judged to have atrial fibrillation. When the XGN level is less than or equal to 1, continue to consider peak A. When peak A is 0, continue to consider FS. When FS is greater than 0, the patient is judged to have atrial fibrillation. Otherwise, continue to consider FJB. When FJB is less than or equal to 0, consider LVPWD. When LVPWD is less than or equal to 9, continue to consider the value of EF. When EF is less than or equal to 57, then The patient is judged to be normal, otherwise it is atrial fibrillation; continue to trace back to the right branch of LVPWD. When LVPWD is greater than 9, the value of FDMB1 is considered. When the value is less than or equal to 101, the patient is judged to be atrial fibrillation, otherwise LAD is considered. When LAD is less than or equal to 50, the patient is judged to have atrial fibrillation, otherwise the patient is judged to be normal; continue to trace back to the right branch of FJB. When FJB is greater than 0, consider GXB. When GXB is less than or equal to 2, the patient is judged to be normal. Otherwise, the patient is judged to have atrial fibrillation; continue to trace back to the right branch of FS, when FS is greater than 0, the patient is judged to have atrial fibrillation; continue to trace back to the right branch of peak A, when A is greater than 0, consider TNB, when TNB is less than or equal to 0 When , the patient is judged to be normal, otherwise FDMB is considered. When FDMB is greater than 0, the patient is judged to be normal. Otherwise, the E value is considered. When E is greater than 72, the patient is judged to have atrial fibrillation. Otherwise, the MCHC value is considered. When MCHC is less than or equal to 338 , then the patient is judged to have atrial fibrillation, otherwise it is normal, and the entire decision tree is traversed.

一种对房颤预测决策树的剪枝的方法,包括:A method for pruning an atrial fibrillation prediction decision tree, including:

1)分别计算三种预测错分样本数:计算子树Tv的所有叶节点预测错分样本数之和,记为E1;计算子树Tv被剪枝以叶节点代替时的预测错分样本数,记为E2;计算子树Tv的最大分支预测错分样本数,记为E3;1) Calculate the number of three types of predicted misclassified samples respectively: Calculate the sum of the number of predicted misclassified samples of all leaf nodes of the subtree Tv, recorded as E1; Calculate the number of predicted misclassified samples when the subtree Tv is pruned and replaced by leaf nodes. , recorded as E2; calculate the maximum number of branch prediction misclassification samples of subtree Tv, recorded as E3;

2)进行比较:E1最小时,不剪枝;E2最小时,进行剪枝,以一个叶节点代替子树Tv;E3最小时,用最大分支代替子树Tv。2) Compare: when E1 is the smallest, no pruning is performed; when E2 is the smallest, pruning is performed and a leaf node is used to replace the subtree Tv; when E3 is the smallest, the largest branch is used to replace the subtree Tv.

有益效果:本发明给出了预测决策树,并给出了决策树对于用于确定影响房颤的一些重要参考指标,具有重要指导意义,剪枝方法适用于该决策树,修建效率及准确性更佳。Beneficial effects: The present invention provides a predictive decision tree, and provides that the decision tree has important guiding significance for determining some important reference indicators that affect atrial fibrillation. The pruning method is suitable for the decision tree, and the construction efficiency and accuracy are improved Better.

附图说明Description of drawings

图1是决策树结构示意图;Figure 1 is a schematic diagram of the decision tree structure;

图2是医疗数据原稿示意图;Figure 2 is a schematic diagram of the medical data manuscript;

图3是导成Excel表格示意图;Figure 3 is a schematic diagram of exporting to Excel table;

图是4心脏超声属性示意图;The picture is a schematic diagram of 4 cardiac ultrasound properties;

图5是4weka操作界面示意图;Figure 5 is a schematic diagram of the 4weka operation interface;

图6是均使用默认值决策树示意图;Figure 6 is a schematic diagram of the decision tree using default values;

图7是决策树准确率示意图;Figure 7 is a schematic diagram of the accuracy of the decision tree;

图8是154项因素的决策树示意图;Figure 8 is a schematic diagram of the decision tree of 154 factors;

图9是决策树准确率示意图。Figure 9 is a schematic diagram of the decision tree accuracy.

具体实施方式Detailed ways

实施例1:Example 1:

为了解决房颤预测的构建决策树构建问题,本发明提出如下技术方案:一种构建房颤预测决策树的方法,包括:In order to solve the problem of building a decision tree for atrial fibrillation prediction, the present invention proposes the following technical solution: a method for building a decision tree for atrial fibrillation prediction, including:

步骤1:如果数据集S属于同一个类别,则创建一个叶子结点,并标记相应类标号,停止构建树;否则,进行步骤2;Step 1: If the data set S belongs to the same category, create a leaf node, mark the corresponding class label, and stop building the tree; otherwise, proceed to step 2;

步骤2:计算数据集S中所有属性的信息增益率Gain-rate(A);Step 2: Calculate the information gain rate Gain-rate(A) of all attributes in the data set S;

步骤3:选取最大信息增益率的属性A;Step 3: Select attribute A with the maximum information gain rate;

步骤4:将属性A建立为决策树T的根节点,T是要构建的决策树;Step 4: Establish attribute A as the root node of decision tree T, where T is the decision tree to be constructed;

步骤5:根据属性A的不同取值对数据集进行划分成多个子集,对子集Sv循环执行步骤1-4,构建子树Tv,Sv是属性A取值为v的样本子集;Step 5: Divide the data set into multiple subsets according to different values of attribute A, perform steps 1-4 cyclically on the subset Sv, and construct a subtree Tv. Sv is a sample subset with the value v of attribute A;

步骤6:将子树Tv添加到决策树T相应的分支中;Step 6: Add subtree Tv to the corresponding branch of decision tree T;

步骤7:循环结束,得出决策树T。Step 7: The loop ends and the decision tree T is obtained.

进一步的,数据处理的方法是:对于类标签缺失,直接删除该条信息;对于属性值缺失的,将值并入最常见的某一类中或者以最常用的值代替;处理连续值首先要多数据进行排序,以每个数据为阈值划分数据集,计算各个划分的信息增益,根据最大增益选择阈值,使用阈值对数据集进行划分。Further, the data processing method is: if the class label is missing, delete the information directly; if the attribute value is missing, merge the value into the most common category or replace it with the most commonly used value; when processing continuous values, we must first Sort multiple data, divide the data set using each data as a threshold, calculate the information gain of each division, select the threshold based on the maximum gain, and use the threshold to divide the data set.

进一步的,对决策树进行剪枝操作:Further, perform pruning operation on the decision tree:

1)分别计算三种预测错分样本数:计算子树Tv的所有叶节点预测错分样本数之和,记为E1;计算子树Tv被剪枝以叶节点代替时的预测错分样本数,记为E2;计算子树Tv的最大分支预测错分样本数,记为E3;1) Calculate the number of three types of predicted misclassified samples respectively: Calculate the sum of the number of predicted misclassified samples of all leaf nodes of the subtree Tv, recorded as E1; Calculate the number of predicted misclassified samples when the subtree Tv is pruned and replaced by leaf nodes. , recorded as E2; calculate the maximum number of branch prediction misclassification samples of subtree Tv, recorded as E3;

2)进行比较:E1最小时,不剪枝;E2最小时,进行剪枝,以一个叶节点代替子树Tv;E3最小时,用最大分支代替子树Tv。2) Compare: when E1 is the smallest, no pruning is performed; when E2 is the smallest, pruning is performed and a leaf node is used to replace the subtree Tv; when E3 is the smallest, the largest branch is used to replace the subtree Tv.

进一步的,根据信息增益率选择分裂属性:Further, the splitting attribute is selected based on the information gain rate:

信息熵的公式为:The formula of information entropy is:

Info_Gain(A)=H(S)-H(A)Info_Gain(A)=H(S)-H(A)

其中S代表数据集,ci代表数据集的第i个分类,p(ci)代表ci这个类别被选择的概率;Where S represents the data set, c i represents the i-th category of the data set, and p(c i ) represents the probability that category c i is selected;

在决策树划分时,计算的一般是某个特征属性的信息熵,假设特征属性A有n个不同的值,则特征属性A将数据集S划分成n个小数据集,用si表示,每个小数据集被选择的概率为p(si),根据公式(1)可知,每个小数据集si的信息熵为H(si),特征属性A的信息熵计算公式为:When dividing the decision tree, what is generally calculated is the information entropy of a certain feature attribute. Assuming that the feature attribute A has n different values, the feature attribute A divides the data set S into n small data sets, represented by s i , The probability of each small data set being selected is p(s i ). According to formula (1), the information entropy of each small data set s i is H(s i ). The information entropy calculation formula of feature attribute A is:

信息增益计算公式为:The information gain calculation formula is:

Info_Gain(A)=H(S)-H(A) (3)Info_Gain(A)=H(S)-H(A) (3)

信息增益率计算公式为:The information gain rate calculation formula is:

进一步的,通过改变决策树算法的参数,不断调整所构造的决策树,使得构造的决策树准确率和分支属性值都达到最优:J48算法可修改参数共有11项,其中binarySplits、debug、saveInstance、subtreeRaising、unpruned、useLaplace均采用默认值,对ConfidenceFactor、minNumObj、numFolds、seed、ReduceErrorPruning五个参数进行修改、验证以不断逼近医疗数据的准确值;将做完数据处理的数据文件放入weka软件中,选取算法以及对算法相应的参数进行修改,运行出结果,对各种参数可能的取值均进行实验,最后选取最优实验结果;Furthermore, by changing the parameters of the decision tree algorithm, the constructed decision tree is continuously adjusted, so that the accuracy and branch attribute values of the constructed decision tree are optimized: there are 11 modifiable parameters of the J48 algorithm, including binarySplits, debug, and saveInstance. , subtreeRaising, unpruned, and useLaplace all use default values. Modify and verify the five parameters ConfidenceFactor, minNumObj, numFolds, seed, and ReduceErrorPruning to continuously approach the accurate values of medical data; put the data files after data processing into the weka software , select the algorithm and modify the corresponding parameters of the algorithm, run the results, conduct experiments on possible values of various parameters, and finally select the optimal experimental results;

实验分为两支:The experiment is divided into two branches:

第一分支实验,对心脏超声指标的若干项属性进行实验,其中最后一列为类标签,f为房颤,z为正常,算法的各个参数均使用默认值;根据决策树可知,在心脏超声的属性中,对房颤影响大的为A峰、ef和lasd三个属性,具体到决策树中根节点A峰,该属性是信息增益率最大的,它的正常范围是41到87,决策树的第一个分支,当a<=0时,a指代的是A峰的值,患者发生房颤,由于数据中没有非0数,所以也就是当a=0时,判断该患者发生房颤;当a>0时,需要继续考虑ef属性,当ef值小于58时,则判断该患者为正常;The first branch of the experiment conducts experiments on several attributes of cardiac ultrasound indicators. The last column is the class label, f is atrial fibrillation, z is normal, and all parameters of the algorithm use default values; according to the decision tree, it can be seen that in the cardiac ultrasound Among the attributes, the three attributes that have the greatest impact on atrial fibrillation are A peak, ef and lasd. Specifically, as for the root node A peak in the decision tree, this attribute has the largest information gain rate. Its normal range is 41 to 87. The decision tree's The first branch, when a<=0, a refers to the value of peak A, and the patient develops atrial fibrillation. Since there are no non-zero numbers in the data, when a=0, it is judged that the patient develops atrial fibrillation. ;When a>0, you need to continue to consider the ef attribute. When the ef value is less than 58, the patient is judged to be normal;

第二分支实验,采集患者的特征指标,其中包括血常规、甲功、凝血像、肝功、血脂、心脏超声指标检测项作为属性列,最后一列为类标签,f为房颤,z为正常,算法的各个参数均使用默认值,根据决策树,对房颤起作用的有XGN(心功能等级)、A峰(心脏超声指标)、FS(风湿性心脏瓣膜病)、FJB(间质性肺疾病)、LVPWD(心脏超声指标)、EF(心脏超声指标)、FDMB1(肺动脉瓣血流速度)、FDMB(肺动脉瓣)、LAD(心脏超声指标)、GXB(冠心病)、TNB(糖尿病)、MCHC(血红蛋白浓度)、E峰(心脏超声指标),具体到决策树中,根节点为XGN,当XGN等级大于1时即判断患者为房颤,当XGN等级小于等于1,继续考虑A峰,当A峰为0时,继续考虑FS,当FS大于0时,判断患者患有房颤,否则继续考虑FJB,当FJB小于等于0,考虑LVPWD,当LVPWD小于等于9时,继续考虑EF的值(即决策树中的EF1),当EF小于等于57时,则判断该患者为正常,否则为房颤;继续回溯到LVPWD的右支,当LVPWD大于9时,考虑FDMB1的值,当该值小于等于101时,则判断该患者为房颤,否则考虑LAD,当LAD小于等于50时,则判断患者是房颤,否则则判断患者为正常;继续回溯到FJB的右支,当FJB大于0时,考虑GXB,当GXB小于等于2,则判断患者为正常,否则判断患者为房颤;继续回溯到FS的右支,当FS大于0,则判断患者为房颤;继续回溯到A峰的右支,当A大于0,则考虑TNB,当TNB小于等于0时,则判断患者为正常,否则考虑FDMB当FDMB大于0,则判断患者为正常,否则考虑E值,当E大于72时,则判断患者为房颤,否则考虑MCHC值,当MCHC小于等于338,则判断患者为房颤,否则为正常,遍历整棵决策树。The second branch of the experiment collects the patient's characteristic indicators, including blood routine, thyroid function, coagulation image, liver function, blood lipids, and cardiac ultrasound index detection items as attribute columns. The last column is the class label, f is atrial fibrillation, and z is normal. , all parameters of the algorithm use default values. According to the decision tree, XGN (cardiac function grade), A peak (cardiac ultrasound index), FS (rheumatic valvular heart disease), FJB (interstitial Lung disease), LVPWD (cardiac ultrasound index), EF (cardiac ultrasound index), FDMB1 (pulmonic valve blood flow velocity), FDMB (pulmonic valve), LAD (cardiac ultrasound index), GXB (coronary heart disease), TNB (diabetes) , MCHC (hemoglobin concentration), E peak (cardiac ultrasound index), specifically in the decision tree, the root node is , when A peak is 0, continue to consider FS. When FS is greater than 0, it is judged that the patient has atrial fibrillation. Otherwise, continue to consider FJB. When FJB is less than or equal to 0, consider LVPWD. When LVPWD is less than or equal to 9, continue to consider EF. value (that is, EF1 in the decision tree). When EF is less than or equal to 57, the patient is judged to be normal, otherwise it is atrial fibrillation; continue to trace back to the right branch of LVPWD. When LVPWD is greater than 9, consider the value of FDMB1. When the When the value is less than or equal to 101, the patient is judged to have atrial fibrillation, otherwise LAD is considered. When LAD is less than or equal to 50, the patient is judged to have atrial fibrillation, otherwise the patient is judged to be normal; continue to trace back to the right branch of FJB, when FJB is greater than When 0, consider GXB. When GXB is less than or equal to 2, the patient is judged to be normal. Otherwise, the patient is judged to have atrial fibrillation. Continue to trace back to the right branch of FS. When FS is greater than 0, the patient is judged to have atrial fibrillation. Continue to trace back to peak A. The right branch of , then the patient is judged to have atrial fibrillation, otherwise the MCHC value is considered. When MCHC is less than or equal to 338, the patient is judged to have atrial fibrillation, otherwise it is normal, and the entire decision tree is traversed.

实施例2:Example 2:

本公开采用数据挖掘的方法建立一棵规范的决策树模型来供医学参考。This disclosure uses data mining methods to establish a standardized decision tree model for medical reference.

其中涉及的标准术语解释:Explanation of standard terms involved:

数据挖掘(Data Mining,DM),面对海量的、数据来源多方面的、长时间积累下来的数据,提取出人类已经认识的有价值的模式、联系和知识等。它是在事先没有假设的前提下去挖掘数据、发现知识。数据挖掘是通过分析每个数据,从大量数据中寻找其规律的技术,主要有数据准备、规律寻找和规律表示3个步骤。数据挖掘的任务有关联分析、聚类分析、分类分析、异常分析、特异群组分析和演变分析等。本发明做的则是分类分析,分析高血压患者中是否患有房颤。Data Mining (DM) is to extract valuable patterns, connections and knowledge that humans have already recognized in the face of massive data from multiple sources and accumulated over a long period of time. It is to mine data and discover knowledge without any prior assumptions. Data mining is a technology that analyzes each data to find patterns in a large amount of data. It mainly consists of three steps: data preparation, pattern search and pattern representation. Data mining tasks include correlation analysis, cluster analysis, classification analysis, anomaly analysis, specific group analysis and evolution analysis, etc. What the present invention does is classification analysis to analyze whether patients with hypertension suffer from atrial fibrillation.

决策树算法是数据挖掘领域用于分类预测的一种典型算法,其具有较低的计算复杂度以及直观的输出结果。本发明将决策树算法引入到预测高血压患者中患有房颤的概率中。The decision tree algorithm is a typical algorithm used for classification prediction in the field of data mining. It has low computational complexity and intuitive output results. The present invention introduces a decision tree algorithm into predicting the probability of atrial fibrillation in patients with hypertension.

决策树,是一种基本的分类与回归方法,本发明主要采用的是分类决策树。决策树模型呈树形结构,在分类问题中,表示基于特征对实例进行分类的过程。相比朴素贝叶斯分类,决策树的优势在于构造过程中不需要任何领域知识或参数设置,因此在实际应用中,对于探测式的知识发现,决策树更加适用。决策树算法包括ID3算法、C4.5算法和CART算法。本发明则采用C4.5算法进行实验。C4.5主要是在ID3的基础上进行了改进,在ID3算法中信息增益选择属性时偏向选择取值比较多的属性。为了解决这个问题,在C4.5算法中用信息增益率来代替信息增益。决策树是一个树形结构,由一个根节点,一系列的内部节点及叶子结点构成,每一结点只有一个父节点和两个或多个子节点,结点间通过分支相连。决策树的每个内部结点对应一个非类别属性或属性的结合,每条边对应该属性的每个可能取值,每个叶子结点对应一个类别属性值。决策树结构示例如图1所示。Decision tree is a basic classification and regression method. This invention mainly uses classification decision tree. The decision tree model has a tree structure, and in classification problems, represents the process of classifying instances based on features. Compared with Naive Bayesian classification, the advantage of decision trees is that they do not require any domain knowledge or parameter settings during the construction process. Therefore, in practical applications, decision trees are more suitable for exploratory knowledge discovery. Decision tree algorithms include ID3 algorithm, C4.5 algorithm and CART algorithm. This invention uses the C4.5 algorithm to conduct experiments. C4.5 is mainly improved on the basis of ID3. In the ID3 algorithm, information gain tends to select attributes with more values when selecting attributes. In order to solve this problem, the information gain rate is used instead of information gain in the C4.5 algorithm. A decision tree is a tree structure consisting of a root node, a series of internal nodes and leaf nodes. Each node has only one parent node and two or more child nodes, and the nodes are connected through branches. Each internal node of the decision tree corresponds to a non-category attribute or combination of attributes, each edge corresponds to each possible value of the attribute, and each leaf node corresponds to a categorical attribute value. An example of a decision tree structure is shown in Figure 1.

针对上述决策树的公知含义,本发明将其适用于房颤预测的指标分类,该方法如下:In view of the well-known meaning of the above decision tree, the present invention classifies indicators suitable for atrial fibrillation prediction. The method is as follows:

C4.5算法流程C4.5 algorithm flow

步骤1:如果数据集S属于同一个类别,则创建一个叶子结点,并标记相应类标号,停止构建树;否则,进行步骤2;Step 1: If the data set S belongs to the same category, create a leaf node, mark the corresponding class label, and stop building the tree; otherwise, proceed to step 2;

步骤2:计算数据集S中所有属性的信息增益率Gain-rate(A);Step 2: Calculate the information gain rate Gain-rate(A) of all attributes in the data set S;

步骤3:选取最大信息增益率的属性A;Step 3: Select attribute A with the maximum information gain rate;

步骤4:将属性A建立为决策树T的根节点,T是要构建的决策树;Step 4: Establish attribute A as the root node of decision tree T, where T is the decision tree to be constructed;

步骤5:根据属性A的不同取值对数据集进行划分成多个子集,对子集Sv循环执行步骤1-4,构建子树Tv,Sv是属性A取值为v的样本子集;Step 5: Divide the data set into multiple subsets according to different values of attribute A, perform steps 1-4 cyclically on the subset Sv, and construct a subtree Tv. Sv is a sample subset with the value v of attribute A;

步骤6:将子树Tv添加到决策树T相应的分支中;Step 6: Add subtree Tv to the corresponding branch of decision tree T;

步骤7:循环结束,得出决策树T。Step 7: The loop ends and the decision tree T is obtained.

数值处理:能够处理具有缺失属性值的训练数据。对于类标签缺失,直接删除该条信息;对于属性值缺失的,将这些值并入最常见的某一类中或者以最常用的值代替。可处理连续值属性。处理连续值首先要多数据进行排序,以每个数据为阈值划分数据集,计算各个划分的信息增益,根据最大增益选择阈值,使用阈值对数据集进行划分。Numerical processing: Ability to process training data with missing attribute values. For missing class labels, delete the information directly; for missing attribute values, merge these values into one of the most common categories or replace them with the most commonly used values. Can handle continuous valued attributes. To process continuous values, we must first sort multiple data, divide the data set using each data as a threshold, calculate the information gain of each division, select the threshold based on the maximum gain, and use the threshold to divide the data set.

剪枝:通过上面决策树的生成流程,我们可以构建一棵基于训练数据集的决策树,但是决策树的准确性,以及其他一些性能,这个是需要我们评估这棵树应该做的一些工作。因为我们得到的决策树是纯粹基于训练数据集的,所以会存在一些过拟合的问题。为了解决这个问题,我们需要对决策树进行剪枝操作。决策树修剪的基本思想是去掉一部分对未知检验样本的分类精度没有帮助的树(子树),生成简单、更容易理解的树有两种改进的递归分枝方法:预剪枝和后剪枝。Pruning: Through the above decision tree generation process, we can build a decision tree based on the training data set, but the accuracy of the decision tree and other performances require us to evaluate the work that should be done on this tree. Because the decision tree we obtained is purely based on the training data set, there will be some overfitting problems. In order to solve this problem, we need to prune the decision tree. The basic idea of decision tree pruning is to remove some trees (subtrees) that are not helpful for the classification accuracy of unknown test samples. There are two improved recursive branching methods to generate simple and easier-to-understand trees: pre-pruning and post-pruning. .

预剪枝:在分支前作出决策防止数据集生成过多的分支。构造决策树的同时进行剪枝。Pre-pruning: Making decisions before branches prevents the data set from generating too many branches. Pruning is performed while constructing the decision tree.

后剪枝:主要是为解决噪声影响,修剪掉多余的树枝。Post-pruning: Mainly to reduce the impact of noise and prune excess branches.

因为本发明采用的J48算法是后剪枝,所以这里详细介绍一下后剪枝方法。后剪枝方法有:REP(Reduced Error Pruning),PEP(Pessimistic Error Pruning),MEP(MinimumError Pruning),CCP(Cost-Complexity Pruning)等。C4.5算法默认的剪枝方法是REP剪枝法。基本思路为:Because the J48 algorithm used in this invention is post-pruning, the post-pruning method is introduced in detail here. Post-pruning methods include: REP (Reduced Error Pruning), PEP (Pessimistic Error Pruning), MEP (MinimumError Pruning), CCP (Cost-Complexity Pruning), etc. The default pruning method of the C4.5 algorithm is the REP pruning method. The basic idea is:

1)分别计算三种预测错分样本数:计算子树Tv的所有叶节点预测错分样本数之和,记为E1;计算子树Tv被剪枝以叶节点代替时的预测错分样本数,记为E2;计算子树Tv的最大分支预测错分样本数,记为E3。1) Calculate the number of three types of predicted misclassified samples respectively: Calculate the sum of the number of predicted misclassified samples of all leaf nodes of the subtree Tv, recorded as E1; Calculate the number of predicted misclassified samples when the subtree Tv is pruned and replaced by leaf nodes. , recorded as E2; calculate the maximum number of branch prediction misclassification samples of subtree Tv, recorded as E3.

2)进行比较。E1最小时,不剪枝;E2最小时,进行剪枝,以一个叶节点代替子树Tv;E3最小时,采用“嫁接”策略,即用这个最大分支代替子树Tv。2) Compare. When E1 is the smallest, no pruning is performed; when E2 is the smallest, pruning is performed, replacing the subtree Tv with a leaf node; when E3 is the smallest, the "grafting" strategy is adopted, that is, the largest branch is used to replace the subtree Tv.

分裂属性选择:分裂属性选择的评判标准是决策树算法之间的根本区别。上文已经提到ID3是通过信息增益选择分裂属性,C4.5是通过信息增益率选择分裂属性。信息熵是信息的期望值,针对数据集而言,信息熵表述的是数据集的无序程度。数据集包含的类别越多,对应信息熵越大。公式为:Split attribute selection: The criterion for split attribute selection is the fundamental difference between decision tree algorithms. As mentioned above, ID3 selects split attributes through information gain, and C4.5 selects split attributes through information gain rate. Information entropy is the expected value of information. For a data set, information entropy expresses the degree of disorder of the data set. The more categories the data set contains, the greater the corresponding information entropy. The formula is:

其中S代表数据集,ci代表数据集的第i个分类,p(ci)代表ci这个类别被选择的概率;Where S represents the data set, c i represents the i-th category of the data set, and p(c i ) represents the probability that category c i is selected;

在决策树划分时,计算的一般是某个特征属性的信息熵,假设特征属性A有n个不同的值,则特征属性A将数据集S划分成n个小数据集,用si表示,每个小数据集被选择的概率为p(si),根据公式(1)可知,每个小数据集si的信息熵为H(si),特征属性A的信息熵计算公式为:When dividing the decision tree, what is generally calculated is the information entropy of a certain feature attribute. Assuming that the feature attribute A has n different values, the feature attribute A divides the data set S into n small data sets, represented by s i , The probability of each small data set being selected is p(s i ). According to formula (1), the information entropy of each small data set s i is H(s i ). The information entropy calculation formula of feature attribute A is:

信息增益计算公式为:The information gain calculation formula is:

Info_Gain(A)=H(S)-H(A) (3)Info_Gain(A)=H(S)-H(A) (3)

信息增益率计算公式为:The information gain rate calculation formula is:

算法应用Algorithm application

数据描述:本发明采用的数据是大连市某医院提供,由高血压患者实测产生,共360份。实验报告单主要包括白细胞计数(WBC)、粒细胞绝对值(Neu#)、NT-proBNP、EF(射血分数)、LVEF(左室射血分数)、高血压等级、是否患有房颤等。如图2所示是部分的原始数据项。Data description: The data used in this invention are provided by a hospital in Dalian City and generated by actual measurements of hypertensive patients, with a total of 360 copies. The laboratory report mainly includes white blood cell count (WBC), absolute granulocyte count (Neu#), NT-proBNP, EF (ejection fraction), LVEF (left ventricular ejection fraction), hypertension grade, whether you have atrial fibrillation, etc. . Figure 2 shows some of the original data items.

数据预处理:Weka平台上运行的数据文件类型为.csv文件,我们的数据文件为Excel表格数据,所以第一步现需要将数据文件转化成.csv文件。将医院所给数据中本发明不考虑的其他指标过滤掉,仅留下研究对象。对异常数据进行删除,空缺值属性J48算法自动处理。由于154维数据各取值存在很大的数量级,所以本发明将通过相关医学标准,有目标性的提取其中的11维数据即心脏超声指标进行更加具体的实验。如ef(射血分数)、a峰、e峰等。简化为如图3所示。Data preprocessing: The data file type running on the Weka platform is .csv file, and our data file is Excel table data, so the first step is to convert the data file into a .csv file. Other indicators not considered by the present invention are filtered out from the data provided by the hospital, leaving only the research objects. Abnormal data is deleted, and the J48 algorithm for blank value attributes is automatically processed. Since each value of the 154-dimensional data has a large order of magnitude, the present invention will use relevant medical standards to targetedly extract the 11-dimensional data, that is, cardiac ultrasound indicators, to conduct more specific experiments. Such as ef (ejection fraction), a peak, e peak, etc. Simplified as shown in Figure 3.

运行环境:怀卡托智能分析环境(Waikato Environment for KnowledgeAnalysis,WEKA)一款免费的、非商业化的基于JAVA境下开源的机器学习以及数据挖掘软件,主要开发者来自新西兰。官方网站为:http://weka.wikispaces.com/.WEKA作为公开的数据挖掘工作平台,集成了大量能承担数据挖掘任务的机器学习算法,括对数据进行预处理、关联分析、分类、回归、聚类以及在新的交互式界面上的可视化,把WEKA嵌入到MyEclipse中,便于对WEKA二次开发;修改或添加一些最新的数据挖掘算法,还可以把挖掘的结果以各种形式展示出来,使用户能够方便、清晰地找到所需的知识。在进行挖掘前,需要配置JDBC,加载数据库的驱动。Weka控制平台和操作界面如图4所示。如若使用weka软件,先需将控制平台打开,选取第一个选项Explorer开始实验,打开的界面如操作界面图所示,第一步现需要通过openfile选项来选择自己要进行的实验,然后根据自己实验的要求对数据做不同实验,比如有数据预处理,分类算法,聚类算法,关联规则等选项参数。本发明根据实验要求,选取分类算法中的J48算法进行实验。软件操作界面如图5所示。Operating environment: Waikato Environment for KnowledgeAnalysis (WEKA) is a free, non-commercial open source machine learning and data mining software based on JAVA environment. The main developer is from New Zealand. The official website is: http://weka.wikispaces.com/. As an open data mining working platform, WEKA integrates a large number of machine learning algorithms that can undertake data mining tasks, including data preprocessing, correlation analysis, classification, and regression. , clustering and visualization on the new interactive interface, embed WEKA into MyEclipse to facilitate secondary development of WEKA; modify or add some of the latest data mining algorithms, and display the mining results in various forms , enabling users to find the required knowledge conveniently and clearly. Before mining, you need to configure JDBC and load the database driver. The Weka control platform and operation interface are shown in Figure 4. If you use the weka software, you need to open the control platform first, select the first option Explorer to start the experiment, the opened interface is as shown in the operation interface diagram, the first step is to select the experiment you want to conduct through the openfile option, and then according to your own Experimental requirements include conducting different experiments on data, such as data preprocessing, classification algorithms, clustering algorithms, association rules and other optional parameters. According to the experimental requirements, the present invention selects the J48 algorithm among the classification algorithms for experiments. The software operation interface is shown in Figure 5.

决策树构建:决策树的构建不是唯一的,遗憾的是最优决策树的构建属于NP问题。因此如何构建一棵好的决策树是研究的重点。本发明通过改变决策树算法的参数,来不断地调整构造的决策树,使得构造的决策树准确率和分支属性值都达到最优。J48算法可修改参数共有11项,其中binarySplits、debug、saveInstance、subtreeRaising、unpruned、useLaplace均采用默认值,对ConfidenceFactor、minNumObj、numFolds、seed、ReduceErrorPruning五个参数进行修改。本发明实验主要对剩余六个参数进行修改验证不断逼近医疗数据的准确值,使得决策树的准确率,可行性更强。weka软件类似于一个黑盒子,只要将处理好的数据文件放入weka中选取自己想要的算法以及对算法相应的参数进行修改,即可运行出结果。对各种参数可能取值均进行实验,最后选取最优实验结果如下。实验总的分为两个分支,一部分是对心脏超声的11项属性进行实验,其中最后一列为类标签,f为房颤,z为正常。实验数据共含有360个,其中男性有186人,女性有174人。房颤的有178人,正常的有182人(这里的正常人指的是纯高血压患者)。算法的各个参数均使用默认值,实验结果如图6所示。Decision tree construction: The construction of decision tree is not unique. Unfortunately, the construction of optimal decision tree is an NP problem. Therefore, how to construct a good decision tree is the focus of research. The present invention continuously adjusts the constructed decision tree by changing the parameters of the decision tree algorithm, so that the accuracy and branch attribute values of the constructed decision tree are optimized. There are 11 modifiable parameters of the J48 algorithm. Among them, binarySplits, debug, saveInstance, subtreeRaising, unpruned, and useLaplace all use default values. Five parameters, ConfidenceFactor, minNumObj, numFolds, seed, and ReduceErrorPruning, are modified. The experiments of this invention mainly modify and verify the remaining six parameters to continuously approach the accurate values of the medical data, making the decision tree more accurate and feasible. The weka software is similar to a black box. As long as you put the processed data files into weka, select the algorithm you want and modify the corresponding parameters of the algorithm, you can run the results. Experiments were conducted on the possible values of various parameters, and the optimal experimental results were finally selected as follows. The experiment is generally divided into two branches. One part is to experiment on 11 attributes of cardiac ultrasound. The last column is the class label, f is atrial fibrillation, and z is normal. The experimental data contains a total of 360 individuals, including 186 males and 174 females. There were 178 people with atrial fibrillation and 182 people with normal conditions (normal people here refer to patients with pure hypertension). All parameters of the algorithm use default values, and the experimental results are shown in Figure 6.

通过上面决策树,可以知道在心脏超声的属性中,对房颤影响大的为A峰、ef和lasd三个属性。具体到决策树中根节点A峰(当消失的时候,意味着房颤已经发生。)这个属性是信息增益率最大的,它的正常范围是41到87。第一个分支我们可以看到,当a<=0时,患者发生房颤,由于数据中没有非0数,所以也就是当a=0时,可以判断该患者发生房颤。当a>0时,需要继续考虑ef属性,当ef值小于58时,则判断该患者为正常。依此类推分析决策树。决策树准确率截图包括准确率、错误率、Kappa值等,这些因素都可以用来评估该算法的好坏。本发明主要根据准确率为判断依据。从图7中可以看到准确率为83.0556%。From the above decision tree, we can know that among the attributes of cardiac ultrasound, the three attributes that have the greatest impact on atrial fibrillation are peak A, ef, and lasd. Specific to the root node A peak in the decision tree (when it disappears, it means that atrial fibrillation has occurred.) This attribute has the largest information gain rate, and its normal range is 41 to 87. In the first branch, we can see that when a <= 0, the patient develops atrial fibrillation. Since there are no non-zero numbers in the data, when a = 0, it can be determined that the patient develops atrial fibrillation. When a>0, the ef attribute needs to be continued to be considered. When the ef value is less than 58, the patient is judged to be normal. Analyze the decision tree by analogy. Decision tree accuracy screenshots include accuracy rate, error rate, Kappa value, etc. These factors can be used to evaluate the quality of the algorithm. This invention mainly bases the judgment on the accuracy rate. It can be seen from Figure 7 that the accuracy is 83.0556%.

第二部分实验数据共含有308个数据。154项患者的特征指标,其中包括血常规、甲功、凝血像、肝功、血脂、心脏超声等指标检测项作为属性列,最后一列为类标签,f为房颤,z为正常。在数据中男性有162人,女性有146人。房颤患者有128人,正常患者有180人。和上面类似的,算法的各个参数均使用默认值,实验结果如图8所示。The second part of the experimental data contains a total of 308 data. 154 patient characteristic indicators, including blood routine, thyroid function, coagulation image, liver function, blood lipids, cardiac ultrasound and other indicator detection items are used as attribute columns. The last column is the class label, f is atrial fibrillation, z is normal. There are 162 males and 146 females in the data. There were 128 patients with atrial fibrillation and 180 normal patients. Similar to the above, all parameters of the algorithm use default values. The experimental results are shown in Figure 8.

通过决策树我们可以看到在154项属性中对房颤起作用的有XGN(心功能等级)、A峰(心脏超声指标)、FS(风湿性心脏瓣膜病)、FJB(间质性肺疾病)、LVPWD(心脏超声指标)、EF(心脏超声指标)、FDMB1(肺动脉瓣血流速度)、FDMB(肺动脉瓣)、LAD(心脏超声指标)、GXB(冠心病)、TNB(糖尿病)、MCHC(血红蛋白浓度)、E峰(心脏超声指标)。这13项指标在医学中有的还未引起足够的注意。比如血红蛋白浓度对房颤的影响。Through the decision tree, we can see that among the 154 attributes that play a role in atrial fibrillation, they include ), LVPWD (cardiac ultrasound index), EF (cardiac ultrasound index), FDMB1 (pulmonic valve blood flow velocity), FDMB (pulmonic valve), LAD (cardiac ultrasound index), GXB (coronary heart disease), TNB (diabetes), MCHC (hemoglobin concentration), E peak (cardiac ultrasound index). Some of these 13 indicators have not attracted enough attention in medicine. For example, the effect of hemoglobin concentration on atrial fibrillation.

具体到决策树中,根节点为XGN,说明这项指标对房颤的发生有很大的作用,当XGN等级小于等于1,继续考虑A峰,当A峰为0时,继续考虑FS,当FS大于0时,判断患者患有房颤,否则继续考虑FJB,当FJB小于等于0,考虑LVPWD,当LVPWD小于等于9时,继续考虑EF的值(即决策树中的EF1),当EF小于等于57时,则判断该患者为正常,否则为房颤;继续回溯到LVPWD的右支,当LVPWD大于9时,考虑FDMB1的值,当该值小于等于101时,则判断该患者为房颤,否则考虑LAD,当LAD小于等于50时,则判断患者是房颤,否则则判断患者为正常;继续回溯到FJB的右支,当FJB大于0时,考虑GXB,当GXB小于等于2,则判断患者为正常,否则判断患者为房颤;继续回溯到FS的右支,当FS大于0,则判断患者为房颤;继续回溯到A峰的右支,当A大于0,则考虑TNB,当TNB小于等于0时,则判断患者为正常,否则考虑FDMB当FDMB大于0,则判断患者为正常,否则考虑E值,当E大于72时,则判断患者为房颤,否则考虑MCHC值,当MCHC小于等于338,则判断患者为房颤,否则为正常,以此类推,遍历整棵决策树。这个模型的准确性为85.0649%。Specific to the decision tree, the root node is When FS is greater than 0, it is judged that the patient has atrial fibrillation, otherwise FJB is continued to be considered. When FJB is less than or equal to 0, LVPWD is considered. When LVPWD is less than or equal to 9, the value of EF (that is, EF1 in the decision tree) is continued to be considered. When EF is less than When it is equal to 57, the patient is judged to be normal, otherwise it is atrial fibrillation; continue to trace back to the right branch of LVPWD. When LVPWD is greater than 9, the value of FDMB1 is considered. When the value is less than or equal to 101, the patient is judged to have atrial fibrillation. , otherwise LAD is considered. When LAD is less than or equal to 50, the patient is judged to have atrial fibrillation, otherwise, the patient is judged to be normal; continue to trace back to the right branch of FJB, when FJB is greater than 0, consider GXB, when GXB is less than or equal to 2, then The patient is judged to be normal, otherwise the patient is judged to have atrial fibrillation; continue to trace back to the right branch of FS, when FS is greater than 0, the patient is judged to have atrial fibrillation; continue to trace back to the right branch of peak A, when A is greater than 0, consider TNB, When TNB is less than or equal to 0, the patient is judged to be normal, otherwise FDMB is considered. When FDMB is greater than 0, the patient is judged to be normal, otherwise the E value is considered. When E is greater than 72, the patient is judged to have atrial fibrillation, otherwise the MCHC value is considered. When MCHC is less than or equal to 338, the patient is judged to have atrial fibrillation, otherwise it is normal, and so on, traversing the entire decision tree. The accuracy of this model is 85.0649%.

通过上面不同的实验,综合考虑决策树、准确率,本发明将选取图8作为最终模型。该模型考虑因素多,较为全面。对于医学工作者则更加简洁大方。该模型也得到医学方面的认可。Through the above different experiments, taking the decision tree and accuracy into consideration, the present invention will select Figure 8 as the final model. This model considers many factors and is relatively comprehensive. For medical workers, it is more concise and generous. The model also has medical approval.

本发明针对医学界对于没有统一规范的模型用于预测房颤以及高血压患者比普通人患有房颤概率高的问题,参考了医学上对房颤预测的综述,提出了基于决策树的房颤预测方法来解决这个问题。通过该方法建立一颗直观简洁的决策树供医学研究参考。该模型结合大量真实的医学数据,尽可能全面的保证该模型的准确性,该模型准确性为85.0649%。模型建立过程中不仅可以挖掘出高血压患者各医学指标之间的潜在关系,而且可以挖掘出哪项指标更可能引起房颤,以及有些指标在医学中一开始并未深度关注的。在接下的工作中,第一点将会加大数据量,使模型更具有泛化能力,防止过拟合。第二点,用机器学习的算法做大更好的分类,建立一颗实用、规范的决策树。Aiming at the problem in the medical community that there is no uniformly standardized model for predicting atrial fibrillation and that patients with hypertension have a higher probability of suffering from atrial fibrillation than ordinary people, the present invention refers to a medical review of atrial fibrillation prediction and proposes a decision-tree-based model for predicting atrial fibrillation. tremor prediction method to solve this problem. Through this method, an intuitive and concise decision tree is established for medical research reference. This model combines a large amount of real medical data to ensure the accuracy of the model as comprehensively as possible. The accuracy of the model is 85.0649%. During the model building process, not only can the potential relationships between various medical indicators in patients with hypertension be unearthed, but also which indicators are more likely to cause atrial fibrillation, as well as some indicators that were not paid in-depth attention in medicine at the beginning. In the next work, the first point will be to increase the amount of data to make the model more generalizable and prevent over-fitting. The second point is to use machine learning algorithms to make larger and better classifications and establish a practical and standardized decision tree.

实施例3:Example 3:

本发明为了解决选择更为准确反映心房颤动的指标的问题,构建一种心房颤动人工智能实验选择指标的方法:In order to solve the problem of selecting indicators that more accurately reflect atrial fibrillation, the present invention constructs a method for selecting indicators for atrial fibrillation artificial intelligence experiments:

S1.构建决策树;S1. Build a decision tree;

S2.调整参数以优化决策树;S2. Adjust parameters to optimize the decision tree;

S3.对各种参数可能的取值均进行实验,最后选取最优实验结果,该结果作为决策树预测的主要指标。S3. Experiment on the possible values of various parameters, and finally select the optimal experimental result, which is used as the main indicator for decision tree prediction.

进一步的,所述的主要指标是心脏超声的属性中的A峰、ef和lasd三个属性。Further, the main indicators are the three attributes of cardiac ultrasound: A peak, ef and lasd.

进一步的,所述的主要指标是XGN(心功能等级)、A峰(心脏超声指标)、FS(风湿性心脏瓣膜病)、FJB(间质性肺疾病)、LVPWD(心脏超声指标)、EF(心脏超声指标)、FDMB1(肺动脉瓣血流速度)、FDMB(肺动脉瓣)、LAD(心脏超声指标)、GXB(冠心病)、TNB(糖尿病)、MCHC(血红蛋白浓度)、E峰(心脏超声指标)。Further, the main indicators are XGN (cardiac function grade), A peak (cardiac ultrasound indicator), FS (rheumatic valvular heart disease), FJB (interstitial lung disease), LVPWD (cardiac ultrasound indicator), EF (cardiac ultrasound index), FDMB1 (pulmonic valve blood flow velocity), FDMB (pulmonic valve), LAD (cardiac ultrasound index), GXB (coronary heart disease), TNB (diabetes), MCHC (hemoglobin concentration), E peak (cardiac ultrasound index).

其构建决策树的方法如实施例1、2所述。The method of constructing a decision tree is as described in Examples 1 and 2.

本发明还涉及一种预测决策树在房颤预测中的应用。The invention also relates to the application of a prediction decision tree in predicting atrial fibrillation.

本发明通过人工智能及大数据处理,对房颤预测指标作出了更为合理的选择,该指标是经过大数据处理以得到的能够更为准确反映房颤的指标,使用这些指标评估房颤能降低对房颤对漏检,本发明还给出了能用于房颤预测的决策树的构建方法,完整的阐述了该决策树的建立过程,使得在房颤预测领域,能够建立一个标准的以数学方法构建决策树的模型,并给出了决策树对于用于确定影响房颤的一些重要参考指标,具有重要指导意义。The present invention uses artificial intelligence and big data processing to make a more reasonable selection of atrial fibrillation prediction indicators. The indicators are obtained through big data processing and can more accurately reflect atrial fibrillation. These indicators are used to evaluate the ability of atrial fibrillation. To reduce the missed detection of atrial fibrillation, the present invention also provides a method for constructing a decision tree that can be used for atrial fibrillation prediction, and completely explains the establishment process of the decision tree, so that a standard decision tree can be established in the field of atrial fibrillation prediction. A decision tree model was constructed using mathematical methods, and the decision tree was given, which has important guiding significance for determining some important reference indicators that affect atrial fibrillation.

实施例4:Example 4:

心房颤动(atrial fibrillation,AF,简称房颤),是临床上最常见的心律失常之一,在总体人群中的患病率约为0.4%-1.0%,并且患病率随年龄的增加而增加,研究显示,<55岁人群患病率仅为0.1%,而>80岁人群患病率高达9%。房颤常见的临床并发症为体循环血栓栓塞,脑卒中是房颤引起的主要栓塞性事件,同时也是房颤患者致残率最高的并发症,房颤患者与非房颤患者比较,脑卒中发生率增加5倍,病死率增加2倍,缺血性脑卒中是病死率增加的主要原因,而房颤是发生缺血性脑卒中的独立危险因素,其发病率随年龄增加而增加。房颤其他危害包括:心房辅助泵功能的丧失等原因导致心衰、电紊乱导致猝死、不规则及快心室率等导致身心障碍。Atrial fibrillation (AF, referred to as atrial fibrillation) is one of the most common clinical arrhythmias, with a prevalence of approximately 0.4%-1.0% in the overall population, and the prevalence increases with age. , research shows that the prevalence rate in people <55 years old is only 0.1%, while the prevalence rate in people >80 years old is as high as 9%. A common clinical complication of atrial fibrillation is systemic thromboembolism. Stroke is the main embolic event caused by atrial fibrillation. It is also the complication with the highest disability rate in patients with atrial fibrillation. Compared with patients without atrial fibrillation, the incidence of stroke is less The mortality rate increases by 5 times, and the mortality rate increases by 2 times. Ischemic stroke is the main reason for the increase in mortality, and atrial fibrillation is an independent risk factor for ischemic stroke, and its incidence rate increases with age. Other hazards of atrial fibrillation include: loss of atrial auxiliary pump function leading to heart failure, electrical disorders leading to sudden death, irregular and fast ventricular rates leading to physical and mental disorders.

精准的预测房颤发生并应用有效的预防手段是房颤治疗过程中的重要一环。目前,房颤的诊断,主要基于心电图及心电图的延伸如动态心电图、监护心电图及植入长程心电图。近年将心电图技术与人工智能融合也取得了较大成就,但基于传统100余年的心电图技术诊断房颤准确率较高,但漏诊率也高,特别对发作不频繁的阵发性房颤及无症状性房颤,而其危害并不逊于有症状的房颤。本技术基于临床大数据结合人工智能(AI)开发新的房颤诊断系统,以期取代传统的心电图诊断技术,至少作为心电图检查前房颤高危患者的筛查诊断系统,作为经典心电图检查的重要补充。Accurately predicting the occurrence of atrial fibrillation and applying effective preventive measures is an important part of the treatment process of atrial fibrillation. At present, the diagnosis of atrial fibrillation is mainly based on electrocardiogram and electrocardiogram extensions such as dynamic electrocardiogram, monitored electrocardiogram and implanted long-term electrocardiogram. In recent years, great achievements have been made in integrating electrocardiogram technology and artificial intelligence. However, the accuracy of diagnosing atrial fibrillation based on the traditional electrocardiogram technology of more than 100 years is high, but the missed diagnosis rate is also high, especially for paroxysmal atrial fibrillation and infrequent attacks. Symptomatic atrial fibrillation is no less harmful than symptomatic atrial fibrillation. This technology is based on clinical big data combined with artificial intelligence (AI) to develop a new atrial fibrillation diagnosis system, with a view to replacing the traditional electrocardiogram diagnostic technology, at least as a screening diagnostic system for high-risk patients with atrial fibrillation before electrocardiogram examination, and as an important supplement to the classic electrocardiogram examination .

方法与技术:本研究利用申请人附属医院-大连大学附属中山医院的信息集成平台,对高血压病患者临床、影像、检验等所有资料进行分析,通过大数据处理手段,如实施例3中所述的决策树手段,制作自动智能诊断模型,如决策树模型,在此基础上,利用该模型对高血压病患者合并房颤进行诊断分析,进一步利用AI系统深度学习的手段对该模型进行进一步修正和不断改进,最终开发出完善的房颤人工智能诊断系统。本发明将临床大数据与AI紧密结合,通过大数据处理和AI自我学习,必定能为预测AF发生开辟新的突破口,对于房颤防治策略提供重要的诊断手段。Methods and techniques: This study uses the information integration platform of the applicant's affiliated hospital - Zhongshan Hospital Affiliated to Dalian University to analyze all clinical, imaging, examination and other data of patients with hypertension, using big data processing methods, as described in Example 3 Use the decision tree method described above to create an automatic intelligent diagnosis model, such as a decision tree model. On this basis, use this model to diagnose and analyze patients with hypertension combined with atrial fibrillation, and further use the AI system deep learning method to further analyze the model. Revise and continuously improve, and finally develop a complete artificial intelligence diagnosis system for atrial fibrillation. This invention closely combines clinical big data with AI, and through big data processing and AI self-learning, it will surely open up a new breakthrough for predicting the occurrence of AF and provide an important diagnostic means for atrial fibrillation prevention and treatment strategies.

AI模型制作:利用申请人附属医院-大连大学附属中山医院的信息集成平台,对2010年1月-2017年12月在我院注册登记的高血压病患者临床资料(病史、体格检查、理化检查等)进行大数据处理,建立初级诊断模型。AI model production: Using the information integration platform of the applicant's affiliated hospital - Zhongshan Hospital Affiliated to Dalian University, the clinical data (medical history, physical examination, physical and chemical examinations) of hypertensive patients registered in our hospital from January 2010 to December 2017 etc.) perform big data processing and establish a primary diagnostic model.

AI模型验证:利用初级AI模型,将在我院住院诊断高血压病患者的相关参数资料输入计算机,检验该AI模型的诊断能力(包括预测灵敏度、特异度、符合率及预测效能)AI model verification: Use the primary AI model to input the relevant parameter data of patients diagnosed with hypertension in our hospital into the computer to test the diagnostic ability of the AI model (including prediction sensitivity, specificity, compliance rate and prediction efficiency)

AI模型完善:通过AI自我深度学习的能力,不断修正和完善该模型,逐步发展、完善。AI model improvement: Through the ability of AI self-deep learning, the model is constantly revised and improved, and gradually developed and improved.

以上所述,仅为本发明创造较佳的具体实施方式,但本发明创造的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明创造披露的技术范围内,根据本发明创造的技术方案及其发明构思加以等同替换或改变,都应涵盖在本发明创造的保护范围之内。The above are only preferred embodiments of the present invention, but the protection scope of the present invention is not limited thereto. Any person familiar with the technical field can, within the technical scope disclosed by the present invention, proceed according to the present invention. Any equivalent replacement or modification of the created technical solution and its inventive concept shall be covered by the protection scope of the invention.

以上所述,仅为本发明创造较佳的具体实施方式,但本发明创造的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明创造披露的技术范围内,根据本发明创造的技术方案及其发明构思加以等同替换或改变,都应涵盖在本发明创造的保护范围之内。The above are only preferred embodiments of the present invention, but the protection scope of the present invention is not limited thereto. Any person familiar with the technical field can, within the technical scope disclosed by the present invention, proceed according to the present invention. Any equivalent replacement or modification of the created technical solution and its inventive concept shall be covered by the protection scope of the invention.

Claims (2)

1.一种基于房颤预测决策树的判断方法,其特征在于:1. A judgment method based on atrial fibrillation prediction decision tree, which is characterized by: 所述房颤预测决策树的构建方法,包括:The method for constructing a decision tree for predicting atrial fibrillation includes: 步骤1:如果数据集S属于同一个类别,则创建一个叶子结点,并标记相应类标号,停止构建树;否则,进行步骤2;Step 1: If the data set S belongs to the same category, create a leaf node, mark the corresponding class label, and stop building the tree; otherwise, proceed to step 2; 步骤2:计算数据集S中所有属性的信息增益率Gain-rate(A);Step 2: Calculate the information gain rate Gain-rate(A) of all attributes in the data set S; 步骤3:选取最大信息增益率的属性A;Step 3: Select attribute A with the maximum information gain rate; 步骤4:将属性A建立为决策树T的根节点,T是要构建的决策树;Step 4: Establish attribute A as the root node of decision tree T, where T is the decision tree to be constructed; 步骤5:根据属性A的不同取值对数据集进行划分成多个子集,对子集Sv循环执行步骤1-4,构建子树Tv,Sv是属性A取值为v的样本子集;Step 5: Divide the data set into multiple subsets according to different values of attribute A, perform steps 1-4 cyclically on the subset Sv, and construct a subtree Tv. Sv is a sample subset with the value v of attribute A; 步骤6:将子树Tv添加到决策树T相应的分支中;Step 6: Add subtree Tv to the corresponding branch of decision tree T; 步骤7:循环结束,得出决策树T;Step 7: The loop ends and the decision tree T is obtained; 所述房颤预测决策树是一种计算机处理决策模型,其处理决策过程具体包括:若决策树中根节点A峰,该属性是信息增益率最大的,它的正常范围是41到87,决策树的第一个分支,当a<=0时,a指代的是A峰的值,患者发生房颤,由于数据中没有非0数,所以也就是当a=0时,判断该患者发生房颤;当a>0时,需要继续考虑ef属性,当ef值小于58时,则判断该患者为正常;The atrial fibrillation prediction decision tree is a computer processing decision-making model. Its processing and decision-making process specifically includes: If the root node A in the decision tree has a peak, this attribute has the largest information gain rate, and its normal range is 41 to 87. The decision tree The first branch of Tremor; when a>0, the ef attribute needs to be continued to be considered. When the ef value is less than 58, the patient is judged to be normal; 若决策树中根节点为XGN,当XGN等级大于1时即判断患者为房颤,当XGN等级小于等于1,继续考虑A峰,当A峰为0时,继续考虑FS,当FS大于0时,判断患者患有房颤,否则继续考虑FJB,当FJB小于等于0,考虑LVPWD,当LVPWD小于等于9时,继续考虑EF的值,当EF小于等于57时,则判断该患者为正常,否则为房颤;继续回溯到LVPWD的右支,当LVPWD大于9时,考虑FDMB1的值,当该值小于等于101时,则判断该患者为房颤,否则考虑LAD,当LAD小于等于50时,则判断患者是房颤,否则则判断患者为正常;继续回溯到FJB的右支,当FJB大于0时,考虑GXB,当GXB小于等于2,则判断患者为正常,否则判断患者为房颤;继续回溯到FS的右支,当FS大于0,则判断患者为房颤;继续回溯到A峰的右支,当A大于0,则考虑TNB,当TNB小于等于0时,则判断患者为正常,否则考虑FDMB当FDMB大于0,则判断患者为正常,否则考虑E值,当E大于72时,则判断患者为房颤,否则考虑MCHC值,当MCHC小于等于338,则判断患者为房颤,否则为正常,遍历整棵决策树;If the root node in the decision tree is XGN, when the XGN level is greater than 1, the patient is judged to have atrial fibrillation. When the XGN level is less than or equal to 1, continue to consider peak A. When peak A is 0, continue to consider FS. When FS is greater than 0, It is judged that the patient has atrial fibrillation, otherwise continue to consider FJB. When FJB is less than or equal to 0, consider LVPWD. When LVPWD is less than or equal to 9, continue to consider the value of EF. When EF is less than or equal to 57, the patient is judged to be normal, otherwise Atrial fibrillation; continue to trace back to the right branch of LVPWD. When LVPWD is greater than 9, the value of FDMB1 is considered. When the value is less than or equal to 101, the patient is judged to have atrial fibrillation. Otherwise, LAD is considered. When LAD is less than or equal to 50, then The patient is judged to have atrial fibrillation, otherwise the patient is judged to be normal; continue to trace back to the right branch of FJB, when FJB is greater than 0, consider GXB, when GXB is less than or equal to 2, the patient is judged to be normal, otherwise the patient is judged to be atrial fibrillation; continue Backtrack to the right branch of FS. When FS is greater than 0, the patient is judged to have atrial fibrillation. Continue to trace back to the right branch of peak A. When A is greater than 0, TNB is considered. When TNB is less than or equal to 0, the patient is judged to be normal. Otherwise, consider FDMB. When FDMB is greater than 0, the patient is judged to be normal. Otherwise, the E value is considered. When E is greater than 72, the patient is judged to have atrial fibrillation. Otherwise, the MCHC value is considered. When MCHC is less than or equal to 338, the patient is judged to have atrial fibrillation. Otherwise, it is normal and the entire decision tree is traversed; 其中,根据信息增益率选择分裂属性:Among them, the splitting attribute is selected according to the information gain rate: 信息熵的公式为:The formula of information entropy is: Info_Gain(A)=H(S)-H(A)Info_Gain(A)=H(S)-H(A) 其中S代表数据集,ci代表数据集的第i个分类,p(ci)代表ci这个类别被选择的概率;Where S represents the data set, c i represents the i-th category of the data set, and p(c i ) represents the probability that category c i is selected; 在决策树划分时,计算的一般是某个特征属性的信息熵,假设特征属性A有n个不同的值,则特征属性A将数据集S划分成n个小数据集,用si表示,每个小数据集被选择的概率为p(si),根据公式(1)可知,每个小数据集si的信息熵为H(si),特征属性A的信息熵计算公式为:When dividing the decision tree, what is generally calculated is the information entropy of a certain feature attribute. Assuming that the feature attribute A has n different values, the feature attribute A divides the data set S into n small data sets, represented by s i , The probability of each small data set being selected is p(s i ). According to formula (1), the information entropy of each small data set s i is H(s i ). The information entropy calculation formula of feature attribute A is: 信息增益计算公式为:The information gain calculation formula is: Info_Gain(A)=H(S)-H(A) (3)Info_Gain(A)=H(S)-H(A) (3) 信息增益率计算公式为:The information gain rate calculation formula is: 2.如权利要求1所述的基于房颤预测决策树的判断方法,其特征在于:2. The judgment method based on atrial fibrillation prediction decision tree according to claim 1, characterized in that: 对所述房颤预测决策树的剪枝的方法,包括:The method of pruning the atrial fibrillation prediction decision tree includes: 1)分别计算三种预测错分样本数:计算子树Tv的所有叶节点预测错分样本数之和,记为E1;计算子树Tv被剪枝以叶节点代替时的预测错分样本数,记为E2;计算子树Tv的最大分支预测错分样本数,记为E3;1) Calculate the number of three types of predicted misclassified samples respectively: Calculate the sum of the number of predicted misclassified samples of all leaf nodes of the subtree Tv, recorded as E1; Calculate the number of predicted misclassified samples when the subtree Tv is pruned and replaced by leaf nodes. , recorded as E2; calculate the maximum number of branch prediction misclassification samples of subtree Tv, recorded as E3; 2)进行比较:E1最小时,不剪枝;E2最小时,进行剪枝,以一个叶节点代替子树Tv;E3最小时,用最大分支代替子树Tv。2) Compare: when E1 is the smallest, no pruning is performed; when E2 is the smallest, pruning is performed and a leaf node is used to replace the subtree Tv; when E3 is the smallest, the largest branch is used to replace the subtree Tv.
CN201811068303.1A 2018-09-13 2018-09-13 Atrial fibrillation prediction decision tree and its pruning method Active CN110895969B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811068303.1A CN110895969B (en) 2018-09-13 2018-09-13 Atrial fibrillation prediction decision tree and its pruning method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811068303.1A CN110895969B (en) 2018-09-13 2018-09-13 Atrial fibrillation prediction decision tree and its pruning method

Publications (2)

Publication Number Publication Date
CN110895969A CN110895969A (en) 2020-03-20
CN110895969B true CN110895969B (en) 2023-12-15

Family

ID=69785498

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811068303.1A Active CN110895969B (en) 2018-09-13 2018-09-13 Atrial fibrillation prediction decision tree and its pruning method

Country Status (1)

Country Link
CN (1) CN110895969B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115137369B (en) * 2021-03-30 2023-10-20 华为技术有限公司 Electronic equipment and systems for early warning of atrial fibrillation based on different stages of atrial fibrillation
CN113598741B (en) * 2021-06-30 2024-03-22 合肥工业大学 Atrial fibrillation evaluation model training method, atrial fibrillation evaluation method and atrial fibrillation evaluation device
CN115423224B (en) * 2022-11-04 2023-04-18 佛山市电子政务科技有限公司 Secondary water supply amount prediction method and device based on big data and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102214213A (en) * 2011-05-31 2011-10-12 中国科学院计算技术研究所 Method and system for classifying data by adopting decision tree
WO2015089484A1 (en) * 2013-12-12 2015-06-18 Alivecor, Inc. Methods and systems for arrhythmia tracking and scoring
CN107296604A (en) * 2017-08-29 2017-10-27 心云(北京)医疗器械有限公司 A kind of atrial fibrillation determination methods
CN107610771A (en) * 2017-08-23 2018-01-19 上海电力学院 A kind of medical science Testing index screening technique based on decision tree

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102214213A (en) * 2011-05-31 2011-10-12 中国科学院计算技术研究所 Method and system for classifying data by adopting decision tree
WO2015089484A1 (en) * 2013-12-12 2015-06-18 Alivecor, Inc. Methods and systems for arrhythmia tracking and scoring
CN107610771A (en) * 2017-08-23 2018-01-19 上海电力学院 A kind of medical science Testing index screening technique based on decision tree
CN107296604A (en) * 2017-08-29 2017-10-27 心云(北京)医疗器械有限公司 A kind of atrial fibrillation determination methods

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
决策树剪枝方法的比较;魏红宁;西南交通大学学报(第01期);第44-48页 *
对数据挖掘决策树分类法的研究;鞠慧;;福建电脑(第12期);第96-97页 *
超声心动图对房颤患者的观察分析;李诺 等;中国超声诊断杂志(第05期);第333-334页 *

Also Published As

Publication number Publication date
CN110895969A (en) 2020-03-20

Similar Documents

Publication Publication Date Title
Lutimath et al. Prediction of heart disease using machine learning
CN112786204A (en) Machine learning diabetes onset risk prediction method and application
CN110895969B (en) Atrial fibrillation prediction decision tree and its pruning method
CN108256452A (en) A kind of method of the ECG signal classification of feature based fusion
Laxmikant et al. An efficient approach to detect diabetes using XGBoost classifier
Singh et al. Prominent features based chronic kidney disease prediction model using machine learning
Ahmed et al. Predicting diabetes using distributed machine learning based on apache spark
Bhukya et al. Detection and classification of cardiac arrhythmia using artificial intelligence
CN115607166B (en) A method and system for intelligent analysis of ECG signals, and an intelligent ECG auxiliary system
CN110895669A (en) A method for constructing a decision tree for atrial fibrillation prediction
Janghorbani et al. Prediction of acute hypotension episodes using logistic regression model and support vector machine: A comparative study
CN118312831A (en) Multi-mode heart data processing optimization method and device and computer equipment
Harris et al. Preoperative risk prediction of major cardiovascular events in noncardiac surgery using the 12-lead electrocardiogram: an explainable deep learning approach
CN117312958A (en) Oxygenation index prediction method based on continuous noninvasive parameters
Rahman et al. Quantifying uncertainty of a deep learning model for atrial fibrillation detection from ECG signals
Li et al. M-XAF: Medical explainable diagnosis system of atrial fibrillation based on medical knowledge and semantic representation fusion
Erdogan et al. Prediction of Major Adverse Cardiac Events After Transcatheter Aortic Valve Implantation: A Machine Learning Approach with GRACE Score
CN110895972A (en) Method for selecting indexes through atrial fibrillation artificial intelligence experiment and application of prediction decision tree in atrial fibrillation prediction
Muhammad et al. Machine Learning Analysis of Heartbeat Sounds Using Weka for Classification
Xue et al. Clinical Application of AI-ECG
Bernatavičienė et al. Rule induction for ophthalmological data classification
Wang et al. Application of Artificial Intelligence Technology in Clinical Auxiliary Diagnosis and Treatment Equipment
Gadde et al. Automated Detection of Heart Disease Using PhysioNet ECG Signals and Machine Learning Models
Gilani Machine learning classifiers for critical cardiac conditions
Alcober et al. Predicting the Mortality of Female Patients suffering from Myocardial Infarction using Data Mining Methods: A Comparison

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant