АЛГОРИТМ ОТБОРА ДНК-МАРКЕРОВ НА ОСНОВЕ

advertisement
ɍȾɄ 631.52; 630*165.3
Ʌɟɜ ȼɥɚɞɢɦɢɪɨɜɢɱ ɍɬɤɢɧ, ɞɨɤɬɨɪ ɬɟɯɧɢɱɟɫɤɢɯ ɧɚɭɤ, ɩɪɨɮɟɫɫɨɪ,
lev.utkin@mail.ru
ɘɥɢɹ Ⱥɥɟɤɫɚɧɞɪɨɜɧɚ ɀɭɤ, ɤɚɧɞɢɞɚɬ ɩɟɞɚɝɨɝɢɱɟɫɤɢɯ ɧɚɭɤ, ɞɨɰɟɧɬ
zhuk_yua@mail.ru
Ⱥɥɟɤɫɚɧɞɪ Ⱥɧɚɬɨɥɶɟɜɢɱ ȿɝɨɪɨɜ, ɤɚɧɞɢɞɚɬ ɛɢɨɥɨɝɢɱɟɫɤɢɯ ɧɚɭɤ, ɞɨɰɟɧɬ,
egorovfta@yandex.ru
ɇɢɤɨɥɚɣ ɉɚɜɥɨɜɢɱ ȼɚɫɢɥɶɟɜ, ɤɚɧɞɢɞɚɬ ɬɟɯɧɢɱɟɫɤɢɯ ɧɚɭɤ, ɞɨɰɟɧɬ
ȿɥɟɧɚ Ʉɢɪɢɥɥɨɜɧɚ ɉɨɬɨɤɢɧɚ, ɞɨɤɬɨɪ ɛɢɨɥɨɝɢɱɟɫɤɢɯ ɧɚɭɤ, ɩɪɨɮɟɫɫɨɪ
Ⱥɧɚɬɨɥɢɣ ɂɝɨɪɟɜɢɱ ɑɟɯ, ɚɫɩɢɪɚɧɬ
-
( Э
)
ȺɅȽɈɊɂɌɆ ɈɌȻɈɊȺ ȾɇɄ-ɆȺɊɄȿɊɈȼ
ɇȺ ɈɋɇɈȼȿ ɇȿɊȺȼȿɇɋɌȼȺ ɎɊȿɒȿ
ɐɟɥɟɜɚɹ ɮɭɧɤɰɢɹ, ɥɟɫɨɨɛɪɚɡɭɸɳɢɟ ɩɨɪɨɞɵ, ɮɟɧɨɬɢɩ, ɭɫɥɨɜɧɚɹ ɜɟɪɨɹɬɧɨɫɬɶ.
Criterion function, forest forming breeds, phenotype, conditional probability.
ɋɟɥɟɤɰɢɹ ɨɫɧɨɜɧɵɯ ɥɟɫɨɨɛɪɚɡɭɸɳɢɯ ɩɨɪɨɞ ɧɚ ɨɫɧɨɜɟ ȾɇɄ-ɦɚɪɤɢɪɨɜɚɧɢɹ ɞɥɹ ɜɵɜɟɞɟɧɢɹ ɧɨɜɵɯ ɝɢɛɪɢɞɧɵɯ ɢ ɫɨɪɬɨɜɵɯ ɮɨɪɦ – ɷɬɨ ɨɞɧɨ ɢɡ
ɧɚɩɪɚɜɥɟɧɢɣ, ɤɨɬɨɪɨɟ ɨɩɪɟɞɟɥɹɟɬ ɢɧɧɨɜɚɰɢɨɧɧɨɟ ɪɚɡɜɢɬɢɟ ɛɢɨɬɟɯɧɨɥɨɝɢɣ.
ȼɚɠɧɟɣɲɢɦ ɢɧɫɬɪɭɦɟɧɬɨɦ ɪɟɚɥɢɡɚɰɢɢ ɷɬɨɝɨ ɧɚɩɪɚɜɥɟɧɢɹ ɹɜɥɹɸɬɫɹ ɷɮɮɟɤɬɢɜɧɵɟ ɚɥɝɨɪɢɬɦɵ ɢ ɦɨɞɟɥɢ ɛɢɨɢɧɮɨɪɦɚɬɢɤɢ ɤɚɤ ɧɟɨɛɯɨɞɢɦɵɯ ɷɥɟɦɟɧɬɨɜ ɜ ɬɟɯɧɨɥɨɝɢɱɟɫɤɨɣ ɰɟɩɨɱɤɟ ɫɟɥɟɤɰɢɢ ɥɟɫɨɨɛɪɚɡɭɸɳɢɯ ɩɨɪɨɞ ɧɚ
ɨɫɧɨɜɟ ȾɇɄ-ɦɚɪɤɢɪɨɜɚɧɢɹ.
ȼɵɪɚɠɟɧɧɨɫɬɶ «ɩɨɥɟɡɧɨɝɨ» ɩɪɢɡɧɚɤɚ ɹɜɥɹɟɬɫɹ ɪɟɡɭɥɶɬɚɬɨɦ ɜɡɚɢɦɨɞɟɣɫɬɜɢɹ ɧɟɫɤɨɥɶɤɢɯ ɝɟɧɨɜ. Ʉɚɠɞɵɣ ɢɡ ɝɟɧɨɜ (ɥɨɤɭɫɨɜ), ɩɪɢɜɧɨɫɹɳɢɣ ɫɜɨɣ
ɜɤɥɚɞ ɜ ɜɚɪɶɢɪɨɜɚɧɢɟ ɤɨɥɢɱɟɫɬɜɟɧɧɨɝɨ ɩɪɢɡɧɚɤɚ, ɢɦɟɧɭɟɬɫɹ ɤɚɤ QTL (quantitative trait loci). Ɇɟɪɚ «ɜɤɥɚɞɚ» ɤɚɠɞɨɝɨ ɝɟɧɚ ɜ ɜɵɪɚɠɟɧɧɨɫɬɶ ɤɨɥɢɱɟɫɬɜɟɧɧɨɝɨ ɩɪɢɡɧɚɤɚ ɨɩɪɟɞɟɥɹɟɬɫɹ ɮɚɤɬɨɪɚɦɢ ɨɤɪɭɠɚɸɳɟɣ ɫɪɟɞɵ. Ɋɚɡɧɢɰɚ ɜ ɜɵɪɚɠɟɧɧɨɫɬɢ ɩɪɢɡɧɚɤɚ ɭ ɞɜɭɯ ɪɚɫɬɟɧɢɣ ɩɪɢ ɨɞɧɢɯ ɢ ɬɟɯ ɠɟ ɜɧɟɲɧɢɯ
ɭɫɥɨɜɢɹɯ ɨɛɴɹɫɧɹɟɬɫɹ ɬɟɦ, ɱɬɨ ɨɞɧɢ ɢ ɬɟ ɠɟ ɝɟɧɵ ɭ ɪɚɡɧɵɯ ɪɚɫɬɟɧɢɣ «ɩɪɟɞɫɬɚɜɥɟɧɵ» ɪɚɡɧɵɦɢ ɚɥɥɟɥɹɦɢ. ɍ ɪɚɡɧɵɯ ɢɧɞɢɜɢɞɭɭɦɨɜ ɥɢɧɟɣɧɵɟ ɭɱɚɫɬɤɢ
172
ȾɇɄ ɦɨɝɭɬ ɨɬɥɢɱɚɬɶɫɹ ɜɫɥɟɞɫɬɜɢɟ ɡɚɦɟɧ ɨɞɧɨɝɨ ɧɭɤɥɟɨɬɢɞɚ ɧɚ ɞɪɭɝɨɣ (SNP –
single nucleotide polymorphism) ɢɥɢ ɜɫɥɟɞɫɬɜɢɟ ɞɪɭɝɢɯ ɩɪɢɱɢɧ (ɞɟɥɟɰɢɢ ɢ
ɜɫɬɚɜɤɢ). ɗɬɢ ɪɚɡɥɢɱɢɹ ɢɥɢ SNP ɦɨɝɭɬ ɨɩɪɟɞɟɥɹɬɶ ɜɵɪɚɠɟɧɧɨɫɬɶ ɬɨɝɨ ɢɥɢ
ɢɧɨɝɨ ɩɪɢɡɧɚɤɚ. ɉɨɢɫɤ ɩɨɞɦɧɨɠɟɫɬɜ SNP, ɤɨɬɨɪɵɟ ɜ ɨɬɥɢɱɢɟ ɨɬ ɞɪɭɝɢɯ SNP
ɨɤɚɡɵɜɚɸɬ ɧɚɢɛɨɥɶɲɟɟ ɜɥɢɹɧɢɟ ɧɚ ɢɡɦɟɧɱɢɜɨɫɬɶ ɩɪɢɡɧɚɤɨɜ ɢɥɢ ɮɟɧɨɬɢɩɚ,
ɹɜɥɹɟɬɫɹ ɨɫɧɨɜɧɨɣ ɡɚɞɚɱɟɣ ɨɬɛɨɪɚ SNP-ɦɚɪɤɟɪɨɜ ɢɥɢ ȾɇɄ-ɦɚɪɤɟɪɨɜ.
ɂɫɯɨɞɧɵɦɢ ɞɚɧɧɵɦɢ ɞɥɹ ɪɟɲɟɧɢɹ ɡɚɞɚɱɢ ɹɜɥɹɸɬɫɹ ɧɚɛɥɸɞɚɟɦɵɟ ɡɧɚɱɟɧɢɹ ɧɟɤɨɬɨɪɨɝɨ ɩɪɢɡɧɚɤɚ (ɮɟɧɨɬɢɩɚ), ɜɚɪɶɢɪɭɸɳɟɝɨ ɜ ɩɨɩɭɥɹɰɢɢ ɪɟɤɨɦɛɢɧɚɧɬɨɜ ɨɬ ɫɤɪɟɳɢɜɚɧɢɹ ɞɜɭɯ ɪɨɞɢɬɟɥɶɫɤɢɯ ɝɟɧɨɬɢɩɨɜ, ɪɚɡɥɢɱɧɵɯ ɷɤɡɟɦɩɥɹɪɨɜ ɨɞɧɨɬɢɩɧɵɯ ɪɚɫɬɟɧɢɣ, ɚ ɬɚɤɠɟ ɞɜɨɢɱɧɚɹ ɬɚɛɥɢɰɚ, ɫɬɪɨɤɢ ɤɨɬɨɪɨɣ ɨɩɪɟɞɟɥɹɸɬɫɹ
ɤɚɠɞɵɦ SNP, ɚ ɫɬɨɥɛɰɵ – ɝɟɧɨɬɢɩɚɦɢ ɪɟɤɨɦɛɢɧɚɧɬɧɨɝɨ ɩɨɬɨɦɫɬɜɚ.
ɋɭɳɟɫɬɜɭɟɬ ɛɨɥɶɲɨɟ ɤɨɥɢɱɟɫɬɜɨ ɦɨɞɟɥɟɣ ɢ ɩɪɨɝɪɚɦɦɧɨɝɨ ɨɛɟɫɩɟɱɟɧɢɹ, ɪɟɚɥɢɡɭɸɳɢɯ ɪɚɡɥɢɱɧɵɟ ɚɥɝɨɪɢɬɦɵ ɨɩɪɟɞɟɥɟɧɢɹ SNP-ɦɚɪɤɟɪɨɜ. Ɉɞɧɚɤɨ ɛɨɥɶɲɢɧɫɬɜɨ ɦɨɞɟɥɟɣ ɧɟ ɭɱɢɬɵɜɚɸɬ ɞɜɨɢɱɧɵɣ ɯɚɪɚɤɬɟɪ ɞɚɧɧɵɯ ɜ
ɚɧɚɥɢɡɢɪɭɟɦɵɯ ɛɚɡɚɯ ɞɚɧɧɵɯ ɢ ɤɨɪɪɟɥɹɰɢɸ ɦɟɠɞɭ SNP-ɦɚɪɤɟɪɚɦɢ. Ʉɪɨɦɟ
ɬɨɝɨ, ɜ ɚɧɚɥɢɡɢɪɭɟɦɵɯ ɛɚɡɚɯ ɞɚɧɧɵɯ ɤɨɥɢɱɟɫɬɜɨ SNP-ɦɚɪɤɟɪɨɜ ɧɚ ɩɨɪɹɞɤɢ
ɛɨɥɶɲɟ, ɱɟɦ ɤɨɥɢɱɟɫɬɜɨ ɢɫɫɥɟɞɭɟɦɵɯ ɩɨɬɨɦɤɨɜ. ɗɬɨ ɞɟɥɚɟɬ ɧɟɜɨɡɦɨɠɧɵɦ
ɤɨɪɪɟɤɬɧɨɟ ɩɨɫɬɪɨɟɧɢɟ ɦɨɞɟɥɟɣ ɤɥɚɫɫɢɮɢɤɚɰɢɢ ɢɥɢ ɪɟɝɪɟɫɫɢɢ ɧɚ ɨɫɧɨɜɟ
ɫɬɚɧɞɚɪɬɧɵɯ ɦɨɞɟɥɟɣ. ɉɨɷɬɨɦɭ ɧɚɦɢ ɩɪɟɞɥɚɝɚɟɬɫɹ ɧɨɜɵɣ ɚɥɝɨɪɢɬɦ ɪɟɲɟɧɢɹ ɡɚɞɚɱɢ ɨɩɪɟɞɟɥɟɧɢɹ ɩɨɞɦɧɨɠɟɫɬɜɚ ɧɚɢɛɨɥɟɟ ɜɚɠɧɵɯ SNP-ɦɚɪɤɟɪɨɜ ɫ
ɬɨɱɤɢ ɡɪɟɧɢɹ ɞɟɬɟɪɦɢɧɚɰɢɢ ɤɚɤɨɝɨ-ɥɢɛɨ ɩɪɢɡɧɚɤɚ, ɭɱɢɬɵɜɚɸɳɢɣ ɤɨɪɪɟɥɹɰɢɸ ɦɟɠɞɭ SNP-ɦɚɪɤɟɪɚɦɢ.
ɇɨɜɵɣ ɚɥɝɨɪɢɬɦ ɢɫɩɨɥɶɡɭɟɬ ɧɟɪɚɜɟɧɫɬɜɨ Ɏɪɟɲɟ ɞɥɹ ɜɵɱɢɫɥɟɧɢɹ ɫɨɜɦɟɫɬɧɵɯ ɮɭɧɤɰɢɣ ɪɚɫɩɪɟɞɟɥɟɧɢɹ ɚɥɥɟɥɟɣ ɜ ɫɨɨɬɜɟɬɫɬɜɢɢ ɫ ɜɵɛɪɚɧɧɵɦɢ
ɩɨɞɦɧɨɠɟɫɬɜɚɦɢ ɤɨɪɪɟɥɢɪɨɜɚɧɧɵɯ SNP-ɦɚɪɤɟɪɨɜ. ɉɪɢ ɷɬɨɦ ɜɚɠɧɨ ɨɬɦɟɬɢɬɶ, ɱɬɨ ɢɫɩɨɥɶɡɭɟɬɫɹ ɩɪɟɞɩɨɥɨɠɟɧɢɟ ɨ ɪɚɫɩɪɟɞɟɥɟɧɢɢ ɜɟɪɨɹɬɧɨɫɬɟɣ Ȼɟɪɧɭɥɥɢ ɞɥɹ ɚɥɥɟɥɟɣ.
ɉɨɞɦɧɨɠɟɫɬɜɚ ɤɨɪɪɟɥɢɪɨɜɚɧɧɵɯ SNP-ɦɚɪɤɟɪɨɜ ɮɨɪɦɢɪɭɸɬɫɹ ɧɚ ɨɫɧɨɜɟ ɦɨɞɢɮɢɤɚɰɢɢ ɚɥɝɨɪɢɬɦɚ Add-Del [1, 7], ɜ ɫɨɨɬɜɟɬɫɬɜɢɢ ɫ ɤɨɬɨɪɵɦ ɞɨɛɚɜɥɹɟɬɫɹ ɤ ɨɬɨɛɪɚɧɧɨɦɭ ɩɨɞɦɧɨɠɟɫɬɜɭ ɧɚɢɛɨɥɟɟ ɜɚɠɧɵɯ SNP-ɦɚɪɤɟɪɨɜ
ɡɚɞɚɧɧɨɟ ɤɨɥɢɱɟɫɬɜɨ SNP, ɩɪɢɜɨɞɹɳɢɯ ɤ ɦɚɤɫɢɦɢɡɚɰɢɢ ɰɟɥɟɜɨɣ ɮɭɧɤɰɢɢ.
ɐɟɥɟɜɚɹ ɮɭɧɤɰɢɹ ɫɬɪɨɢɬɫɹ ɤɚɤ ɚɛɫɨɥɸɬɧɨɟ ɡɧɚɱɟɧɢɟ ɪɚɡɧɨɫɬɢ ɨɠɢɞɚɟɦɵɯ
ɡɧɚɱɟɧɢɣ ɮɟɧɨɬɢɩɚ ɞɥɹ ɞɜɭɯ ɤɥɚɫɫɨɜ ɷɤɡɟɦɩɥɹɪɨɜ. Ʉɪɨɦɟ ɬɨɝɨ, ɜ ɫɨɨɬɜɟɬɫɬɜɢɢ ɫ ɚɥɝɨɪɢɬɦɨɦ Add-Del ɩɨɫɥɟ ɞɨɛɚɜɥɟɧɢɹ SNP ɨɫɭɳɟɫɬɜɥɹɟɬɫɹ ɭɞɚɥɟɧɢɟ ɧɚɢɛɨɥɟɟ «ɫɥɚɛɵɯ» SNP ɢɡ ɩɨɞɦɧɨɠɟɫɬɜɚ. ɉɪɨɰɟɞɭɪɚ ɩɨɜɬɨɪɹɟɬɫɹ ɡɚɞɚɧɧɨɟ ɤɨɥɢɱɟɫɬɜɨ ɪɚɡ.
ɋɨɜɨɤɭɩɧɨɫɬɶ ɩɪɟɞɥɚɝɚɟɦɵɯ ɚɥɝɨɪɢɬɦɨɜ ɫɨɡɞɚɟɬ ɨɛɳɢɣ ɦɟɬɨɞ ɨɩɪɟɞɟɥɟɧɢɹ ɦɧɨɠɟɫɬɜɚ SNP, ɨɤɚɡɵɜɚɸɳɢɯ ɧɚɢɛɨɥɶɲɟɟ ɜɥɢɹɧɢɟ ɧɚ ɜɚɪɶɢɪɨɜɚɧɢɟ
ɩɪɢɡɧɚɤɨɜ ɢɥɢ ɮɟɧɨɬɢɩɚ.
173
Ɉɫɧɨɜɧɵɟ ɩɨɞɯɨɞɵ ɞɥɹ ɨɬɛɨɪɚ SNP-ɦɚɪɤɟɪɨɜ. Ɉɝɪɨɦɧɨɟ ɤɨɥɢɱɟɫɬɜɨ
ɦɟɬɨɞɨɜ ɢ ɚɥɝɨɪɢɬɦɨɜ ɨɬɛɨɪɚ SNP-ɦɚɪɤɟɪɨɜ ɨɫɧɨɜɚɧɨ ɧɚ ɩɪɟɞɩɨɥɨɠɟɧɢɢ,
ɱɬɨ ɦɧɨɠɟɫɬɜɨ ɦɚɪɤɟɪɨɜ ɫɨɞɟɪɠɚɬ ɧɟɫɭɳɟɫɬɜɟɧɧɵɟ ɢ ɢɡɛɵɬɨɱɧɵɟ ɷɥɟɦɟɧɬɵ. Ƚɪɭɛɨ ɝɨɜɨɪɹ, ɧɟɫɭɳɟɫɬɜɟɧɧɵɟ ɦɚɪɤɟɪɵ ɧɟ ɧɟɫɭɬ ɤɚɤɨɣ-ɥɢɛɨ ɩɨɥɟɡɧɨɣ
ɢɧɮɨɪɦɚɰɢɢ, ɜɥɢɹɸɳɟɣ ɧɚ ɢɡɦɟɧɟɧɢɟ ɡɧɚɱɟɧɢɣ ɮɟɧɨɬɢɩɚ. ɂɡɛɵɬɨɱɧɵɟ
ɩɪɢɡɧɚɤɢ ɫɨɞɟɪɠɚɬ ɢɧɮɨɪɦɚɰɢɸ, ɤɨɬɨɪɚɹ ɭɠɟ ɢɦɟɟɬ ɦɟɫɬɨ ɜ ɛɨɥɟɟ ɢɧɮɨɪɦɚɬɢɜɧɵɯ ɩɪɢɡɧɚɤɚɯ. ɍɞɚɥɟɧɢɟ ɧɟɫɭɳɟɫɬɜɟɧɧɵɯ ɢ ɢɡɛɵɬɨɱɧɵɯ ɦɚɪɤɟɪɨɜ –
ɷɬɨ ɨɞɧɚ ɢɡ ɰɟɥɟɣ ɡɚɞɚɱɢ ɨɬɛɨɪɚ SNP-ɦɚɪɤɟɪɨɜ.
ȼ ɡɚɞɚɱɚɯ ɤɥɚɫɫɢɮɢɤɚɰɢɢ, ɤɨɝɞɚ ɮɟɧɨɬɢɩ ɩɪɢɧɢɦɚɟɬ ɞɜɚ ɡɧɚɱɟɧɢɹ, ɢɫɩɨɥɶɡɭɸɬɫɹ ɬɪɢ ɨɫɧɨɜɧɵɯ ɩɨɞɯɨɞɚ ɞɥɹ ɨɬɛɨɪɚ ɦɚɪɤɟɪɨɜ. ɉɟɪɜɵɣ ɩɨɞɯɨɞ,
ɧɚɡɵɜɚɟɦɵɣ ɮɢɥɶɬɪɚɰɢɟɣ (filter methods), ɢɫɩɨɥɶɡɭɟɬ ɫɬɚɬɢɫɬɢɱɟɫɤɢɟ ɫɜɨɣɫɬɜɚ ɦɚɪɤɟɪɨɜ ɢ ɨɫɧɨɜɚɧ ɧɚ ɩɪɟɞɩɨɥɨɠɟɧɢɢ, ɱɬɨ ɷɬɢ ɫɜɨɣɫɬɜɚ ɪɚɡɥɢɱɧɵ ɞɥɹ
ɪɚɡɥɢɱɧɵɯ ɤɥɚɫɫɨɜ [2]. ȿɫɥɢ ɧɟɤɨɬɨɪɵɣ ɦɚɪɤɟɪ ɢɦɟɟɬ ɪɚɡɥɢɱɧɵɟ ɪɚɫɩɪɟɞɟɥɟɧɢɹ ɜɟɪɨɹɬɧɨɫɬɟɣ ɞɥɹ ɞɜɭɯ ɤɥɚɫɫɨɜ, ɬɨ ɦɟɬɨɞɵ ɮɢɥɶɬɪɚɰɢɢ ɨɫɧɨɜɚɧɵ ɧɚ
ɨɩɪɟɞɟɥɟɧɢɢ ɪɚɫɫɬɨɹɧɢɹ ɦɟɠɞɭ ɪɚɫɩɪɟɞɟɥɟɧɢɹɦɢ. ȼ ɤɚɱɟɫɬɜɟ ɩɨɤɚɡɚɬɟɥɟɣ
ɪɚɡɥɢɱɢɹ ɪɚɫɩɪɟɞɟɥɟɧɢɣ ɜɟɪɨɹɬɧɨɫɬɟɣ ɞɥɹ ɞɜɭɯ ɤɥɚɫɫɨɜ ɫɥɭɠɚɬ ɤɪɢɬɟɪɢɣ
Ɏɢɲɟɪɚ, t-ɫɬɚɬɢɫɬɢɤɚ, ɪɚɫɫɬɨɹɧɢɟ Ʉɭɥɶɛɚɤɚ-Ʌɟɣɛɥɟɪɚ ɢ ɞɪ.
ȼɬɨɪɨɣ ɩɨɞɯɨɞ, ɨɯɜɚɬɵɜɚɸɳɢɣ ɬɚɤ ɧɚɡɵɜɚɟɦɵɟ ɦɟɬɨɞɵ ɭɩɚɤɨɜɤɢ
(wrapper methods), ɜɨ ɦɧɨɝɢɯ ɫɥɭɱɚɹɯ ɨɛɟɫɩɟɱɢɜɚɟɬ ɛɨɥɟɟ ɬɨɱɧɨɟ ɪɟɲɟɧɢɟ,
ɱɟɦ ɦɟɬɨɞɵ ɮɢɥɶɬɪɚɰɢɢ, ɧɨ ɨɞɧɨɜɪɟɦɟɧɧɨ ɬɪɟɛɭɟɬ ɛɨɥɶɲɢɯ ɜɵɱɢɫɥɢɬɟɥɶɧɵɯ ɪɟɫɭɪɫɨɜ [6]. Ɇɟɬɨɞɵ ɭɩɚɤɨɜɤɢ ɦɨɝɭɬ ɛɵɬɶ ɢɫɩɨɥɶɡɨɜɚɧɵ ɢ ɞɥɹ ɡɚɞɚɱ,
ɜ ɤɨɬɨɪɵɯ ɡɧɚɱɟɧɢɹ ɮɟɧɨɬɢɩɚ ɦɨɝɭɬ ɛɵɬɶ ɩɪɨɢɡɜɨɥɶɧɵɦɢ. Ɉɞɧɢɦ ɢɡ ɧɚɢɛɨɥɟɟ ɩɨɩɭɥɹɪɧɵɯ ɦɟɬɨɞɨɜ ɭɩɚɤɨɜɤɢ ɹɜɥɹɟɬɫɹ ɦɟɬɨɞ ɪɟɤɭɪɫɢɜɧɨɝɨ ɭɞɚɥɟɧɢɹ ɦɚɪɤɟɪɨɜ (SVM-RFE), ɨɫɧɨɜɚɧɧɵɣ ɧɚ ɢɫɩɨɥɶɡɨɜɚɧɢɢ ɦɚɲɢɧɵ ɨɩɨɪɧɵɯ
ɜɟɤɬɨɪɨɜ (SVM).
Ɍɪɟɬɢɣ ɩɨɞɯɨɞ ɢɫɩɨɥɶɡɭɟɬ ɜɫɬɪɨɟɧɧɵɟ ɦɟɬɨɞɵ (embedded methods),
ɤɨɬɨɪɵɟ ɩɨɡɜɨɥɹɸɬ ɨɬɨɛɪɚɬɶ ȾɇɄ-ɦɚɪɤɟɪɵ ɜ ɩɪɨɰɟɫɫɟ ɪɟɲɟɧɢɹ ɡɚɞɚɱɢ
ɤɥɚɫɫɢɮɢɤɚɰɢɢ ɢɥɢ ɪɟɝɪɟɫɫɢɢ. ɇɚɢɛɨɥɟɟ ɪɚɫɩɪɨɫɬɪɚɧɟɧɧɵɦɢ ɜɫɬɪɨɟɧɧɵɦɢ
ɦɟɬɨɞɚɦɢ ɹɜɥɹɸɬɫɹ ɦɟɬɨɞɵ ɡɚɦɟɧɵ ɤɜɚɞɪɚɬɢɱɧɨɝɨ ɪɟɝɭɥɹɪɢɡɚɰɢɨɧɧɨɝɨ
(ɫɝɥɚɠɢɜɚɸɳɟɝɨ) ɱɥɟɧɚ ɜ ɰɟɥɟɜɨɣ ɮɭɧɤɰɢɢ ɜ SVM ɟɞɢɧɢɱɧɨɣ ɢɥɢ ɧɭɥɟɜɨɣ
ɧɨɪɦɨɣ. ȼ ɪɚɦɤɚɯ ɜɫɬɪɨɟɧɧɵɯ ɦɟɬɨɞɨɜ ɫɥɟɞɭɟɬ ɨɬɦɟɬɢɬɶ ɪɟɝɪɟɫɫɢɨɧɧɵɣ
ɦɟɬɨɞ LASSO [5], ɹɜɥɹɸɳɢɣɫɹ ɨɞɧɢɦ ɢɡ ɧɚɢɛɨɥɟɟ ɷɮɮɟɤɬɢɜɧɵɯ ɢ ɩɨɩɭɥɹɪɧɵɯ ɦɟɬɨɞɨɜ. Ɉɞɧɚɤɨ ɷɬɨɬ ɦɟɬɨɞ ɢɫɩɨɥɶɡɭɟɬ ɩɪɟɞɩɨɥɨɠɟɧɢɹ ɫɬɚɬɢɫɬɢɱɟɫɤɨɣ ɧɟɡɚɜɢɫɢɦɨɫɬɢ SNP ɢ ɥɢɧɟɣɧɨɫɬɶ ɪɟɝɪɟɫɫɢɨɧɧɨɣ ɡɚɜɢɫɢɦɨɫɬɢ ɮɟɧɨɬɢɩɚ ɨɬ ɡɧɚɱɟɧɢɣ SNP. Ⱦɥɹ ɚɧɚɥɢɡɚ ɧɟɥɢɧɟɣɧɵɯ ɡɚɜɢɫɢɦɨɫɬɟɣ
ɩɪɟɞɥɨɠɟɧ ɰɟɥɵɣ ɪɹɞ ɦɨɞɢɮɢɤɚɰɢɣ LASSO, ɨɫɧɨɜɚɧɧɵɯ ɧɚ ɩɪɟɨɛɪɚɡɨɜɚɧɢɢ ɧɟ ɩɪɨɫɬɪɚɧɫɬɜɚ ɩɪɢɡɧɚɤɨɜ, ɤɚɤ ɜ ɫɬɚɧɞɚɪɬɧɨɦ ɦɟɬɨɞɟ ɨɩɨɪɧɵɯ ɜɟɤɬɨɪɨɜ, ɚ ɩɪɨɫɬɪɚɧɫɬɜɚ ɩɪɢɦɟɪɨɜ ɩɪɢ ɩɨɦɨɳɢ ɨɩɪɟɞɟɥɟɧɧɵɯ ɹɞɟɪ. Ɉɞɧɚɤɨ ɷɬɢ
174
ɦɨɞɢɮɢɤɚɰɢɢ ɜɫɥɟɞɫɬɜɢɟ ɫɥɨɠɧɨɫɬɢ ɨɩɪɟɞɟɥɟɧɢɹ ɹɞɟɪ ɧɟ ɧɚɲɥɢ ɲɢɪɨɤɨɝɨ
ɩɪɢɦɟɧɟɧɢɹ.
ɋɥɟɞɭɟɬ ɬɚɤɠɟ ɫɤɚɡɚɬɶ, ɱɬɨ ɫɭɳɟɫɬɜɭɸɳɢɟ ɦɟɬɨɞɵ ɨɬɛɨɪɚ SNP ɧɟ ɩɨɥɧɨɫɬɶɸ ɭɱɢɬɵɜɚɸɬ ɫɬɪɭɤɬɭɪɭ ɞɚɧɧɵɯ, ɤɨɬɨɪɵɟ ɫɨɫɬɚɜɥɹɸɬ ɡɧɚɱɟɧɢɹ SNP.
Ʉɪɨɦɟ ɬɨɝɨ, ɫɭɳɟɫɬɜɭɸɳɢɟ ɩɨɞɯɨɞɵ ɧɟ ɜɫɟɝɞɚ ɭɱɢɬɵɜɚɸɬ ɩɪɢɧɰɢɩ ɨɬɛɨɪɚ
SNP ɜ ɤɚɠɞɨɣ ɤɨɧɤɪɟɬɧɨɣ ɩɪɢɤɥɚɞɧɨɣ ɡɚɞɚɱɟ.
Ⱥɥɝɨɪɢɬɦ, ɢɫɩɨɥɶɡɭɸɳɢɣ ɪɚɫɩɪɟɞɟɥɟɧɢɟ Ȼɟɪɧɭɥɥɢ
Ɋɚɫɫɦɨɬɪɢɦ ɮɨɪɦɚɥɶɧɭɸ ɩɨɫɬɚɧɨɜɤɭ ɡɚɞɚɱɢ ɨɩɪɟɞɟɥɟɧɢɹ ɩɨɞɦɧɨɠɟɫɬɜɚ SNP-ɦɚɪɤɟɪɨɜ. ɉɪɟɞɩɨɥɨɠɢɦ, ɱɬɨ ɢɦɟɟɬɫɹ ɬɚɤɨɟ ɦɧɨɠɟɫɬɜɨ ɧɚɛɥɸɞɟɧɢɣ ( z i , yi ), i  1,..., n , ɱɬɨ z i  {0,1}m , yi  {1,1} . Ɋɚɫɫɦɨɬɪɢɦ ɞɜɚ ɩɨɞɦɧɨ-
ɠɟɫɬɜɚ ɩɟɪɟɦɟɧɧɵɯ z i  {0,1}m ɢ z i  {0,1}m ɬɚɤ, ɱɬɨ ɤɚɠɞɚɹ ɩɟɪɟɦɟɧɧɚɹ
z i ɫɨɨɬɜɟɬɫɬɜɭɟɬ ɤɥɚɫɫɭ y  1 ɢ z i ɫɨɨɬɜɟɬɫɬɜɭɟɬ ɤɥɚɫɫɭ = 1.
ȼ [3] ɩɨɤɚɡɚɧɨ, ɱɬɨ ɫɨɜɦɟɫɬɧɚɹ ɮɭɧɤɰɢɹ ɪɚɫɩɪɟɞɟɥɟɧɢɹ ɜɟɪɨɹɬɧɨɫɬɟɣ
ɞɜɨɢɱɧɵɯ ɤɨɪɪɟɥɢɪɨɜɚɧɧɵɯ ɩɟɪɟɦɟɧɧɵɯ Z1, …, Zm ɦɨɠɟɬ ɛɵɬɶ ɡɚɩɢɫɚɧɚ
ɫɥɟɞɭɸɳɢɦ ɨɛɪɚɡɨɦ:
 m

p( z1, ..., zm )    p zji q1j  zi  
 j 1



Ɂɞɟɫɶ


 1  ij uiu j   ijk uiu j uk  ...  1,2, ..., mu1  um  .
 i j

i jk


p j  Pr{Z j  1}  1  q j , j  1,..., m,
Uj 
Z j  pj
p jqj
, uj 
zj  pj
p jqj
,  j1 j2 ... jk  E U j1U j2 ...U jk  .
Ɂɚɦɟɬɢɦ, ɱɬɨ ɩɟɪɜɵɣ ɫɨɦɧɨɠɢɬɟɥɶ ɜ ɩɪɚɜɨɣ ɱɚɫɬɢ ɜɵɪɚɠɟɧɢɹ ɞɥɹ
p ( z1 , ..., z m ) ɩɪɟɞɫɬɚɜɥɹɟɬ ɫɨɛɨɣ ɫɨɜɦɟɫɬɧɭɸ ɮɭɧɤɰɢɸ ɪɚɫɩɪɟɞɟɥɟɧɢɹ ɜɟɪɨɹɬɧɨɫɬɟɣ ɩɪɢ ɭɫɥɨɜɢɢ ɧɟɡɚɜɢɫɢɦɨɫɬɢ ɜɫɟɯ ɩɟɪɟɦɟɧɧɵɯ. ȼɬɨɪɨɣ ɫɨɦɧɨɠɢɬɟɥɶ ɜɤɥɸɱɚɟɬ ɜ ɫɟɛɹ ɜɫɟ ɫɬɚɬɢɫɬɢɱɟɫɤɢɟ ɤɨɪɪɟɥɹɰɢɢ, ɧɚɱɢɧɚɹ ɫ ɩɟɪɜɨɝɨ
ɩɨɪɹɞɤɚ ɢ ɡɚɤɚɧɱɢɜɚɹ m-ɦ ɩɨɪɹɞɤɨɦ. Ɂɚɦɟɬɢɦ ɬɚɤɠɟ, ɱɬɨ Uj – ɫɬɚɧɞɚɪɬɢɡɨɜɚɧɧɚɹ ɩɟɪɟɦɟɧɧɚɹ, ɤɨɬɨɪɚɹ ɨɰɟɧɢɜɚɟɬɫɹ ɩɨɫɥɟ ɨɰɟɧɤɢ j.
ɋɨɨɬɜɟɬɫɬɜɭɸɳɢɟ ɨɰɟɧɤɢ ɜɟɪɨɹɬɧɨɫɬɟɣ ɢ ɤɨɷɮɮɢɰɢɟɧɬɨɜ ɤɨɪɪɟɥɹɰɢɣ
pj, uj,  j1 j2 ... jk ɞɥɹ ɤɥɚɫɫɚ i (ɡɞɟɫɶ i ɩɪɢɧɢɦɚɟɬ ɡɧɚɱɟɧɢɹ –1 ɢ 1), ɨɛɨɡɧɚɱɟɧ175
 (i )  (i )  ( i )
ɧɵɟ ɤɚɤ p j , u j ,  j1 j2 ... jk , ɜɵɱɢɫɥɹɸɬɫɹ ɩɨɫɪɟɞɫɬɜɨɦ ɫɥɟɞɭɸɳɢɯ ɜɵɪɚɠɟ-
ɧɢɣ:
 (i )
 (i )
 (i )
 (i )  (i )
p j  x (jli ) / n(i )  1  q j , j  1,..., m,  j1 j2 ... jk  u j1l  u jk l / n(i ) ,
a
 (i )
ɝɞɟ n(i ) – ɱɢɫɥɨ ɧɚɛɥɸɞɟɧɢɣ ɜ ɤɥɚɫɫɟ i ɢ u jl

a

 (i )
x (ji )  p j
– l-ɟ ɧɚɛɥɸɞɚɟɦɨɟ
  
(i ) (i )
pj qj
ɡɧɚɱɟɧɢɟ ɩɟɪɟɦɟɧɧɨɣ Uj , ɩɪɢɧɚɞɥɟɠɚɳɟɣ ɤɥɚɫɫɭ i.
ɋɧɚɱɚɥɚ ɪɚɫɫɦɨɬɪɢɦ ɨɫɧɨɜɧɵɟ ɷɥɟɦɟɧɬɵ ɚɥɝɨɪɢɬɦɚ. Ɉɛɨɡɧɚɱɢɦ ɫɪɟɞɧɢɣ ɜɨɡɪɚɫɬ ɪɨɞɢɬɟɥɟɣ (Steptoe ɢ Morex) ɤɚɤ a0. ɍɩɨɪɹɞɨɱɢɦ ɜɫɟ ɷɤɡɟɦɩɥɹɪɵ ɜ ɫɨɨɬɜɟɬɫɬɜɢɢ ɫɨ ɡɧɚɱɟɧɢɟɦ ɢɯ ɮɟɧɨɬɢɩɚ ɢ ɪɚɡɞɟɥɢɦ ɩɨɥɭɱɟɧɧɨɟ ɭɩɨɪɹɞɨɱɟɧɧɨɟ ɦɧɨɠɟɫɬɜɨ ɧɚ ɞɜɚ ɩɨɞɦɧɨɠɟɫɬɜɚ, ɫɨɨɬɜɟɬɫɬɜɭɸɳɢɟ ɞɜɭɦ
ɤɥɚɫɫɚɦ. ɉɟɪɜɵɣ ɤɥɚɫɫ, ɨɛɨɡɧɚɱɟɧɧɵɣ (1), ɫɨɫɬɨɢɬ ɢɡ ɷɤɡɟɦɩɥɹɪɨɜ, ɜɨɡɪɚɫɬ
ɤɨɬɨɪɵɯ ɧɟ ɩɪɟɜɵɲɚɟɬ ɡɧɚɱɟɧɢɹ a0. ȼɬɨɪɨɣ ɤɥɚɫɫ, ɨɛɨɡɧɚɱɟɧɧɵɣ (2), ɫɨɫɬɨɢɬ ɢɡ ɷɤɡɟɦɩɥɹɪɨɜ, ɜɨɡɪɚɫɬ ɤɨɬɨɪɵɯ ɩɪɟɜɵɲɚɟɬ ɡɧɚɱɟɧɢɟ a0. Ⱦɥɹ ɤɚɠɞɨɝɨ ɤɥɚɫɫɚ i = 1, 2 ɢ ɞɥɹ ɤɚɠɞɨɝɨ SNP j ɢɡ ɦɧɨɠɟɫɬɜɚ S ɨɰɟɧɢɦ ɜɟɪɨɹɬɧɨɫɬɶ
p(ji ) ɬɨɝɨ, ɱɬɨ ɷɤɡɟɦɩɥɹɪ ɢɡ i-ɝɨ ɤɥɚɫɫɚ ɧɚɫɥɟɞɭɟɬ ɨɩɪɟɞɟɥɟɧɧɨɟ ɡɧɚɱɟɧɢɟ
ɞɥɹ j-ɝɨ SNP, ɫɤɚɠɟɦ 0 ɞɥɹ ɨɩɪɟɞɟɥɟɧɧɨɫɬɢ, ɬ. ɟ. p(ji )  Pr{Z j  0 | ɤɥɚɫɫ
ɟɫɬɶ i}. Ɍɚɤɠɟ ɨɰɟɧɢɦ ɩɚɪɚɦɟɬɪɵ  j1 j2 ... jk ɢ uj ɞɥɹ ɜɵɱɢɫɥɟɧɢɹ ɜɟɪɨɹɬɧɨɫɬɢ
pk ( z ( S ) | ɤɥɚɫɫ ɟɫɬɶ i) k-ɝɨ ɩɪɢɦɟɪɚ ɢɡ ɤɥɚɫɫɚ i. Ɂɞɟɫɶ z(S ) ɟɫɬɶ ɜɟɤɬɨɪ
SNP, ɧɨɦɟɪɚ ɤɨɬɨɪɵɯ ɩɪɢɧɚɞɥɟɠɚɬ ɦɧɨɠɟɫɬɜɭ S. ɇɚɩɪɢɦɟɪ, ɟɫɥɢ
S  {12, 43, 67} , ɬɨ z ( S )  ( z12 , z43 , z67 ) . Ʉɨɪɨɬɤɨ ɨɛɨɡɧɚɱɢɦ ɷɬɭ ɜɟɪɨɹɬɧɨɫɬɶ
ɤɚɤ pk( i ) ( S ) . Ɂɧɚɹ ɡɧɚɱɟɧɢɹ ɮɟɧɨɬɢɩɨɜ gk(i ) , k  1, ..., n ( i ) , ɩɪɢɦɟɪɨɜ ɜ ɤɚɠɞɨɦ
ɤɥɚɫɫɟ i, ɦɨɠɧɨ ɧɚɣɬɢ ɨɠɢɞɚɟɦɵɟ ɡɧɚɱɟɧɢɹ ɮɟɧɨɬɢɩɚ R(S, i), i = 1, 2, ɫɥɟɞɭɸɳɢɦ ɨɛɪɚɡɨɦ:
R( S , i )  g k( i ) Pk( i ) ( S ),
n( i )
ɝɞɟ Pk( i ) ( S ) 
k 1

(i )
k
p (S )
n( i )
j 1
p (ji ) ( S )
.
Ɇɵ ɢɫɩɨɥɶɡɭɟɦ ɨɬɧɨɲɟɧɢɟ Pk( i ) ( S ) ɜɦɟɫɬɨ pk( i ) ( S ), ɞɥɹ ɬɨɝɨ ɱɬɨɛɵ
ɨɛɟɫɩɟɱɢɬɶ ɭɫɥɨɜɢɟ ɧɨɪɦɢɪɨɜɤɢ P1( i ) ( S )  ...  Pn((ii)) ( S )  1 .
Ɋɚɡɧɨɫɬɶ ɢɥɢ ɨɬɧɨɲɟɧɢɟ ɨɠɢɞɚɟɦɵɯ ɡɧɚɱɟɧɢɣ R( S ,1) ɢ R( S ,2) ɨɩɪɟɞɟɥɹɟɬ, ɤɚɤ ɦɧɨɠɟɫɬɜɨ SNP S ɜɥɢɹɟɬ ɧɚ ɪɚɡɥɢɱɢɟ ɩɪɢɦɟɪɨɜ ɢɡ ɪɚɡɥɢɱɧɵɯ
176
ɤɥɚɫɫɨɜ. ɑɟɦ ɛɨɥɶɲɟ ɚɛɫɨɥɸɬɧɨɟ ɡɧɚɱɟɧɢɟ ɪɚɡɧɨɫɬɢ R ( S ,1)  R ( S , 2) , ɬɟɦ
ɛɨɥɶɲɟ ɡɧɚɱɢɦɨɫɬɶ ɦɧɨɠɟɫɬɜɚ S. Ɍɨ ɠɟ ɫɚɦɨɟ ɦɨɠɧɨ ɫɤɚɡɚɬɶ ɢ ɨɛ ɨɬɧɨɲɟɧɢɢ R( S ,1) / R( S ,2) .
ȼɬɨɪɨɣ ɨɫɨɛɟɧɧɨɫɬɶɸ ɩɪɟɞɥɚɝɚɟɦɨɝɨ ɚɥɝɨɪɢɬɦɚ ɹɜɥɹɟɬɫɹ ɩɪɟɞɩɨɥɨɠɟɧɢɟ ɨ ɬɨɦ, ɱɬɨ ɧɚɦ ɢɡɜɟɫɬɟɧ ɧɚɢɛɨɥɟɟ ɜɚɠɧɵɣ SNP, ɫɤɚɠɟɦ, SNP ɫ ɧɨɦɟɪɨɦ s0. ɉɪɢ ɟɝɨ ɢɫɩɨɥɶɡɨɜɚɧɢɢ ɦɨɠɧɨ ɩɨɥɭɱɢɬɶ ɡɚɞɚɧɧɨɟ ɱɢɫɥɨ r SNP,
ɤɨɬɨɪɵɟ ɦɨɝɭɬ ɤɨɪɪɟɥɢɪɨɜɚɬɶ ɫ SNP ɫ ɧɨɦɟɪɨɦ s0 ɢ ɫɨɜɦɟɫɬɧɨ ɜɥɢɹɬɶ ɧɚ
ɪɚɡɧɨɫɬɶ ɨɠɢɞɚɟɦɵɯ ɡɧɚɱɟɧɢɣ R( S ,1) ɢ R( S ,2) .
ɋɥɟɞɭɸɳɢɣ ɲɚɝ – ɩɪɢɦɟɧɟɧɢɟ ɦɨɞɢɮɢɤɚɰɢɢ ɚɥɝɨɪɢɬɦɚ Add-Del [1, 7]
ɞɥɹ ɪɟɲɟɧɢɹ ɡɚɞɚɱɢ ɨɬɛɨɪɚ SNP. Ɉɛɨɡɧɚɱɢɦ D ( S )  R ( S ,1)  R ( S , 2) .
ɂɬɨɝɨɜɵɣ ɚɥɝɨɪɢɬɦ ɩɨɢɫɤɚ ɡɧɚɱɢɦɵɯ SNP-ɦɚɪɤɟɪɨɜ ɩɪɟɞɫɬɚɜɥɟɧ
ɧɢɠɟ.
Ⱥɥɝɨɪɢɬɦ 1.
ȼɯɨɞɧɵɟ ɞɚɧɧɵɟ: s0 (ɧɚɢɛɨɥɟɟ ɜɚɠɧɵɣ SNP), s (ɱɢɫɥɨ ɡɧɚɱɢɦɵɯ SNP),
T (ɨɛɭɱɚɸɳɟɟ ɦɧɨɠɟɫɬɜɨ), N = {1, …, m}, la (ɱɢɫɥɨ ɞɨɛɚɜɥɹɟɦɵɯ SNP
ɜ ɚɥɝɨɪɢɬɦɟ Add-Del), ld (ɱɢɫɥɨ ɭɞɚɥɹɟɦɵɯ SNP ɜ ɚɥɝɨɪɢɬɦɟ Add-Del).
ȼ ɵ ɯ ɨ ɞ ɧ ɵ ɟ ɞ ɚ ɧ ɧ ɵ ɟ : S (ɦɧɨɠɟɫɬɜɨ ɡɧɚɱɢɦɵɯ SNP).
ɍɩɨɪɹɞɨɱɢɦ T ɜ ɫɨɨɬɜɟɬɫɬɜɢɢ ɫɨ ɡɧɚɱɟɧɢɹɦɢ ɮɟɧɨɬɢɩɚ ɞɥɹ ɤɚɠɞɨɝɨ
ɤɥɚɫɫɚ.
l  1 , S  s0 .
ɐ ɢ ɤ ɥ : ȼɵɱɢɫɥɹɟɦ D(S).
Ⱦɨɛɚɜɥɹɟɦ k-ɣ SNP ɤ ɦɧɨɠɟɫɬɜɭ S , ɬ. ɟ. S   S  k , k  N / S.
ȼɵɱɢɫɥɹɟɦ D ( S  ) ɢ Qk  D ( S  )  D ( S ) .
l  l + 1; S  S  kopt .
Ⱦɨ ɬɟɯ ɩɨɪ, ɩɨɤɚ l < la l  1.
ɐ ɢ ɤ ɥ : ȼɵɱɢɫɥɹɟɦ D( S ) .
ɍɞɚɥɹɟɦ j-ɣ SNP ɢɡ ɦɧɨɠɟɫɬɜɚ S , ɬ. ɟ. S   S / j , j  S .
ȼɵɱɢɫɥɹɟɦ D ( S  ) ɢ Qk  D ( S  )  D ( S ) .
l  l + 1; S  S / jopt .
Ⱦɨ ɬɟɯ ɩɨɪ, ɩɨɤɚ l  ld ɢɥɢ card(S) < s.
Ʉɨɧɟɰ ɚɥɝɨɪɢɬɦɚ.
177
Ⱦɪɨɛɧɨ-ɥɢɧɟɣɧɨɟ ɩɪɨɝɪɚɦɦɢɪɨɜɚɧɢɟ ɞɥɹ ɩɨɢɫɤɚ SNP-ɦɚɪɤɟɪɨɜ
ɉɪɟɞɫɬɚɜɥɟɧɢɟ Ȼɚɯɚɞɭɪɚ – ɷɬɨ ɞɨɫɬɚɬɨɱɧɨ ɨɛɳɢɣ ɢɧɫɬɪɭɦɟɧɬ ɞɥɹ ɜɵɱɢɫɥɟɧɢɹ ɫɨɜɦɟɫɬɧɨɝɨ ɪɚɫɩɪɟɞɟɥɟɧɢɹ ɜɟɪɨɹɬɧɨɫɬɟɣ ɫ ɭɱɟɬɨɦ ɤɨɪɪɟɥɹɰɢɢ
ɞɜɨɢɱɧɵɯ ɩɟɪɟɦɟɧɧɵɯ. ȿɝɨ ɝɥɚɜɧɵɣ ɨɫɧɨɜɧɨɣ ɧɟɞɨɫɬɚɬɨɤ – ɱɪɟɡɜɵɱɚɣɧɨ
ɛɨɥɶɲɨɟ ɱɢɫɥɨ ɩɚɪɚɦɟɬɪɨɜ, ɤɨɬɨɪɵɟ ɧɟɨɛɯɨɞɢɦɨ ɜɵɱɢɫɥɹɬɶ ɢ ɭɱɢɬɵɜɚɬɶ
ɩɪɢ ɨɩɪɟɞɟɥɟɧɢɢ ɫɨɜɦɟɫɬɧɨɣ ɜɟɪɨɹɬɧɨɫɬɢ. ɋ ɭɱɟɬɨɦ ɷɬɨɣ ɫɥɨɠɧɨɫɬɢ ɩɪɟɞɥɨɠɟɧɨ ɭɫɟɱɟɧɧɨɟ ɪɚɫɩɪɟɞɟɥɟɧɢɟ, ɜ ɫɜɨɸ ɨɱɟɪɟɞɶ, ɭɱɢɬɵɜɚɸɳɟɟ ɬɨɥɶɤɨ
ɤɨɪɪɟɥɹɰɢɢ ɜɬɨɪɨɝɨ ɩɨɪɹɞɤɚ, ɬ. ɟ. ɤɨɪɪɟɥɹɰɢɢ ɦɟɠɞɭ ɩɚɪɚɦɢ ɞɜɨɢɱɧɵɯ
ɜɟɤɬɨɪɨɜ. ȼ ɷɬɨɦ ɫɥɭɱɚɟ ɫɨɜɦɟɫɬɧɚɹ ɜɟɪɨɹɬɧɨɫɬɶ ɩɪɢɧɢɦɚɟɬ ɜɢɞ:



p ( S )    pizi qi1 zi  1   ij uiu j  .
 i S
  i  j , i, j  S

Ɍɚɤɨɟ ɩɪɟɞɫɬɚɜɥɟɧɢɟ, ɞɟɣɫɬɜɢɬɟɥɶɧɨ, ɫɭɳɟɫɬɜɟɧɧɨ ɭɩɪɨɳɚɟɬ ɡɚɞɚɱɭ
ɜɵɱɢɫɥɟɧɢɹ ɫɨɜɦɟɫɬɧɵɯ ɜɟɪɨɹɬɧɨɫɬɟɣ. Ɉɞɧɚɤɨ ɝɥɚɜɧɨɟ ɩɪɟɩɹɬɫɬɜɢɟ ɞɥɹ
ɟɝɨ ɢɫɩɨɥɶɡɨɜɚɧɢɹ ɡɚɤɥɸɱɚɟɬɫɹ ɜ ɬɨɦ, ɱɬɨ ɩɨɥɭɱɚɟɦɵɟ ɜɟɪɨɹɬɧɨɫɬɢ ɦɨɝɭɬ
ɛɵɬɶ ɨɬɪɢɰɚɬɟɥɶɧɵɦɢ. Ʉɪɨɦɟ ɬɨɝɨ, ɧɟɜɨɡɦɨɠɧɨ ɨɰɟɧɢɬɶ, ɤɚɤ ɭɞɚɥɟɧɢɟ
ɤɨɪɪɟɥɹɰɢɣ ɛɨɥɟɟ ɜɵɫɨɤɨɝɨ ɩɨɪɹɞɤɚ ɜɥɢɹɟɬ ɧɚ ɫɨɜɦɟɫɬɧɭɸ ɜɟɪɨɹɬɧɨɫɬɶ.
Ⱦɥɹ ɬɨɝɨ ɱɬɨɛɵ ɢɡɛɟɠɚɬɶ ɜɵɲɟɩɟɪɟɱɢɫɥɟɧɧɵɟ ɩɪɨɛɥɟɦɵ, ɩɪɟɞɥɚɝɚɟɬɫɹ
ɢɫɩɨɥɶɡɨɜɚɬɶ ɝɪɚɧɢɰɵ Ɏɪɟɲɟ ɞɥɹ ɫɨɜɦɟɫɬɧɵɯ ɜɟɪɨɹɬɧɨɫɬɟɣ, ɤɨɬɨɪɵɟ
ɢɦɟɸɬ ɜɢɞ:


max 0, p( z j )  (mS  1)   p( S )  min p( z j ),
jS
 jS

ɝɞɟ p ( z j ) – ɦɚɪɝɢɧɚɥɶɧɵɟ ɜɟɪɨɹɬɧɨɫɬɢ ɨɬɞɟɥɶɧɨɝɨ ɡɧɚɱɟɧɢɹ ɝɟɧɨɬɢɩɚ ɞɥɹ
j-ɝɨ SNP.
Ɉɬɫɸɞɚ ɦɨɠɧɨ ɡɚɩɢɫɚɬɶ ɧɟɪɚɜɟɧɫɬɜɚ Ɏɪɟɲɟ, ɩɪɢɧɢɦɚɹ ɜɨ ɜɧɢɦɚɧɢɟ
ɡɧɚɱɟɧɢɹ p(zj):




z 1 z
z 1 z
max 0,  p j j q j j  (mS  1)   p(S )  min p j j q j j .
jS

 j  S
(1)
Ɍɚɤɢɦ ɨɛɪɚɡɨɦ, ɬɨɱɧɚɹ ɜɟɪɨɹɬɧɨɫɬɶ p(S ) ɤɚɠɞɨɝɨ ɩɪɢɦɟɪɚ ɧɟ ɢɡɜɟɫɬɧɚ, ɧɨ ɦɵ ɡɧɚɟɦ, ɱɬɨ ɨɧɚ ɥɟɠɢɬ ɜ ɨɩɪɟɞɟɥɟɧɧɨɦ ɢɧɬɟɪɜɚɥɟ, ɝɪɚɧɢɰɵ ɤɨɬɨɪɨɝɨ ɜɵɱɢɫɥɹɸɬɫɹ ɞɥɹ ɤɚɠɞɨɝɨ S ɢ ɤɚɠɞɨɝɨ ɷɤɡɟɦɩɥɹɪɚ. Ɉɬɫɸɞɚ ɫɥɟɞɭɟɬ,
ɱɬɨ ɧɟɥɶɡɹ ɜɵɱɢɫɥɢɬɶ ɨɠɢɞɚɟɦɨɟ ɡɧɚɱɟɧɢɟ ɮɟɧɨɬɢɩɚ R( S , i ) , i = 1, 2, ɞɥɹ
ɤɚɠɞɨɝɨ ɤɥɚɫɫɚ. Ɇɨɠɧɨ ɬɨɥɶɤɨ ɧɚɣɬɢ ɝɪɚɧɢɰɵ ɨɠɢɞɚɟɦɨɝɨ ɡɧɚɱɟɧɢɹ ɮɟɧɨ178
ɬɢɩɚ ɞɥɹ ɤɚɠɞɨɝɨ ɤɥɚɫɫɚ R ( S , i ) ɢ R ( S , i ), ɤɨɬɨɪɵɟ ɨɩɪɟɞɟɥɹɸɬɫɹ ɩɭɬɟɦ
ɪɟɲɟɧɢɹ ɫɥɟɞɭɸɳɢɯ ɡɚɞɚɱ ɞɪɨɛɧɨ-ɥɢɧɟɣɧɨɝɨ ɩɪɨɝɪɚɦɦɢɪɨɜɚɧɢɹ:
(i ) (i )
(i ) (i )
nk1gk pk ( S )
nk1gk pk ( S )

(
,
)
max
,
,
R
S
i
(i )
(
)
i
(i )
(i )
(S )
pk( i ) ( S )
nj 1 p j ( S )
nj 1 p j ( S )
R( S , i )  min
(i )
pk
(i )
(i )
ɩɪɢ ɨɝɪɚɧɢɱɟɧɢɹɯ (1).
ɗɬɢ ɡɚɞɚɱɢ ɦɨɠɧɨ ɫɜɟɫɬɢ ɤ ɨɛɵɱɧɨɦɭ ɥɢɧɟɣɧɨɦɭ ɩɪɨɝɪɚɦɦɢɪɨɜɚɧɢɸ
ɢɫɩɨɥɶɡɭɹ ɩɪɟɨɛɪɚɡɨɜɚɧɢɟ ɑɚɪɧɫɚ ɢ Ʉɭɩɟɪɚ [4]. ȼ ɱɚɫɬɧɨɫɬɢ, ɩɟɪɜɚɹ ɡɚɞɚɱɚ
ɨɩɬɢɦɢɡɚɰɢɢ ɩɪɢɧɢɦɚɟɬ ɜɢɞ:
R( S , i )  min
gk(i ) yk ,
(i )
n( i )
pk ( S )
ɩɪɢ ɨɝɪɚɧɢɱɟɧɢɹɯ

y j  t  min p j j q j
jS
z
1 z j
k 1
  0,
j  1, ..., n ( i ) ,


z 1 z
 y j  t  max 0, p j j q j j  (mS  1)   0, j  1,..., n ( i ) ,
 jS

y
n( i )
j 1
j
 1, y j  0, j  1, ..., n(i ) .
Ɂɚɞɚɱɚ ɞɥɹ ɜɵɱɢɫɥɟɧɢɹ ɜɟɪɯɧɟɣ ɝɪɚɧɢɰɵ R( S , i) ɦɨɠɟɬ ɛɵɬɶ ɡɚɩɢɫɚɧɚ
ɚɧɚɥɨɝɢɱɧɵɦ ɨɛɪɚɡɨɦ.
ɋɥɟɞɭɸɳɢɣ ɜɨɩɪɨɫ, ɤɨɬɨɪɵɣ ɧɟɨɛɯɨɞɢɦɨ ɪɟɲɢɬɶ, ɡɚɤɥɸɱɚɟɬɫɹ ɜ ɬɨɦ,
ɤɚɤ ɫɪɚɜɧɢɬɶ ɩɨɞɦɧɨɠɟɫɬɜɚ S. Ɉɬɦɟɬɢɦ, ɱɬɨ ɡɧɚɱɟɧɢɹ R ɞɥɹ ɩɟɪɜɨɝɨ ɤɥɚɫɫɚ ɦɟɧɶɲɟ, ɱɟɦ ɫɨɨɬɜɟɬɫɬɜɭɸɳɢɟ ɡɧɚɱɟɧɢɹ R ɞɥɹ ɜɬɨɪɨɝɨ ɤɥɚɫɫɚ. ɇɟɨɛɯɨɞɢɦɨ ɧɚɣɬɢ ɬɚɤɢɟ SNP, ɤɨɬɨɪɵɟ ɜ ɧɚɢɛɨɥɶɲɟɣ ɫɬɟɩɟɧɢ ɞɚɸɬ ɪɚɡɥɢɱɢɟ ɦɟɠɞɭ ɨɠɢɞɚɟɦɵɦɢ ɡɧɚɱɟɧɢɹɦɢ ɮɟɧɨɬɢɩɨɜ ɪɚɡɥɢɱɧɵɯ ɤɥɚɫɫɨɜ R.
Ⱦɪɭɝɢɦɢ ɫɥɨɜɚɦɢ, ɧɟɨɛɯɨɞɢɦɨ ɧɚɣɬɢ ɬɚɤɨɟ ɩɨɞɦɧɨɠɟɫɬɜɨ S ɢɡ ɦɧɨɠɟɫɬɜɚ ɜɫɟɯ ɩɨɞɦɧɨɠɟɫɬɜ SNP, ɤɨɬɨɪɨɟ ɨɛɟɫɩɟɱɢɜɚɟɬ ɧɚɢɛɨɥɶɲɭɸ ɪɚɡɧɨɫɬɶ
D ( S )  R ( S ,1)  R ( S , 2) .
Ɉɞɧɚɤɨ ɬɨɱɧɵɟ ɡɧɚɱɟɧɢɹ R( S ,1) ɢ R( S ,2) ɧɟɢɡɜɟɫɬɧɵ, ɚ ɢɡɜɟɫɬɧɵ ɬɨɥɶɤɨ ɢɯ ɝɪɚɧɢɰɵ. Ɉɬɫɸɞɚ ɫɥɟɞɭɟɬ, ɱɬɨ ɪɚɡɥɢɱɧɵɟ ɫɬɪɚɬɟɝɢɢ ɩɪɢɧɹɬɢɹ ɪɟɲɟɧɢɣ
ɦɨɝɭɬ ɛɵɬɶ ɩɪɢɦɟɧɟɧɵ ɞɥɹ ɪɟɲɟɧɢɹ ɡɚɞɚɱɢ. Ɇɵ ɢɫɩɨɥɶɡɭɟɦ ɦɚɤɫɢɦɢɧɧɭɸ
ɫɬɪɚɬɟɝɢɸ, ɤɨɝɞɚ ɨɫɭɳɟɫɬɜɥɹɟɬɫɹ ɩɨɢɫɤ ɧɚɢɛɨɥɶɲɟɣ ɪɚɡɧɨɫɬɢ D(S) ɜ ɧɚɢ179
ɯɭɞɲɢɯ ɭɫɥɨɜɢɹɯ, ɬ. ɟ. ɩɪɢ ɦɢɧɢɦɚɥɶɧɨɦ D(S) ɩɨ ɜɟɪɨɹɬɧɨɫɬɹɦ p(i)(S). Ɍɚɤɢɦ ɨɛɪɚɡɨɦ, ɧɟɨɛɯɨɞɢɦɨ ɦɚɤɫɢɦɢɡɢɪɨɜɚɬɶ R(S, 1) ɢ ɦɢɧɢɦɢɡɢɪɨɜɚɬɶ R(S, 2),
ɬ. ɟ. ɧɟɨɛɯɨɞɢɦɨ ɧɚɣɬɢ R( S ,1) ɢ R ( S , 2). ȼ ɢɬɨɝɟ ɧɚɯɨɞɢɦ ɨɩɬɢɦɚɥɶɧɨɟ
ɩɨɞɦɧɨɠɟɫɬɜɨ S ɩɭɬɟɦ ɪɟɲɟɧɢɹ ɡɚɞɚɱɢ max R( S ,1)  R( S ,2) .
S
ȼ ɢɬɨɝɟ, ɫɦɵɫɥ ɦɚɤɫɢɦɢɧɧɨɣ ɫɬɪɚɬɟɝɢɢ ɜ ɪɚɫɫɦɚɬɪɢɜɚɟɦɨɣ ɡɚɞɚɱɟ ɡɚɤɥɸɱɚɟɬɫɹ ɜ ɬɨɦ, ɱɬɨ ɦɵ ɢɳɟɦ ɪɚɫɩɪɟɞɟɥɟɧɢɟ ɜɟɪɨɹɬɧɨɫɬɟɣ p(S), ɭɞɨɜɥɟɬɜɨɪɹɸɳɟɟ ɧɟɪɚɜɟɧɫɬɜɚɦ Ɏɪɟɲɟ, ɤɨɬɨɪɨɟ ɦɢɧɢɦɢɡɢɪɭɟɬ ɚɛɫɨɥɸɬɧɨɟ ɡɧɚɱɟɧɢɟ ɪɚɡɧɨɫɬɢ R( S , 1)  R( S , 2) . Ɂɚɬɟɦ ɦɵ ɦɚɤɫɢɦɢɡɢɪɭɟɦ ɧɚɢɦɟɧɶɲɢɟ
ɡɧɚɱɟɧɢɹ ɪɚɡɧɨɫɬɟɣ ɩɨ S .
ȼ ɪɟɡɭɥɶɬɚɬɟ ɩɨɥɭɱɚɟɦ ɞɨɫɬɚɬɨɱɧɨ ɩɪɨɫɬɭɸ ɡɚɞɚɱɭ ɥɢɧɟɣɧɨɝɨ ɩɪɨɝɪɚɦɦɢɪɨɜɚɧɢɹ, ɪɟɲɟɧɢɟ ɤɨɬɨɪɨɣ ɧɟ ɩɪɟɞɫɬɚɜɥɹɟɬ ɫɥɨɠɧɨɫɬɢ.
ȼɵɱɢɫɥɢɬɟɥɶɧɚɹ ɫɥɨɠɧɨɫɬɶ ɩɨɥɭɱɟɧɧɨɣ ɡɚɞɚɱɢ ɧɟɫɪɚɜɧɢɦɚ ɫɨ ɫɥɨɠɧɨɫɬɶɸ ɩɪɟɞɫɬɚɜɥɟɧɢɹ Ȼɚɯɚɞɭɪɚ, ɤɨɬɨɪɚɹ ɫɬɪɟɦɢɬɟɥɶɧɨ ɪɚɫɬɟɬ ɫ ɭɜɟɥɢɱɟɧɢɟɦ ɱɢɫɥɚ ɷɥɟɦɟɧɬɨɜ ɜ ɩɨɞɦɧɨɠɟɫɬɜɟ S. ȼ ɬɨ ɠɟ ɜɪɟɦɹ ɬɚɤɨɣ ɩɨɞɯɨɞ ɢɦɟɟɬ ɨɞɢɧ ɫɭɳɟɫɬɜɟɧɧɵɣ ɧɟɞɨɫɬɚɬɨɤ: ɜɟɪɯɧɹɹ ɝɪɚɧɢɰɚ ɧɟɪɚɜɟɧɫɬɜɚ Ɏɪɟɲɟ
z
1 zj
ɡɚɜɢɫɢɬ ɨɬ ɦɢɧɢɦɚɥɶɧɨɝɨ ɡɧɚɱɟɧɢɹ p j j q j
ɜ ɩɪɟɞɟɥɚɯ ɩɨɞɦɧɨɠɟɫɬɜɚ S,
ɤɨɬɨɪɨɟ ɦɨɠɟɬ ɛɵɬɶ ɨɞɢɧɚɤɨɜɵɦ ɞɥɹ ɪɚɡɥɢɱɧɵɯ S, ɱɬɨ ɩɪɢɜɨɞɢɬ ɤ ɨɞɢɧɚɤɨɜɵɦ ɜɟɪɨɹɬɧɨɫɬɹɦ ɝɟɧɨɬɢɩɨɜ ɨɬɞɟɥɶɧɵɯ ɷɤɡɟɦɩɥɹɪɨɜ ɢ ɤ ɫɥɨɠɧɨɫɬɢ ɜɵɞɟɥɟɧɢɹ «ɧɚɢɥɭɱɲɟɝɨ» ɩɨɞɦɧɨɠɟɫɬɜɚ S. Ⱦɥɹ ɭɫɬɪɚɧɟɧɢɹ ɷɬɨɝɨ ɧɟɞɨɫɬɚɬɤɚ
ɧɟɨɛɯɨɞɢɦɨ ɭɬɨɱɧɟɧɢɟ ɝɪɚɧɢɰ, ɱɬɨ, ɜ ɫɜɨɸ ɨɱɟɪɟɞɶ, ɩɪɢɜɟɞɟɬ ɤ ɩɨɜɵɲɟɧɢɸ ɜɵɱɢɫɥɢɬɟɥɶɧɨɣ ɫɥɨɠɧɨɫɬɢ ɜɫɟɣ ɡɚɞɚɱɢ.
ɍɬɨɱɧɟɧɧɚɹ ɦɨɞɟɥɶ. Ɉɫɧɨɜɧɚɹ ɢɞɟɹ ɩɪɟɞɥɚɝɚɟɦɨɣ ɦɨɞɟɥɢ – ɢɡɛɚɜɢɬɶɫɹ ɨɬ ɜɨɡɦɨɠɧɵɯ ɫɥɭɱɚɟɜ, ɤɨɝɞɚ ɜɟɪɨɹɬɧɨɫɬɶ ɜɟɤɬɨɪɚ Zi ɦɨɠɟɬ ɛɵɬɶ ɩɨɥɭɱɟɧɚ ɨɬɪɢɰɚɬɟɥɶɧɨɣ ɢɡ-ɡɚ ɢɫɩɨɥɶɡɨɜɚɧɢɹ ɩɪɟɞɫɬɚɜɥɟɧɢɹ Ȼɚɯɚɞɭɪɚ ɜɬɨɪɨɝɨ
ɩɨɪɹɞɤɚ. Ɂɧɚɹ ɧɚɢɛɨɥɟɟ ɜɚɠɧɵɣ SNP, ɫɤɚɠɟɦ, ɫ ɧɨɦɟɪɨɦ t, ɦɵ ɡɚɢɧɬɟɪɟɫɨɜɚɧɵ ɜ ɩɨɢɫɤɟ ɤɨɪɪɟɥɹɰɢɣ ɩɚɪ SNP (t, i), i  1, ..., m  1, ɝɞɟ m – ɱɢɫɥɨ ɷɥɟɦɟɧɬɨɜ ɦɧɨɠɟɫɬɜɚ S, ɜɤɥɸɱɚɹ t . ɉɪɢ ɷɬɨɦ ɫɬɟɩɟɧɶ ɤɨɪɪɟɥɹɰɢɢ ɦɟɠɞɭ ɩɪɨɢɡɜɨɥɶɧɵɦɢ ɩɚɪɚɦɢ (j, i), j, i  t ɧɟ ɬɚɤ ɜɚɠɧɚ, ɬɚɤ ɤɚɤ ɧɚɲɚ ɰɟɥɶ –
ɨɩɪɟɞɟɥɢɬɶ, ɤɚɤɢɟ SNP ɫɨɜɦɟɫɬɧɨ ɫ ɧɚɢɛɨɥɟɟ ɡɧɚɱɢɦɵɦ SNP t ɜɥɢɹɸɬ
ɧɚɢɛɨɥɶɲɢɦ ɨɛɪɚɡɨɦ ɧɚ ɡɧɚɱɟɧɢɹ ɮɟɧɨɬɢɩɚ. ɋɨɜɦɟɫɬɧɚɹ ɜɟɪɨɹɬɧɨɫɬɶ ɞɜɭɯ
ɩɟɪɟɦɟɧɧɵɯ, ɫɨɨɬɜɟɬɫɬɜɭɸɳɢɯ ɩɚɪɟ (t, i), ɦɨɠɟɬ ɛɵɬɶ ɨɩɪɟɞɟɥɟɧɚ ɬɨɱɧɨ
(ɫ ɭɱɟɬɨɦ ɨɰɟɧɨɤ ɜɟɪɨɹɬɧɨɫɬɟɣ pt) ɜ ɫɨɨɬɜɟɬɫɬɜɢɢ ɫ ɩɪɟɞɫɬɚɜɥɟɧɢɟɦ Ȼɚɯɚɞɭɪɚ ɜɬɨɪɨɝɨ ɩɨɪɹɞɤɚ ɫɥɟɞɭɸɳɢɦ ɨɛɪɚɡɨɦ:
Pr( X t  xt , X i  xi )  p( xt , xi )  ( ptxt qt1 xt pixi qi1 xi )(1  tjut ut ).
180
ȿɫɥɢ ɬɟɤɭɳɟɟ ɦɧɨɠɟɫɬɜɨ S ɢɦɟɟɬ ɜɢɞ {1, ..., t , ..., m}, ɬɨ ɩɨɥɭɱɚɟɦ m  1
ɜɟɪɨɹɬɧɨɫɬɟɣ p ( xt , xk ). Ɉɞɧɚɤɨ ɜɟɪɨɹɬɧɨɫɬɶ p(S ) ɞɨɥɠɧɚ ɜɵɱɢɫɥɹɬɶɫɹ ɞɥɹ
ɩɨɥɭɱɟɧɢɹ ɨɠɢɞɚɟɦɨɝɨ ɡɧɚɱɟɧɢɹ ɮɟɧɨɬɢɩɚ R( S , i). Ɍɚɤ ɤɚɤ ɬɨɥɶɤɨ ɜɟɪɨɹɬɧɨɫɬɢ p ( xt , xk ) ɢɡɜɟɫɬɧɵ, ɬɨ ɦɨɠɧɨ ɨɩɪɟɞɟɥɢɬɶ ɬɨɥɶɤɨ ɧɟɤɨɬɨɪɵɟ ɝɪɚɧɢɰɵ
ɜɟɪɨɹɬɧɨɫɬɢ p(S ) , ɤɨɬɨɪɵɟ ɦɨɝɭɬ ɛɵɬɶ ɩɨɥɭɱɟɧɵ ɩɭɬɟɦ ɪɟɲɟɧɢɹ ɫɥɟɞɭɸɳɢɯ ɡɚɞɚɱ ɥɢɧɟɣɧɨɝɨ ɩɪɨɝɪɚɦɦɢɪɨɜɚɧɢɹ:
p ( S )  min p ( S ), p ( S )  max p ( S ),
ɩɪɢ ɨɝɪɚɧɢɱɟɧɢɹɯ

X tk {0,1}m  2
p ( X )  p ( xt , xk ), k  1,..., m, k  t ,

X  {0,1}m
p ( X)  1.
Ɂɞɟɫɶ X tk – ɜɟɤɬɨɪ X ɛɟɡ ɷɥɟɦɟɧɬɨɜ, ɫɨɨɬɜɟɬɫɬɜɭɸɳɢɯ ɩɚɪɟ SNP ɫ ɧɨɦɟɪɚɦɢ t ɢ k Ⱦɪɭɝɢɦɢ ɫɥɨɜɚɦɢ, ɦɵ ɮɢɤɫɢɪɭɟɦ ɷɥɟɦɟɧɬɵ ɫ ɧɨɦɟɪɚɦɢ t ɢ k ɜ ɏ
ɢ ɩɟɪɟɛɢɪɚɟɦ ɜɫɟ ɜɨɡɦɨɠɧɵɟ ɡɧɚɱɟɧɢɹ ɞɪɭɝɢɯ ɷɥɟɦɟɧɬɨɜ X. ɑɢɫɥɨ ɫɥɚɝɚɟɦɵɯ ɜ ɥɟɜɨɣ ɱɚɫɬɢ ɨɝɪɚɧɢɱɟɧɢɣ ɪɚɜɧɨ 2m  2.
ɉɟɪɟɣɞɟɦ ɤ ɞɜɨɣɫɬɜɟɧɧɵɦ ɡɚɞɚɱɚɦ. Ⱦɜɨɣɫɬɜɟɧɧɚɹ ɡɚɞɚɱɚ ɞɥɹ ɜɵɱɢɫɥɟɧɢɹ ɧɢɠɧɟɣ ɝɪɚɧɢɰɵ ɜɟɪɨɹɬɧɨɫɬɢ p( S ) ɢɦɟɟɬ ɜɢɞ:
m


p( S )  max c0   ck p( xt , xk )  ,
k 1, k t


ɩɪɢ ɨɝɪɚɧɢɱɟɧɢɹɯ c0  R , ck  0 ,
c0 
c1
m
k 1, k t
k
X t  xt , X k  xk
( X )  1S ( X ), X {0,1}m .
Ɂɞɟɫɶ 1X t  xt , X k  xk ( X ) – ɢɧɞɢɤɚɬɨɪɧɚɹ ɮɭɧɤɰɢɹ, ɩɪɢɧɢɦɚɸɳɚɹ ɡɧɚɱɟɧɢɟ 1, ɟɫɥɢ ɷɥɟɦɟɧɬ ɫ ɧɨɦɟɪɨɦ t ɜɟɤɬɨɪɚ X ɪɚɜɟɧ xt , ɢ ɷɥɟɦɟɧɬ ɫ ɧɨɦɟɪɨɦ k ɪɚɜɟɧ
xk ; 1S ( X ) – ɢɧɞɢɤɚɬɨɪɧɚɹ ɮɭɧɤɰɢɹ, ɩɪɢɧɢɦɚɸɳɚɹ ɡɧɚɱɟɧɢɟ 1 ɬɨɥɶɤɨ ɞɥɹ
ɨɞɧɨɝɨ ɜɟɤɬɨɪɚ X , ɤɨɬɨɪɵɣ ɩɨɥɧɨɫɬɶɸ ɫɨɨɬɜɟɬɫɬɜɭɟɬ ɩɨɞɦɧɨɠɟɫɬɜɭ S. ɇɚɩɪɢɦɟɪ, ɟɫɥɢ ɢɫɤɚɬɶ ɜɟɪɨɹɬɧɨɫɬɶ Pr( X 1  x1 ,..., X m  xm ) , ɬɨ 1S ( X )  1 , ɟɫɥɢ
X  ( x1 , ..., x m ).
181
Ɂɚɞɚɱɚ ɜɵɱɢɫɥɟɧɢɹ ɜɟɪɯɧɟɣ ɝɪɚɧɢɰɵ ɜɟɪɨɹɬɧɨɫɬɢ p( S ) ɢɦɟɟɬ ɜɢɞ:
m


p( S )  min c0   ck p( xt , xk )  ,
k  1, k  t


ɩɪɢ ɨɝɪɚɧɢɱɟɧɢɹɯ c0  R , ck  0 ,
c0 

m
Ⱦɨɤɚɠɟɦ, ɱɬɨ
p(S ) 
min
ck 1 X t  xt , X k  xk ( X)  1S ( X), X {0, 1}m .
k  1, k  t
k  1, ..., m, k  t
m


p( xt , xk ), p( S )  max 0,  p( xt , xk )  (m  2) .
 k  1, k  t

Ⱦɟɣɫɬɜɢɬɟɥɶɧɨ, ɦɨɠɧɨ ɭɜɢɞɟɬɶ, ɱɬɨ ɩɟɪɟɦɟɧɧɚɹ ct ɧɟ ɢɫɩɨɥɶɡɭɟɬɫɹ
ɜ ɞɜɨɣɫɬɜɟɧɧɨɣ ɡɚɞɚɱɟ ɨɩɬɢɦɢɡɚɰɢɢ. Ȼɨɥɟɟ ɬɨɝɨ, ɨɝɪɚɧɢɱɟɧɢɹ ɞɥɹ ɜɫɟɯ ɜɟɤɬɨɪɨɜ X ɬɚɤɢɟ, ɱɬɨ X t  xt ɦɨɝɭ ɛɵɬɶ ɫɜɟɞɟɧɵ ɤ ɨɞɧɨɦɭ ɨɝɪɚɧɢɱɟɧɢɸ c0  0
. Ɉɬɫɸɞɚ ɫɥɟɞɭɟɬ, ɱɬɨ ɷɬɨɬ ɷɥɟɦɟɧɬ ɦɨɠɟɬ ɛɵɬɶ ɭɞɚɥɟɧ ɢɡ ɜɟɤɬɨɪɚ X. ȼ ɢɬɨɝɟ
ɩɨɥɭɱɚɟɦ ɡɚɞɚɱɭ ɨɩɬɢɦɢɡɚɰɢɢ, ɤɨɬɨɪɚɹ ɷɤɜɢɜɚɥɟɧɬɧɚ ɡɚɞɚɱɟ ɩɨɥɭɱɟɧɧɨɣ ɞɥɹ
ɦɚɪɝɢɧɚɥɶɧɵɯ ɜɟɪɨɹɬɧɨɫɬɟɣ p  ( xk ) , k  m , ɬɚɤɢɯ, ɱɬɨ p  ( xk )  p ( xt , xk ) .
Ɉɞɧɚɤɨ ɧɢɠɧɹɹ ɢ ɜɟɪɯɧɹɹ ɫɨɜɦɟɫɬɧɵɟ ɜɟɪɨɹɬɧɨɫɬɢ ɜ ɷɬɨɦ ɫɥɭɱɚɟ ɛɟɡ ɭɱɟɬɚ
ɧɟɡɚɜɢɫɢɦɨɫɬɢ ɦɨɝɭɬ ɛɵɬɶ ɜɵɱɢɫɥɟɧɵ, ɩɪɢɦɟɧɹɹ ɧɟɪɚɜɟɧɫɬɜɚ Ɏɪɟɲɟ
m


p ( xt , xk ).
max 0,  p ( xt , xk )  (m  2)  p ( S ) 
min
k  1, ..., m , k  t
 k  1, k  t

Ɂɚɤɥɸɱɟɧɢɟ. ɉɪɟɞɥɨɠɟɧ ɧɨɜɵɣ ɚɥɝɨɪɢɬɦ ɨɩɪɟɞɟɥɟɧɢɹ ɧɚɢɛɨɥɟɟ ɡɧɚɱɢɦɵɯ SNP-ɦɚɪɤɟɪɨɜ, ɨɬɥɢɱɚɸɳɢɣɫɹ ɨɬ ɫɭɳɟɫɬɜɭɸɳɢɯ ɬɟɦ, ɱɬɨ ɩɨɡɜɨɥɹɟɬ ɭɱɟɫɬɶ, ɩɪɟɠɞɟ ɜɫɟɝɨ, ɤɨɪɪɟɥɹɰɢɸ SNP-ɦɚɪɤɟɪɨɜ. Ⱥɥɝɨɪɢɬɦ ɢɫɩɨɥɶɡɭɟɬ
ɞɪɨɛɧɨ-ɥɢɧɟɣɧɨɟ ɩɪɨɝɪɚɦɦɢɪɨɜɚɧɢɟ, ɤɨɬɨɪɨɟ ɢɦɟɟɬ ɛɨɥɶɲɨɟ ɤɨɥɢɱɟɫɬɜɨ
ɪɟɚɥɢɡɚɰɢɣ ɜ ɪɚɡɥɢɱɧɵɯ ɫɢɫɬɟɦɚɯ ɫɬɚɬɢɫɬɢɱɟɫɤɨɣ ɨɛɪɚɛɨɬɤɢ ɞɚɧɧɵɯ.
ɇɟ ɩɪɢɜɨɞɹɬɫɹ ɪɟɡɭɥɶɬɚɬɵ ɱɢɫɥɨɜɵɯ ɷɤɫɩɟɪɢɦɟɧɬɨɜ ɜɫɥɟɞɫɬɜɢɟ ɨɝɪɚɧɢɱɟɧɢɣ ɧɚ ɪɚɡɦɟɪ ɫɬɚɬɶɢ.
Ɉɫɧɨɜɧɨɣ ɧɟɞɨɫɬɚɬɨɤ ɩɪɟɞɥɚɝɚɟɦɨɝɨ ɚɥɝɨɪɢɬɦɚ ɡɚɤɥɸɱɚɟɬɫɹ ɜ ɬɨɦ, ɱɬɨ
ɝɪɚɧɢɰɵ Ɏɪɟɲɟ ɞɨɫɬɚɬɨɱɧɨ ɲɢɪɨɤɢɟ ɢ ɦɨɝɭɬ ɩɪɢɜɟɫɬɢ ɤ ɧɟɬɨɱɧɵɦ ɪɟɡɭɥɶɬɚɬɚɦ. Ⱦɪɭɝɢɦɢ ɫɥɨɜɚɦɢ, ɫɧɢɠɟɧɢɟ ɫɥɨɠɧɨɫɬɢ ɜɵɱɢɫɥɟɧɢɣ ɤɨɦɩɟɧɫɢɪɭɟɬɫɹ
ɜɨɡɦɨɠɧɨɣ ɧɟɬɨɱɧɨɫɬɶɸ ɪɟɡɭɥɶɬɚɬɨɜ. ɉɨɷɬɨɦɭ ɧɟɨɛɯɨɞɢɦɨ ɜ ɤɚɱɟɫɬɜɟ ɧɚɩɪɚɜɥɟɧɢɹ ɞɚɥɶɧɟɣɲɢɯ ɢɫɫɥɟɞɨɜɚɧɢɣ ɢɫɤɚɬɶ ɦɟɬɨɞɵ ɫɭɠɟɧɢɹ ɝɪɚɧɢɰ ɫ ɢɫɩɨɥɶɡɨɜɚɧɢɟɦ ɞɨɩɨɥɧɢɬɟɥɶɧɨɣ ɢɧɮɨɪɦɚɰɢɢ.
182
Ȼɢɛɥɢɨɝɪɚɮɢɱɟɫɤɢɣ ɫɩɢɫɨɤ
1.
, . . Ɇɚɲɢɧɧɨɟ ɨɛɭɱɟɧɢɟ [ɗɥɟɤɬɪɨɧɧɵɣ ɪɟɫɭɪɫ] : ɤɭɪɫ ɥɟɤɰɢɣ /
Ʉ.ȼ. ȼɨɪɨɧɰɨɜ. Ɋɟɠɢɦ ɞɨɫɬɭɩɚ: http://www.machinelearning.ru/
2. Altidor, W. Ensemble feature ranking methods for data intensive computing
applications [Text] / W. Altidor, T. Khoshgoftaar, J.V. Hulse, A. Napolitano // Handbook of Data Intensive Computing, Springer. N. Y., 2011. P. 349–376.
3. Bahadur, R. A representation of the joint distribution of response to n dichotomous items [Text] / R. Bahadur // Studies in Item Analysis and Prediction, Stanford
University Press. Palo Alto, CA. 1961. P. 158–168.
4. Charnes, A. Programming with Linear Fractional Functionals [Text] / A.
Charnes, W.W. Cooper // Naval Research Logistics Quarterly. 1962. Vol. 9, no. 3–4.
P. 181–196.
5. Hastie, T. The Elements of Statistical Learning: Data Mining, Inference and
Prediction [Text] / T. Hastie, R. Tibshirani, J. Friedman. N. Y.: Springer, 2001.
6. Kohavi, R. Wrappers for feature subset selection [Text] / R. Kohavi, G.H. John
// Artificial Intelligence. 1997. 97, no. 1–2. P. 273–324.
7. Somol, P. Efficient Feature Subset Selection and Subset Size Optimization
[Text] / P. Somol, J. Novovicova, P. Pudil // Pattern Recognition Recent Advances,
InTech, Croatia, China. 2010. P. 75–97.
ɉɪɟɞɥɚɝɚɟɬɫɹ ɧɨɜɵɣ ɚɥɝɨɪɢɬɦ ɨɬɛɨɪɚ ɡɧɚɱɢɦɵɯ ȾɇɄ-ɦɚɪɤɟɪɨɜ, ɨɩɪɟɞɟɥɹɸɳɢɯ ɫɜɨɣɫɬɜɚ ɥɟɫɨɨɛɪɚɡɭɸɳɢɯ ɩɨɪɨɞ. Ⱥɥɝɨɪɢɬɦ ɢɫɩɨɥɶɡɭɟɬ ɦɧɨɝɨɦɟɪɧɨɟ
ɪɚɫɩɪɟɞɟɥɟɧɢɟ Ȼɟɪɧɭɥɥɢ ɞɥɹ ɜɵɱɢɫɥɟɧɢɹ ɫɨɜɦɟɫɬɧɵɯ ɜɟɪɨɹɬɧɨɫɬɟɣ ɪɚɡɥɢɱɧɵɯ
ɩɨɞɦɧɨɠɟɫɬɜ ȾɇɄ-ɦɚɪɤɟɪɨɜ ɞɥɹ ɤɚɠɞɨɝɨ ɧɚɛɥɸɞɚɟɦɨɝɨ ɷɤɡɟɦɩɥɹɪɚ. ȼ ɤɚɱɟɫɬɜɟ
ɰɟɥɟɜɨɣ ɮɭɧɤɰɢɢ ɞɥɹ ɨɩɬɢɦɢɡɚɰɢɢ ɩɨɞɦɧɨɠɟɫɬɜɚ ȾɇɄ-ɦɚɪɤɟɪɨɜ ɢɫɩɨɥɶɡɭɟɬɫɹ
ɦɚɤɫɢɦɭɦ ɪɚɡɧɨɫɬɢ ɨɠɢɞɚɟɦɵɯ ɡɧɚɱɟɧɢɣ ɮɟɧɨɬɢɩɨɜ, ɭɫɥɨɜɧɨ ɪɚɡɞɟɥɟɧɧɵɯ ɧɚ
ɞɜɚ ɤɥɚɫɫɚ. ɍɱɟɬ ɜɨɡɦɨɠɧɵɯ ɤɨɪɪɟɥɹɰɢɣ ɨɫɭɳɟɫɬɜɥɹɟɬɫɹ ɢɫɩɨɥɶɡɨɜɚɧɢɟɦ ɧɟɪɚɜɟɧɫɬɜɚ Ɏɪɟɲɟ ɞɥɹ ɜɵɱɢɫɥɟɧɢɹ ɭɫɥɨɜɧɵɯ ɜɟɪɨɹɬɧɨɫɬɟɣ.
***
The new algorithm of selection of the significant DNA markers defining properties
of forest forming breeds is offered. The algorithm uses the multidimensional Bernoulli's
distribution for calculation of joint probabilities various subsets of DNA markers for
each observed copy. In quality criterion function for optimization of a subset of DNA
markers it is used maximum of a difference of expected values of the phenotypes which
have been conditionally divided on two classes. The accounting of possible correlations
is carried out by use of an inequality of Frechet for calculation of conditional probabilities.
183
Download