Skip to content
ManažérskaInformatika.sk

ManažérskaInformatika.sk

vedecký časopis o informatike

Menu
  • Domov
  • O nás
    • O nás
    • O časopise
    • Redakčná rada
    • Indexovanie časopisu
    • Zásady ochrany osobných údajov
  • Časopis MI
    • Časopis MI
      • Ročník 1, 2020, číslo 1
      • Ročník 1, 2020, číslo 2
    • Študentský článok
    • Recenzovaný článok
    • Článok
    • Recenzia
  • Pre autorov
    • Pre autorov
    • Návody a šablóny
    • Publikačná etika
    • Odoslanie príspevku
  • Newsletter
    • Newsletter
    • RSS
  • Kontakt
Menu

A New Approach to Modelling the Nature of Individual Morbidity Using Partial Functional Dependencies

Posted on 13 mája, 202125 augusta, 2022 by admin

Nataliia Melnykova

Abstract

The paper proposes the development of an approach to modelling the nature of individual morbidity based on the Big Data approach. Analysis of large amounts of data requires the definition of groups of attributes that form functional dependencies. However, in real datasets obtained from different sources, important relationships are defined only for a subset of attribute group values. There are relationships, for example, between previously transmitted diseases and the nature of the disease now – such a relationship is established between subsets of values ​​of different tuples and cannot be found in existing methods of searching for hidden data. The authors will call such dependences partial functional dependencies. Accordingly, the level of support for such dependencies is low, which does not allow to use them for further data analysis. At the same time, partial functional dependencies are modified associative rules, but they are executed only for a part of the data and depend on the time factor. The method of finding such dependencies will be based on the modification of the method of associative rules, which allows to reduce the time complexity and to use parallel and distributed mode for calculation.

Keywords: Big Data, associative rules, dependency analysis, Covid`19, partial functional dependencies

Introduction

It is a well-known fact that artificial intelligence really gives us many opportunities to understand and solve many complex problems in the practical and scientific space. The use of artificial intelligence can be useful primarily in systems that are designed to detect, track and predict disease outbreaks. The better we can track the spread of the virus, the more effective and faster we can fight it.

By analysing press reports, social networking platforms, and government documents, artificial intelligence can learn to detect outbreaks faster. It even turns out that such systems already exist and work, for example, the Canadian system for launching BlueDot [1]. The company’s software is designed to protect against the risk of pandemic outbreaks and protects lives by reducing the impact of infectious diseases that threaten human health. The developers claim that their software allows you to convey all the information about the threat of the COVID`19 epidemic within a few hours to the Centres for Disease Control and Prevention or the World Health Organization. The app also helps detect potentially infected people.

Artificial intelligence experts have used machine learning techniques to process online activity, news, health organizations’ reports and media activities to predict the spread of the outbreak in China. As well as the application of the Bayesian approach to predict the number of deaths in the future, using empirical data [5].

An important role is played by the problem of analysing the relationships between the individual characteristics of the patient and the characteristics of dynamics of changes in his condition during treatment and recovery.

The course of diseases, that caused by infections and viruses (even with known treatment prevention schemes) is influenced by various factors, namely [6,9]:

  • variability of strains,
  • nature of interaction,
  • features of the distribution area: climatic conditions, development of infrastructure and connections, quality of medical care, chronic diseases inherent in this area, political situation, etc.

Materials and Methods

The peculiarity of medical data is their hierarchy and networking. Network data include information on comorbidities, allergic reactions, etc. These are the direct or indirect factor, which determines the nature of disease of the individual.  Therefore, to find the hidden dependencies of the data and to determine the nature of the disease of the individual, it is necessary to find not only linear dependencies in the data.

The task of finding dependencies in the data requires the analysis of dependencies between dozens of parameters of the studied process and hundreds of possible sources of influence on this process. Dependencies are nondeterministic, and therefore modelling requires the use of statistical methods for analysing random processes. Often much of the information is hidden from observation or is not monitored. This introduces many difficulties in the process of analysing the collected information.

The modelling process requires data that can be obtained from known statistics, namely:

  • the population of the districts of the region;
  • the population density;
  • the social distance between people;
  • the disease duration;
  • the probability of disease in human contact; mortality rate;
  • the availability of crowded places (supermarkets, churches, pharmacies, markets, construction sites, gyms);
  • the percentage of people who carry the disease asymptomatically;
  • the presence of the procedure of isolation of sick people;
  • the ability to move a person from the area to the regional centre and back;
  • the incubation period
  • and others.

The above factors can negatively affect the conduct, interpretation and generalization of research results, as well as the understanding and interpretation of the phenomenon under study. For stability of result, it is necessary to use ensembles of models, which are easy to parallelize.

The analysis of large amounts of data requires the definition of groups of attributes that form functional dependencies. However, in real data sets obtained from different sources, important relationships are defined only for a subset of attribute group values. ​​For example, between previously transferred disease and the nature of the disease now – such a relationship is established between subsets of values ​​of different tuples and cannot be found by existing hidden methods. We can call such dependences partial functional dependencies. Accordingly, the level of support for such dependencies is low, which does not allow to use them for further data analysis. At the same time, partial functional dependencies are modified associative rules, but they are executed only for a part of the data and depend on the time factor. The method of finding such dependencies will be based on the modification of associative rules. This reduces time complexity and uses parallel and distributed mode for calculation.

As part of the study of the problem, is necessary to:

  • propose the development of already special methods of forming a training set of data and preprocessing of attributes, taking into account the specifics of the content of medical data and environmental data.
  • develop an ensemble of data imputation models on the basis of basic models of various nature as a part of specialized information technology of recovery of the missed data for the automated processing of information.
  • simulate different scenarios of influence by the state at the next stage of forecasting the dynamics of infection.

The formal model of the individual

Therefore, for a formal representation of the individual’s condition, the task of which is to find a personalized assessment and find solutions to improve it, it is necessary to build a formal model of the object, as a model of production expert system, which is usually used to solve this class of problems.

The knowledge base in accordance with the structural scheme of the system of personalized assessment is the selection of a set of rules R [7,12,14]:

                                                                                       (1)

where the production of type

                                                           (2)

The use of associative rules for study dependencies

The basic concepts in the theory of associative rules are subject set and transaction. A thematic set is a non-empty set of elements that can be part of a transaction:

                                                                             (3)
where ik – are the elements included in the subject sets, k = 1..n, n – is the number of elements of the set I.

The database has a certain set of transactions:

                                                                                (4)

where ti – is the corresponding transaction, m – is the total number of transactions.

The concepts of set and associative rule are closely related to another characteristic of an associative rule – trust, which is calculated as the ratio of a set that has both a condition and a consequence (in other words, support for an associative rule) to support a set that has only a condition.

                                                 (5)

MinSupp and MinConf minimum support and validity thresholds are used to determine the significance of the rules, which are usually determined by experts based on their own experience:

                                                                     (6)

                                                                     (7)

Methods for finding associative rules find all associations that meet the constraints of support and confidence. However, this leads to the need to review a large number of associative rules, which it is desirable to reduce in such a way as to analyse only the most important of them.

Methods for finding associative rules find all associations that meet the constraints of support and confidence. However, this leads to the need to review a large number of associative rules, which it is desirable to reduce in such a way as to analyse only the most important of them.

Among the main algorithms for generating associative rules are AIS, SETM, Apriori, AprioriTid, AprioriHybrid [8]. The efficiency and feasibility of using each of them is determined by the structure and scope of the data set for which the search for associative rules, as the basis of these methods lies in the different principles of generation and selection of subject sets – candidates [13,15].

The proposed method

Based on the data analysis, information was collected about patients taking into account the parameters of the individual to determine the presence of diseases that are characterized as a priority.

Dataset is collected using google form https://docs.google.com/forms/d/1o8CMGVZv6BDkw-QIYg2F8VQzqcXxqklomRwXCLOZIcY, is funded by Central European Initiative and is verified by Lviv regional centre Covid`19 resistance. This dataset consists of the following characteristics:

  • Age (categorical): 0-15, 16-22, 23-40, 41-65,>66
  • Gender (categorical): male, female
  • Region (string): Lviv (Ukraine), Chernivtsi (Ukraine), Belarus, Germany, Other
  • Do you smoke (Boolean): yes, no
  • Have you had COVID`19 (categorical): yes, no, maybe
  • IgM level (numerical): [0..0.9) (negative), [0.9..1.1) (indefinite), >=1.1 (positive)
  • IgG level (numerical): [0..0.9) (negative), [0.9..1.1) (indefinite), >=1.1 (positive)
  • Blood group (numerical)
  • Do you vaccinated influenza? (categorical): yes, no, maybe
  • Do you vaccinated tuberculosis? (categorical): yes, no, maybe
  • Have you had influenza this year? (categorical): yes, no, maybe
  • Have you had tuberculosis this year? (categorical): yes, no, maybe

Taken into account 480 responses are presented in dataset.

Sample for training, consisting of states of 30 individuals. For data analysis, it was proposed:

  • grouping of data by patient ID,
  • separation of factors by patient ID,
  • division into factors according to the patient’s condition,
  • separation of factors according to the treatment scheme.

A three-step algorithm was used to recognize the patterns:

  • creating a cluster of patients – to find the behavior of the condition,
  • template construction – to find the sequence of state changes,
  • next run ConditionType – to predict the next condition of the patient.

The cluster of the studied objects was created by means of R, factoextra packages (Fig. 1 shows the results of hierarchical clustering):

Fig. 1: Dendrogoram of hierarchical clustering

Random forest shows the better results. 500 trees are built.

OOB estimate of  error rate: 16.61%

The biggest error is for class 1 (COVID’19 – yes). It can be explained by differences in IgG and IgM representation (data scatter is between 0.00 and 18.00) in different countries. The minimal depth values for all trees in a random forest is given on Fig. 2.

Fig. 2: Distribution of  minimal depth of developed trees

Conclusion

An approach to modelling the nature of individual morbidity based on the Big Data approach is proposed. Analysis of large amounts of data requires the definition of groups of attributes that form functional dependencies.

Explained on real data sets obtained from different sources, important relationships are defined only for a subset of the values ​​of the attribute group.

Dependencies with partial functional dependencies have a low level of support, which does not allow their use for further data analysis, and partial functional dependencies are modified associative rules, but they are executed only for part of the data and depend on the time factor.

A method of finding dependences is proposed, which is based on the modification of the method of associative rules, which allows to reduce the complexity of time and use parallel and distributed mode for calculation.

References

  1. https://root-nation.com/ua/articles-ua/tech-ua/ua-shtuchnij-intelekt-covid-19/
  2. https://qz.com/1803737/chinas-facial-recognition-tech-can-crack-masked-faces-amid-coronavirus/
  3. Perumal, Varalakshmi, et al. “Detection of COVID-19 Using CXR and CT Images Using Transfer Learning and Haralick Features.” Applied Intelligence, Aug. 2020. DOI.org (Crossref), doi:10.1007/s10489-020-01831-z.
  4. Chimmula, V. K. R., & Zhang, L. (2020). Time series forecasting of COVID-19 transmission in Canada using LSTM networks. Chaos, Solitons & Fractals, 135, 109864. https://doi.org/10.1016/j.chaos.2020.109864
  5. Bayes, C., Rosas, V. S. y., and Valdivieso, L., “Modelling death rates due to COVID-19: A Bayesian approach”, arXiv e-prints, 2020.
  6. Some Methods for classification and Analysis of Multivariate Observations”​. Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability. University of California Press. pp. 281–297 ​ .
  7. Shakhovska, Natalya & Kaminskyy, R. & Zasoba, E. & Tsiutsiura, M.. (2018). Association rules mining in big data. International Journal of Computing. 17.25-32.
  8. The personalized approach to the processing and analysis of patients’ medical data. CEUR Workshop Proceedings. – 2018. – Vol. 2255:Proceedings of the 1st International workshop on informatics & Data-driven medicine (IDDM 2018) Lviv, Ukraine, November 28–30, 2018., 103-112
  9. Shakhovska, N., Fedushko, S., Greguš ml., M., Melnykova, N., Shvorob, I., & Syerov, Y. (2019b). Big Data analysis in development of personalized medical system. Procedia Computer Science, 160, 229–234. https://doi.org/10.1016/j.procs.2019.09.461
  10. Shakhovska, N., Fedushko, S., Greguš ml., M., Melnykova, N., Shvorob, I., & Syerov, Y. (2019b). Big Data analysis in development of personalized medical system. Procedia Computer Science, 160, 229–234. https://doi.org/10.1016/j.procs.2019.09.461
  11. Nataliya Boyko, Lesia Mochurad, Iryna Stetsiv, Yurii Kryvenchuk : Modeling of the Information System for Processing of a Large Distilled Data for the Investigation of Competitiveness of Enterprises // Proceedings of the 4th International Conference on Computational Linguistics and Intelligent Systems (COLINS 2020). Volume I: Main Conference Lviv, Ukraine, April 23-24, 2020. рр. 964-978
  12. Nataliya Boyko , Lesia Mochurad, Iryna Andrusiak, Yurii Drevnytskyi : Organizational and Legal Aspects of Managing the Process of Recognition of Objects in the Image // Proceedings of the International Workshop on Cyber Hygiene (CybHyg-2019) co-located with 1st International Conference on Cyber Hygiene and Conflict Management in Global Information Networks (CyberConf 2019), Kyiv, Ukraine, November 30, 2019. – 571-59
  13. Gaudart J, Ghassani M, Mintsa J, Waku J, Rachdi M, Doumbo OK, Demongeot J (2010) Demographic and spatial factors as causes of an epidemic            spread, the copule approach: application to the retro-prediction of the black           death epidemy of 1346. In: 2010 IEEE 24th International conference on           advanced information networking and applications workshops (pp 751–758).
  14. .Huang C, Wang Y, Li X, Ren L, Zhao J, Hu Y, et al. Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China. ​ Lancet.(2020)           395:497–506. doi: 10.1016/S0140-6736(20)30183-5
  15. Pham QV, Nguyen DC, Hwang WJ, Pathirana PN. Artificial intelligence (AI) and big data for coronavirus (COVID-19) pandemic: a survey on the            state-of-the-arts. ​ Preprints. (2020) 2020:2020040383. doi:     20944/preprints202004.0383.v

 

Prihlásenie na odber Newsletteru

Najnovšie články

  • COMPUTERS IN MEDICAL GOWNS: USAGE OF ARTIFICIAL INTELLIGENCE IN MEDICINE
  • ARTIFICIAL INTELLIGENCE AND ITS EVERYDAY USE
  • A NOVEL SCALOGRAM-BASED MODEL FOR HUMAN ACTIVITY RECOGNITION
  • RESEARCH OF METHODS FOR POLLUTION FORECASTING USING TIME SERIES AND NEURAL NETWORKS
  • CLASSIFICATION OF CLOUD TYPES ON SATELLITE IMAGES USING DEEP LEARNING

Kategórie

  • Článok
  • Ročník 1, 2020, číslo 1
  • Ročník 1, 2020, číslo 2
  • Ročník 1, 2021, číslo 1
  • Ročník 1, 2021, číslo 2
  • Ročník 1, 2022, číslo 1
  • Ročník 1, 2022, číslo 2
  • Študentský článok

Tag cloud

antivirus (1) Antivirus software (1) Antivírusový softvér (1) automatizácia (1) bezpečnostná politika (1) Bezpečnosť (7) convolutional neural networks (1) COVID-19 (2) GDPR (8) implementation (3) implementácia (3) industrial espionage (2) innovation (2) inovácia (2) internetová ochrana (1) Internet security (2) lambda architecture (1) malware (1) management (2) manažment (2) obchodné tajomstvo (1) object search (1) online nakupovanie (1) online shopping (1) osobné údaje (2) parallel calculations (1) personal data (2) priemyselná špionáž (2) project (2) project management (2) projekt (2) projektový manažment (2) python (1) remote work (1) Security (6) security policy (1) Signal (1) sociálne siete (1) user (2) užívateľ (2) virus (2) vzdialená práca (1) web Applications (1) webové aplikácie (1) WhatsApp (1)

Archív

  • december 2022
  • jún 2022
  • apríl 2022
  • október 2021
  • máj 2021
  • december 2020

Manažérska informatika

Časopis Manažérska informatika je vydávaný 2x ročne. ISSN 2729-8310. Posledná aktualizácia k 31.12.2020. Posledné vydanie časopisu : Vol. I, No. 1 k 31.12.2020

©2023 ManažérskaInformatika.sk | Built using WordPress and Responsive Blogily theme by Superb
Správa nastavenia COOKIES
Na našich webových stránkach používame súbory cookie, aby sme vám poskytli najrelevantnejšie zážitky pamätaním vašich preferencií a opakovaných návštev. Kliknutím “Súhlasím”, súhlasíte s použitím VŠETKÝCH cookies.
Nastavenie COOKIESSúhlasím
Manage consent

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary
Vždy zapnuté
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Non-necessary
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.
ULOŽIŤ A PRIJAŤ