Skip to content
ManažérskaInformatika.sk

ManažérskaInformatika.sk

vedecký časopis o informatike

Menu
  • Domov
  • O nás
    • O nás
    • O časopise
    • Redakčná rada
    • Indexovanie časopisu
    • Zásady ochrany osobných údajov
  • Časopis MI
    • Časopis MI
      • Ročník 1, 2020, číslo 1
      • Ročník 1, 2020, číslo 2
    • Študentský článok
    • Recenzovaný článok
    • Článok
    • Recenzia
  • Pre autorov
    • Pre autorov
    • Návody a šablóny
    • Publikačná etika
    • Odoslanie príspevku
  • Newsletter
    • Newsletter
    • RSS
  • Kontakt
Menu

RESEARCH OF METHODS FOR POLLUTION FORECASTING USING TIME SERIES AND NEURAL NETWORKS

Posted on 10 decembra, 202222 januára, 2023 by admin

Roman Sulymka, Lviv Polytechnic National University, Lviv, Ukraine

Dmytro Fedasyuk, Lviv Polytechnic National University, Lviv, Ukraine

Abstract. The paper presents the results of pollution forecasting using ARIMA and LSTM models based on time series and neural networks. Features of the use of ARIMA and LSTM models for air pollution forecasting have been revealed. The LSTM method was found to show better pollution prediction results using data sets with a larger number of records. The result of the research is an analysis of ARIMA and LSTM forecasting methods and developed software using time series and neural networks.

Keywords: pollution forecasting, time series, neural networks, ARIMA, LSTM.

I. Introduction.

Pollution is one of the most widespread environmental problems. Due to the large number of factories that emit harmful substances, they settle in the soil and pollute the environment, but there are still no software tools for predicting pollution [1].

However, there are many algorithms and models that can be used to construct a pollution forecast.

Accordingly, it was decided to conduct a study of pollution forecasting models using time series and neural networks.

It was decided to explore the ARIMA and LSTM methods for forecasting.

II. The relevance of environmental pollution forecasting

Currently, there are many enterprises in Ukraine that emit a large number of harmful substances into the air and wastewater due to their activities. One of the main sources of chemical soil pollution is chemicals used in agriculture, atmospheric precipitation in the radius of industrial enterprises, especially chemical and metallurgical ones, mining of minerals, mineral fertilizers, etc [2]. A significant part of the sources of soil pollution has a local effect, but some of them act on a regional and even global scale, especially in the case of pollution due to precipitation or due to the use of fertilizers on large areas of land [2].

After conducting a detailed analysis of existing systems, many software tools were found that allow forecasting of air and water pollution, but there are no applications for analysis and forecasting of soil pollution.

The purpose of this research is the following main tasks:

  • conduct a review of existing information tools and literary sources on pollution forecasting;
  • conduct an analysis of existing methods that are popular in pollution forecasting;
  • develop software for predicting soil contamination using time series and neural networks;
  • provide the neural network with real data;
  • train a neural network and determine the best model for pollution prediction.

III. Analysis of literary sources

ARIMA method is based on time series, which includes methods of data analysis in order to extract significant statistics and other data characteristics [2]. Time series forecasting is the application of a model to predict future values based on previously observed values [3].

The ARIMA method is an integrated moving average autoregressive model, a form of regression analysis that measures the strength of one dependent variable relative to other variables. The method’s purpose is to predict future data by studying the difference between the values in the series instead of the actual values [2].

ARIMA algorithm

  1. Load input data.
  2. Check whether the time series is stationary, if it is stationary, then go to step 4. If the time series is not stationary, then its difference of order d is found, which is a stationary series [3].
  3. Put d = 0.
  4. Plot the graphs of the autocorrelation function and the partial correlation function to determine the input parameters of the ARIMA model.
  5. Using graphs of the autocorrelation and partial correlation functions, determine the values of p and q for the ARIMA model.
  6. Fit parameters p and q to identify the ARIMA model.
  7. Predict values on a test time series of future values.
  8. Calculate the root mean square deviation to compare predictions and actual values.

The long-short-term memory (LSTM) neural network model is the most popular neural network framework for time series forecasting, which is designed to address the problem of long-term dependencies.

The structure of the model with long short-term memory resembles a chain, it contains four layers of a neural network that interact with each other in a special way [4]. A special feature of LSTM is that the LSTM model stores information over long periods of time. Its advantages include the fact that there are fewer restrictions and assumptions for a neural network, this model is able to process complex nonlinear dependencies in a time series, and has high forecast accuracy and the possibility of automation [5]. The disadvantages include the fact that this method has a low interpretation, and a lot of data is required for an accurate prediction.

Algorithm of LSTM model for the construction of pollution forecast.

  1. Define Network: We will construct an LSTM neural network with a 1 input time step and 1 input feature in the visible layer, 10 memory units in the LSTM hidden layer, and 1 neuron in the fully connected output layer with a linear (default) activation function.
  2. Compile Network: We will use the efficient ADAM optimization algorithm with the default configuration and the mean squared error loss function because it is a regression problem.
  3. Fit Network: We will fit the network for 1,000 epochs and use a batch size equal to the number of patterns in the training set. We will also turn off all verbose output.
  4. Evaluate Network. We will evaluate the network on the training dataset. Typically we would evaluate the model on a test or validation set.
  5. Make Predictions. We will make predictions for the training input data. Again, typically we would make predictions on data where we do not know the right answer.

IV. Implementation of pollution forecasting using ARIMA and LSTM.

The dataset contains 1000 records of atmospheric air pollution from 2010 to 2014, with data collected hourly (table 1) [6].

Table 1.

Air pollution input data [4].

Date

Pollution

Dew

Temperature

Atmospheric pressure

Wind speed

02/01/2010 00:00

129

-16

-4

1020

1.79

02/01/2010 01:00

148

-15

-4

1020

2.68

02/01/2010 02:00

159

-11

-5

1021

3.57

02/01/2010 03:00

181

-7

-5

1022

5.36

02/01/2010 04:00

138

-7

-5

1022

6.25

02/01/2010 05:00

109

-7

-6

1022

7.14

02/01/2010 06:00

105

-7

-6

1023

8.93

02/01/2010 07:00

124

-7

-5

1024

10.72

02/01/2010 08:00

120

-8

-6

1024

12.51

02/01/2010 09:00

132

-7

-5

1025

14.3

As a result of the constructed forecast for 1000 days using the ARIMA model, the following results were obtained (Fig. 1).

Fig. 1. Pollution forecast using the ARIMA method.

The result of forecasting pollution using the LSTM model is shown in Fig. 2.

Fig. 2. Forecasting pollution using the LSTM method.

An excerpt of the received forecast results is shown in table 2.

Table 2.

The results of the air pollution forecast using ARIMA and LSTM models.

DateReal valuesARIMA forecastLSTM forecastARIMA accuracy percentageLSTM accuracy percentage
02/01/2010 00:00138190.463533180.79177938%31%
02/01/2010 01:00109142.876833134.66302531%24%
02/01/2010 02:00105113.50899112.3991248%7%
02/01/2010 03:00124107.3276107.89436313%13%
02/01/2010 04:00120126.869533122.343016%2%
02/01/2010 05:00132123.029785118.0997017%11%
02/01/2010 06:00140135.957285128.7837523%8%
02/01/2010 07:00152144.881011135.0955055%11%
02/01/2010 08:00148158.088652145.4824687%2%
02/01/2010 09:00164153.853202141.9571696%13%
02/01/2010 10:00158171.703913158.6027539%0%
………………
The mean error   26%20%

As can be seen in the table. 2, using a dataset of 1000 records, the LSTM model performed significantly better, with an average error of 20%, while the ARIMA model had an average error of 26%. This is because neural networks are better at making predictions based on large data sets.

Based on the obtained forecast results, it can be concluded that the ARIMA model is better suited for short-term forecasts, or forecasts with a small amount of input data. The LSTM model should be used in cases where the input data set consists of a large number of records and when the prediction is long-term because neural network-based models require a large amount of input data for training.

V. Conclusion

Using the air pollution data set, the prediction result using the LSTM model turned out to be more accurate, while the ARIMA model showed lower accuracy, the error of the LSTM model was 20%, the average accuracy of the ARIMA model was 26%.

The performed forecasting analysis showed that for data with a large sample, the LSTM model based on neural networks shows better forecasting results than the ARIMA model.

References

  1. 1.               Omran E.-S. E. Environmental modelling of heavy metals using pollution indices and multivariate techniques in the soils of Bahr El Baqar, Egypt. Modeling earth systems and environment. 2016. Vol. 2, no. 3. URL: https://doi.org/10.1007/s40808-016-0178-7 (date of access: 28.11.2022).
  2. Statistical approaches for forecasting primary air pollutants: a review / K. Liao et al. Atmosphere. 2021. Vol. 12, no. 6. P. 686. URL: https://doi.org/10.3390/atmos12060686 (date of access: 28.11.2022).
  3. Ye Z. Air pollutants prediction in shenzhen based on ARIMA and prophet method. E3S web of conferences. 2019. Vol. 136. P. 05001. URL: https://doi.org/10.1051/e3sconf/201913605001 (date of access: 28.11.2022).
  4. Spatiotemporal prediction of air quality based on LSTM neural network / D. Seng et al. Alexandria engineering journal. 2021. Vol. 60, no. 2. P. 2021–2032. URL: https://doi.org/10.1016/j.aej.2020.12.009 (date of access: 02.11.2022).
  5. Statistical approaches for forecasting primary air pollutants: a review / K. Liao et al. Atmosphere. 2021. Vol. 12, no. 6. P. 686. URL: https://doi.org/10.3390/atmos12060686 (date of access: 28.11.2022).
  6. Toxicity criteria database – catalog. Dataset – Catalog. URL: https://catalog.data.gov/dataset/toxicity-criteria-database-22828 (date of access: 28.11.2022).

Prihlásenie na odber Newsletteru

Najnovšie články

  • COMPUTERS IN MEDICAL GOWNS: USAGE OF ARTIFICIAL INTELLIGENCE IN MEDICINE
  • ARTIFICIAL INTELLIGENCE AND ITS EVERYDAY USE
  • A NOVEL SCALOGRAM-BASED MODEL FOR HUMAN ACTIVITY RECOGNITION
  • RESEARCH OF METHODS FOR POLLUTION FORECASTING USING TIME SERIES AND NEURAL NETWORKS
  • CLASSIFICATION OF CLOUD TYPES ON SATELLITE IMAGES USING DEEP LEARNING

Kategórie

  • Článok
  • Ročník 1, 2020, číslo 1
  • Ročník 1, 2020, číslo 2
  • Ročník 1, 2021, číslo 1
  • Ročník 1, 2021, číslo 2
  • Ročník 1, 2022, číslo 1
  • Ročník 1, 2022, číslo 2
  • Študentský článok

Tag cloud

antivirus (1) Antivirus software (1) Antivírusový softvér (1) automatizácia (1) bezpečnostná politika (1) Bezpečnosť (7) convolutional neural networks (1) COVID-19 (2) GDPR (8) implementation (3) implementácia (3) industrial espionage (2) innovation (2) inovácia (2) internetová ochrana (1) Internet security (2) lambda architecture (1) malware (1) management (2) manažment (2) obchodné tajomstvo (1) object search (1) online nakupovanie (1) online shopping (1) osobné údaje (2) parallel calculations (1) personal data (2) priemyselná špionáž (2) project (2) project management (2) projekt (2) projektový manažment (2) python (1) remote work (1) Security (6) security policy (1) Signal (1) sociálne siete (1) user (2) užívateľ (2) virus (2) vzdialená práca (1) web Applications (1) webové aplikácie (1) WhatsApp (1)

Archív

  • december 2022
  • jún 2022
  • apríl 2022
  • október 2021
  • máj 2021
  • december 2020

Manažérska informatika

Časopis Manažérska informatika je vydávaný 2x ročne. ISSN 2729-8310. Posledná aktualizácia k 31.12.2020. Posledné vydanie časopisu : Vol. I, No. 1 k 31.12.2020

©2023 ManažérskaInformatika.sk | Built using WordPress and Responsive Blogily theme by Superb
Správa nastavenia COOKIES
Na našich webových stránkach používame súbory cookie, aby sme vám poskytli najrelevantnejšie zážitky pamätaním vašich preferencií a opakovaných návštev. Kliknutím “Súhlasím”, súhlasíte s použitím VŠETKÝCH cookies.
Nastavenie COOKIESSúhlasím
Manage consent

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary
Vždy zapnuté
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Non-necessary
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.
ULOŽIŤ A PRIJAŤ