Analysis of news and social media for identify propaganda in armed conflicts/wars

Veronika Kovalisko, 4th-year student of the Department of Artificial Intelligence Systems, Lviv Polytechnic National University

This work aims to develop a system for detecting and analyzing disinformation and propaganda spread through the media in armed conflicts and wars, which uses machine-learning algorithms for such tasks:

Identification of propaganda content: Development of methods that automatically recognize materials aimed at manipulating the audience and creating a deceptive image of events in armed conflicts.
Creating a fake news detection system: Developing tools to identify and classify false information that can be spread during military conflicts.
Analyzing the impact of disinformation on public opinion: Researching the impact of propaganda content and fake news on the beliefs and behavior of users of social media and news platforms.

To summarize, the aim of the work is to develop tools that can improve information security and objectivity of information in the context of armed conflicts.

The subject of the study is the methods and tools for analyzing information content disseminated during armed conflicts and military events through various media resources, including news sites and social media platforms.

In addition, the subject of the study analyses the impact of this information on the public perception of events and determines the objectivity of information related to military conflicts and crises.

The relevance of the study arose due to the rapid growth of propaganda content that is widely disseminated through social media and news resources during armed conflicts and military events. This problem threatens information security and the veracity of information, which can have serious consequences for society.

Developing a system for analyzing and identifying such information is becoming an urgent task to ensure information security and preserve the objectivity of public opinion during crises.

Thus, this research area is of great importance for society and political processes, as it can help raise public awareness and make the information space more objective and reliable.

Now, I propose to take a closer look at the concept of “propaganda” as a phenomenon and identify specific features that can distinguish ordinary news from propaganda.

Propaganda is the dissemination of political, religious, philosophical, scientific, or other views by conveying to the masses various arguments, true or half-true facts, rumors, or outright lies to manipulate public consciousness [1].

The main features of propaganda [2]:

That is, it is not about an accidental mistake or inaccuracy, but rather a tactic aimed at changing the system of values and behavior.
Ready-made answers. Propaganda can take different forms and use different means, but its power lies in the proposed ready-made answers.
Influence on the attitude to certain phenomena or groups of people – this attitude is always contrasting and highly emotional.
Partially based on truthful information: but it is not always the whole truth or a lie that pretends to be the truth (trust is replaced by blind faith).

In connection with recent events, the word “propaganda” has finally acquired a negative connotation in the international community. This is due to global events in which political leaders are using propaganda techniques to manipulate the consciousness of the masses, which has led to many military conflicts.

The words “propaganda” and “manipulation” are often used synonymously, but this is not entirely correct. Propaganda always aims to get you to behave in some way:

to exceed performance targets at work,
fight for the state or faith,
support a party or government.

But there can also be such “positive directions” as:

wash your hands before lunch,
not using drugs.

In other words, propaganda can be quite in line with your interests and needs. It is not always something dishonest or harmful to you. And it does not necessarily involve lies [3].

However, regardless of the intentions of propaganda, it is important to be able to see it in order not to allow yourself to be manipulated and to be able to formulate your vision of the world.

The implementation of the bachelor’s project will include the following steps:

In-depth literature review: a detailed analysis of research and literature related to the detection of propaganda and fake news during armed conflicts;
Study of analysis methods: mastering modern methods of analyzing text, images, and the impact of disinformation;
Data collection and processing: collecting and processing information from various sources, such as news resources and social media;
Data labeling and classification: Identifying and classifying data to create the training set to be used for training machine learning models;
Development of machine learning algorithms: creating and optimizing algorithms to effectively detect propaganda content and fake news;
Use of natural language processing and computer vision techniques: application of NLP techniques for text analysis and computer vision for image processing for more accurate and complete content analysis;
Expanding data sources: searching for and integrating new data sources to obtain more complete information;
Working with real-time: developing a system that can analyze and respond in real-time to quickly identify propaganda as it develops;
Ensuring security and privacy: implementing security measures to protect the data and privacy of system users;

Development of visualization and interface: creating an interface for visualizing the results and easy access to analytical data for users.

To reach all set targets, we must choose a way of creating a model for our service. There are two ways to do it write your own model or pick some already created, both have their pros and cons. Let’s dive into it.

Creating your own models gives you:

personalization for a specific task: Custom models can be built to meet specific project requirements and features. For example, you can train the model on your own dataset;
better adaptation to a specific language style: If your project requires analyzing text from a specific language style or field, a proprietary model can better adapt to it.

Choosing an exited model gives us also many opportunities, such as:

efficiency and accuracy: Models that have been pre-trained on large amounts of data can have impressive efficiency and accuracy as they learn a wide range of language interactions and patterns;
saving time and resources: There is no need for full training: Using off-the-shelf models reduces the need to spend time and computing resources on full model training. This is especially important if you have a limited budget or equipment.

It is important to mention the cons of both methods. If you choose to create your own model, you will face:

the need for significant amounts of data: creating a custom model that is effective can require a significant amount of training data, which can be difficult to secure;
high cost of computing resources: the process of training a custom model can be resource-intensive, which can be high for high-performance hardware;
complexity and time: complex and time-consuming process – developing a custom model can be labour-intensive and time-consuming, especially if you are not experienced in deep learning.

If you choose another way, some other problems will wait for you such as:

general assumptions: off-the-shelf models can be trained on a wide range of data, making them less accurate for specific tasks;
data confidentiality issues: the use of off-the-shelf models may require the transfer of sensitive data to external servers for processing, which may raise security and privacy issues.

So to summarize all of it the choice between off-the-shelf and custom models depends on:

the amount of data: Off-the-shelf models may require less data to train, and custom models may require significantly more data to achieve high accuracy;
time and resources: Using off-the-shelf models can save time and resources, as training your own model can be costly;
specific project needs: If you have specific requirements that are not met by off-the-shelf models, building your own may be the best choice.

In my case, I decided to use the existing model, as in the world there are plenty of specialists who create well-trained models for work with text, such as BERT (Bidirectional Encoder Representations from Transformers) or well-known GPT. Using these tools will help to save a large amount of time which can be used for preparing data and creating a user-friendly app, but it is not as great as it looks for the first time. The big problem is language because all train data will be not in English which leads to a need for translation.

References