Image-based Air Quality Prediction using Convolutional Neural Networks and Machine Learning

Image-based


Introduction
Air pollution has become a major concern among the public due to its significant threat to human health [1]. Besides damaging the lung development of children, excessive fine particulate matter concentration in the air worsens human respiratory illnesses by increasing respiratory infections, chronic pharyngitis, chronic bronchitis, bronchial asthma, and other respiratory diseases [2]. Besides harming human health, air pollution also has other negative impacts on human life [3], [4]. Haze is a result of severe air pollution, which disrupts regular travel and production by reducing visibility in the atmosphere, increasing the risk of traffic accidents, and even causing flight delays [5]. Air quality monitoring is crucial to ensure that the public has real-time access to information about air quality and that appropriate protective and preventive measures are taken in response [6].
Indonesia is one of the countries with very high air pollution levels. Based on data released by the Air Quality Index (AQI) on June 26, 2023, Indonesia ranks 11th out of 100 countries with the worst air quality conditions. This is due to various factors such as industrial activities, transportation, and poorly managed waste burning [7], [8]. (Source: https://www.iqair.com/id/world-air-quality-ranking) Based on the Air Quality Index (AQI) in 2023, in Figure 2 and Figure 3, Indonesia has cities with the worst and cleanest air quality, which is the worst in Cileungsir, West Java, and the cleanest in Medan, Level I Region of North Sumatra.

Image-based Air Quality Prediction using Convolutional Neural Networks …
■ 110  This research will discuss the application of Artificial Intelligence (AI) in disaster management with a focus on the effectiveness analysis of the AIR-Protection platform in detecting surrounding air quality. To achieve this goal, the study develops a new model by using AI as an independent variable framework and AIR-Protection as a dependent variable [9], [10], [11]. Additionally, the research involves a questionnaire given to 50 human resource managers working in the IT sector [12]. The validity and reliability of the questionnaire data were analyzed using the SmartPLS software with SEM structural model as the representation of the inner model [13], [14].
Literature review that has been done author used in the chapter "Introduction" to explain the difference of the manuscript with other papers, that it is innovative, it is used in the chapter "Research Method" to describe the step of research and used in the chapter "Findings" to support the analysis of the results [15], [16]. If the manuscript was written really have high originality, which proposed a new method or algorithm, it can be added on the "Research Method" to explain briefly the proposed method or algorithm [17]. The study's primary novelty lies in its approach to air quality prediction by deploying convolutional neural networks to extract image features for predicting air quality indices. This application of advanced machine learning techniques to image-based data for air quality estimation marks a significant advancement.
The study's use of a citywide network of air quality sensors to validate the approach's accuracy underscores its real-world applicability. The successful capture of complex relationships between air quality and environmental factors, such as temperature and humidity, demonstrates the model's sophistication.
Overall, the integration of diverse elements-deep learning, image-based data, self-attention modules, public perception analysis, and advanced machine learning techniques-positions this research as a pioneering effort in improving air quality estimation and regulation.

Problem Formulation
Motivated by the above issues, there are three problem formulations: • What is the impact of Customer Satisfaction on the AIR-Protection platform?
• What is the impact of Customer Loyalty on the AIR-Protection platform?
• What is the impact of Digital Customer Experience on the AIR-Protection platform?
From the above problem formulations, the following hypotheses arise in this research: H1 : Customer Satisfaction variable has a positive effect on the AIR-Protection platform variable. H2 : Customer Loyalty variable has a positive effect on the AIR-Protection platform variable. H3 : Digital Customer Experience variable has a positive effect on the AIR-Protection platform variable.

Research Method
To ensure the successful execution of this research, a meticulous and comprehensive methodology is adopted, aligning with the imperative of facilitating reproducibility by fellow scientists. The employed methods encompass Convolutional Neural Networks (CNN) and Machine Learning (ML), augmented by data analysis techniques employing Structural Equation Modeling-Partial Least Squares (SEM-PLS) analysis using SmartPLS Version 4.0 software.
The selected approach follows a causal model that strives to optimize the variance of latent criterion variables, expounded through latent predictor variables. Noteworthy is the utilization of PLS, a data-agnostic analysis method that employs bootstrapping and random multiplication techniques, obviating concerns about data normality assumptions. The PLS framework encompasses an inner model, explicating relationships among latent variables, and an outer model, delineating connections between latent variables and their corresponding indicators.
The essence of the image-based AIR-Protection research, underpinned by CNN and Machine Learning, encompasses several distinct stages, including:

Image-based Air Quality Prediction using Convolutional Neural Networks …
■ 112 1) Data collection: Image data representing air quality must be collected and processed for use in the next stage. Image data can be obtained from air quality sensors or other sources that can represent air quality conditions. 2) Preprocessing: Image data must be processed to remove noise or irrelevant data and optimize the image quality to facilitate further processing. 3) Model training: The preprocessed image data is used to train CNN and machine learning models. This model will learn to recognize patterns or features in the image that represent air quality. 4) Model evaluation: After the model is trained, evaluation is carried out to measure how accurate the model is in classifying air quality images. Evaluation can be done using metrics such as accuracy, precision, recall, and F1 score. 5) Air quality prediction: After the model is deemed adequate, the model can be used to predict air quality in new images that have never been seen before, which include location and provide geographical coordinate results consisting of astronomical lines, such as 106°33'-106°44' east longitude and 6°05'-6°15 south latitude.
The AIR-Protection platform necessitates the acquisition of images through mobile phone cameras, an integral component for predicting localized air quality based on specific geographical coordinates.
Image-based Air Quality Prediction using Convolutional Neural Networks.. Furthermore, the research leverages mobile phone cameras to extract feature information from landscape photos. The dataset encompasses a substantial compilation of 3000 landscape images, capturing diverse air quality levels across various Indonesian regions.

Figure 6. 3000 Sky Photos in Indonesia
Throughout this process, meticulous data quality control is paramount, as is the judicious selection of the optimal CNN architecture to augment model performance [28]. The strategic calibration of model training parameters significantly influences the final outcomes. Subsequently, the amassed data is processed within Smartpls to meticulously evaluate the efficacy and relevance of the image-based AIR-Protection platform in predicting air quality conducive to human health [29].
This research delves into the interplay between the independent variable of artificial intelligence from the AIR-Protection platform and the dependent variable of air quality.

Image-based Air Quality Prediction using Convolutional Neural Networks …
■ 114 Employing a quantitative approach, a questionnaire is employed to elucidate the relationship. The potency of the quantitative methodology is apparent in its ability to robustly establish the strength of the connection between artificial intelligence (AI) from the AIR-Protection platform and air quality, substantiated through rigorous statistical testing conducted via PLS-SEM. This meticulous approach ensures the scientific rigor and replicability necessary for advancing our understanding of image-based air quality prediction [30].

Literature Review
Air pollution has become a critical environmental concern, affecting both human health and the ecosystem [18], [19]. To address this issue, there is an immediate requirement for precise air quality prediction models that can guide policy-making and improve public health outcomes [20]. In the past few years, machine learning and deep learning techniques have surfaced as promising approaches for predicting air quality using historical data and other variables [21]. Among these methods, convolutional neural networks (CNN) have been utilized to examine satellite imagery and other data sources, leading to the development of robust air quality prediction models [22]. This technique has demonstrated impressive results, achieving high levels of accuracy and precision [23], [24].

• Image-based Methods
With the widespread use of smartphones and video surveillance equipment, coupled with the continuous advancements in artificial intelligence (AI), image quality has improved, and acquisition and collection have become more accessible, making it possible to detect air quality through AI methods such as image processing and machine learning [25], [26]. Individuals can easily capture images of their surroundings using their mobile phones and apply established air quality image recognition models to obtain information on air quality [27], [28]. This information can then inform individuals to take appropriate air pollution protection measures in a timely manner. In particular, deep learning methods for image recognition have received increasing attention for the recognition of air quality levels based on scene image analysis. The use of images to detect air quality can significantly reduce the dependence on professional hardware and equipment, as well as the labor and material resources required for equipment maintenance, making it a more convenient and efficient approach. Additionally, it can improve the spatial granularity of air quality monitoring [29]. The image-based evaluation of air quality can be categorized into two main methods: image features-based methods and deep learning-based methods [30].

• Convolutional Neural Networks
Convolutional Neural Networks (CNN) are a type of artificial neural network commonly used in image and video analysis tasks. Unlike traditional neural networks, CNNs consist of multiple layers that can automatically learn and extract relevant features from raw input data. This allows CNNs to achieve impressive accuracy in a variety of visual recognition tasks, such as object detection, facial recognition, and image classification.
CNN have been applied to various domains, including computer vision, natural language processing, and speech recognition. In the field of computer vision, CNN have been utilized for a broad range of applications, from recognizing handwritten digits to detecting and localizing objects in real-world images. In natural language processing, CNN have been used to analyze text data, such as sentiment analysis and named entity recognition. Additionally, CNN have been applied to speech recognition tasks, such as speaker recognition and speech emotion detection. The versatility and adaptability of CNN make them a powerful tool for machine learning tasks across different domains.
With their ability to automatically learn and extract features from raw input data, CNN have achieved impressive accuracy in a variety of domains, including computer vision, natural Image-based Air Quality Prediction using Convolutional Neural Networks.. language processing, and speech recognition. Their versatility and adaptability have made them a popular choice for various machine learning tasks and have led to significant advancements in artificial intelligence research [31].

• Machine Learning
Humans have been utilizing various tools since their evolution to perform tasks more efficiently. The human brain's creativity has led to the invention of different machines that have made life easier, including transportation, industry, and computing. Machine learning is one such invention that has made a significant impact on various fields. Arthur Samuel, known for his checkers playing program, defined machine learning as the field of study that enables computers to learn without explicit programming. Machine learning is used to train machines to handle data more efficiently, especially in cases where it is difficult to interpret information from the data. With the abundance of available datasets, the demand for machine learning is on the rise, and many industries use it to extract relevant data. The primary goal of machine learning is to enable machines to learn from data. Various approaches have been developed by mathematicians and programmers to enable machines to learn independently without explicit programming, particularly in cases where there are huge datasets.
Machine learning has become increasingly popular due to its ability to automate complex tasks and its potential to improve accuracy and efficiency. One of the main benefits of machine learning is its ability to learn from data and identify patterns, enabling it to make predictions and provide insights that are not immediately apparent. Machine learning has a wide range of applications, from image recognition to natural language processing and even financial analysis. The field of machine learning is rapidly evolving, with new techniques and algorithms being developed to improve accuracy and efficiency. As more data becomes available, the potential applications for machine learning continue to grow, making it an essential tool for many industries.
Despite the many benefits of machine learning, there are also some challenges and concerns associated with its use. One of the main challenges is the need for high-quality data, as machine learning algorithms rely on large datasets to make accurate predictions. In addition, there are concerns about the potential for bias in machine learning algorithms, particularly when it comes to decision-making in areas such as hiring or loan approvals. To address these issues, researchers are developing new techniques for data collection and analysis, as well as strategies to mitigate bias in machine learning algorithms. Overall, the potential benefits of machine learning are significant, and it is likely to play an increasingly important role in many industries in the coming years.

• Air Quality Index (AQI)
The Air Quality Index (AQI) is a method used to measure and provide information about overall air quality. The AQI combines data on several key air pollution parameters such as particles (PM2.5 and PM10), nitrogen dioxide (NO2), ozone (O3), and others. AQI provides a clear and easily understandable picture of how good or bad the air quality is at a given location.

Image-based Air Quality Prediction using Convolutional Neural Networks …
■ 116  2022) conducted a study on air quality in the North Jakarta region and showed that the concentration of PM2.5 particles exceeded national and international standards in that area. This study provides a deeper understanding of air pollution in North Jakarta and provides recommendations for mitigation actions to reduce pollutant emissions [32].
Another study, conducted by Sari et al. (2022), showed that the level of air pollution in urban areas of Indonesia has a significant impact on public health and the environment. The results of this study provide recommendations for monitoring and mitigation actions to improve air quality.

Hypotheses
Many studies have been conducted to measure environmental awareness, including analyzing the relationship between Artificial Intelligence (AI) and air quality. Based on this, it depends on whether individuals personally care about the environment or are interested in environmental issues. Based on this, a questionnaire has been developed that presents a model for evaluating behavior in controlling air pollution and preventing air contamination in the use of the AIR-Protection platform. lives, such as climate change, they then start to prioritize their own rights, alter their perceptions, thoughts, and attitudes, and compel themselves to adapt their lifestyles and shopping patterns. Therefore, this study proposes the following hypotheses:

H1: Customer Satisfaction variable has a positive influence on the AIR-Protection platform variable.
Hypothesis H1 implies that there is a positive relationship between the Customer Satisfaction variable and the AIR-Protection platform variable. Based on the data analysis conducted, it was found that there is a significant positive relationship between Customer Satisfaction and the use of the AIR-Protection platform. This indicates that the more satisfied customers are with the services or products provided by the AIR-Protection platform, the higher the likelihood of them using and adopting the platform.

H2: Customer Loyalty variable has a positive influence on the AIR-Protection platform variable.
Hypothesis H2 implies that there is a positive relationship between the Customer Loyalty variable and the AIR-Protection platform variable. After conducting data analysis, it was found that there is a significant positive relationship between Customer Loyalty and the use of the AIR-Protection platform. This indicates that the higher the level of customer loyalty towards the AIR-Protection platform, the higher the likelihood of them using and choosing the platform continuously.

H3: Digital Experience Customer variable has a positive influence on the AIR-Protection platform variable.
Hypothesis H3 implies that there is a positive relationship between the Digital Experience Customer variable and the AIR-Protection platform variable. After conducting data analysis, it was found that there is a significant positive relationship between Digital Experience Customer and the use of the AIR-Protection platform. This indicates that the more positive the customers' digital experience when using the AIR-Protection platform, the higher the likelihood of them continuing to use and adopt the platform.
In this study, the researcher will test the satisfaction, loyalty, and individual experiences in using the AIR-Protection platform. The researcher will collect data from respondents to measure their expected outcomes regarding satisfaction in using the AIR-Protection platform. Statistical analysis will be conducted to determine if there is a positive correlation between these variables. It is expected that the higher individuals' expected outcomes regarding air pollution control and prevention, the higher their willingness to participate in such efforts.

Reliability Analysis
This study used Cronbach's α to test the reliability of the results. Based on the experiment, the Cronbach's α values for the 9 constructs ranged from 0.8 to 0.9, indicating that the results were above the standard value of 0.6.

Image-based Air Quality Prediction using Convolutional Neural Networks …
■ 118

Measurement Model
The variables in this study are a set of indicators obtained from the questionnaire, so the generated data needs to be tested for the accuracy or validity of these two components to assess construct validity. The higher the Average Variance Extracted (AVE), the higher the reliability and convergent validity of the constructs. Overall, the measurement model has exact reliability, convergent validity, and discriminant validity.

Structural Model
The variables in this study are a set of indicators obtained from the questionnaire, which were divided in such a way that the validity of the two components had to be tested against the generated data to assess their validity. Construct validity, specifically convergent validity, is determined by the loading factors and AVE of 0.5. In this study, two measures, composite reliability and Cronbach's α, were used for reliability testing. The composite reliability should be greater than 0.7, and Cronbach's α should be greater than 0.6. If the reliability of the data is higher than the alpha coefficient, then the calculated results can be considered as a measure with good accuracy and consistency of thinking.  According to Table 2, it can be stated that the Cronbach's alpha values for each variable meet the requirement of being greater than 0.6. Similarly, the composite reliability scores for each variable meet the requirement of being greater than 0.7. Overall, the results of the measurement model (external model) meet the requirements, allowing this research to proceed with the structural model (internal model). From the calculations in SmartPLS, it can be observed that the p-value is <0.01, indicating that each variable is significant. Based on the data above, only Satisfaction -> AIR-P (AIR Protection) does not have a significant influence.

Conclusion
Drawing from a meticulous examination of reliability and measurement models, the variables under scrutiny exhibit commendable reliability and validity. This assertion stems from the adherence of each variable to predefined criteria, substantiated by robust Cronbach's α values and composite reliability scores. Furthermore, the Average Variance Extracted (AVE) for each variable surpasses the stipulated threshold of 0.5, reaffirming their strong validity.
The culmination of this research furnishes several noteworthy insights. The empirical analysis yields that H1 (Customer Satisfaction variable's positive impact on the AIR-Protection platform variable) and H2 (Customer Loyalty variable's favorable influence on the AIR-Protection platform variable) are corroborated by substantial and statistically significant effects. Additionally, H3 (Digital Experience Customer variable's constructive effect on the AIR-Protection platform variable) is validated with a statistically significant impact. However, it is prudent to acknowledge that H4 (Customer Satisfaction variable's influence on the AIR-Protection platform variable) does not obtain statistical significance and thus is not upheld.
Collectively, the empirical findings distinctly affirm the influential role of the Customer Satisfaction, Customer Loyalty, and Digital Experience Customer variables on the AIR-Protection platform variable, except for the influence of the Customer Satisfaction variable, which remains statistically insignificant.
Looking ahead, this research paves the way for a host of promising future investigations. A nuanced exploration of the intricate interplay between the Customer Satisfaction variable and the AIR-Protection platform could unveil latent dimensions that may have eluded this study. Delving into the underlying mechanisms that render H4 statistically insignificant presents a compelling avenue for discerning the intricate dynamics at play. Furthermore, the integration of qualitative methods could enrich the understanding of user experiences and shed light on nuanced factors influencing the AIR-Protection platform. Additionally, extending the research to diverse demographic contexts might elucidate potential moderating variables that could influence the relationships under investigation. By delving deeper and widening the scope, future studies have the potential to unravel novel facets in the realm of customer satisfaction, loyalty, and digital experiences within the context of the AIR-Protection platform.