Automated Detection of Container-based Audio Forgery Using Mobile Crowdsourcing for Dataset Building

This research paper introduces a new approach for detecting digital audio file forgeries, highlighting entrepreneurship spirit through innovation and market adaptability. It utilizes a cyclical system aided by mobile crowdsourcing to collect a comprehensive dataset of smartphone recordings, demonstrating an entrepreneurship approach to leveraging technology for scalable solutions. Digital forgery, an area ripe for entrepreneurship investment, is becoming increasingly accessible, making it challenging to detect manipulated audio files. This new method emphasizes the utility of metadata and file structure analysis and is scalable, reflecting an entrepreneurship mindset in creating adaptable and sustainable solutions. The researchers developed a mobile web-based prototype system to collect diverse audio data and automatically detect forgeries, showcasing their initiative and innovative thinking. They also conducted scenario-based testing to validate the effectiveness of their methodology, a step that underscores the entrepreneurship value of practical validation. This approach has the potential to significantly advance digital forensic practices by enabling broader detection of manipulated audio files, opening avenues for entrepreneurship ventures in digital security. The collected dataset will be made publicly available and serve as a valuable resource for future forensic investigations, encouraging entrepreneurship collaboration and knowledge sharing. The paper lays the groundwork for future research in expanding the scope of digital forensics, fostering technological innovation, and enhancing participatory models for data collection, all of which are essential elements in entrepreneurship ecosystems. This is an open access article under the CC BY 4.0 license.


INTRODUCTION
In today's society, the proliferation of smartphones has made it easier to create, edit, and utilize audio files, thereby increasing the risk of forgery [1,2].Furthermore, with advancements in digital audio editing software and artificial intelligence technologies such as Deep Voice software and deepfakes, it is now possible for laypeople to create forged voice files either by deleting specific parts of digital audio files or by inserting arbitrary content [3][4][5].This study tackles the challenge of detecting digital audio file forgeries using a cyclical approach that requires continuous improvement throughout analyzing and modeling data.By leveraging mobile crowdsourcing, we minimize the resources required for extensive forensic analyses and engage a diverse range ❒ E-ISSN: 2656-8888 | P-ISSN: 2655-8807 of participants in the forensic process.Such forged audio files can cause significant problems when used as critical data such as court evidence [6].In such situations, experts expend significant time and effort to determine whether audio recording files have been tampered with [7].Accordingly, there is a pressing need to develop a method to detect forgeries in digital audio files efficiently.Methods for detecting forged audio files are broadly classified into container-based and content-based forgery detection methods [8].Content-based forgery detection methods, a core task of audio forensics, detect the area forged by audio editing software within the audio content [9].Owing to recent advances in deep learning, numerous studies have leveraged deep learning to detect forged audio files within audio content [4,5], [10][11][12].Deep-learning-based methods detect audio file forgery by training models on large-scale datasets, supporting multimodal inputs, including spectrogram images and text.This approach enables better learning of audio features and more accurate differentiation between original and forged audio.However, acquiring an authentically forged dataset edited by various audio editing software with their functions and encoded with the same encoder as the original recording device presents significant challenges [13].For this reason, current deep learning-based and content-based forgery detection methods have limitations.
On the other hand, container-based forgery detection methods focus primarily on the file structure and metadata alterations occurring during audio file forgery [14].As the factors for this technique can vary depending on smartphone models, operating systems, recording applications(apps), and file formats, various studies have been conducted to address these variables [15][16][17][18][19][20].However, forging the structure and metadata of audio files to appear identical to the original file is possible because of the variety of audio recording apps available for smartphones and their re-encoding capabilities [21,22].Nevertheless, a technique for detecting forgery by comparing the metadata and structure of an audio file with those of an original test audio file is crucial for enhancing the efficiency of digital audio forensics.Based on the systematic characteristics with clear factors for detecting forged files, this approach has the potential to evolve into an automated forgery detection system.However, to the best of our knowledge, no audio dataset exists as original audio files for testing recorded across several smartphone models, various recording apps, and their options, considering that there are approximately 250 smartphone brands globally [23] [24].The absence of such datasets makes it difficult to detect tampering using a container-based method.
This study aims to develop a mobile web-based prototype system by collecting audio files recorded on smartphones, embodying an entrepreneurship approach in technological innovation and market responsiveness.To achieve this objective, we built a cyclical approach for detecting container-based forgery, illustrating entrepreneurship insight in problem-solving and market needs assessment [25][26].This approach extracts elements to determine whether smartphone audio files have been tampered with, leveraging entrepreneurship skills in creating novel solutions [27].Our methodology is based on various experiments conducted using Samsung Galaxy S23 Ultra (Android 14), guided by a comprehensive literature review, demonstrating an entrepreneurship spirit in pioneering research and development [28].To prove the validity of this method, we developed a mobile web-based prototype system capable of collecting audio files recorded on smartphones and automatically detecting forged audio files created using container-based methods, showcasing an entrepreneurship venture in forensic technology [29].Furthermore, this system was validated through scenario-based testing, demonstrating the practicality and robustness of our method for identifying forged audio files, reflecting the entrepreneurship value of product testing and market adaptation [4].This system enables fast and accurate data collection through a simple and easy-to-use user interface, eliminating the need for user-generated annotations, an entrepreneurship achievement in user experience design.The system constructed in this study demonstrates cyclical and self-sustaining characteristics, epitomizing the entrepreneurship goal of creating scalable and adaptable solutions [30].In other words, the detection range was automatically expanded by incorporating audio files from new smartphone models not included in the dataset, an entrepreneurship strategy in continuous improvement and market expansion.
This research is significant because it lays the foundation for an efficient method by building a dataset of audio files and developing an automatic audio forgery detection system based on the container-based method.Furthermore, making this dataset available to the public is expected to be a valuable resource for researchers and students seeking to explore the realms of audio forensics, fostering an entrepreneurship ecosystem in academic and research communities.

LITERATURE REVIEWS
APTISI Transactions on Technopreneurship (ATT), Vol.The analysis of metadata plays a pivotal role in identifying forgeries and alterations within voice recording files.This importance stems from the observation that the editing or manipulating of audio files invariably leads to modifications in their metadata and structure from the state of the original file.Grigoras and Smith [15] examined the metadata structures in MP3, WAV, and WMA audio files, revealing that metadata undergoes alterations during the editing process.They posited that analyzing the file format and metadata is a crucial step in digital audio authentication [31].Similarly, Gangwar et al. [16] discovered that attributes, such as metadata, file signatures, and hex data, are instrumental in pinpointing the origin of a recording file, thereby making metadata a significant indicator for detecting file tampering across various formats.
Furthermore, metadata properties exhibit variability based on the smartphone model used for the recording.Distinctions exist between devices such as Apple iPhones and Samsung Galaxy smartphones and among models from the same manufacturer.Zeng et al. [17] demonstrated that voice recording files vary in their specific file structure, time information, and metadata depending on the iPhone model and operating system.Koenig and Lacey [18] also highlighted a discernible difference in metadata structure and properties when editing and re-encoding a voice recording file on an iPhone's Voice Memos compared to the original file.Park et al. [19] demonstrated that adding log records registered in the device during the creation of audio recordings under these foundational characteristics makes it possible to verify the authenticity of audio recordings generated using the Voice Memos application on the iPhone [32].
M4A files, MPEG-4 Audio, are predominantly used as smartphone audio files.M4A files can encode audio using Advanced Audio Coding (AAC) or Apple Lossless Audio Codec (ALAC).By leveraging the unique file structure of M4A, Michalek [20] introduced a novel method for detecting forgeries and alterations in audio files.Michalek's analysis, conducted on recordings made with the default Samsung Galaxy smartphone app and nine Android recording apps, indicated that editing not only the metadata properties but also the structure of the MPEG multimedia container.Theoretically, upgrading the operating system (OS), even within the same model, could impact audio data encoding, potentially leading to modifications in the M4A file structure.Similarly, updates to recording apps might affect audio data encoding and result in structural changes to the M4A files.However, rapid updates to smartphone operating systems and recording apps limit the scope of these studies, as they may not fully account for the impact of such upgrades [33].
The deviation between the metadata and file structure of the original and altered audio files plays an important role as a clear indicator of forgery.However, the frequent introduction of new smartphone models and the swift release of updates for operating systems and recording applications makes it difficult to conduct a comprehensive and precise analysis of audio file forgery.Consequently, constructing an extensive database that categorizes metadata and file structure characteristics by operating system and recording application versions according to smartphone models is crucial for enhancing audio file authenticity verification [34].

Optimizing Mobile Crowdsourcing for Enhanced Data Collection
Crowdsourcing, a concept introduced by Jeff Howe in 2006, encapsulates the collective effort of soliciting contributions, ideas, skills, and information from a wide audience [35].This methodology harnesses the "wisdom of the crowd" for various purposes, aiding entities such as corporations, academic researchers, and governmental bodies to address specific challenges, foster innovation, and gather niche data [36].Crowdsourcing is particularly beneficial for tasks that require human intellect to be superior to machine capability [37].It is also advantageous for projects where leveraging a vast, heterogeneous crowd can improve efficiency in terms of both time and cost, compared to hiring specialized professionals [38].
The emergence of mobile technology has broadened the scope and effectiveness of crowdsourcing [39].Mobile devices enable active participation without geographical or temporal limitations, enhancing accessibility.Moreover, the growing demand for data, propelled by advancements in artificial intelligence and the widespread availability of sensor-equipped smartphones, has facilitated the compilation of large-scale datasets through mobile crowdsourcing efforts [40-42][43].
Ensuring the efficacy of mobile crowdsourcing requires a thorough evaluation of several key factors, including task design, incentive mechanisms, quality control measures, and security and privacy concerns [44].Incentives such as financial rewards, gamification elements, and social recognition are imperative for motivating extensive participation [45].Implementing robust quality control protocols is vital for maintaining the accuracy and reliability of collected data [46].In response to these challenges, research has shifted towards developing systems that automatically generate annotations from metadata harvested from various sensors within mobile devices, moving away from reliance on user-generated annotations [47][48].
Task architecture must be explicit and targeted to promote contributions through clear definitions, a refined user interface (UI), and appropriate granularity [49], [50].UI has emerged as a pivotal component within mobile crowdsourcing initiatives.A UI that is straightforward and user-friendly and facilitates swift completion of tasks is deemed essential [51].While traditional efforts have emphasized aesthetic design to achieve this, the advent of UI technologies that capitalize on data-related information can significantly diminish user effort and foster UI enhancements, culminating in expedited task processing [52].Such advancements in UI design bolster participant engagement and improve data quality.They also highlight the need for a comprehensive strategy.This strategy should integrate incentives, privacy and security measures, quality assurance, and a focused emphasis on the UI to collect a wide array of smartphone recordings efficiently [53].

Experiments for Container-based Forgery Detection
In this study, we adopted a three-step approach to detect container-based forgery in system design, involving: 1) observing changes in the audio file structure, 2) defining immutable metadata attributes, and 3) noting changes according to recording app settings.We utilized the latest Samsung Galaxy S23 Ultra with Android 14 as the operating system given the popularity of Samsung Galaxy smartphones in Korea.The original file was generated using a Voice Recorder and the default voice recording application on the Samsung Galaxy S23 Ultra [54].Subsequently, leveraging the original file, we produced two forged versions: one rerecorded using the editing function of the built-in Voice Recorder app and the other using the audio editing software, DemoCreator [55] [56].

Immutable Attributes of Metadata
An experiment was conducted using the MediaInfo software [57] to identify immutable elements within the metadata of the recording files.Audio file metadata can be divided into two categories: general and audio metadata.General metadata includes file name, size, format, duration, tagged date, and encoded date.In contrast, audio metadata covers the bit rate, sampling rate, number of channels, recording length, and codec [58] [59].Certain metadata elements, including file name, size, duration, tagged date, and encoded date, may vary based on the recording environment.Conversely, audio files recorded using the same model, operating system, and recording application with identical settings exhibit immutable metadata attributes and display consistent properties and values.As shown in Table 1, the metadata of voice files can be segregated into immutable and mutable attributes, highlighting the distinction between changeable and unchangeable properties [60][61].Figure 1 shows that edited files display significant differences, underscoring metadata modifications when files are re-encoded using various editing tools.Files edited with the built-in voice recording app lacked metadata related to the operating system and file name of the model [62].Conversely, files modified using the DemoCreator Software exhibited changes in the format profile, codec ID, and title values.Additionally, the encoded and tagged dates were eliminated and replaced with properties in the 'Default' and 'Alternate groups,' which previously did not exist.It is essential to verify the presence and sequence of attributes of the original data.Identifying immutable attributes and their consistent values is crucial for authentication purposes [63].

Changes in Audio File Structure
An M4A file features a hierarchical structure with various nested segments known as atoms or boxes [18] [64].Existing research has demonstrated that re-encoding an M4A file leads to structural changes that differ from the original file [65].The purpose of this experiment is to identify changes in the file structure resulting from re-encoding and to verify whether these changes can be restored to the original file structure.To achieve this, we employed a specialized MP4 parser, MP4 Inspector [66], and a hex editor, HxD [67], for an in-depth analysis [68].It is not easy to alter the structure within an M4A file.However, using a hex editor, the structure and name of the boxes in the original audio files can be modified [69].Such changes can lead to errors that may cause the M4A file to become non-functional [70].This issue arises because recordings made with Samsung Galaxy devices' built-in apps employ a 64-bit size in the media data box ('mdat'), in comparison to regular M4A files that use a 32-bit size.Consequently, the latest smartphones support a larger type of M4A file, allowing storage of up to 2 to the power of 64 bytes in the 'mdat' box.However, when re-encoded after editing in the built-in recording app or using DemoCreator, the file is treated as a regular M4A file. Figure 3 compares the media data box types of the original and the forged files.In the original file, the large type of a 'mdat' box is indicated when the 4 bytes immediately preceding the 'mdat' type identifier, represented in ASCII code as '6D 64 61 74, ' are '00 00 00 01'; a regular type of media data box represents any value other than '00 00 00 01.Typically, forged files feature a regular type in their media data boxes.When forged files of a regular type are converted into a large type, the same file structure can be created [71].context, container-based forgery detection methods are expected to be more effective in identifying audio file forgeries [72].

Changes According to Recording App Settings
The built-in recording apps of smartphones provide a range of configurable settings.The Voice Recorder app on the Samsung Galaxy S23 Ultra offers three quality settings for recording: High, Medium, and Low.The High setting uses a bit rate of 256kbps and a sampling rate of 48kHz, the Medium setting defaults to 128kbps and 44.1kHz, and the Low setting defaults to 64kbps and 44.1kHz.Despite these differences, all three options maintain a consistent file structure.The immutable attributes of bit rates, sampling rates, and potential frame rates in the metadata are key to distinguishing among recording quality options.
Moreover, the Voice Recorder app allows users to select from the three modes on the recording screen.The Standard mode was the default voice-recording mode.The Interview mode amplified the voices to record conversations more clearly.Finally, the speech-to-text mode converts speech into text in real-time and displays it on a screen.The metadata of these three modes are influenced by the quality settings; for example, when set to the Medium setting, they follow a 128kbps bit rate and a 44.1kHz sampling rate.However, the file structures of the three modes were not influenced by quality settings.As shown in Figure 4 While the Standard and Interview modes use the large type of a 'mdat' box, the Speech-to-Text mode employs a regular type, indicating higher susceptibility to forgery.Based on the built-in recording app, a single smartphone model could generate nine different recording files.This file information can be used not only as a criterion for identifying tampering but also for automatic annotation when collecting data.

Our Strategies for Collecting Data Based on Mobile Crowdsourcing
In our data collection, we aim to collect audio files from diverse smartphone models, targeting audio files without individual voice recordings to comply with privacy norms.Our mobile crowdsourcing approach, reflecting an entrepreneurship mindset, is underpinned by three principal strategies: (1) Leveraging the motivational power of social contribution as tangible incentives, encouraging participants by highlighting the societal benefits of their contributions, which aligns with entrepreneurship values of community impact and social entrepreneurship.(2) Developing a user interface that automatically generates data-related information through computing technology without requiring user-generated annotations, accommodating technical proficiencies and thereby lowering barriers to participation, demonstrating entrepreneurship acumen in creating accessible, market-driven solutions.And (3) supporting automated quality control to ensure the reliability and integrity of collected data, akin to entrepreneurship standards for product quality and consumer trust.This includes reducing participant errors by aiding automatic annotations using audio file metadata, a reflection of entrepreneurship innovation in process optimization.Furthermore, enhancing the quality ❒ E-ISSN: 2656-8888 | P-ISSN: 2655-8807 of data by allowing users to upload data that already are included in the database is an essential feature of our system, showcasing entrepreneurship foresight in data management and system sustainability.
In this research, our approach to collecting non-voice smartphone recording data through mobile crowdsourcing is comprehensive, integrating motivational incentives, user-centric design, and rigorous quality control, embodying entrepreneurship principles in operational efficiency, customer engagement, and ethical standards.These strategies collectively guarantee efficient data collection, strong participant engagement, and adherence to privacy standards, illustrating a holistic entrepreneurship strategy in research and development.

Building a Cyclical Process
Our system integrates two fundamental operations: the collection of audio file data and authentication.The core idea of this integration is to blur the lines between data collection participants and users.This approach enables users to examine audio files for potential manipulation that could be used against them as harmful evidence, as well as to verify the authenticity of their audio submissions as unaltered evidence.Consequently, it allows users to participate in crowdsourcing activities from both perspectives.
When users upload a file for inspection, the system initiates a verification process.It examines the file's internal metadata to identify specific details such as the model, OS, recording app, and app settings used.If the system does not find a matching original recording in the database, it notifies the user of the findings.Users can then secure their audio data and, in data collection mode, upload it for forgery detection.This approach fosters the development of a self-sustaining and cyclical system, contributing to its ongoing growth.Figure 5 illustrates this cyclical process through a flowchart within our system.The architecture of our system is intricately designed to simultaneously execute two main functions: data collection and forgery detection, through the analysis of metadata and the file structure of audio files.To achieve this, the system is organized into five distinct modules: the User Interface Module, Extraction Module, Analysis Module, Management Module, and Database Module.While not shown here, Figure 6 in the manuscript offers a visual representation of the system's architecture.When users access our service, the server automatically detects the smartphone model and operating system, adjusting the user interface to fit the device specifications.In this mode, uploaded M4A audio files are stored in the repository, triggering the Extraction Module to annotate the files with metadata and file structure information before automatically forwarding them to the database.To further mitigate technical complexity for users, this module introduces a guided process for new users through tutorials, ensuring they can navigate the system effortlessly.Additionally, real-time support and FAQ sections will be incorporated, providing immediate assistance to users encountering difficulties.
Conversely, Authentication Mode is initiated when a file is uploaded for forgery detection.The Extraction Module retrieves metadata and file structure details from the data stored, searching the database for records that match the smartphone model, the closest operating system version, and identical settings.The Analysis Module then evaluates the extracted data against the information in the database to ascertain authenticity, with the findings presented via the Authentication Mode interface.To enhance user understanding and trust in the forgery detection process, detailed explanations of the analysis process and criteria used are made accessible.This transparency aims to demystify the system's operation for non-technical users.Additionally, our system features a Management Module, essentially a dashboard for administrators to monitor and manage collected data.This module offers tools for visualizing statistical data and generating graphs, facilitating effective data management and analytical insights.

Implementation
This prototype system has been developed as a mobile web-based system using the Flask framework, with the authors building the strategies for future scalability.Our web-based application is to be containerized because it can be horizontally scaled based on Kubernetes in the future.The choice to be created as a mobile web system was driven by the dataset's independence from smartphone sensor data, prioritizing user accessibility above all else.The primary objective during the data collection phase was to enable participants to upload data swiftly and with minimal errors, thereby enhancing the quality of the data collected.
When participants access the mobile web interface, as illustrated in Figure 7(b), they can immediately verify their smartphone model and operating system.This is achieved by parsing the User-Agent string in the HTTP request headers from the web browser to the server, which extracts details about the smartphone model and OS.As shown in Figure 7(d), information on the recording app settings is provided to participants through an automatic metadata check, ensuring the process is completed successfully.This approach eliminates the need for manual annotation by participants, as annotations are automatically generated, ensuring the verification and authentication of the audio files.For metadata extraction, we used a Python wrapper for the MediaInfo library, pymediainfo 6.1.0[73], and for extracting file structure information of M4A files, we utilized construct 2.10.70 Figure 8 illustrates the forgery detection workflow in our prototype system.The process for detecting container-based forgery is user-friendly: participants need only select and upload a recording file to initiate analysis and receive results.To ensure privacy, uploaded audio files are automatically deleted immediately following the analysis.By selecting the 'Check detail results' button, users can access in-depth comparisons with the original audio files.Should the comparison audio file be absent from the database, the system's analysis of the uploaded file's metadata will reveal crucial details about the original file, such as the smartphone model, operating system, and recording application utilized.Furthermore, users can transition to data collection mode by clicking the 'Go to Upload' button, facilitating seamless navigation within the system.

Scenario-based Testing and Validation
To validate the proposed system's capabilities effectively, we conducted scenario-based testing by simulating realistic conditions with various recording process variations.This approach not only demonstrated the system's efficacy but also provided valuable feedback for improving the user interface's quality.In light of the increasing use of voice recording files as evidence in Korean courts and the rising allegations of forgery, we designed a scenario to investigate the potential forgery of digital audio files in legal disputes.The initial stage of developing the legal dispute scenario involved identifying two key personas: an individual committing the forgery and another responsible for detecting it.This foundational step was essential for the detailed elaboration of the scenario, which we systematically divided into two main phases: the forgery and its subsequent detection.We simulated the act of forgery using the Voice Recorder application on a Samsung Galaxy Z Flip3 with Android 14.Table 2 shows the forgery scenario procedure.The forgery scenario was composed of actions and software used according to the process of audio file forgery.Submit the forged evidence to court -*Easy Voice Recorder is recognized as the most popular app in the Google Play store [36][75].
The detection phase outlines the challenges and necessary processes for identifying forgeries in audio files, that are notoriously time-consuming and costly.These limit the general public's access to specialized services for detecting voice file forgeries.Table 3 presents the workflow of forgery detection in our system within the framework of a legal dispute.Verify why the evidence has been manipulated Authentication Submission Item 5 -Even though the metadata of the original and forged voice files were identical, detection was possible due to differences in the M4A file structure.In legal contexts, the Korean judiciary relies on the Supreme Court precedent which states: "No signs of editing can be found in the copy of the recording file, and the file information and recording frequency band of this case's recording file copy are the same as those generated by the above digital recorder" [21].Thus, analysis results from our system identifying such discrepancies will be accepted as evidence in court.The application of our system in these simulated scenarios reaffirmed its practicality and robustness.Scenario-based testing has demonstrated our system's capability to differentiate between original and forged files.

5.
CONCLUSIONS Our research introduces a novel and cyclical methodology for detecting digital audio file forgeries, emphasizing container-based forgery detection and complementing it with mobile crowdsourcing for data collec-

Figure 2
Figure2illustrates the M4A file structure comparison between the original and the forged files.The original M4A file encoded by a built-in app reveals the absence of the 'meta' box in the M4A file edited and re-encoded by the built-in recording app.Additionally, the re-encoding process changed sub-boxes within the 'stbl' box.Converting to M4A by DemoCreator generated a 'free' box, absent in the original file, and generated two 'udta' boxes, along with changes within the 'stbl' box.

Figure 3 .
Comparison of M4A file structure: (a) Original; (b) Edited by a built-in app; (c) Edited by DemoCreator Manipulating the file structure after editing the original file presents a challenge for the latest Samsung Galaxy smartphones.To achieve this level of forgery, it is necessary to utilize an encoder capable of supporting a large type of a 'mdat' box or manually converting it into a large type.Considering that most audio editing software and converters are designed for a regular type of M4A file and manual conversion requires a deep understanding of the MP4 file format, making direct changes is nearly impossible for the general public.In this APTISI Transactions on Technopreneurship (ATT), Vol. 6, No. 1, March 2024, pp.119-135 APTISI Transactions on Technopreneurship (ATT) ❒ 125

Figure 4 .
, the three modes have different file structures.Unlike the Standard mode, the Interview mode and the Speech-to-Text mode contain a 'metd' box and an 'sttd' box within the user data box ('udta') respectively.Comparison of M4A file structure: (a) Original; (b) Edited by a built-in app; (c) Edited by DemoCreator

Figure 5 .
Figure 5. Cyclical procedure based on flowchart

❒Figure 7 .
E-ISSN: 2656-8888 | P-ISSN: 2655-8807 [74], a library aiding in binary parsing.The extracted data are then stored in a JSON database.Upon completing the upload, participants who wish to use the ranking feature are encouraged to sign up.UI of data collection mode: (a) Introduction; (b) Selection; (c) Upload; (d) Results

APTISI
Transactions on Technopreneurship (ATT), Vol. 6, No. 1, March 2024, pp.119-135 APTISI Transactions on Technopreneurship (ATT) ❒ 129 Advancements in Container-based Forgery DetectionMetadata is essentially structured data that provides detailed descriptions of other data entities.In the context of voice recording files, metadata encompasses fundamental attributes, such as file name, type, size, creation date and time, and last modification date and time, alongside audio-specific characteristics, such as sample rate, bit rate, codec, number of channels, and audio length.

Table 1 .
Classification of immutable and mutable attributes in M4A file metadata

Table 3 .
Forgery detection procedure in our system