In this work, we proposed a smart-analysis system of sound events for smart/intelligent environments based on an autonomic cycle of data analysis tasks.
We propose an autonomic cycle of data analysis tasks. An autonomic cycle of data analysis tasks is a set of data analysis tasks that supervise and control a process anonymously, which are based on knowledge models (of prediction, recognition, etc.), interacting with each other to reach a common goal. Each task has a different function in the cycle: observation of the process, or its analysis, or decision-making.
This work presents the autonomic cycle. With its components, this autonomic cycle detects sound information using a taxonomic model of the sound events to analyze them and give a recommendation about the context. The taxonomic model is a hierarchical pattern that considers different aspects to recognize the sound events. This work defines the architecture of this autonomic cycle, specifies its machine-learning-based analysis tasks and evaluates its capabilities of reasoning, adaptation and communication in case studies.
It is important to work in the future on the improvement of the accuracy of the system by implementing neural networks or more sophisticated techniques. To take the implemented autonomic cycle to a higher level, it could use parallel function management. The automation also needs improvement. In addition to that, future works are going to be directed not only to sound events but also to include emotion recognition and its relation with sound events happening simultaneously.
The main contributions of this paper are as follows: the detailed description of the intelligent sound analysis (ISA) autonomic cycle for the smart sound analysis of sound events (SAS-SE) in an intelligent environment (IE); the specification of the machine-learning-based analysis tasks of ISA for the smart sound analysis and the development of a case study that settles the use of the system in different IEs.
1. Introduction
THE monitoring process in intelligent environment (IE) is mostly done with sensors and video data. In this work, the process considers only the use of audio sensors as proposed in a previous work [1]. The acoustic waves traveling in an IE can provide information and data about what occurs in that environment (e.g. activities, occupation, among other things), and that information can be used to improve the experience of the users in an IE from the acoustic perspective. In general, as explained in Ref. [2], the detection of sound events can be used in environments in various ways.
Particularly, ReM-AM needs to extract acoustic information in an IE in real-time and identifies single sound events. ReM-AM aims to generate automatic responses according to the specific time and context (the combination of the same sound events could have different meanings depending on the location and moment). A Smart-Analysis System of Sound Events (SAS-SE) performs data acquisition from an IE and classifies sound events based on specific parameters. It then responds according to the events taking place and their duration.
The work of Mesaros et al. [2] uses supervised learning for sound event detection and general classification of the sound. Bilen et al. [3] present an evaluation of sound event detection accuracy using a method adaptable to real-time requirements through adjustable evaluation parameters. Huang et al. [4] use deep-learning neural network models to develop a computational system for sound event detection in domestic environments. Our SAS-SE presents a complete solution (observation, analysis and decision-making) for IE using a hierarchical pattern and a specification of descriptors that operate autonomously, adapting its responses to different contexts.
Specifically, in this work, we proposed an SAS-SE based on an autonomic cycle of data analysis tasks according to the intelligent sound analysis (ISA) architecture [5]. For the SAS-SE autonomic cycle, its data analysis tasks use the components of the ReM-AM and a taxonomic model of sound events to extract, recognize, and recommend actions in a given scenario. The main contributions of this paper are:
The detailed description of the ISA autonomic cycle for the SAS-SE in an IE,
The specification of the machine-learning-based analysis tasks of ISA for the smart sound analysis, and
The development of a case study that settles the use of the system in different IEs.
This article presents in Section 2 the State of the Art, and in Section 3 the theoretical framework of ReM-AM and autonomic cycles, followed in Section 4 by the specification of our SAS-SE, which describes the hierarchical pattern of sound events and the autonomic cycle. Section 5 describes the experiments, including task implementation, verification, and the general analysis of results. Finally, there is a comparison with similar works in Section 6, and the conclusions in Section 7.
2. State of the art
To describe the different terms necessary for this work, it is important to review the literature and make comparisons to highlight the contributions in the fields of sound events, sound analysis, and sound in IEs. The main works related to our proposal are the following:
The work of Martín-Morató [6] presented a workflow to collect data from sound events of real everyday environments, even though it shows redundancy and incomplete annotations from the data. This work offers an important approach to the audio collecting used in SAS-SE. In Ref. [7], Huh et al. created a dataset identifying actions that can be discriminated purely from audio. This work is relevant because it uses audio recognition models that are verified against visual labels to eliminate ambiguities. The work of Czopek et al. [8] described an acoustic monitoring system including sound pressure levels, time and speed, applied in the identification of energy wastes, important for fields such as production energy and efficiency management.
In the work [9], Laska et al. introduced a sound analysis scheme of the vocal tract for speech coding, considering only coughing. This interesting approach can be applied to a smart home to offer non-invasive assistance. Finally, Hou et al. [10] presented a soundscape analysis using artificial intelligence to analyze automatic soundscape characterization, with the classification of sounds to analyze their correlation.
Table 1 presents a comparison between previous works and our proposal, emphasizing contributions in the fields of sound events and sound analysis, both general and in IEs.
All these works gave specific insights into the various topics that must be integrated to form the foundation for the development of SAS-SE. However, none of the works combine them for integral use, particularly in an IE. As shown in Table 1, SAS-SE is the only one that takes into account sound events, sound analysis, and sound in IE to develop a system with a broader approach to analyzing sound events in IE. Our approach presents an integration of these processes through an autonomous cycle of sound analysis tasks that process the sound events present in an IE. The originality of this work lies in our real-time adaptive warning system using sound perception based on autonomic cycles of sound data analysis tasks.
3. Theoretical framework
This section describes the ReM-AM and its components, as well as the term autonomic cycle.
3.1 ReM-AM
As explained in Ref. [1], ReM-AM (Figure 1) is an extension of the AmICL Middleware that uses digital resources to improve the learning process by combining educational services from the cloud with multiagent systems (MAS) [11, 12]. In ReM-AM, a new layer, called Audio Management Layer (AMS), is added, which has three components: a Collecting Audio Data (CAD) component that characterizes and categorizes sound events, generates metadata, and defines their properties; an Interaction System-User-Audio (ISUA) component that recognizes sound patterns and performs advanced analysis to identify sources using cloud-based algorithms; and a Decision-Making (DM) component that adapts an IE based on acoustic information, optimizing its performance.
A diagram of the ReM-AM structure. The diagram is divided into several layers, each containing different components represented by flasks. The top layer is labeled MAS Management Layer (MMAL) and contains three flasks labeled AMA, CCA, and DMA. The next layer is labeled Services Management Layer (SML) and contains four flasks labeled KMA, AdMA, SMA, and WSA. The third layer is labeled Audio Management Layer (AML) and contains three flasks labeled ISUA, CAD, and DM. Below these layers is the Ami Logical Management Layer (ILL) containing six flasks labeled SPA, CA, TA, CEA, CMA, and DA. The bottom layer is labeled Physical Layer (PL) and contains categories labeled Sensors, Effectors, Intelligent Objects, Users, and Other Objects. The diagram also includes vertical labels on the left side indicating Meta, Security, and Base, and horizontal labels on the right side indicating MAP, Context Awareness, and E.ReM-AM
A diagram of the ReM-AM structure. The diagram is divided into several layers, each containing different components represented by flasks. The top layer is labeled MAS Management Layer (MMAL) and contains three flasks labeled AMA, CCA, and DMA. The next layer is labeled Services Management Layer (SML) and contains four flasks labeled KMA, AdMA, SMA, and WSA. The third layer is labeled Audio Management Layer (AML) and contains three flasks labeled ISUA, CAD, and DM. Below these layers is the Ami Logical Management Layer (ILL) containing six flasks labeled SPA, CA, TA, CEA, CMA, and DA. The bottom layer is labeled Physical Layer (PL) and contains categories labeled Sensors, Effectors, Intelligent Objects, Users, and Other Objects. The diagram also includes vertical labels on the left side indicating Meta, Security, and Base, and horizontal labels on the right side indicating MAP, Context Awareness, and E.ReM-AM
3.2 Autonomic cycle of data analysis tasks
According to Ref. [13], an Autonomic Cycle of Data Analysis Tasks (ACODAT) is a set of data analysis tasks that supervise and control a process autonomously, based on knowledge models (e.g. prediction, recognition, etc.), which interact to achieve a common goal. Each task has a different function in the cycle; as explained in Ref. [13]: observation, analysis, or decision-making of the process.
In the case of ReM-AM, each component has a role: CAD observes, ISUA analyzes and DM makes decisions. Additionally, ReM-AM has three autonomic cycles: (1) The General Acoustic Management (GAM) has three processes that are the absorption of acoustic waves that create reverberation, the block of waves dispersion to focus on the right direction, and the cover of unwanted noise by using noise-canceling systems [14]; (2) The ISA identifies the acoustic features in an IE to determine the tasks that can be executed depending on the use of the IE; finally, (3) the Artificial Sound Perception (ASP) offers an artificial soundscape of the IE, representing virtually its acoustic features.
4. Specification of our SAS-SE
This section describes the SAS-SE, specifically, the hierarchical pattern for sound events, and its autonomic cycle for the ISA using the methodology proposed in Ref. [15].
4.1 Hierarchical pattern
As described in Ref. [16], a sound object is an acoustic action that corresponds to the intention of listening. In this work, we use the term sound event to broaden the spectrum of identified acoustic signals. To recognize a sound event and understand its influence on emotions, SAS-SE considers a three-level pattern (see Figure 2).
A diagram illustrating a hierarchical pattern of sound events with three levels. The first level is labeled 'Level I-Sound Source,' the second level is labeled 'Level II-Sound Parameters,' and the third level is labeled 'Level III-Emotional Responses related to Sound.' Arrows indicate the flow from Level I to Level II, and from Level II to Level III.Hierarchical pattern of sound events
A diagram illustrating a hierarchical pattern of sound events with three levels. The first level is labeled 'Level I-Sound Source,' the second level is labeled 'Level II-Sound Parameters,' and the third level is labeled 'Level III-Emotional Responses related to Sound.' Arrows indicate the flow from Level I to Level II, and from Level II to Level III.Hierarchical pattern of sound events
In general, this pattern has a set of descriptors to identify factors related to acoustic signals and their emotional impact. In the context of our system, the hierarchical pattern is used in a classification process to determine the characteristics of the environment based on the detected sound properties.
Specifically, classification involves grouping or categorizing sound properties into distinct classes based on predefined rules. These rules are defined using formal languages within a formal grammar. Therefore, we use classification rules as a mapping of sound properties to environmental categories. Formally, a classification rule defines a function f that maps each set of sound properties of an input to one of several predefined categories that define the environment. Let f: , where Σ∗ is the set of all strings of sound properties and C is the set of environmental categories. Classification aims to partition the set of strings Σ∗ into subsets according to the environmental categories they belong to. For example, for an input string , the rules classify w by assigning it to one of the categories Ci ∈ C.
The levels of the hierarchical pattern are described below.
Level I: Sound Source. At this level, the main focus will be on the location and start of a sound event. This will be considered from both the temporal and spatial perspectives, and will be defined by the set of descriptors (see Table 2).
Descriptors of the pattern of sound source
| Descriptor | Description |
|---|---|
| Visual field | It describes if the sound source is located in an easily visual area |
| Horizontal-vertical | It describes the direction from where the sound source is perceived |
| Distance | It describes if the sound source is close or far from the listening point |
| Identification | It describes if the sound corresponds to an anthropogenic or natural source (see Figure 3) |
| Descriptor | Description |
|---|---|
| Visual field | It describes if the sound source is located in an easily visual area |
| Horizontal-vertical | It describes the direction from where the sound source is perceived |
| Distance | It describes if the sound source is close or far from the listening point |
| Identification | It describes if the sound corresponds to an anthropogenic or natural source (see |
Source(s): Authors' own creation
In Figure 3, presented in Refs. [5, 17], there are two categories of sound events: anthropogenic and natural. The anthropogenic sounds are those created by humans and this category is divided into non-mechanic, mechanic, and transportation. Natural sounds are divided into biological and geophysical. This is a general classification of sound events, serving as a starting point for the development of our system. Some examples of classification rules at this level are:
If the sound is in the visual field and the frequency is in a determined range, then the sound is domestic (biological).
If the direction is horizontal and the amplitude is over a threshold and there is high sound pressure then the sound is impulsive (mechanical).
If the speed of the sound is high and the intensity is high then the sound is in the visual field (musical).
Level II: Sound Parameters. At this level are described the different features of an individual sound event in terms of its physics. The parameters are considered from the physics of acoustic waves and will be defined by the set of descriptors in Table 3. Some examples of classification rules at this level are:
If the code system is phonetic and the psychological threshold is low then the frequency is slow.
If the critical band is slow and the auditory masking is true then the amplitude is low.
If the code system is rhythm-melody and the auditory recognition is music, then the sound pressure is high.
Level III: Emotional Responses related to Sound. This level considers the process of psychoacoustics and cognitive neuroscience of hearing. These processes are directly related to the emotions that different sound events can generate [18–21], and they will be defined by the set of descriptors in Table 4.
Descriptors of the pattern of sound parameters
| Descriptor | Description |
|---|---|
| Sound source | It describes the different types of acoustic waves that the source could create |
| Frequency | It describes the frequencies in a sound event defined in Hz |
| Wavelength | It will depend on the medium that the sound wave travels through |
| Amplitude | It describes the value between the peak and trough (highest and lowest value in a wave) |
| Sound pressure | It describes the local pressure deviation in the atmospheric pressure |
| Intensity | It describes the power carried by an acoustic wave |
| Speed | It describes the distance traveled per unit of time by an acoustic wave |
| Direction | It describes the direction in which the wave propagates |
| Descriptor | Description |
|---|---|
| Sound source | It describes the different types of acoustic waves that the source could create |
| Frequency | It describes the frequencies in a sound event defined in Hz |
| Wavelength | It will depend on the medium that the sound wave travels through |
| Amplitude | It describes the value between the peak and trough (highest and lowest value in a wave) |
| Sound pressure | It describes the local pressure deviation in the atmospheric pressure |
| Intensity | It describes the power carried by an acoustic wave |
| Speed | It describes the distance traveled per unit of time by an acoustic wave |
| Direction | It describes the direction in which the wave propagates |
Source(s): Authors' own creation
Descriptors of the pattern of emotional responses related to sound
| Descriptor | Description |
|---|---|
| Sound parameters | It describes the individual features of the acoustic signal that is perceived |
| Psychological threshold | It describes the levels at which an acoustic stimulus generates a visible reaction |
| Auditory masking | It describes if an acoustics signal is being masked in the ear |
| Critical band | It describes the frequency bandwidth in the hearing system |
| Auditory recognition | It describes the brain areas that recognize different sound events (natural, music, speech) |
| Code systems | It describes if the sound event has to be processed as rhythm-melody or as phonetic |
| Descriptor | Description |
|---|---|
| Sound parameters | It describes the individual features of the acoustic signal that is perceived |
| Psychological threshold | It describes the levels at which an acoustic stimulus generates a visible reaction |
| Auditory masking | It describes if an acoustics signal is being masked in the ear |
| Critical band | It describes the frequency bandwidth in the hearing system |
| Auditory recognition | It describes the brain areas that recognize different sound events (natural, music, speech) |
| Code systems | It describes if the sound event has to be processed as rhythm-melody or as phonetic |
Source(s): Authors' own creation
A diagram categorizes sound events into two main groups: anthropogenic and natural. Anthropogenic sounds are divided into mechanic and non-mechanic. Mechanic sounds further split into engines, impulsive, water, and transportation, with transportation subdividing into terrestrial, railway, and aerial. Non-mechanic sounds are categorized into vocals and musical, with musical further divided into vocals and non-vocals. Natural sounds are divided into biological and geophysical. Biological sounds split into wild and domestic, while geophysical sounds are categorized into wind and water.Classification structure for sound events
A diagram categorizes sound events into two main groups: anthropogenic and natural. Anthropogenic sounds are divided into mechanic and non-mechanic. Mechanic sounds further split into engines, impulsive, water, and transportation, with transportation subdividing into terrestrial, railway, and aerial. Non-mechanic sounds are categorized into vocals and musical, with musical further divided into vocals and non-vocals. Natural sounds are divided into biological and geophysical. Biological sounds split into wild and domestic, while geophysical sounds are categorized into wind and water.Classification structure for sound events
The main goal of this Hierarchical Pattern for Sound Events is to define the entire process of an acoustic signal, from the origin to the processing by the auditory system. An emotion recognition management in an IE can use this information to achieve its goal [19, 22, 23].
This sound pattern offers a range of emotional responses discovered from the sound, showing the multiple emotions that different sounds can generate in a person [18, 20–22].
4.2 Autonomic cycle of the SAS-SE based on ISA architecture
The acoustic features of different spaces vary from space to space considering that the dimensions, and materials that cover surfaces and objects, can change the behavior of acoustic waves and their propagation. For this work, the system acquires information from the IE using CAD, and analyzes the sounds in the IE with ISUA. Therefore, to implement our ReM-AM from the previous work [1], it was necessary to use the ISA autonomic cycle, which has the tasks of extraction, analysis and configuration of the response where the main components used are CAD and ISUA, excluding the DM given that the system will indicate the action that should take place but it does not be executed (see Figure 4).
The flowchart starts with a sound event in an intelligent environment (IE). This sound event is extracted by the CAD (Context-Aware Detection) module. The extraction process uses a hierarchical pattern of sound events. The extracted information is then analyzed by the ISUA (Intelligent Sound Understanding Agent). The analysis passes on the recognized information to the recommendation module. The recommendation module, part of the response module, gives a notification based on the recognized information.Diagram of the autonomic cycle ISA of our SAS-SE
The flowchart starts with a sound event in an intelligent environment (IE). This sound event is extracted by the CAD (Context-Aware Detection) module. The extraction process uses a hierarchical pattern of sound events. The extracted information is then analyzed by the ISUA (Intelligent Sound Understanding Agent). The analysis passes on the recognized information to the recommendation module. The recommendation module, part of the response module, gives a notification based on the recognized information.Diagram of the autonomic cycle ISA of our SAS-SE
As seen in Figure 4, the CAD component is responsible for the extraction task, determining the information used by the autonomic cycle. Specifically, CAD extracts and prepares the data from the sound events taking place in the IE using the hierarchical pattern and its descriptors [24]. Additionally, ISUA analyzes that information to interpret the sounds in the IE, and particularly, to recognize the sound events to pass on to the response module. Finally, the response module gives an alarm to notify information based on the detected sound events in the IE such as emotions, and activities, among others. Each task in the ISA autonomic cycle for this case is described as follows:
Extraction:
When a sound event happens in an IE, the system identifies it to extract the data that can be used and to detect the descriptors using the CAD component. These data are then prepared according to the descriptors of Table 4 to move on to the next task. See the macro-algorithm in Table 5.
Analysis:
Macro-algorithm of the extraction process
| Input | Sound event |
|---|---|
| Procedure | 1. Identification 1.1. Extraction from audio information 2. Data preparation 2.1. Detection of descriptors |
| Output | Data collected from sound events |
| Input | Sound event |
|---|---|
| Procedure | 1. Identification |
| Output | Data collected from sound events |
Source(s): Authors' own creation
This task is the recognition of the information contained in the sound data using the ISUA component, with the aim of interpreting the sound events from their descriptors (see its macro-algorithm in Table 6).
Recommendation:
Macro-algorithm of the analysis process
| Input | Data collected from sound events |
|---|---|
| Procedure | 1. Interpretation of the sound events 2. Classification of the sound events using the descriptors |
| Output | Recognized descriptors information |
| Input | Data collected from sound events |
|---|---|
| Procedure | 1. Interpretation of the sound events |
| Output | Recognized descriptors information |
Source(s): Authors' own creation
The final task generates a notification in response to a sound event in the IE using the recognized descriptors. When different sound events occur simultaneously, their combination in the final step generates a combined response, expressed as an alarm. The response module is responsible for this task, and gives a recommendation based on the relationship between events and actions that will be explained in the following sections (see its macro-algorithm in Table 7).
Macro-algorithm of the extraction process
| Input | Recognized descriptors information |
|---|---|
| Procedure | 1. Identification of the quantity of sound events happening 2. Matching the descriptors of the sound events with the possible actions 2.1. Analysis of the possible combinations of sound events 2.2. Making the decision about convenient recommendations |
| Output | Notification of the possible actions |
| Input | Recognized descriptors information |
|---|---|
| Procedure | 1. Identification of the quantity of sound events happening |
| Output | Notification of the possible actions |
Source(s): Authors' own creation
Table 8 shows the tasks, their roles, and the data sources. The extraction has the steps of 1-Data detection, and 2-Data preparation using descriptors; these tasks use directly the information provided by the specific sound events in the IE. This information goes to the task of analysis that has the sub-tasks of 3-Data recognition and 4-Data transmission, which means that when the data is ready can be passed on to the last task. The recommendation task is the last step, responsible for the final notification from the system.
Tasks description
| Role (tasks) | Sub-tasks | Data source |
|---|---|---|
| Extraction | Sub-task 1 – Data Detection Sub-task 2 – Data preparation using descriptors | Sound Events from IE |
| Analysis | Sub-task 3– Data recognition Sub-task 4–Data transmission | Processed Data from the previous task |
| Recommendation | Sub-task 5 – Notification | Processed Data from the previous task |
| Role (tasks) | Sub-tasks | Data source |
|---|---|---|
| Extraction | Sub-task 1 – Data Detection | Sound Events from IE |
| Analysis | Sub-task 3– Data recognition | Processed Data from the previous task |
| Recommendation | Sub-task 5 – Notification | Processed Data from the previous task |
Source(s): Authors' own creation
5. Experimentation with our SAS-SE
The experimentation process was divided into 3 steps: the case study, the implementation and verification of tasks, and the general analysis of the autonomic cycle behavior.
5.1 Case study
To provide a general overview of the system, the case study presents a scenario in a rural school where random anthropogenic and natural sound events occur consecutively, with some overlapping. The system recognizes that two or more simultaneous sounds could conflict, given the school setting and ongoing activities within the IE [13].
While the system detects all sound events, warnings will focus on those that may be problematic. For example, indoor background noise typically does not trigger alarms due to its stable nature. However, if the school is closed, the system can detect anthropogenic sounds, such as footsteps, which, when combined with background noise, may trigger a notification for unauthorized entry.
Sound events in the school’s surrounding area may also indicate a potential conflict. In the recreational area, student activities generate a background soundscape that includes voices, sports noises, and crowds. A key system function is to process this sound data and determine the appropriate actions. For example, if a wild animal is detected nearby, the system issues a warning so authorities can secure the gates.
The system evaluates all data according to the descriptors in the hierarchical pattern proposed in this work, prioritizing key acoustic features from sound sources and relevant parameters, to give an adequate response.
5.2 Implementation and verification of tasks
The tasks of the ACODAT in each role were verified as follows:
Extraction: The system recognized the sound information and prepared the data with the extracted sound parameters from the hierarchical pattern of sound events using CAD and the techniques proposed in Ref. [25]. The dataset used had 22 category folders with approximately 50 sound events each, and for the test were used 2 minutes per audio as a sample. The categorization of the sound events in the dataset is shown in Table 9, using the structure from Figure 3.
Analysis: ISUA takes the processed information from the previous tasks to store it in the database with the information of the IE itself. With this information was defined the Decision Tree from Figure 5 [26] of the different classes of sound events (the information in each class represents its average noise levels, entropy, and values of the 18 descriptors described in Section 3.1).
Dataset categories
| Category | Dataset folders | |
|---|---|---|
| Anthropogenic | Non Mechanic |
|
| Mechanic |
| |
| Transportation |
| |
| Natural | Biological |
|
| Geophysical |
| |
| Category | Dataset folders | |
|---|---|---|
| Anthropogenic | Non Mechanic | Human Crowds, Children & Footsteps Comedy, Fantasy & Humor Babies Schools & Crowds Sport & Leisure Hospitals |
| Mechanic | Household Interior Backgrounds Communications Industry Cities Farm Machinery Building | |
| Transportation | Transport Cars Ships & Boats Aircraft Trains | |
| Natural | Biological | Animals & Birds Horses Dogs Cats |
| Geophysical | Weather Water | |
Source(s): Authors' own creation
A decision tree diagram with a root node labeled with specific values and entropy. The tree splits into two main branches based on a condition. The left branch further splits into two sub-branches, each leading to nodes labeled with different values, entropy, and sample sizes. The right branch also splits into two sub-branches, each leading to nodes labeled with different values, entropy, and sample sizes. Each node contains specific values, entropy, sample sizes, and class labels.Decision tree
A decision tree diagram with a root node labeled with specific values and entropy. The tree splits into two main branches based on a condition. The left branch further splits into two sub-branches, each leading to nodes labeled with different values, entropy, and sample sizes. The right branch also splits into two sub-branches, each leading to nodes labeled with different values, entropy, and sample sizes. Each node contains specific values, entropy, sample sizes, and class labels.Decision tree
The initial recognition system used only two categories, generating an accuracy of 90%; with more categories added, the accuracy decreased. The noise parameter was also discrepant due to the different categories features. The entropy equally decreased with sub-categories added, and from the 18 descriptors, not all of them could be recognized, resulting in more zeros and sinking to an accuracy of 33%. To improve these numbers, some data were altered and the accuracy increased 50%-60%.
Recommendation: The combination of events taking place determines specific actions. The Response module as orchestrator manages the recognized events and their timing and sends a notification to indicate the required action. The relationships among events and actions are defined in the Response Module shown in Table 10, which considers that a persistent non-mechanic sound can be a person in need, or that a long-duration mechanic sound can be related to machine malfunction. The Response Module was tested following the process proposed in Ref. [27].
Response module
| Sound events taking place | Action as response |
|---|---|
| Non mechanic + more than 2 minutes of duration | Warning signal to the person/authority in charge |
| Mechanic + more than 4 minutes of duration | Warning signal to the person/authority in charge |
| Transportation + more than 4 minutes of duration | Warning signal to the person/authority in charge |
| Mechanic + biological | Generation of an acoustic signal in infrasound and ultrasound frequencies |
| Non mechanic + Mechanic + more than 4 minutes | Supervision required |
| Non mechanic + transportation + more than 30 seconds of duration | Warning signal to parents/person/authority in charge (imperative if children/babies) |
| Transportation + geophysical | Warning signal about weather conditions on the road |
| Non mechanic + biological + more than 2 minutes | Supervision required |
| Mechanic + geophysical | Supervision required |
| Non mechanic + geophysical | Warning signal about weather conditions |
| Sound events taking place | Action as response |
|---|---|
| Non mechanic + more than 2 minutes of duration | Warning signal to the person/authority in charge |
| Mechanic + more than 4 minutes of duration | Warning signal to the person/authority in charge |
| Transportation + more than 4 minutes of duration | Warning signal to the person/authority in charge |
| Mechanic + biological | Generation of an acoustic signal in infrasound and ultrasound frequencies |
| Non mechanic + Mechanic + more than 4 minutes | Supervision required |
| Non mechanic + transportation + more than 30 seconds of duration | Warning signal to parents/person/authority in charge (imperative if children/babies) |
| Transportation + geophysical | Warning signal about weather conditions on the road |
| Non mechanic + biological + more than 2 minutes | Supervision required |
| Mechanic + geophysical | Supervision required |
| Non mechanic + geophysical | Warning signal about weather conditions |
Source(s): Authors' own creation
5.3 Analysis of the autonomic cycle general behavior
Figure 6 shows the performance of the system identifying sound events in an IE and specifying according to the response module which of them could generate a conflict, showing the triggered events happening. In this case, an interior background and horses are not compatible, which sent the notification of rule 4 (see row 4 of Table 10) from the response module (generation of an acoustic signal in infrasound and ultrasound frequencies).
A screenshot of a dashboard interface for a monitoring system. The interface includes a navigation menu on the left side with options such as Dashboard, Clasification Module, Response Module, Profile, and Smart Room. The main section of the dashboard is divided into two columns: Events and Notifications. The Events column lists several events with timestamps and icons representing different categories such as Household, Horses, Sport & Leisure, Water, and Interior Backgrounds. Each event has a play button and a progress bar. The Notifications column displays notifications with timestamps and descriptions of triggered events, such as the generation of an acoustic signal and a warning signal about weather conditions.General result analysis
A screenshot of a dashboard interface for a monitoring system. The interface includes a navigation menu on the left side with options such as Dashboard, Clasification Module, Response Module, Profile, and Smart Room. The main section of the dashboard is divided into two columns: Events and Notifications. The Events column lists several events with timestamps and icons representing different categories such as Household, Horses, Sport & Leisure, Water, and Interior Backgrounds. Each event has a play button and a progress bar. The Notifications column displays notifications with timestamps and descriptions of triggered events, such as the generation of an acoustic signal and a warning signal about weather conditions.General result analysis
We carried out one last test of our system with sounds about different events, but at the same time, knowing in advance what was happening in the environment (for example, if there was a scream, it was because a person had fallen). From there, the response of the system was compared against the real one for 50 random cases (see Table 11, which shows the accuracy of the system in detecting the simultaneous presence of sound events).
Results for combined events
| Combined sound events | Quality (%) |
|---|---|
| 2 | 90 |
| 3 | 85 |
| 4 or more | 60 |
| Combined sound events | Quality (%) |
|---|---|
| 2 | 90 |
| 3 | 85 |
| 4 or more | 60 |
Source(s): Authors' own creation
Our system perfectly manages to discriminate two sound events and give an adequate recommendation, but it is also capable of doing it quite well for three. When there are four or more simultaneous sound events, our system begins to be less accurate.
The general behavior of the system is satisfactory, demonstrating great potential for implementing descriptor recognition and its use for intelligent acoustic management in an IE. With only a few categories, the system achieved positive accuracy results. However, as more categories were added, its performance decreased. Taking into account the initial results, the system can be used in IE with few activities, serving as a solid initial reference.
6. Comparison with similar works
To compare this work with others, it is important to establish which elements they could have in common, which are defined as criteria in our case. The first criterion is the detection of sound events, the second is the classification of those events, and the third is the implementation of a recommendation system for an IE. Table 12 presents the comparison of our work with the previous works most similar to ours, based on these three criteria.
The work [2] uses machine learning for sound detection and employs data augmentation to increase the volume of trained data. They also have a labeling for sound classification. However, this work is not directly implemented in an IE. Similar to Ref. [2], the work [3] is based on sound event detection and shows good experimental results, but it does not offer a specific sound classification, or an IE implementation.
In the work [28], they detect sound events by using their own recordings, but they do not provide a general sound classification. They demonstrate the implementation in smart cities. In Ref. [29], they present an interesting approach to sound detection in the automotive industry, using machine learning. However, it is not implemented in an IE.
The work [30] performs an acoustic descriptor analysis using machine learning, but it is applied only to musical sounds, detecting and classifying to compare different musical instruments from their timbres, but no application in an IE.
This work proposes an ACODAT for SAS-SE, which includes the detection and classification of sound events from audio descriptors, allowing for better recognition of each sound event based on its features and expanding its usage possibilities in various situations. Additionally, the implementation of our SAS-SE is directly developed for IE.
A quantitative comparison with other works is not feasible, as our system extracts sound data in an IE through the proposed descriptors. Thus, it provides a completely different input compared to the datasets used in previous works, tailored specifically for IEs. Nonetheless, the general structure with its roles and tasks, allows for qualitative comparison of the process itself.
7. Conclusions
This article demonstrates the utility of an ACODAT for the analysis of sounds in IEs based on a set of descriptors employed by their tasks based on machine learning. This system can be used in different IEs, providing relevant notifications when required. The autonomic cycle utilizes a hierarchical pattern of sound event descriptors to improve the handling of sound parameter information to be used in different environments and create better coordination in the system-user relationship.
The case study shows the process in a specific scenario, giving an overview of the system; however, it is adaptable to other situations and customizable to specific requirements. The minimum requirements for the system to function properly with the autonomic cycle have been met. Each step was implemented and integrated, enabling the monitoring of its general behavior, with promising experimental results.
This system could have valuable applications in the context of smart cities, and smart homes, among other areas. Some examples of the use of our system in different contexts are as follows. In a smart city, the system could enhance efficiency and safety by analyzing sound patterns to detect traffic jams and their causes, or determine arising public emergency situations and diagnose their magnitude, among other things. It could also adjust street lighting based on pedestrian sounds to maximize people’s safety. In an ambient assisted living, the system could monitor patients and identify distress signals or changes in vocal patterns that may indicate an emergency. In a home security system, it would allow immediate reactions to situations detected by sound events. Finally, in smart classrooms, the system could adapt soundscapes and adjust background noises to improve concentration based on students' learning styles. This makes the IE less invasive, and more perceptive and adaptable to special needs. Perhaps the primary disadvantage is the required instrumentation for the system to effectively capture sound events in context.
Future work should focus on improving the accuracy of the system by implementing neural networks or more advanced techniques. For example, implementations based on deep learning could automatically extract sound descriptors from events, which would make the tasks of capturing and recognizing sound events much more efficient. To improve the autonomic cycle implementation, parallel function management could be employed. Automation improvements are also needed. Additionally, future works will focus not only on sound events but also on integrating emotion recognition and its relationship with simultaneous sound events.
We would like to thank Daniel Ochoa, Andrea González, Sebastián Gomez and Ángela Ochoa, from the University of EAFIT, Colombia, for providing insight and expertise that assisted this research.
Availability of data and material: The data will be available if requested, with a justification of what its use will be. The implemented codes are found at https://github.com/SantGa/SAS-SE.git
Conflict of interests/competing interests: The authors declare that they have no conflict of interest.
