Specification of a smart-analysis system of sound events for smart environments

Santiago, Gabriela; Aguilar, Jose

doi:10.1108/ACI-06-2024-0240

Purpose

In this work, we proposed a smart-analysis system of sound events for smart/intelligent environments based on an autonomic cycle of data analysis tasks.

Design/methodology/approach

We propose an autonomic cycle of data analysis tasks. An autonomic cycle of data analysis tasks is a set of data analysis tasks that supervise and control a process anonymously, which are based on knowledge models (of prediction, recognition, etc.), interacting with each other to reach a common goal. Each task has a different function in the cycle: observation of the process, or its analysis, or decision-making.

Findings

This work presents the autonomic cycle. With its components, this autonomic cycle detects sound information using a taxonomic model of the sound events to analyze them and give a recommendation about the context. The taxonomic model is a hierarchical pattern that considers different aspects to recognize the sound events. This work defines the architecture of this autonomic cycle, specifies its machine-learning-based analysis tasks and evaluates its capabilities of reasoning, adaptation and communication in case studies.

Research limitations/implications

It is important to work in the future on the improvement of the accuracy of the system by implementing neural networks or more sophisticated techniques. To take the implemented autonomic cycle to a higher level, it could use parallel function management. The automation also needs improvement. In addition to that, future works are going to be directed not only to sound events but also to include emotion recognition and its relation with sound events happening simultaneously.

Originality/value

The main contributions of this paper are as follows: the detailed description of the intelligent sound analysis (ISA) autonomic cycle for the smart sound analysis of sound events (SAS-SE) in an intelligent environment (IE); the specification of the machine-learning-based analysis tasks of ISA for the smart sound analysis and the development of a case study that settles the use of the system in different IEs.

1. Introduction

THE monitoring process in intelligent environment (IE) is mostly done with sensors and video data. In this work, the process considers only the use of audio sensors as proposed in a previous work [1]. The acoustic waves traveling in an IE can provide information and data about what occurs in that environment (e.g. activities, occupation, among other things), and that information can be used to improve the experience of the users in an IE from the acoustic perspective. In general, as explained in Ref. [2], the detection of sound events can be used in environments in various ways.

Particularly, ReM-AM needs to extract acoustic information in an IE in real-time and identifies single sound events. ReM-AM aims to generate automatic responses according to the specific time and context (the combination of the same sound events could have different meanings depending on the location and moment). A Smart-Analysis System of Sound Events (SAS-SE) performs data acquisition from an IE and classifies sound events based on specific parameters. It then responds according to the events taking place and their duration.

The work of Mesaros et al. [2] uses supervised learning for sound event detection and general classification of the sound. Bilen et al. [3] present an evaluation of sound event detection accuracy using a method adaptable to real-time requirements through adjustable evaluation parameters. Huang et al. [4] use deep-learning neural network models to develop a computational system for sound event detection in domestic environments. Our SAS-SE presents a complete solution (observation, analysis and decision-making) for IE using a hierarchical pattern and a specification of descriptors that operate autonomously, adapting its responses to different contexts.

Specifically, in this work, we proposed an SAS-SE based on an autonomic cycle of data analysis tasks according to the intelligent sound analysis (ISA) architecture [5]. For the SAS-SE autonomic cycle, its data analysis tasks use the components of the ReM-AM and a taxonomic model of sound events to extract, recognize, and recommend actions in a given scenario. The main contributions of this paper are:

The detailed description of the ISA autonomic cycle for the SAS-SE in an IE,
The specification of the machine-learning-based analysis tasks of ISA for the smart sound analysis, and
The development of a case study that settles the use of the system in different IEs.

This article presents in Section 2 the State of the Art, and in Section 3 the theoretical framework of ReM-AM and autonomic cycles, followed in Section 4 by the specification of our SAS-SE, which describes the hierarchical pattern of sound events and the autonomic cycle. Section 5 describes the experiments, including task implementation, verification, and the general analysis of results. Finally, there is a comparison with similar works in Section 6, and the conclusions in Section 7.

2. State of the art

To describe the different terms necessary for this work, it is important to review the literature and make comparisons to highlight the contributions in the fields of sound events, sound analysis, and sound in IEs. The main works related to our proposal are the following:

The work of Martín-Morató [6] presented a workflow to collect data from sound events of real everyday environments, even though it shows redundancy and incomplete annotations from the data. This work offers an important approach to the audio collecting used in SAS-SE. In Ref. [7], Huh et al. created a dataset identifying actions that can be discriminated purely from audio. This work is relevant because it uses audio recognition models that are verified against visual labels to eliminate ambiguities. The work of Czopek et al. [8] described an acoustic monitoring system including sound pressure levels, time and speed, applied in the identification of energy wastes, important for fields such as production energy and efficiency management.

In the work [9], Laska et al. introduced a sound analysis scheme of the vocal tract for speech coding, considering only coughing. This interesting approach can be applied to a smart home to offer non-invasive assistance. Finally, Hou et al. [10] presented a soundscape analysis using artificial intelligence to analyze automatic soundscape characterization, with the classification of sounds to analyze their correlation.

Table 1 presents a comparison between previous works and our proposal, emphasizing contributions in the fields of sound events and sound analysis, both general and in IEs.

Table 1

Contribution from previous literature

Work	Sound events	Sound analysis	Sound in IE
[6]	X
[7]	X
[8]		X
[9]	X	X	X
[10]		X	X
SAS-SE	X	X	X

Source(s): Authors' own creation

All these works gave specific insights into the various topics that must be integrated to form the foundation for the development of SAS-SE. However, none of the works combine them for integral use, particularly in an IE. As shown in Table 1, SAS-SE is the only one that takes into account sound events, sound analysis, and sound in IE to develop a system with a broader approach to analyzing sound events in IE. Our approach presents an integration of these processes through an autonomous cycle of sound analysis tasks that process the sound events present in an IE. The originality of this work lies in our real-time adaptive warning system using sound perception based on autonomic cycles of sound data analysis tasks.

3. Theoretical framework

This section describes the ReM-AM and its components, as well as the term autonomic cycle.

3.1 ReM-AM

As explained in Ref. [1], ReM-AM (Figure 1) is an extension of the AmICL Middleware that uses digital resources to improve the learning process by combining educational services from the cloud with multiagent systems (MAS) [11, 12]. In ReM-AM, a new layer, called Audio Management Layer (AMS), is added, which has three components: a Collecting Audio Data (CAD) component that characterizes and categorizes sound events, generates metadata, and defines their properties; an Interaction System-User-Audio (ISUA) component that recognizes sound patterns and performs advanced analysis to identify sources using cloud-based algorithms; and a Decision-Making (DM) component that adapts an IE based on acoustic information, optimizing its performance.

Figure 1

A diagram representing the structure of ReM-AM with various management layers and their components.

View large Download slide

A diagram of the ReM-AM structure. The diagram is divided into several layers, each containing different components represented by flasks. The top layer is labeled MAS Management Layer (MMAL) and contains three flasks labeled AMA, CCA, and DMA. The next layer is labeled Services Management Layer (SML) and contains four flasks labeled KMA, AdMA, SMA, and WSA. The third layer is labeled Audio Management Layer (AML) and contains three flasks labeled ISUA, CAD, and DM. Below these layers is the Ami Logical Management Layer (ILL) containing six flasks labeled SPA, CA, TA, CEA, CMA, and DA. The bottom layer is labeled Physical Layer (PL) and contains categories labeled Sensors, Effectors, Intelligent Objects, Users, and Other Objects. The diagram also includes vertical labels on the left side indicating Meta, Security, and Base, and horizontal labels on the right side indicating MAP, Context Awareness, and E.

ReM-AM

3.2 Autonomic cycle of data analysis tasks

According to Ref. [13], an Autonomic Cycle of Data Analysis Tasks (ACODAT) is a set of data analysis tasks that supervise and control a process autonomously, based on knowledge models (e.g. prediction, recognition, etc.), which interact to achieve a common goal. Each task has a different function in the cycle; as explained in Ref. [13]: observation, analysis, or decision-making of the process.

In the case of ReM-AM, each component has a role: CAD observes, ISUA analyzes and DM makes decisions. Additionally, ReM-AM has three autonomic cycles: (1) The General Acoustic Management (GAM) has three processes that are the absorption of acoustic waves that create reverberation, the block of waves dispersion to focus on the right direction, and the cover of unwanted noise by using noise-canceling systems [14]; (2) The ISA identifies the acoustic features in an IE to determine the tasks that can be executed depending on the use of the IE; finally, (3) the Artificial Sound Perception (ASP) offers an artificial soundscape of the IE, representing virtually its acoustic features.

4. Specification of our SAS-SE

This section describes the SAS-SE, specifically, the hierarchical pattern for sound events, and its autonomic cycle for the ISA using the methodology proposed in Ref. [15].

4.1 Hierarchical pattern

As described in Ref. [16], a sound object is an acoustic action that corresponds to the intention of listening. In this work, we use the term sound event to broaden the spectrum of identified acoustic signals. To recognize a sound event and understand its influence on emotions, SAS-SE considers a three-level pattern (see Figure 2).

Figure 2

A diagram showing a three-level pattern of sound events.

View large Download slide

A diagram illustrating a hierarchical pattern of sound events with three levels. The first level is labeled 'Level I-Sound Source,' the second level is labeled 'Level II-Sound Parameters,' and the third level is labeled 'Level III-Emotional Responses related to Sound.' Arrows indicate the flow from Level I to Level II, and from Level II to Level III.

Hierarchical pattern of sound events

In general, this pattern has a set of descriptors to identify factors related to acoustic signals and their emotional impact. In the context of our system, the hierarchical pattern is used in a classification process to determine the characteristics of the environment based on the detected sound properties.

Specifically, classification involves grouping or categorizing sound properties into distinct classes based on predefined rules. These rules are defined using formal languages within a formal grammar. Therefore, we use classification rules as a mapping of sound properties to environmental categories. Formally, a classification rule defines a function f that maps each set of sound properties of an input to one of several predefined categories that define the environment. Let f: $w \in Σ * \to Ci \in C$ ⁠, where Σ∗ is the set of all strings of sound properties and C is the set of environmental categories. Classification aims to partition the set of strings Σ∗ into subsets according to the environmental categories they belong to. For example, for an input string $w \in Σ *$ ⁠, the rules classify w by assigning it to one of the categories Ci ∈ C.

The levels of the hierarchical pattern are described below.

Level I: Sound Source. At this level, the main focus will be on the location and start of a sound event. This will be considered from both the temporal and spatial perspectives, and will be defined by the set of descriptors (see Table 2).

Table 2

Descriptors of the pattern of sound source

Descriptor	Description
Visual field	It describes if the sound source is located in an easily visual area
Horizontal-vertical	It describes the direction from where the sound source is perceived
Distance	It describes if the sound source is close or far from the listening point
Identification	It describes if the sound corresponds to an anthropogenic or natural source (see Figure 3)

Descriptor	Description
Visual field	It describes if the sound source is located in an easily visual area
Horizontal-vertical	It describes the direction from where the sound source is perceived
Distance	It describes if the sound source is close or far from the listening point
Identification	It describes if the sound corresponds to an anthropogenic or natural source (see Figure 3)

Source(s): Authors' own creation

In Figure 3, presented in Refs. [5, 17], there are two categories of sound events: anthropogenic and natural. The anthropogenic sounds are those created by humans and this category is divided into non-mechanic, mechanic, and transportation. Natural sounds are divided into biological and geophysical. This is a general classification of sound events, serving as a starting point for the development of our system. Some examples of classification rules at this level are:

If the sound is in the visual field and the frequency is in a determined range, then the sound is domestic (biological).
If the direction is horizontal and the amplitude is over a threshold and there is high sound pressure then the sound is impulsive (mechanical).
If the speed of the sound is high and the intensity is high then the sound is in the visual field (musical).
Level II: Sound Parameters. At this level are described the different features of an individual sound event in terms of its physics. The parameters are considered from the physics of acoustic waves and will be defined by the set of descriptors in Table 3. Some examples of classification rules at this level are:
If the code system is phonetic and the psychological threshold is low then the frequency is slow.
If the critical band is slow and the auditory masking is true then the amplitude is low.
If the code system is rhythm-melody and the auditory recognition is music, then the sound pressure is high.
Level III: Emotional Responses related to Sound. This level considers the process of psychoacoustics and cognitive neuroscience of hearing. These processes are directly related to the emotions that different sound events can generate [18–21], and they will be defined by the set of descriptors in Table 4.

Table 3

Descriptors of the pattern of sound parameters

Descriptor	Description
Sound source	It describes the different types of acoustic waves that the source could create
Frequency	It describes the frequencies in a sound event defined in Hz
Wavelength	It will depend on the medium that the sound wave travels through
Amplitude	It describes the value between the peak and trough (highest and lowest value in a wave)
Sound pressure	It describes the local pressure deviation in the atmospheric pressure
Intensity	It describes the power carried by an acoustic wave
Speed	It describes the distance traveled per unit of time by an acoustic wave
Direction	It describes the direction in which the wave propagates

Descriptor	Description
Sound source	It describes the different types of acoustic waves that the source could create
Frequency	It describes the frequencies in a sound event defined in Hz
Wavelength	It will depend on the medium that the sound wave travels through
Amplitude	It describes the value between the peak and trough (highest and lowest value in a wave)
Sound pressure	It describes the local pressure deviation in the atmospheric pressure
Intensity	It describes the power carried by an acoustic wave
Speed	It describes the distance traveled per unit of time by an acoustic wave
Direction	It describes the direction in which the wave propagates

Source(s): Authors' own creation

Table 4

Descriptors of the pattern of emotional responses related to sound

Descriptor	Description
Sound parameters	It describes the individual features of the acoustic signal that is perceived
Psychological threshold	It describes the levels at which an acoustic stimulus generates a visible reaction
Auditory masking	It describes if an acoustics signal is being masked in the ear
Critical band	It describes the frequency bandwidth in the hearing system
Auditory recognition	It describes the brain areas that recognize different sound events (natural, music, speech)
Code systems	It describes if the sound event has to be processed as rhythm-melody or as phonetic

Descriptor	Description
Sound parameters	It describes the individual features of the acoustic signal that is perceived
Psychological threshold	It describes the levels at which an acoustic stimulus generates a visible reaction
Auditory masking	It describes if an acoustics signal is being masked in the ear
Critical band	It describes the frequency bandwidth in the hearing system
Auditory recognition	It describes the brain areas that recognize different sound events (natural, music, speech)
Code systems	It describes if the sound event has to be processed as rhythm-melody or as phonetic

Source(s): Authors' own creation

Figure 3

A diagram of sound event classification structure.

View large Download slide

A diagram categorizes sound events into two main groups: anthropogenic and natural. Anthropogenic sounds are divided into mechanic and non-mechanic. Mechanic sounds further split into engines, impulsive, water, and transportation, with transportation subdividing into terrestrial, railway, and aerial. Non-mechanic sounds are categorized into vocals and musical, with musical further divided into vocals and non-vocals. Natural sounds are divided into biological and geophysical. Biological sounds split into wild and domestic, while geophysical sounds are categorized into wind and water.

Classification structure for sound events

The main goal of this Hierarchical Pattern for Sound Events is to define the entire process of an acoustic signal, from the origin to the processing by the auditory system. An emotion recognition management in an IE can use this information to achieve its goal [19, 22, 23].

This sound pattern offers a range of emotional responses discovered from the sound, showing the multiple emotions that different sounds can generate in a person [18, 20–22].

4.2 Autonomic cycle of the SAS-SE based on ISA architecture

The acoustic features of different spaces vary from space to space considering that the dimensions, and materials that cover surfaces and objects, can change the behavior of acoustic waves and their propagation. For this work, the system acquires information from the IE using CAD, and analyzes the sounds in the IE with ISUA. Therefore, to implement our ReM-AM from the previous work [1], it was necessary to use the ISA autonomic cycle, which has the tasks of extraction, analysis and configuration of the response where the main components used are CAD and ISUA, excluding the DM given that the system will indicate the action that should take place but it does not be executed (see Figure 4).

Figure 4

A flowchart illustrating the process of sound event analysis and recommendation in an intelligent environment.

View large Download slide

The flowchart starts with a sound event in an intelligent environment (IE). This sound event is extracted by the CAD (Context-Aware Detection) module. The extraction process uses a hierarchical pattern of sound events. The extracted information is then analyzed by the ISUA (Intelligent Sound Understanding Agent). The analysis passes on the recognized information to the recommendation module. The recommendation module, part of the response module, gives a notification based on the recognized information.

Diagram of the autonomic cycle ISA of our SAS-SE

As seen in Figure 4, the CAD component is responsible for the extraction task, determining the information used by the autonomic cycle. Specifically, CAD extracts and prepares the data from the sound events taking place in the IE using the hierarchical pattern and its descriptors [24]. Additionally, ISUA analyzes that information to interpret the sounds in the IE, and particularly, to recognize the sound events to pass on to the response module. Finally, the response module gives an alarm to notify information based on the detected sound events in the IE such as emotions, and activities, among others. Each task in the ISA autonomic cycle for this case is described as follows:

Extraction:

When a sound event happens in an IE, the system identifies it to extract the data that can be used and to detect the descriptors using the CAD component. These data are then prepared according to the descriptors of Table 4 to move on to the next task. See the macro-algorithm in Table 5.

Analysis:

Table 5

Macro-algorithm of the extraction process

Input	Sound event
Procedure	1. Identification 1.1. Extraction from audio information 2. Data preparation 2.1. Detection of descriptors
Output	Data collected from sound events

Source(s): Authors' own creation

This task is the recognition of the information contained in the sound data using the ISUA component, with the aim of interpreting the sound events from their descriptors (see its macro-algorithm in Table 6).

Recommendation:

Table 6

Macro-algorithm of the analysis process

Input	Data collected from sound events
Procedure	1. Interpretation of the sound events 2. Classification of the sound events using the descriptors
Output	Recognized descriptors information

Source(s): Authors' own creation

The final task generates a notification in response to a sound event in the IE using the recognized descriptors. When different sound events occur simultaneously, their combination in the final step generates a combined response, expressed as an alarm. The response module is responsible for this task, and gives a recommendation based on the relationship between events and actions that will be explained in the following sections (see its macro-algorithm in Table 7).

Table 7

Macro-algorithm of the extraction process

Input	Recognized descriptors information
Procedure	1. Identification of the quantity of sound events happening 2. Matching the descriptors of the sound events with the possible actions 2.1. Analysis of the possible combinations of sound events 2.2. Making the decision about convenient recommendations
Output	Notification of the possible actions

Input	Recognized descriptors information
Procedure	1. Identification of the quantity of sound events happening2. Matching the descriptors of the sound events with the possible actions2.1. Analysis of the possible combinations of sound events2.2. Making the decision about convenient recommendations
Output	Notification of the possible actions

Source(s): Authors' own creation

Table 8 shows the tasks, their roles, and the data sources. The extraction has the steps of 1-Data detection, and 2-Data preparation using descriptors; these tasks use directly the information provided by the specific sound events in the IE. This information goes to the task of analysis that has the sub-tasks of 3-Data recognition and 4-Data transmission, which means that when the data is ready can be passed on to the last task. The recommendation task is the last step, responsible for the final notification from the system.

Table 8

Tasks description

Role (tasks)	Sub-tasks	Data source
Extraction	Sub-task 1 – Data Detection Sub-task 2 – Data preparation using descriptors	Sound Events from IE
Analysis	Sub-task 3– Data recognition Sub-task 4–Data transmission	Processed Data from the previous task
Recommendation	Sub-task 5 – Notification	Processed Data from the previous task

Role (tasks)	Sub-tasks	Data source
Extraction	Sub-task 1 – Data DetectionSub-task 2 – Data preparation using descriptors	Sound Events from IE
Analysis	Sub-task 3– Data recognitionSub-task 4–Data transmission	Processed Data from the previous task
Recommendation	Sub-task 5 – Notification	Processed Data from the previous task

Source(s): Authors' own creation

5. Experimentation with our SAS-SE

The experimentation process was divided into 3 steps: the case study, the implementation and verification of tasks, and the general analysis of the autonomic cycle behavior.

5.1 Case study

To provide a general overview of the system, the case study presents a scenario in a rural school where random anthropogenic and natural sound events occur consecutively, with some overlapping. The system recognizes that two or more simultaneous sounds could conflict, given the school setting and ongoing activities within the IE [13].

While the system detects all sound events, warnings will focus on those that may be problematic. For example, indoor background noise typically does not trigger alarms due to its stable nature. However, if the school is closed, the system can detect anthropogenic sounds, such as footsteps, which, when combined with background noise, may trigger a notification for unauthorized entry.

Sound events in the school’s surrounding area may also indicate a potential conflict. In the recreational area, student activities generate a background soundscape that includes voices, sports noises, and crowds. A key system function is to process this sound data and determine the appropriate actions. For example, if a wild animal is detected nearby, the system issues a warning so authorities can secure the gates.

The system evaluates all data according to the descriptors in the hierarchical pattern proposed in this work, prioritizing key acoustic features from sound sources and relevant parameters, to give an adequate response.

5.2 Implementation and verification of tasks

The tasks of the ACODAT in each role were verified as follows:

Extraction: The system recognized the sound information and prepared the data with the extracted sound parameters from the hierarchical pattern of sound events using CAD and the techniques proposed in Ref. [25]. The dataset used had 22 category folders with approximately 50 sound events each, and for the test were used 2 minutes per audio as a sample. The categorization of the sound events in the dataset is shown in Table 9, using the structure from Figure 3.
Analysis: ISUA takes the processed information from the previous tasks to store it in the database with the information of the IE itself. With this information was defined the Decision Tree from Figure 5 [26] of the different classes of sound events (the information in each class represents its average noise levels, entropy, and values of the 18 descriptors described in Section 3.1).

Table 9

Dataset categories

Category		Dataset folders
Anthropogenic	Non Mechanic	Human Crowds, Children & Footsteps Comedy, Fantasy & Humor Babies Schools & Crowds Sport & Leisure Hospitals
	Mechanic	Household Interior Backgrounds Communications Industry Cities Farm Machinery Building
	Transportation	Transport Cars Ships & Boats Aircraft Trains
Natural	Biological	Animals & Birds Horses Dogs Cats
Natural	Geophysical	Weather Water

Category		Dataset folders
Anthropogenic	Non Mechanic	Human Crowds, Children & Footsteps Comedy, Fantasy & Humor Babies Schools & Crowds Sport & Leisure Hospitals
	Mechanic	Household Interior Backgrounds Communications Industry Cities Farm Machinery Building
	Transportation	Transport Cars Ships & Boats Aircraft Trains
Natural	Biological	Animals & Birds Horses Dogs Cats
Natural	Geophysical	Weather Water

Source(s): Authors' own creation

Figure 5

A decision tree diagram with multiple branches and nodes.

View large Download slide

A decision tree diagram with a root node labeled with specific values and entropy. The tree splits into two main branches based on a condition. The left branch further splits into two sub-branches, each leading to nodes labeled with different values, entropy, and sample sizes. The right branch also splits into two sub-branches, each leading to nodes labeled with different values, entropy, and sample sizes. Each node contains specific values, entropy, sample sizes, and class labels.

Decision tree

The initial recognition system used only two categories, generating an accuracy of 90%; with more categories added, the accuracy decreased. The noise parameter was also discrepant due to the different categories features. The entropy equally decreased with sub-categories added, and from the 18 descriptors, not all of them could be recognized, resulting in more zeros and sinking to an accuracy of 33%. To improve these numbers, some data were altered and the accuracy increased 50%-60%.

Recommendation: The combination of events taking place determines specific actions. The Response module as orchestrator manages the recognized events and their timing and sends a notification to indicate the required action. The relationships among events and actions are defined in the Response Module shown in Table 10, which considers that a persistent non-mechanic sound can be a person in need, or that a long-duration mechanic sound can be related to machine malfunction. The Response Module was tested following the process proposed in Ref. [27].

Table 10

Response module

Sound events taking place	Action as response
Non mechanic + more than 2 minutes of duration	Warning signal to the person/authority in charge
Mechanic + more than 4 minutes of duration	Warning signal to the person/authority in charge
Transportation + more than 4 minutes of duration	Warning signal to the person/authority in charge
Mechanic + biological	Generation of an acoustic signal in infrasound and ultrasound frequencies
Non mechanic + Mechanic + more than 4 minutes	Supervision required
Non mechanic + transportation + more than 30 seconds of duration	Warning signal to parents/person/authority in charge (imperative if children/babies)
Transportation + geophysical	Warning signal about weather conditions on the road
Non mechanic + biological + more than 2 minutes	Supervision required
Mechanic + geophysical	Supervision required
Non mechanic + geophysical	Warning signal about weather conditions

Sound events taking place	Action as response
Non mechanic + more than 2 minutes of duration	Warning signal to the person/authority in charge
Mechanic + more than 4 minutes of duration	Warning signal to the person/authority in charge
Transportation + more than 4 minutes of duration	Warning signal to the person/authority in charge
Mechanic + biological	Generation of an acoustic signal in infrasound and ultrasound frequencies
Non mechanic + Mechanic + more than 4 minutes	Supervision required
Non mechanic + transportation + more than 30 seconds of duration	Warning signal to parents/person/authority in charge (imperative if children/babies)
Transportation + geophysical	Warning signal about weather conditions on the road
Non mechanic + biological + more than 2 minutes	Supervision required
Mechanic + geophysical	Supervision required
Non mechanic + geophysical	Warning signal about weather conditions

Source(s): Authors' own creation

5.3 Analysis of the autonomic cycle general behavior

Figure 6 shows the performance of the system identifying sound events in an IE and specifying according to the response module which of them could generate a conflict, showing the triggered events happening. In this case, an interior background and horses are not compatible, which sent the notification of rule 4 (see row 4 of Table 10) from the response module (generation of an acoustic signal in infrasound and ultrasound frequencies).

Figure 6

View large Download slide

A screenshot of a dashboard interface for a monitoring system. The interface includes a navigation menu on the left side with options such as Dashboard, Clasification Module, Response Module, Profile, and Smart Room. The main section of the dashboard is divided into two columns: Events and Notifications. The Events column lists several events with timestamps and icons representing different categories such as Household, Horses, Sport & Leisure, Water, and Interior Backgrounds. Each event has a play button and a progress bar. The Notifications column displays notifications with timestamps and descriptions of triggered events, such as the generation of an acoustic signal and a warning signal about weather conditions.

General result analysis

We carried out one last test of our system with sounds about different events, but at the same time, knowing in advance what was happening in the environment (for example, if there was a scream, it was because a person had fallen). From there, the response of the system was compared against the real one for 50 random cases (see Table 11, which shows the accuracy of the system in detecting the simultaneous presence of sound events).

Table 11

Results for combined events

Combined sound events	Quality (%)
2	90
3	85
4 or more	60

Source(s): Authors' own creation

Our system perfectly manages to discriminate two sound events and give an adequate recommendation, but it is also capable of doing it quite well for three. When there are four or more simultaneous sound events, our system begins to be less accurate.

The general behavior of the system is satisfactory, demonstrating great potential for implementing descriptor recognition and its use for intelligent acoustic management in an IE. With only a few categories, the system achieved positive accuracy results. However, as more categories were added, its performance decreased. Taking into account the initial results, the system can be used in IE with few activities, serving as a solid initial reference.

6. Comparison with similar works

To compare this work with others, it is important to establish which elements they could have in common, which are defined as criteria in our case. The first criterion is the detection of sound events, the second is the classification of those events, and the third is the implementation of a recommendation system for an IE. Table 12 presents the comparison of our work with the previous works most similar to ours, based on these three criteria.

Table 12

Comparison with other works

Work	[2]	[3]	[28]	[29]	[30]	Our ACODAT
Detection	X	X	X	X	X	X
Classification	X			X	X	X
Implementation in an IE			X			X

Source(s): Authors' own creation

The work [2] uses machine learning for sound detection and employs data augmentation to increase the volume of trained data. They also have a labeling for sound classification. However, this work is not directly implemented in an IE. Similar to Ref. [2], the work [3] is based on sound event detection and shows good experimental results, but it does not offer a specific sound classification, or an IE implementation.

In the work [28], they detect sound events by using their own recordings, but they do not provide a general sound classification. They demonstrate the implementation in smart cities. In Ref. [29], they present an interesting approach to sound detection in the automotive industry, using machine learning. However, it is not implemented in an IE.

The work [30] performs an acoustic descriptor analysis using machine learning, but it is applied only to musical sounds, detecting and classifying to compare different musical instruments from their timbres, but no application in an IE.

This work proposes an ACODAT for SAS-SE, which includes the detection and classification of sound events from audio descriptors, allowing for better recognition of each sound event based on its features and expanding its usage possibilities in various situations. Additionally, the implementation of our SAS-SE is directly developed for IE.

A quantitative comparison with other works is not feasible, as our system extracts sound data in an IE through the proposed descriptors. Thus, it provides a completely different input compared to the datasets used in previous works, tailored specifically for IEs. Nonetheless, the general structure with its roles and tasks, allows for qualitative comparison of the process itself.

7. Conclusions

This article demonstrates the utility of an ACODAT for the analysis of sounds in IEs based on a set of descriptors employed by their tasks based on machine learning. This system can be used in different IEs, providing relevant notifications when required. The autonomic cycle utilizes a hierarchical pattern of sound event descriptors to improve the handling of sound parameter information to be used in different environments and create better coordination in the system-user relationship.

The case study shows the process in a specific scenario, giving an overview of the system; however, it is adaptable to other situations and customizable to specific requirements. The minimum requirements for the system to function properly with the autonomic cycle have been met. Each step was implemented and integrated, enabling the monitoring of its general behavior, with promising experimental results.

This system could have valuable applications in the context of smart cities, and smart homes, among other areas. Some examples of the use of our system in different contexts are as follows. In a smart city, the system could enhance efficiency and safety by analyzing sound patterns to detect traffic jams and their causes, or determine arising public emergency situations and diagnose their magnitude, among other things. It could also adjust street lighting based on pedestrian sounds to maximize people’s safety. In an ambient assisted living, the system could monitor patients and identify distress signals or changes in vocal patterns that may indicate an emergency. In a home security system, it would allow immediate reactions to situations detected by sound events. Finally, in smart classrooms, the system could adapt soundscapes and adjust background noises to improve concentration based on students' learning styles. This makes the IE less invasive, and more perceptive and adaptable to special needs. Perhaps the primary disadvantage is the required instrumentation for the system to effectively capture sound events in context.

Future work should focus on improving the accuracy of the system by implementing neural networks or more advanced techniques. For example, implementations based on deep learning could automatically extract sound descriptors from events, which would make the tasks of capturing and recognizing sound events much more efficient. To improve the autonomic cycle implementation, parallel function management could be employed. Automation improvements are also needed. Additionally, future works will focus not only on sound events but also on integrating emotion recognition and its relationship with simultaneous sound events.

We would like to thank Daniel Ochoa, Andrea González, Sebastián Gomez and Ángela Ochoa, from the University of EAFIT, Colombia, for providing insight and expertise that assisted this research.

Availability of data and material: The data will be available if requested, with a justification of what its use will be. The implemented codes are found at https://github.com/SantGa/SAS-SE.git

Conflict of interests/competing interests: The authors declare that they have no conflict of interest.

References

1

Santiago

G

,

Aguilar

J

.

Ontological model for the acoustic management in intelligent environments

.

Appl Comput Inform

.

2022

;

ahead-of-print

(

No. ahead-of-print

). doi:

https://doi.org/10.1108/ACI-09-2021-0246

.

Google Scholar

2

Mesaros

A

,

Heittola

T

,

Virtanen

T

,

Plumbley

MD

.

Sound event detection: a tutorial

.

IEEE Signal Process. Mag

.

2021

;

38

(

5

):

67

-

83

. doi:

https://doi.org/10.1109/msp.2021.3090678

.

Google Scholar

Crossref

3

Bilen

Ç

,

Ferroni

G

,

Tuveri

F

,

Azcarreta

J

,

Krstulović

S

.

A framework for the robust evaluation of sound event detection

. In:

IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

;

2020

. p.

61

-

5

.

Google Scholar

Crossref

4

Huang

SJ

,

Liu

CC

,

Chen

CP

.

Sound event detection system based on VGGSKCCT model architecture with knowledge distillation

.

Appl Artif Intelligence

.

2023

;

37

(

1

): 2152948. doi:

https://doi.org/10.1080/08839514.2022.2152948

.

Google Scholar

Crossref

5

Aguilar

J

,

Jerez

M

,

Exposito

E

,

Villemur

T

.

CARMiCLOC: context awareness middleware in cloud computing

. In:

Latin American Computing Conference (CLEI)

;

2015

.

Google Scholar

Crossref

6

Martín-Morató

I

,

MESAROS

A

.

Strong labeling of sound events using crowdsourced weak labels and annotator competence estimation

.

IEEE/ACM transactions on audio, speech, and language processing

.

2023

;

31

:

902

-

14

. doi:

https://doi.org/10.1109/taslp.2022.3233468

.

Google Scholar

Crossref

7.

Huh

J

,

Chalk

J

,

Kazakos

E

,

Damen

D

,

Zisserman

A

.

Epic-sounds: a large-scale dataset of actions that sound

. In:

En ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

.

IEEE

;

2023

. p.

1

-

5

.

Google Scholar

Crossref

8

Czopek

D

,

Gryboś

D

,

Leszczyński

J

,

Wiciak

J

.

Identification of energy wastes through sound analysis in compressed air systems

.

Energy

.

2022

;

239

: 122122. doi:

https://doi.org/10.1016/j.energy.2021.122122

.

Google Scholar

Crossref

9.

Laska

B

,

Valdés

J

,

Xi

P

,

Goubran

R

,

Wallace

B

,

Cohen-McFarlane

M

.

Cough sound analysis using vocal tract models

. In:

En IEEE International Instrumentation and Measurement Technology Conference (I2MTC)

.

IEEE

;

2024

. p.

1

-

6

.

Google Scholar

Crossref

10

Hou

Y

,

Ren

Q

,

Zhang

H

,

Mitchell

A

,

Aletta

F

,

Kang

J

,

Botteldooren

D

.

AI-based soundscape analysis: jointly identifying sound sources and predicting annoyance

.

The J Acoust Soc America

.

2023

;

154

(

5

):

3145

-

57

. doi:

https://doi.org/10.1121/10.0022408

.

Google Scholar

Crossref

11

Sánchez

M

,

Aguilar

J

,

Cordero

J

,

Valdiviezo-Díaz

P

,

Barba-Guamán

L

,

Chamba-Eras

L

. Cloud computing in smart educational environments: application in learning analytics as service. In:

Rocha

Á

,

Correia

A

,

Adeli

H

,

Reis

L

,

Mendonça Teixeira

M

(Eds).

New advances in information systems and Technologies. Advances in intelligent systems and computing

;

2016

. p.

993

-

1002

.

Google Scholar

12

Jovanovic

D

,

Milovanov

S

,

Ruskovski

I

,

Govedarica

M

,

Sladic

D

,

Radulovic

A

,

Pajic

V

.

Building virtual 3D city model for Smart Cities applications: a case study on campus area of the University of Novi Sad

.

ISPRS Int J Geo-information

.

2020

;

9

(

8

):

476

. doi:

https://doi.org/10.3390/ijgi9080476

.

Google Scholar

Crossref

13

Sánchez

M

,

Exposito

E

,

Aguilar

J

.

Implementing self-* autonomic properties in self-coordinated manufacturing processes for the Industry 4.0 context

.

Comput Industry

.

2020

;

121

: 103247. doi:

https://doi.org/10.1016/j.compind.2020.103247

.

Google Scholar

Crossref

14

Santiago

G

,

Aguilar

J

.

Integration of ReM-AM in smart environments

.

WSEAS Trans Comput

.

2019

;

18

:

97

-

100

.

Google Scholar

15

Carrión-Toro

M

,

Santorum

M

,

Acosta-Vargas

P

,

Aguilar

J

,

Pérez

M

.

iPlus a user-centered methodology for serious games design

.

Appl Sci

.

2020

;

10

(

24

):

9007

. doi:

https://doi.org/10.3390/app10249007

.

Google Scholar

Crossref

16

Godøy

RI

.

Perceiving sound objects in the musique concrète

.

Front Psychol

.

2021

;

12

:

1702

. doi:

https://doi.org/10.3389/fpsyg.2021.672949

.

Google Scholar

Crossref

17

Santiago

G

,

Jiménez

M

,

Aguilar

J

,

Montoya

E

.

Audio feature engineering for occupancy and activity estimation in smart buildings

.

Electronics

.

2021

;

10

(

21

):

2599

. doi:

https://doi.org/10.3390/electronics10212599

.

Google Scholar

Crossref

18

Salazar

C

,

Aguilar

J

,

Monsalve-Pulido

J

,

Montoya

E

.

Affective recommender systems in the educational field. A systematic literature review

.

Comput Sci Rev

.

2021

;

40

: 100377. doi:

https://doi.org/10.1016/j.cosrev.2021.100377

.

Google Scholar

Crossref

19

Yang

H

,

Cai

M

,

Diao

Y

,

Liu

R

,

Liu

L

,

Xiang

Q

.

How does interactive virtual reality enhance learning outcomes via emotional experiences? A structural equation modeling approach

.

Front Psychol

.

2023

;

13

: 1081372. doi:

https://doi.org/10.3389/fpsyg.2022.1081372

.

Google Scholar

Crossref

PubMed

20

Jeon

M

,

Walker

B

,

Yim

J

.

Effects of specific emotions on subjective judgment, driving performance, and perceived workload

.

Traffic Psychol Behav

.

2014

;

24

:

197

-

209

. doi:

https://doi.org/10.1016/j.trf.2014.04.003

.

Google Scholar

Crossref

21

Kessous

L

,

Castellano

G

,

Caridakis

G

.

Multimodal emotion recognition in speech-based interaction using facial expression, body gesture and acoustic analysis

.

J. Multimodal User Interfaces

.

2010

;

3

(

1-2

):

33

-

48

. doi:

https://doi.org/10.1007/s12193-009-0025-5

.

Google Scholar

Crossref

22

Cordero

J

,

Aguilar

J

,

Aguilar

K

,

Chávez

D

,

Puerto

E

.

Recognition of the driving style in vehicle drivers

.

Sensors

.

2020

;

20

(

9

):

2597

. doi:

https://doi.org/10.3390/s20092597

.

Google Scholar

Crossref

PubMed

23

Huang

C

,

Yu

J

,

Wu

F

,

Wang

Y

,

Chen

N

.

Uncovering emotion sequence patterns in different interaction groups using deep learning and sequential pattern mining

.

J Comput Assist Learn

.

2024

;

40

(

4

):

1777

-

90

. doi:

https://doi.org/10.1111/jcal.12977

.

Google Scholar

Crossref

24

Aguilar

J

,

Salazar

C

,

Velasco

H

,

Monsalve-Pulido

J

,

Montoya

E

.

Comparison and evaluation of different methods for the feature extraction from educational contents

.

Computation

.

2020

;

8

(

30

):

30

. doi:

https://doi.org/10.3390/computation8020030

.

Google Scholar

Crossref

25

Pinardi

D

,

Farina

A

. Metrics for evaluating the spatial accuracy of microphone arrays. In:

2021 immersive and 3D audio: from architecture to automotive (I3DA)

;

2021

.

Google Scholar

26

Hao

J

,

Ho

TK

.

Machine learning made easy: a review of scikit-learn package in python programming language

.

J Educ Behav Stat

.

2019

;

44

(

3

):

348

-

61

. doi:

https://doi.org/10.3102/1076998619832248

.

Google Scholar

Crossref

27

Vidal-Silva

CL

,

Sánchez-Ortiz

A

,

Serrano

J

,

Rubio

JM

.

Experiencia académica en desarrollo rápido de sistemas de información web con Python y Django

.

Formación universitaria

.

2021

;

14

(

5

):

85

-

94

. doi:

https://doi.org/10.4067/s0718-50062021000500085

.

Google Scholar

Crossref

28

Ciaburro

G

,

Iannace

G

. Improving smart cities safety using sound events detection based on deep neural network algorithms.

Informatics

.

2020

;

7

(

3

). p.

23

. doi:

https://doi.org/10.3390/informatics7030023

.

Google Scholar

Crossref

29

Espinosa

R

,

Ponce

H

,

Gutiérrez

S

.

Click-event sound detection in automotive industry using machine/deep learning

.

Appl Soft Comput

.

2021

;

108

: 107465. doi:

https://doi.org/10.1016/j.asoc.2021.107465

.

Google Scholar

Crossref

30.

Huang

R

,

Li

M

,

Yang

D

,

Shi

J

,

Chang

X

,

Ye

Z

,

Wu

Y

,

Hong

Z

,

Huang

J

,

Liu

J

,

Ren

Y

,

Zhao

Z

,

Watanabe

S

.

Audiogpt: understanding and generating speech, music, sound, and talking head

.

arXiv preprint arXiv:2304.12995

.

2023

.

Google Scholar

2025

Gabriela Santiago and Jose Aguilar

Published in Applied Computing and Informatics. Published by Emerald Publishing Limited. This article is published under the Creative Commons Attribution (CC BY 4.0) licence. Anyone may reproduce, distribute, translate and create derivative works of this article (for both commercial and non-commercial purposes), subject to full attribution to the original publication and authors. The full terms of this licence may be seen at http://creativecommons.org/licences/by/4.0/legalcode

Specification of a smart-analysis system of sound events for smart environments

1. Introduction

2. State of the art

3. Theoretical framework

3.1 ReM-AM

3.2 Autonomic cycle of data analysis tasks

4. Specification of our SAS-SE

4.1 Hierarchical pattern

4.2 Autonomic cycle of the SAS-SE based on ISA architecture

5. Experimentation with our SAS-SE

5.1 Case study

5.2 Implementation and verification of tasks

5.3 Analysis of the autonomic cycle general behavior

6. Comparison with similar works

7. Conclusions

References

Email Alerts

Cited By

Input	Sound event
Procedure	1. Identification1.1. Extraction from audio information2. Data preparation2.1. Detection of descriptors
Output	Data collected from sound events

Input	Data collected from sound events
Procedure	1. Interpretation of the sound events2. Classification of the sound events using the descriptors
Output	Recognized descriptors information

Specification of a smart-analysis system of sound events for smart environments

1. Introduction

2. State of the art

3. Theoretical framework

3.1 ReM-AM

3.2 Autonomic cycle of data analysis tasks

4. Specification of our SAS-SE

4.1 Hierarchical pattern

4.2 Autonomic cycle of the SAS-SE based on ISA architecture

5. Experimentation with our SAS-SE

5.1 Case study

5.2 Implementation and verification of tasks

5.3 Analysis of the autonomic cycle general behavior

6. Comparison with similar works

7. Conclusions

References

Email Alerts

Suggested Reading

Recommended for you

Cited By

Sharing Unavailable