Skip to Main Content
Purpose

This paper uses computer vision to develop a method for collecting football players’ positions during a match. It also compares the differences between various solutions of the existing object detection models to determine discerning differences. Then, an illustration of the obtained information is created as a heatmap.

Design/methodology/approach

The proposed method utilizes a dataset of side-view football broadcast footage to detect the players’ positions on the football at any given moment of the football match. Using YOLO object detection, the players of each team are identified and recorded to create valuable illustrations that the coaches can further analyze, training staff, etc. Using three different implementations of YOLO (YOLOv5m6, YOLOv5l-tph and YOLOv5l-tph-plus) the results will be compared to find out the best implementation for this specific task.

Findings

After each YOLO implementation was trained using the same dataset, the results showed that YOLOv5l-tph performed best, achieving a precision of 0.9868. Meanwhile, YOLOv5l-tph-plus placed second, with a precision of 0.9786. Finally, YOLOv5m6 performed worst, with a precision of 0.8214.

Originality/value

In sports, various analytical information can be extracted from games, whether from statistical records or recorded footage. For “sports analytics,” the goal is to provide valuable insights that are otherwise not obtainable through traditional means, such as merely rewatching past footage. In both the past and the present, there have been multiple attempts at using statistical records to predict or quantify various aspects of the games.

Football players must constantly adapt their positioning while maintaining their primary role. Defenders, for example, are primarily confined to one side of the field but must strategically move to support offensive play or prevent opponents’ advances. Poor positioning can create defensive vulnerabilities, exposing the team to potential attacks. After matches, players often review game footage to analyze their spatial awareness and identify moments where they were out of optimal position. This ongoing challenge requires players to balance tactical responsibilities with quick, situational decision-making to maintain team effectiveness.

Machine learning in sports involves using algorithms and statistical models to analyze vast amounts of data to enhance performance, strategy and decision-making. Applications include player performance analysis, injury prediction and prevention, game strategy optimization and fan engagement through personalized experiences. By identifying patterns and making data-driven predictions, machine learning helps teams gain competitive advantages and improve overall efficiency in sports operations. Due to the competitive advantage, it provides many studies that have been conducted to potentially give any team a competitive edge [1]. For this paper, the multi-object detection models will be used to optimize strategy.

Using footage from previously played matches, we can apply computer vision techniques to detect objects, track motions and assign the motion to specific objects. After obtaining the footage, it must first be labeled to explicitly locate each player on each team. This process is often done by hand. Afterward, the labeled footage can be loaded into a pre-trained machine learning model, which will be able to learn based on the footage that has been provided. The model can detect multiple classes within a single frame of footage. After object detection, an algorithm is used to track the motion of each specific detected object. Then it will be assigned to specific identifications (IDs) to preserve the information even if the players have moved off the frame.

1.2.1 You only look once (YOLO)

First proposed in 2016 [2], YOLO is designed with a unified architecture that is much simpler compared to other existing methods at the time, such as R-CNN’s complex pipeline of region proposal method that requires each component, generating bounding boxes, running classifiers on each box, and then post-processing, to be individually trained. To merge all these components into one single problem that can be trained in a single step, YOLO reframes object detection as a single regression problem, from image pixels to bounding boxes and class probabilities, which enables end-to-end training and real-time speed while maintaining high average precision.

Compared to other techniques at the time, YOLO has three major advantages for object detection and classification.

  • (1)

    Improved speed: YOLO frames its detection problem as a single unified regression problem, eliminating the need for a complex pipeline that requires training for each proposed component. With improved speed, the base model can be run in real-time.

  • (2)

    Global context: Compared to techniques like Deformable Parts Models (DPM), which uses a sliding window approach where a specialized classifier is run across an evenly spaced location over the entire image, YOLO sees the entire image during testing and training and then encodes both the context and appearance.

  • (3)

    Generalization: Since YOLO learns each image on a more generalized basis, it outperforms other existing methods when trained on natural images and artworks.

Since its introduction, YOLO has gone through multiple iterations, each improving the model in one way or another. Here are a few iterations and their improvements over the past years.

  • (1)

    YOLOv2 enhanced the original YOLO model by increasing input resolution from 224 × 224 to 448 × 448 and improving feature granularity, resulting in a 1% performance boost across its key features.

  • (2)

    YOLOv5’s main highlight of this iteration is the integration of the focus layer; the integration reduces the number of layers and parameters, which in turn increases the training speed further.

  • (3)

    YOLOv8 introduced further improvements, including a newly introduced extensibility framework, a new backbone network, an anchor-free detection head and a new loss function. It promises to be highly efficient compared to previous versions while improving performance.

  • (4)

    YOLOv10: Introduces significant advancements over YOLOv8 and YOLOv5, including NMS-free training for improved efficiency and reduced inference times and spatial-channel decoupled downsampling for better feature extraction. It also employs large-kernel convolutions to enhance accuracy, particularly for smaller objects, while minimizing computational overhead. YOLOv10 achieves a balance of high performance and low resource demands, making it ideal for real-time applications in diverse environments.

1.2.2 General use of machine learning in sports

For “sports analytics,” the goal is to provide valuable insights that are otherwise not obtainable through traditional means, such as merely rewatching past footage. There have been multiple attempts to use statistical records to predict or quantify different aspects of the games. However, by solely using statistical analysis, the field of study is limited to retrospectively evaluating a player’s performance and making predictions based on the numerical averages.

One of the first applications of machine learning in sports stems from the use case of injury diagnosis prediction using machine learning methodologies. An early example of this is Machine Learning Applied to Diagnosis of Sport Injuries [3]. The machine learning techniques used in this work are decision tree learners and variants of Bayesian classifiers, using various attributes of the injuries in handball the models are made to predict the diagnosis of the injuries. The results from the work show a moderate accuracy of 61.7% using decision tree models, while the Naive Bayes model shows a stronger result at 69.4%. While “A compound framework for sports results prediction: A football case study [4]” was published in 2008, it aims to create a framework that uses a Bayesian interface, rule-based reasoning and an in-game time-series approach to predict the results of any sport. Unlike other frameworks at the time, the proposed framework is novel in multiple ways. First, it uses both a rule-based reasoner and a Bayesian network to handle both the controllable and stochastic nature of sports. Second, the model also considers many data points and features such as the current scores, fatigue, morale, skills, etc. The last two novelties are the time series to handle the continuous flow of the sport and, finally, a Monte Carlo approach in evaluating the overall result of the game due to the stochastic nature of the game. Using the proposed frameworks, the authors were able to out-predict the previously proposed methods when applying the framework to the 2002 World Cup.

It is also possible to use machine learning to assist in collecting the data, which will be utilized in many ways. Historically, data collection has been done manually via a group of volunteers recording each measure manually and storing it in various databases. With the introduction of more sophisticated sensors, data collection has been automated to a certain extent, but a downside of this approach of automation is the high cost of entry and the intrusive nature of using sensors for data collection. For example, to accurately track the position of each player in a football match, the players must wear a GPS vest that can cost up to hundreds of US dollars per player, and the vest must be worn for the entirety of the match. However, these GPS trackers can be used as the most accurate data source for player positioning. From survey research, methods to collect and interpret external training load using microtechnology incorporating GPS in professional football: a systematic review [5] was published.

By leveraging computer vision technology, especially applying deep learning methodologies, one can reliably extract a player’s position from video footage. In the research done in Ref. [6], it was described that the use of computer vision in various sports had to be to extract information such as player positioning, ball trajectory, jersey number recognition and action recognition. Most of the work done in this survey heavily leverages deep learning techniques, but to illustrate a typical computer vision use case in sports, a typical framework can be established:

  • (1)

    After obtaining the footage, the playing field/boundary must be extracted and excluded from consideration since often it does not provide any value to the analysis,

  • (2)

    Regardless of whether the objective of the use case is either detection, tracking or both, detections must be made to be able to track or record the information regarding the object of interest,

  • (3)

    Lastly, after the object is detected/tracked the information is analyzed with the specific objective, and the result can be presented.

1.2.3 Commercial use of computer vision in sports

After years of research work being published, we are now able to see various uses of machine learning in sports as a commercial product. The following is a list of examples of machine learning (especially computer vision) in a commercial setting:

  • (1)

    Veo is a camera that is specifically designed to record and automatically analyze footage of sports ranging from football, rugby and handball. The camera can capture field-wide footage using only 1 unit, and the analysis of the footage is done via “AI” technology; the company does not disclose the details of their “AI” technology as to which machine learning methodology they used to develop their product. The analysis includes multiple aspects of sports, such as event detection, shot mapping and statistical recording, pass strings and player position heatmaps.

  • (2)

    HomeCourt: Basketball Training, an application available on mobile devices for free, is an application that leverages computer vision technology to create interactive activities for the users to train their fundamental basketball skills, starting from basic footwork and hand movements up to their shooting forms. During and after these activities, the users can review the footage with suggestions made by the application based on its analysis of the video footage.

The combination of YOLOv3 and the Simple Online Real-Time (SORT) algorithm had been implemented before [7] with an extremely positive result; the tracking accuracy reported by the authors is 93.7% on multiple object tracking accuracy metrics with a detection speed of 23.7 FPS (frames per second) and a tracking speed of 11.3 FPS. The methodology also handles partial occlusions and player and ball reappearing well but struggles with severe occlusion situations. Another closely related work [8] utilizes YOLOv4 instead of YOLOv3 in combination with SORT to achieve the goal of player and referee tracking with color recognition; the reported result is 96% tracking accuracy and 60% on the Multiple Object Tracking Accuracy (MOTA) and Generalized Multiple Object Tracking Accuracy (GMOTA) metrics while being able to run on 23 FPS detection speed.

More related works have been proposed throughout the years. By using multi-camera, multi-player tracking with deep player identification in sports video (MCMPTI) [9]. The authors have proposed a method called the DeepPlayer model to track and identify players using a multi-camera setup. DeepPlayer model employs an extended Cascade Mask RCNN for robust player detection and segmentation. It consists of two stages: Cascade Mask RCNN-P handles coarse-grained player detection, classifying players by team and generating instance masks, while Cascade Mask RCNN-J focuses on fine-grained jersey number localization and recognition within the detected player bounding boxes. By sharing CNN feature maps and utilizing specialized region proposal networks for both players and jersey numbers, the model ensures accurate identification and segmentation, providing essential features for subsequent player ID inference. Additionally, beyond the player detection and tracking tasks, there are other published works on the topic of object detection and tracking using various methods that may be applied to the player detection tasks [10, 11].

While traditional object detection approaches often require substantial labeled data, few-shot learning (FSL) has emerged as a viable method for scenarios with limited labeled examples. Part-Aware Correlation Networks (PACNet), for instance, apply partial representation and a semantic covariance matrix to detect objects with high contextual relevance by focusing on distinctive object parts and label correlations, thus enabling effective FSL in complex scenes [12]​. This is highly relevant for football player detection, as focusing on partial features, such as player orientation and limb positions, can enhance the model’s ability to discern individual players within a crowded scene.

Another relevant approach is the Differential Feature Awareness Network (DFANet), which uses antagonistic learning to separate and fuse features from different modalities. DFANet has demonstrated effectiveness in scenarios like infrared-visible object detection, where it utilizes complementary features across modalities to improve detection robustness under challenging conditions. Techniques from DFANet, particularly its attention-based differential feature fusion, may inform improvements in YOLOv5’s architecture for capturing unique features of each player even amidst occlusions​ [13].

Image enhancement techniques like U2D2Net, which integrates dehazing and denoising, are also useful for handling environmental variables in wide-angle footage, such as weather effects or background noise. U2D2Net leverages a unified framework to improve visibility and detail retention, which is essential in sports settings where high clarity and detail are crucial for accurate player detection and positioning​ [14].

In total, combining these approaches (including YOLOv5’s efficiency, PACNet’s part-based feature extraction, DFANet’s differential fusion strategies and U2D2Net’s image enhancement) could create a robust model for detecting and tracking football players in wide-angle footage, overcoming challenges like occlusions, variable lighting and complex backgrounds. Additionally, there is a gap in using the most up-to-date version of the YOLO model (YOLOv8 and YOLOv10 as of November 2024), and at the same time, the detection speed along with the occlusion problem still has room for improvement. The overview framework of the proposed solution is shown in Figure 1.

Figure 1

High-level overview of the proposed pipeline to detect players and create the positional heatmap

Figure 1

High-level overview of the proposed pipeline to detect players and create the positional heatmap

Close modal

There are plenty of datasets made for machine learning in sports, such as ImageNet [15], but not all sports datasets can be shared for all research projects. For a dataset to be valid, it must contain the following features: sample images that are relevant to the subject matter, so for our purposes, images of football matches with players present in each image, the label of each image must also be provided as a ground truth of each image. However, depending on the model that will be consuming the dataset, the labels must be represented accordingly.

For our objectives, the SoccerTrack [16] dataset developed by the members of the University of Tsukuba, Japan, will be used to test, train and validate the models. SoccerTrack is developed using both a fish-eyed view of the football pitch and the top-down view using drone footage. Since the dataset is also developed for the SORT algorithm with YOLO as the object detection model, it is convenient for us to directly apply the dataset for our own purposes. The dataset has been semi-automatically annotated with the help of GPS devices attached to the players on the football pitch. The following table (Table 1) displays the specification of the dataset.

Table 1

The specification for the device used to capture the video for SoccerTrack

DeviceZ CAM E2 F8
Resolution8K (7680 × 4320 pixels)
FPS30
Number of classes3 (Team 1, Team 2, Ball)
Attributes provided
  • Bounding box height

  • Bounding box width

  • Bounding box position

  • Frame number

Number of videos66 videos

Source(s): Authors’ own creation/work

2.1.1 Pre-processing

Since the video footage from the SoccerTrack dataset is captured using a fish-eyed view of the entire football field, the first step of the pre-processing path is to find the key points of each video frame. For this, the authors of the dataset provided the coordinates of the key point in a JSON file that can be downloaded alongside the rest of the image set and its annotations. Once the key points are defined, the rest of the footage is cut out in order to train the model on footage that only consists of the football pitch.

Ever since YOLO exploded in popularity, there have been many implementations of YOLO created to specialize in specific tasks. These tasks include, but are not limited to, detecting small objects, detecting objects in a certain color spectrum and detecting objects of certain shapes.

Each implementation of YOLO can be simple adjustments of the hyperparameters or a redesign of the YOLO architecture. For instance, efficient-lightweight YOLO (el-YOLO [17]) is designed to detect objects in densely populated images captured from drones. By modifying the original YOLOv5 architecture with the addition of high-resolution low-level feature maps, el-YOLO manages to achieve higher precision metrics compared to the original YOLOv5.

However, for this experiment, the YOLO implementations selected alongside the unadjusted YOLOv5 model are the YOLOv5-tph [18] and YOLOv5-tph-plus [19], in YOLOv5-tph, a prediction head is added to detect different scale objects, and then the original prediction head of the YOLOv5 is replaced with transformer prediction heads (TPH) to explore the prediction potential with a self-attention mechanism. YOLOv5-tph-plus replaces the additional prediction heads in YOLOv5-tph with a cross-layer asymmetric transformer (CA-Trans), which significantly reduces the computational cost compared to YOLOv5-tph while improving detection speed. Since football players recorded from a sideline far away are comparatively small, these models are selected specifically for their specialties in small object detection. The main architectures and key differences are shown in Figure 2 and Table 2.

Figure 2

Methodology: In this methodology, the proposed method is to implement the original YOLOv5 and compare it against two modified versions (YOLOv5-tph [18] and YOLOv5-tph++ [19])

Figure 2

Methodology: In this methodology, the proposed method is to implement the original YOLOv5 and compare it against two modified versions (YOLOv5-tph [18] and YOLOv5-tph++ [19])

Close modal
Table 2

A brief description of each YOLO model involved

No.ModelDescription
1YOLOv5The original YOLOv5 model was released on January 6th, 2020
2YOLOv5-tph [18]Introduces a transformer prediction head (TPH), leveraging transformers’ capability to capture long-range dependencies, which enhances object detection in complex and cluttered backgrounds typical of drone imagery
3YOLOv5-tph-plus [19]Introduces a cross-layer asymmetric transformer to better capture the hierarchical features and improve detection performance, particularly in complex scenarios like those captured by drones

Source(s): Authors’ own creation/work

After passing the dataset through the listed YOLOv5 models, we will obtain the coordinates of the detected players and the teams they play on. The teams will be classified into the home team and the away team; the two teams can be distinguished by the color of their jerseys. Since the SoccerTrack dataset is created from the sidelines, the coordinates obtained are those viewed from the side; hence, the next objective is to translate the obtained coordinates into a top-down view. To achieve this, homographic transformation will be utilized.

Homographic transformation, also known as a projective transformation, is a fundamental concept in the field of geometry and computer vision. It refers to a mapping between two planes, typically in a projective space, which preserves the straightness of lines but not necessarily their parallelism or distances. Mathematically, it can be represented by a 3 × 3 matrix acting on homogeneous coordinates. This transformation is widely used in image processing for tasks such as image rectification, perspective correction and stitching, where it enables the alignment of images taken from different viewpoints by accounting for their geometric distortions. The ability to model the relationship between different planar views makes homographic transformations essential for applications requiring accurate spatial representation and transformation.

With homographic transformation, we can determine a homographic matrix to pass the coordinates from the model alongside the key points of the footage and obtain a new set of coordinates in a top-down view. The last step in creating the heatmap is to average out the position of each team on the field.

Once all necessary information is obtained, each individual player’s position information will be mapped onto a top-down view of the football field. The player’s historical positional information will be individually mapped out in the form of a heat map as shown in Figure 3. With each heatmap, we will be able to learn which area of the football field is most frequented by that player.

Figure 3

Heatmap creation pipeline

Figure 3

Heatmap creation pipeline

Close modal

Although YOLO models have been pre-trained to detect and classify objects, additional training is still required in order for them to effectively distinguish between players on different teams, detect the goalkeepers, and learn to exclude the referee. However, after training the model with the SoccerTrack dataset, it is discovered that the model struggles to include the goalkeepers with the team they are supposed to be associated with. But from the perspective of obtaining valuable sports insights, the goalkeepers do not provide value to positional heatmaps, and often they are intentionally excluded. So, the failure to include the goalkeepers does not impede the value of our model. With the machine specifications, the priority is to train the model to the highest possible image size with a batch size that the machine can handle. The machine used to run the study has the CPU AMD Ryzen 9 3900X 12 core processor, 64 GB of RAM, NVDIA RTX 2080. And the model training is conducted with a batch size of 4 with 34 epochs, image size of 1,536 and 75 images per epoch. More details and access to the source code used in this research study can be found in the GitHub repository at: https://github.com/atiteptan/PlayerDetection

From the results obtained, it is evident that the YOLOv5 model that performs the worst is YOLOv5m6, which is the base model without any customization to fit our use case. Sample results are shown in Figure 4. The difference in performance between the other 2 models YOLOv5-tph and YOLOv5-tph-plus, is marginal, as shown in Table 3. Using each model’s result, the following heatmaps are produced.

Figure 4

Resulting image and bounding boxes obtained from training YOLOv5

Figure 4

Resulting image and bounding boxes obtained from training YOLOv5

Close modal
Table 3

Results from training each YOLOv5 implementations

(a)
ModelBatchEpochsImage sizemAP_0.5PrecisionRecall
YOLOv5l-tph [18]43415360.65390.98680.6267
YOLOv5l-tph-plus [19]43415360.65040.97860.6134
YOLOv5m643415360.38880.65940.5254
(b)
Modeltrain/cls_losstrain/obj_losstrain/box_lossval/cls_lossval/obj_lossval/box_loss
YOLOv5l-tph [18]0.0040230.0035420.12630.003950.0090980.1139
YOLOv5l-tph-plus [19]0.0038590.013530.13210.0035120.03610.1215
YOLOv5m60.024050.056730.12740.024520.14720.1231

Source(s): Authors’ own creation/work

In each resulting heatmap we can observe 3 figures, the first is the average team position of the home team, the second the average position of the away team, and the last being the position of both teams overlaid on top of one another, as shown in Figure 5. From the observed results of all 3 models, although YOLOv5m scored considerably lower in all metrics than the rest, it is still able to generate a similar heatmap. These heatmaps will provide valuable insights with regard to how each team plays. From the dataset we can observe that the home team (blue) plays a much more compact formation whereas the away team (red) plays a spread formation utilizing the width of the pitch.

Figure 5

Resulting heatmap obtained using the different YOLOv5 implementations, YOLOv5-tph, YOLOv5-tph-plus and YOLOv5m6

Figure 5

Resulting heatmap obtained using the different YOLOv5 implementations, YOLOv5-tph, YOLOv5-tph-plus and YOLOv5m6

Close modal

With heatmaps that are presented in Figure 5, the coaches, trainers, managers and players can observe the opposition’s actual game plan and position that is being played. Although the player formations are presented before the match starts, or it can be observed at a specific time, each football team will react differently to the circumstances presented at each match. With this information, each team can make swift adjustments to the plan and maximize their chances of winning the match. In addition, the comparison between the implemented work against another similar work [8] on a different dataset is shown in Table 4.

Table 4

Comparison between the implemented work against another similar work [8] on a different dataset

MethodDatasetClassesPrecision (%)Recall (%)mAP (%)
YOLOv5l-tph [14]SoccerTrackTeam 1, Team 298.6862.6765.39
YOLOv5l-tph-plus [15]SoccerTrack [13]Team 1, Team 297.8661.3465.04
DeepPlayer-track [8]ISSA [18]Player, Soccer Ball, Referee, Background939592.47
SoccerNet [19]Player, Soccer Ball, Referee, Background949591.76

Source(s): Authors’ own creation/work

The study successfully applied YOLO object detection models to football player detection, with YOLOv5l-tph achieving the highest precision (0.9868) and mAP (0.6539), demonstrating its effectiveness in creating heatmaps for tactical analysis. This approach offers a cost-effective, non-intrusive alternative to traditional methods and highlights the potential of machine learning in sports analytics. Future work should address model limitations, such as goalkeeper classification and explore newer detection models for further refinement.

This research project was partially supported by the Faculty of Information and Communication Technology, Mahidol University.

1
Du
M
,
Yuan
X
.
A survey of competitive sports data visualization and visual analysis
.
J Visualization
.
2020
;
24
(
1
):
47
-
67
. doi: .
2
Redmon
J
,
Divvala
S
,
Girshick
R
,
Farhadi
A
.
You only look once: unified, real-time object detection
. In:
IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
,
Las Vegas
;
2016
. p.
779
-
88
. doi: .
3
Zeli
I
,
Kononenko
I
,
Lavra
N
,
Vuga
V
.
Machine learning applied to diagnosis of sport injuries
.
Artif Intelligence Med
.
1997
;
1211
:
138
-
41
. doi: .
4
Min
B
,
Kim
J
,
Choe
C
,
Eom
H
,
McKay
R
.
A compound framework for sports results prediction: a football case study
.
Knowledge-Based Syst
.
2008
;
21
(
7
):
551
-
62
.
200
. doi: .
5
Rago
V
,
Brito
J
,
Figueiredo
P
,
Costa
J
,
Barreira
D
,
Krustrup
P
,
Rebelo
A
.
Methods to collect and interpret external training load using microtechnology incorporating GPS in professional football: a systematic review
.
Res Sports Med
.
2020
;
28
(
3
):
437
-
58
. doi: .
6
Naik
BT
,
Hashmi
MF
,
Bokde
ND
.
A comprehensive review of computer vision in sports: open issues, future trends and research directions
.
Appl Sci
.
2022
;
12
(
9
):
4429
. doi: .
7
Naik
BT
,
Hashmi
MF
.
YOLOv3-SORT: detection and tracking player/ball in soccer sport
.
J Electron Imaging
.
2022
;
32
(
1
). doi: .
8
Naik
BT
,
Hashmi
MF
,
Geem
ZW
,
Bokde
ND
.
DeepPlayer-track: player and referee tracking
.
IEEE Access
.
2022
;
10
:
32494
-
509
. doi: .
9
Zhang
R
,
Wu
L
,
Yang
Y
,
Wu
W
,
Chen
Y
,
Xi
M
.
Multi-camera multi-player tracking with deep player identification in sports video
.
Pattern Recognition
.
2020
;
102
: 107260. doi: .
10
Zhang
R
,
Cao
Z
,
Yang
S
,
Si
L
,
Sun
H
,
Xu
L
,
Sun
F
.
Cognition-driven structural prior for instance-dependent label transition matrix estimation
.
IEEE Trans Neural Networks Learn Syst
.
2024
:
1
-
14
. doi: .
11
Zhang
R
,
Xu
L
,
Yu
Z
,
Shi
Y
,
Mu
C
,
Xu
M
.
Deep-IRTarget: an automatic target detector in infrared imagery using dual-domain feature extraction and allocation
.
IEEE Trans Multimedia
.
2022
;
24
:
1735
-
49
. doi: .
12
Zhang
R
,
Member
I
,
Tan
J
,
Cao
Z
,
Xu
L
,
Liu
Y
,
Si
L
,
Sun
F
,
Fellow
I
.
Part-aware correlation networks for few-shot learning
.
IEEE Trans Multimedia
.
2024
;
26
:
9527
-
38
. doi: .
13
Zhang
R
,
Member
I
,
Li
L
,
Zhang
Q
,
Zhang
J
,
Xu
L
,
Zhang
B
,
Wang
B
,
Member
I
.
Differential feature awareness network within antagonistic learning for infrared-visible object detection
.
IEEE Trans Circuites Syst For Video Technology
.
2024
;
34
(
8
):
6735
-
48
. doi: .
14
Ding
B
,
Zhang
R
,
Member
I
,
Xu
L
,
Liu
G
,
Yang
S
,
Liu
Y
,
Zhang
Q
.
U2D2Net: unsupervised unified image dehazing and denoising network for single hazy image enhancement
.
IEEE Transaction on Multimedia
.
2024
;
26
:
202
-
17
. doi: .
15
Deng
J
,
Dong
W
,
Socher
R
,
Li
L-J
,
Li
K
,
Fei-Fei
L
.
ImageNet: a large-scale hierarchical image database
. In:
IEEE Conference on Computer Vision and Pattern Recognition
,
Miami
;
2009
.
16
Scott
A
,
Onishi
M
,
Kameda
Y
,
Fukui
K
,
Fujii
K
.
SoccerTrack: a dataset and tracking algorithm for soccer with fish-eye and drone videos
. In:
IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)
,
New Orleans
;
2022
. p.
3568
-
78
. doi: .
17
Hu
MZ
,
Li
ZY
,
Yu
J
,
Wan
XQ
,
Tan
HT
,
Lin
ZY
.
Efficient-lightweight YOLO: improving small object detection in YOLO for aerial images
.
Sensors
.
2023
;
23
(
14
):
6423
. doi: .
18
Zhu
X
,
Lyu
S
,
Wang
X
,
Zhao
Q
.
TPH-YOLOv5: improved YOLOv5 based on transformer prediction head for object detection on drone-captured scenarios
. In:
International Conference on Computer Vision Workshops (ICCV Workshops)
,
Montreal
;
2021
. p.
2778
-
88
. doi: .
19
Zhao
Q
,
Liu
B
,
Lyu
S
,
Wang
C
,
Zhang
H
.
TPH-YOLOv5++: boosting object detection on drone-captured scenarios with cross-layer asymmetric transformer
.
Remote Sensing
.
2023
;
15
(
6
):
1687
. doi: .
20
D'Orazio
T
,
Leo
M
,
Mosca
N
,
Spagnolo
P
,
Mazzeo
PL
.
A semi-automatic system for ground truth generation of soccer video sequences
. In:
2009 Sixth IEEE International Conference on Advanced Video and Signal Based Surveillance
;
2009
. p.
559
-
64
.
21
Giancola
S
,
Amine
M
,
Dghaily
T
,
Ghanem
B
.
SoccerNet: a scalable dataset for action spotting in soccer videos
. In:
2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)
,
Salt Lake City
;
2018
. p.
1792
-
179210
. doi: .
Published in Applied Computing and Informatics. Published by Emerald Publishing Limited. This article is published under the Creative Commons Attribution (CC BY 4.0) licence. Anyone may reproduce, distribute, translate and create derivative works of this article (for both commercial and non-commercial purposes), subject to full attribution to the original publication and authors. The full terms of this licence may be seen at http://creativecommons.org/licences/by/4.0/legalcode

or Create an Account

Close Modal
Close Modal