Tutorials

Tutorial 1

Speaker: Prof. Keisuke Fujii (Graduate School of Informatics, Nagoya University / CyberAgent)

Title: Frontiers in Sports AI Research

Abstract:

In recent years, advancements in automated sports data measurement technologies based on computer vision, and the machine learning methods built upon them, have raised expectations for quantitative understanding of play and support for decision-making. However, many unresolved issues remain at each stage from measurement, recognition, classification, prediction, evaluation, and proposal, stemming from data acquisition rights restrictions, measurement difficulties, the complexity of group behavior, and the context-dependent nature of judgments. This presentation will primarily focus on team sports and introduce initiatives related to automated data acquisition using computer vision, play evaluation using counterfactual prediction based on machine learning, and evaluation and action proposals for the entire match and all players using reinforcement learning. Finally, as part of efforts to make these technologies widely available, we will introduce various data publications, the open-source analysis platform OpenSTARLab, and competitions that have been held, and discuss future prospects.

Tutorial 2

Speaker: Prof. Naoya Chiba (The University of Osaka)

Title: Trends in 3D Data Processing

Abstract:

3D data is one of the important research topics in computer vision, and recently, not only point clouds, meshes, and depth images but also new representations such as NeRF and 3DGS have become common. With the development of deep learning-related technologies, the emergence of foundation models, the increase in computational resources and datasets, and the evolution of generative models, in addition to traditional tasks such as classification, segmentation, and monocular depth estimation, new tasks such as feedforward reconstruction and scene generation/editing are also attracting attention. In this tutorial, I will overview research on processing 3D data with deep learning and introduce recent trends from both data representation and task perspectives.

Tutorial 3

Speaker: Prof. Itsumi Saito (Graduate School of Information Sciences, Tohoku University)

Title: Vision-Language Models and Document Image Generation

Abstract:

With the rapid advancement of large language models, large-scale models that jointly process visual information, such as images and videos, together with language have also progressed significantly. These models combine language models with visual encoders to enable unified understanding of both visual and textual modalities. Recent approaches have gone beyond visual understanding to enable the generation of visually well-structured document images, such as web pages, diagrams, and charts, by producing structured text in the form of renderable intermediate representations, including HTML, SVG, and Python code. In this tutorial, we provide an overview of vision-language models, from fundamental concepts to recent advances, and systematically examine techniques and applications for document image generation based on these models. We also discuss future directions and open challenges in this emerging area.