AI for archives browsing

Multimodal semantic analysis for the performing arts heritage

Performing art archives and collections are called to preserve and document a multimodal and multidimensional experience. The space, the sound, the lights, the performers, the objects, the audience, the music, the words are all interconnected parts of a performance, many times, in dialogue with other performances, or artistic works in general. For this, we want to go beyond indexing/cataloguing, and move forward to a paradigm of connectivity and multimodality, where vision, speech and sound are analysed as interrelated aspects of the performing work and are made searchable and accessible. Our focus is the use and reuse of the archive from the point of view of the performing arts practice and knowledge.

Our goal

Generate semantic information exploitable by the user of the archive. We have two main scope of application:

Browsing archives

Allow search based on multimodal semantic information.

3D Virtual Theatre

Allow customizable visualisation of semantic information on screen.

Archived data and in real-time during performances.

Technologies

Multimodal semantic analysis refers to the process of understanding, or making meaning, out of information in textual, sound, and visual modalities. AI technologies are used in this process, in order to automatise the extraction, classification and cross-reference of the information in large scale. This, leverages the efforts needed to perform manual description tasks, while allows to establish different criteria for looking into the archive. Training the algorithms and models on the parameters of the performing arts practice, means that we are creating AI that can look into the archive with the eyes of a creator, a scenographer, a dancer, a scriptwriter. 

Audio analysis

Describe “what” and “when” in the semantic content of audio

Extraction of information from audio recordings, either speech, music or other events and events indexing.

  • Text: named entity recognition, speaker diarization (indexing speakers and speeches), subtitles/script alignment
  • Music: style, tempo and beat / metre, musical instruments, tonality, signing voice, music emotion
  • Audio events: event description, audio roughness

Audio analysis © Inês Barahona e Miguel Fragata 2013, A Caminhada dos Elefantes

Language technologies

Automatically recognise speech and transcribe it into text, for greater accessibility or subtitles generation

Speech-to-text and speaker diarization off-line and real-time cases

  • Automatically off-line transcribe the speech audio to text
  • Automatically align the speech audio with an existing script
  • Automatically real-time transcribe the streaming speech audio to text
  • Automatically identify subtitle from the speech audio of a stream

NLP © Ricardo Pais 1998, Noite de Reis

Video analysis

Reach a three-dimensional understanding of a two-dimensional material

3D scene analysis and 3D pose trajectories estimation

 

  • Object/human detection​
  • Human 2D/3D pose estimation​
  • Light color and position​ estimation
  • Depth/3D layout of the room​
  • Human ID tracking​
  • Human motion ​

Light position estimation © Una Hora Menos 2021, Moria

Sentiment analysis

Analyse sentiment expression in different modalities

  • Body motion analysis and synthesis, based on Laban Movement Analysis system
    • Analysis of motions/sentiments
    • Synthesis of motion (temporal composition, style transfer, long sequences)
    • Motion generation through text
  • Emotion recognition from face expression
  • Text/Speech based human sentiment analysis

Cross-linking

Enable search across audio, video or text data 

Retrieval of content based on semantic relations across modalities

  • Simple text-based query: “A fast-paced scene with happy outcome”
  • Similar scenes: “Find scenes that are similar to theone I’m watching”

FORTHCOMING COMMUNITY CALLS

Are you working in the digital transformation in the performing arts? Join us in the coming months for three online gatherings! These events will be an opportunity to explore our latest research, connect with artists, venues, archivists, and other key stakeholders shaping this space.

Leave your email to receive an alert as soon as the dates are announced.

In practice

The application of these technologies in the performing arts collections is made on the basis of pretrained models, that are retrained and fine-tuned for the specific setting of dance and theatre, following the criteria established by the art institutions members of the project, and professionals that have participated in the user studies.

Semantic data crosslinking

One of the challenges is how all these technologies can perform tasks together, and speak to each other, and for this, the project has developed a “matching/synchronisation” data service, compatible with standard streaming protocols. Many of these technologies have been used by art institutions or media broadcasters to work with their archives (e.g. automatic transcriptions or subtitles, object or face recognition) though their combination is an innovative aspect of Premiere.

Content management system

What the project proposes and develops is a platform that brings all these technologies together, upon a content management infrastructure. At a first moment, the platform generates AI annotations for the videos of the performances, coming from the data of the audio, video and text analysis. Then, human experts can enrich and correct them through a video annotation tool. At the end, the browsing interface offers the data of the work’s record and visualises them in live streaming. A user of the archive can not only watch a performance enriched with the semantic information, but can also search for objects, sounds, musical instruments, motion qualities or make queries of similar scenes.  

The technologies and methodologies developed, consider that the challenges the archives face nowadays are not only technological, but rather include issues like IP management, sustainability of the infrastructures and need of resources and expert skills. For this, we aim to make the platform comprehensible, transparent and sustainable.

The vision driving these efforts is to allow broad access to the performing arts heritage and make the archive a place of knowledge and creativity. Museums, theatres, festivals or private collections host a heritage that allows us to understand how European societies have thought about their identity in different times and geographies, and to continue elaborating on it.

Who participates?

  • Athena Research Center
  • FITEI – Festival Internacional de Teatro de Expressão Ibérica
  • Forum Dança – Associaçao Cultural
  • Medidata.net – Sistemas de Informação Para Autarquias

Relevant resources

Creating a live performance dataset

Creating a live performance dataset

The Université Jean Monnet (UJM) team is working on the creation of a Dance Motion Dataset composed of live performance video recordings. These recordings introduce variations of lightning conditions, pose and occlusion patterns, tackling important challenges in human...