“The battle of series and characters” Presentation

The Chaire LIAvignon is organizing the AI challenge, entitled “The battle of series and characters”, dedicated to the students of Avignon University. The aim is to recognize which character of a series has pronounced a short line, based on the spelling transcription of the line and/or from the audio recording of the line. Recognizing the series and not just the character is also part of the challenge.



This challenge has two aims:

1. Implementing open source AI solutions, in the sense of machine learning, in the field of automatic language processing, with two approaches, NLP and audio.2. To illustrate the spirit of the fundamental values of the LIAvignon Chair, which are transparency, ethics and reliability.


How to participate (Challenge completed):

This challenge is open to all students of Avignon University. Participants can be individuals or teams of up to four students.

The declaration of intention of a participant is made by an e-mail to contact@liavignon.frThe organisers will register the participant, after checking the information given, with the name, surname and student number of the participant(s) in the team, as well as the name of the team.

Participation is free and voluntary: participants can withdraw at any time.

Participants agree to respect the rules of the challenge and, in particular, not to disseminate the data distributed to them. These data are protected by copyright and only a few short extracts of less than 3s can be distributed for illustration purposes. Participants also commit themselves to destroy these data after the end of the challenge.

The organisers undertake to make the challenge visible on a permanent basis to the industrial and institutional partners of the LIAvignon Chair as well as on a wider scale. A certificate of participation will be given to each participant having completed the challenge.


Participation consists of three phases:

1. A collaborative and competitive system development phase.

The aim is to develop the systems using only the corpus provided and to test them on the development set, the results of the tests being shared by all the participants. A follow-up/support will be carried out by the LIA permanent staff throughout this phase.

2. A very short test phase.


It consists of applying the systems in blind mode on test sets provided by the organisation. Each system should only be run once per test set (optimization of systems during this phase is not allowed).

3. An analysis, documentation and presentation phase.

An analysis of your solution will be requested, in terms of the explicability of your system’s decisions and the discovery of possible biases that affect these decisions. This analysis report is an important result that must be returned to the challenge organizers in time. Finally, a presentation of your results, essentially based on this analysis, will be requested at the results meeting.


Description of tasks:

The challenge consists of two main tasks, character detection and series detection.

1. Character detection:

Task 1 is to detect which character has spoken a given audio clip. 

This audio clip, Ei, has the following characteristics:

  • it is monolingual (only one speaker is present on the recording)
  • it has a duration varying between 0.5 and 10 seconds of speech
  • the presence of noise and/or music is possible
  • it is accompanied by its spelling transcription

Two sub-tasks are defined:

1.1 Binary detection:

A test is presented in the form (Ei, X, Y). Ei was spoken by either character X or character Y. The expected response is of the form (ID_segment, decision, score), score being the numerical output of the system in the form where the larger the score, the greater the confidence in the decision, decision is the character, X or Y.

1.2 Identification among an open set N:

A test is presented in the form (Ei, P1, P2,…,PN). P1 to PN are the identities of the N characters who potentially pronounced Ei (N = 6; respectively [cersei_lannister, daenerys_targaryen, jesse_pinkman, skyler_white, tyrion_lannister, walter_white]).

Note: Ei may have been spoken by a character other than the Ns submitted. The expected answer is of the form (Ei, ID, S0, S1,…SN) where ID is the character who pronounced the extract, either taken among P1 to PN, or “Null” to say that it is a character other than those proposed who pronounced the extract. S1 to Sn are the scores corresponding to the characters P1 to PN, S0 is the score attached to the “Null” hypothesis.

2. Identification of the series

The aim is to determine to which of the N proposed series the extract Ei belongs. A test is presented in the form (Ei, F1, F2,…,FN). F1 to FN are the identities of the N series to which the character who pronounced Ei potentially belongs (N = 2; respectively [breaking_bad, game_of_throne]).

Note: Ei could be pronounced by a character who does not belong to the N proposed series. The expected answer is of the form (Ei, ID, S0, S1,…SN) where ID is the series among F1, FN or “Null” if the character does not belong to the proposed series. S1 to Sn are the scores corresponding to the characters F1 to FN, S0 is the score attached to the “Null” hypothesis.

Constraints:

  • For a given test, data from other tests in the current assessment should not be used (apart from training data, only data from the current test is allowed).
  • In the case of task 1b for a given test (Ei, P1, P2,…,PN), only knowledge of the characters P1 to PN is allowed (no data from other characters can be used). Similarly, only the series F1 to FN are allowed for task 2. 

Monitoring and participation:

The follow-up of the participants is carried out through several face-to-face meetings. The aim of these meetings is to carry out technical follow-up of the participants, as well as to provide answers to the participants’ questions. There are 3 of them and they are scheduled on Friday 15 october 2021, 22 october and 5 november. The hours and venue will be defined at a later stage.

To facilitate student monitoring and participation, an e-UAPV course is available: “Challenge IA” . The registration key for the course will be provided by reply to the registration email of the participating teams.
It allows the downloading of the resources made available, as well as the submission of entries. It will also be used to disseminate information throughout the challenge. It is important for participants to register for this course.

The participation will be taken into account by a blind test set which will be available for the whole of the system development phase 1. The evaluation of the performance of the systems on this test set will be carried out by the organisers by submitting a prediction file in the form indicated in the task description. There is no limit to the number of prediction files sent on this blind test. The results on this blind test set are fed into a leaderboard. This is a different test set from the one used in the very short phase 2 test.

The analysis phase is also part of the participation and takes an important part in the final scoring of the entries. It will start after the short test phase and an analysis report must be returned to the organisers no later than 3 days before the date of presentation to the jury, scheduled for Friday 19 November 2021.

All submissions must be made on the e-upav page dedicated to the AI challenge.

Available resources

All the resources provided are available on the e-upav page dedicated to the IA challenge.

Initially, two data sets are provided, a training set with a duration of 150 minutes and a development set with a duration of 30 minutes. They are composed of dialogues from the series “Breaking Bad” and “Game of Throne”, for the following characters: Cersei Lannister, Daenerys Targaryen, Jesse Pinkman, Skyler White, Tyrion Lannister and Walter White.

The data consists of the audio segment, as well as the textual transcription of the spoken dialogue. These segments have an average length of about 3 seconds. The development set contains about 630 segments, while the training set contains about 3200.

This data is in JSON format and an example from the development set is provided below:

{
"bb_23509": {
        "audio_path": "data/bb/jesse_pinkman/23509.wav",
        "cat_id": 4,
        "doc_id": "23509",
        "end": 941.71,
        "part_id": 6,
        "spk_label": "jesse_pinkman",
        "start": 935.06,
        "video_id": 5,
        "video_paths": [
            "data/bb/jesse_pinkman/23509"
        ],
        "video_start": 935.04,
        "words": "So if I 'm out here in a guard - type capacity to watch over the money , that means I need , like , a gun , right ?"
    },
    "got_00613": {
        "audio_path": "data/got/tyrion_lannister/00613.wav",
        "cat_id": 1,
        "doc_id": "00613",
        "end": 351.419,
        "part_id": 4,
        "spk_label": "tyrion_lannister",
        "start": 350.219,
        "video_id": 2,
        "video_paths": [
            "data/got/tyrion_lannister/00613"
        ],
        "video_start": 350.2,
        "words": "Do you understand ?"
    }
}

This data will be made available through a download link provided by email to participants. The use of these data is strictly reserved to the AI challenge of the LIAvignon chair. They cannot be shared with third parties, nor kept after the challenge.

In addition to the data, two baseline systems are available to all participants:

  • A textual baseline using a “Random Forest” classifier and “TF-IDF” vectors.
  • An audio baseline using a “resnet” neural network, and a “softmax” classification.

Each of the baselines is provided with a pre-trained model that can be immediately re-used. The performance, over the development set, is provided in the table below:

Baseline

Accuracy

Text

40,90 %

Speech

91,64 %

Evaluation and ranking:

The ranking will be done in three categories, according to the level of study of the participants (Bachelor, Master, Doctorate). If the team is composed of several people, the highest level will be retained.

The evaluation consists of three parts:

1.

Monitoring (25%), based on the number of items on the leaderboard and the evolution of the proposed solution.

2.

Performance (25%), estimated from the evaluation set distributed in “blind mode” and for a short duration.

3.

The quality of the documentation provided and the analysis of the results in an “explainability” logic (50%).

Jury

The evaluation is steered by a jury composed of:

  • Vincent Labatut (President, AU)
  • Corinne Fredouille (AU)
  • Marie-Jean Meurs (Humania/UQAM)
  • Xavier Bost (Orkis)
  • Olivier Galibert (LNE)

It is supported by a committee of experts composed of:

  • Antoine Caubrière (LIAvignon)
  • Yannick Estève (AU)
  • Jean-François Bonastre (AU)
  • Tania Jimenez (AU)
  • Orange AI
  • Airbus D&S
  • Bertin IT
  • Validsoft
  • CERCO