Spring CLEF 2026 Competition
- Sign up here: CLEF 2026 Competition Sign up Form
Are you interested in doing research but need help knowing where to start? Are you interested in sinking your teeth into applicable, real-world datasets to solve challenging problems? Join the Data Science at Georgia Tech (DS@GT) club as we tackle information retrieval and machine learning competitions at CLEF 2026 in the Spring for our fifth year! The DS@GT competition team has won $10,000 worth of prizes across four working note competitions, and have over 30 accepted working note papers into workshop proceedings. For CLEF 2025 alone, we had 44 published authors! By joining us, you’ll gain valuable experience, build a network of like-minded individuals, and have the opportunity to participate in competitive research. You can read more about our achievements on our Impact page.
The CLEF 2026 competition begins in March and ends in May, with working note papers submitted by the end of May. To participate with the DS@GT-ARC CLEF 2026 team, you should meet at least one of the following criterias:
- Returning members of DS@GT ARC who have published working note(s) with us
- Participated in the 2025 Fall Interest Group
- Experienced students (please see the self-assessment section below)
For those who are looking to earn academic credit for their research, we will also be offering an optional CS8903 Special Problems section for the first time. All participants of the course must submit a research proposal for the CLEF task they are interested in. Seats in the course will be limited, and completing the sign-up form does not guarantee enrollment. If you’re an Alumnus and would like to join our research group in the spring, you will be required to register for CS 8903 (or any other course) for PACE access. CS8903 may be taken as a 3 credits course or a 1 credit course.
We hope you find this exciting and complete the sign up form to begin team formation! Once you have completed the sign up form, a member of our team will reach out with next steps. Lab leads will be assigned first, and each lab lead will reach out to potential lab members.
Feel free to share this page and join us in #applied-research-competitions by joining us on the DS@GT Slack: https://linktr.ee/datasciencegt
DS@GT ARC - CLEF Competition Schedule
Monthly general meetings to share ideas and share progress between labs. Each lab will have biweekly meetings (exact meeting frequency and meeting times will be coordinated by each lab lead). The rough general schedule for the spring semester is as follows:
Late November to December - Begin Lab team and Task teams formation
January - Finalize Lab and Task teams formation and Kickoff, Task Review, Research Planning and Literature Reviews
February - Preliminary Experiments
March - Datasets Released, Research Begins
April - Research Continues
May - Final Submissions for the Competition
End of May - Working Note Papers 1st Submission
June - Working Notes Feedback and Final Paper Submission
Team Structure
- Size: Typically 3-5 members per team (including the lead). No more than 5 people on a single task due to complexity of sharing work.
- Composition: Aim for a mix of skills and experience levels where possible, fostering inclusivity and knowledge sharing (e.g., pairing experienced members with newer ones).
- Roles:
- Lab Lead: Responsible for managing a lab team in the Spring. Duties include registering the task teams, defining the technical approach/plan, conducting weekly meetings, delegating tasks, tracking progress, reporting updates, leading paper writing/submission, and confirming team members. Requires significant time commitment and technical/project management skills. Lab lead positions are limited to 1-2 per lab.
- Lab Member: Responsible for actively contributing (coding, analysis, experiments), attending weekly meetings, reporting progress, contributing to the paper, and potentially presenting updates. Expected to have relevant technical skills (Python, Git, ML/data analysis basics) and commit sufficient time.
Available Labs and Lab Leads
While all labs currently have opportunities for new members to join. We will highlight the labs when at full capacity:
- BioASQ: Large-scale Biomedical Semantic Indexing and Question Answering
- TBD
- CheckThat!: Predicting Check-Worthiness, Subjectivity, Persuasion, Roles and Authorities
- TBD
- ELOQUENT: Evaluating Generative Language Models
- TBD
- eRisk: Early Risk Prediction on the Internet
- TBD
- EXIST: sEXism Identification in Social neTworks
- TBD
- FinMMEval: Multilingual and multimodal evaluation of financial AI systems
- TBD
- HIPE: Evaluating accurate and efficient person-place relation extraction from multilingual historical texts
- TBD
- ImageCLEF: Multimodal Challenge in CLEF
- TBD
- JOKER: Automatic Wordplay Analysis
- TBD
- LifeCLEF: Multimedia Retrieval in Nature
- Anthony Miyaguchi, acmiyaguchi@gatech.edu
- Murilo Gustineli, murilogustineli@gatech.edu
- LongEval: Longitudinal Evaluation of Model Performance
- TBD
- PAN: Lab on Stylometry and Digital Text Forensics
- TBD
- QuantumCLEF: Quantum Computing at CLEF
- TBD
- SimpleText: Automatic Simplification of Scientific Texts
- TBD
- TalentCLEF: Natural Language Processing for Human Resources
- TBD
- Touché: Argumentation Systems
- TBD
Self-Assessment
These questions capture the big idea that many of our teams will be exploring:
- Dataset: Using the 20 newsgroups dataset, create a subset of 4 groups of 25 examples each.
- What is transfer learning? What is unsupervised learning? What is an embedding space? Demonstrate the usage of Huggingface transformers to embed each post in the newsgroup dataset.
- When would you use the cosine distance over the Euclidean distance when measuring distance in an embedding space? Write an assertion demonstrating the triangle inequality with embedding vectors.
- What is a k-NN graph? Demonstrate the construction of a 3-NN graph using a ball tree using an edge list representation. How many edges are in the graph? What is the maximum number of edges in the graph?
- What is Precision@k? What is NCDG? Why would you use one over the other? Find the five nearest neighbors of an item in the set as an ordered list. Compute scores between two random lists in the set. Compute the score between lists in the same group. Compute the score between lists in two different groups.
- What is learning to rank? Create a dataset composed of items and their neighborhood lists. Demonstrate learning to rank on the dataset with XGBoost.
Try thinking through each question and task, and self-assess your ability to solve this problem. Consider solving this problem in a notebook and timing yourself, given access to the library documentation. To save you the copy and paste, here’s how ChatGPT answered: https://chatgpt.com/share/b41a40f2-e831-4df0-b051-02b629e1bd9b
Experiences
It would be helpful if you have practical software experience, at a bare minimum:
- You know how to use git and what most of these subcommands roughly do:
clone,fetch,commit,push,pull,stash,merge,log, andrebase. - You have used the scientific Python stack (e.g., imported
numpyandpandas) and know how to use virtual environments. It would also be helpful if you’ve had any of the following experiences: - You have scripting experience with bash or Python and have changed file permissions via the command line.
- You have used a Docker container, have written a Dockerfile, and know how to write a Docker Compose specification.
- You have used pre-trained Torch or Tensorflow models for a domain-specific task.
- You have built a data/model pipeline. You have experience with Airflow or Luigi. You know about DAGs and idempotency as they relate to pipelines.
- You know how to project, filter, rank, and join relational data. You have used Spark, Hadoop, or Postgres. You can convert a dataset between CSV and Parquet.
- You have created or managed resources in a cloud provider like AWS or GCP or PACE. You have experience with Terraform and/or Ansible.
You might find “The Missing Semester of Your CS Education” useful if you feel weak in the software aspect of building IR/ML systems: The Missing Semester of Your CS Education
Videos
Here is a video where we shared our trip to CLEF 2025 in Madrid, Spain during one of the meetings in the Fall. Ten members were able to attend in-person and present their approaches and results.
Points of Contact:
Murilo Gustineli murilogustineli@gatech.edu
Ritesh Mehta rmehta307@gatech.edu
Questions?
Please checkout the FAQ page and feel free to contact us if you have more questions!