BelgaLogos Dataset

Content-based logos and trademarks retrieval in large natural image collections is of high interest for many applications, including dissemination impact evaluation, prohibited or suspicious logos detection, automatic annotation, etc. Trademarks recognition has been widely addressed by the pattern recognition community in the last decades but there is surprisingly very few works dealing with natural images collections. BelgaLogos dataset was specifically created for this purpose in the scope of the European project VITALAS and with the major contribution of BELGA press agency

Images

The images of BelgaLogos dataset have been provided and are copyrighted by BELGA press agency. They are freely available for research purpose only. The dataset is composed of 10,000 images covering all aspects of life and current affairs: politics and economics, finance and social affairs, sports, culture and personalities. All images are in JPEG format and have been re-sized with a maximum value of height and width equal to 800 pixels, preserving aspect ratio. Here are some thumbnails examples:

Download images

top

Annotations

The 10,000 images of BelgaLogos dataset have been manually annotated. Two different groundtruth are provided: a global groundtruth and a local groundtruth.

Global groundtruth

In this one, each image is labelled for each logo (26 differents logos) with 1 if the logo is actually present in the image and with 0 if it is not. A given image can contain one or several logos or no logo at all. The localization of the logo is not provided for all 10K images, but only for the queries (see next section). The list of logos that were annotated is given in the following table with an illustration of the targeted object. Logos having a bounding box with a minimum value of height and width lower than 10 pixels were not annotated.

Local groundtruth

In the local groundtruth, every logos (37 differents logos) have been surrounded with a rectangular bounding box. A given image can contain several bounding boxes. The annotated instances have then been visually classified as "OK" or "junk" by a set of 3 users, according to their ability to easily recognize an instance without the image context.

Logo name Illustration #OK #Junk Total

Adidas 147 896 1043
Adidas-text 63 115 178
Airness 11 109 120
Base 162 86 248
BFGoodrich 86 222 308
Bik 65 205 270
Bouygues 14 18 32
Bridgestone 31 Junk 105
Bridgestone-text 64 74 201
Carglass 18 47 65
Citroen 78 164 242
Citroen-text 197 134 331
CocaCola 40 33 73
Cofidis 45 45 90
Dexia 235 391 626
ELeclerc 15 5 20
Ferrari 77 136 213
Gucci 2 2 4
Kia 141 101 242

Logo name Illustration #OK #Junk Total

Mercedes 86 193 279

Nike 235 2007 2242

Peugeot 6 2 8

Puma 157 643 800

Puma-text 27 53 80

Quick 57 196 253

Reebok 18 48 66

Roche 2 0 2

Shell 123 113 236

SNCF 7 3 10

Std-Liege 98 283 381

StellaArtois 21 8 29

TNT 102 81 183

Total 78 18 96

US-President 14 0 14

Umbro 153 506 659

Veolia 12 65 77

VRT 10 8 18

Download local groundtruth

top

Queries

Three distinct pools of queries can be used for evaluation, Qset1, Qset2, and Qset3:

Qset1 is composed of 55 internal queries, each defined by an image name and the coordinates of the logo bounding box in this image. Logos being the most frequent in the dataset (see above table) are represented by more queries than less frequent ones. Queries targeting the same logo have the same root name and a iterative number (ex: Addidas1, Addidas2, etc.).

Download internal Qset1 queries

Download internal Qset1 grountruth

Qset2 is composed of 26 jpeg thumbnails downloaded from Google first result page after querying 'logo $logoname'. The logo illustrations provided in above table are re-sized versions of the 26 thumbnails composing Qset2.

Download external Qset2 queries

Download external Qset2 grountruth

Qset3 is composed of 2697 internal queries, representing all the "OK annotated" instances of the 37 logos, each defined by an image name and the coordinates of the logo bounding box in this image. Queries targeting the same logo have the same root name and an incremental number (ex: Adidas1, Adidas2, etc.).

Download internal and local Qset3 queries

Download internal and local Qset3 grountruth

top

Evaluation

Evaluation Metric

The primary metric used for the evaluation is the Mean Average Precision over all queries of a given query set (Qset1, Qset2 or Qset3). Each query has to be searched independently from all other queries (even when a targeted logo is represented by several queries). Average precision is computed for each query and the mean over the query set is computed afterwards.
Secondary metrics can be used to study in detail the performances for each of the 26 logos. In this case the Mean Average Precision has to be computed as the mean of the average precisions of each query targeting the same logo (in a given query set).

Evaluation softwares

Qset1 and Qset2 are evaluated with trec_eval
Qset3 is evaluated with a dedicated software (BelgaLogosEval) using the spatial position of the instances.

Download BelgaLogosEval

top

Download

Download the full BelgaLogos package

top

References

All publications making use of BelgaLogos dataset must include the following reference:

Alexis Joly and Olivier Buisson, Logo retrieval with a contrario visual query expansion, In Proceedings of the Seventeen ACM international Conference on Multimedia, 2009.

@inproceedings{belgalogos09,
author = {Joly, Alexis and Buisson, Olivier},
title = {Logo retrieval with a contrario visual query expansion},
booktitle = {MM '09: Proceedings of the seventeen ACM international conference on Multimedia},
year = {2009},
pages = {581--584},
}

If you use the local groundtruth or the Qset3 queries, you must include the following reference:

Pierre Letessier, Olivier Buisson, Alexis Joly, Scalable Mining of Small Visual Objects, In Proceedings of the 20th ACM international Conference on Multimedia, 2012.

@inproceedings{letessier2012scalable,
title={Scalable mining of small visual objects},
author={Letessier, Pierre and Buisson, Olivier and Joly, Alexis},
booktitle={Proceedings of the 20th ACM international conference on Multimedia},
pages={599--608},
year={2012},
organization={ACM}
}

top

Related publications

Please send your publications related to BelgaLogos at belgalogos@inria.fr

top

Contact

Alexis Joly, alexis(dot)joly(at)inria.fr
Pierre Letessier,

top

INRIA - Rocquencourt - IMEDIA Project