BelgaLogos Dataset
Content-based logos and trademarks retrieval in large natural image collections is of high interest for many applications, including dissemination impact evaluation, prohibited or suspicious logos detection, automatic annotation, etc. Trademarks recognition has been widely addressed by the pattern recognition community in the last decades but there is surprisingly very few works dealing with natural images collections. BelgaLogos dataset was specifically created for this purpose in the scope of the European project
VITALAS and with the major contribution of BELGA press agency
Images
The images of BelgaLogos dataset have been provided and are copyrighted by
BELGA press agency. They are freely available for research purpose only. The dataset is composed of 10,000 images covering all aspects of life and current affairs: politics and economics, finance and social affairs, sports, culture and personalities. All images are in JPEG format and have been re-sized with a maximum value of height and width equal to 800 pixels, preserving aspect ratio. Here are some thumbnails examples:
Download images
top
Annotations
The 10,000 images of BelgaLogos dataset have been manually annotated. Two different groundtruth are provided: a global groundtruth and a local groundtruth.
Global groundtruth
In this one, each image is labelled for each logo (26 differents logos) with 1 if the logo is actually present in the image and with 0 if it is not. A given image can contain one or several logos or no logo at all. The localization of the logo is not provided for all 10K images, but only for the queries (see next section). The list of logos that were annotated is given in the following table with an illustration of the targeted object. Logos having a bounding box with a minimum value of height and width lower than 10 pixels were not annotated.
Local groundtruth
In the local groundtruth, every logos (37 differents logos) have been surrounded with a rectangular bounding box. A given image can contain several bounding boxes. The annotated instances have then been visually classified as "OK" or "junk" by a set of 3 users, according to their ability to easily recognize an instance without the image context.
Logo name | Illustration | #OK | #Junk | Total |
Adidas | | 147 | 896 | 1043 |
Adidas-text | | 63 | 115 | 178 |
Airness | | 11 | 109 | 120 |
Base | | 162 | 86 | 248 |
BFGoodrich | | 86 | 222 | 308 |
Bik | | 65 | 205 | 270 |
Bouygues | | 14 | 18 | 32 |
Bridgestone | | 31 | Junk | 105 |
Bridgestone-text | | 64 | 74 | 201 |
Carglass | | 18 | 47 | 65 |
Citroen | | 78 | 164 | 242 |
Citroen-text | | 197 | 134 | 331 |
CocaCola | | 40 | 33 | 73 |
Cofidis | | 45 | 45 | 90 |
Dexia | | 235 | 391 | 626 |
ELeclerc | | 15 | 5 | 20 |
Ferrari | | 77 | 136 | 213 |
Gucci | | 2 | 2 | 4 |
Kia | | 141 | 101 | 242 |
|
Logo name | Illustration | #OK | #Junk | Total |
Mercedes | | 86 | 193 | 279 |
Nike | | 235 | 2007 | 2242 |
Peugeot | | 6 | 2 | 8 |
Puma | | 157 | 643 | 800 |
Puma-text | | 27 | 53 | 80 |
Quick | | 57 | 196 | 253 |
Reebok | | 18 | 48 | 66 |
Roche | | 2 | 0 | 2 |
Shell | | 123 | 113 | 236 |
SNCF | | 7 | 3 | 10 |
Std-Liege | | 98 | 283 | 381 |
StellaArtois | | 21 | 8 | 29 |
TNT | | 102 | 81 | 183 |
Total | | 78 | 18 | 96 |
US-President | | 14 | 0 | 14 |
Umbro | | 153 | 506 | 659 |
Veolia | | 12 | 65 | 77 |
VRT | | 10 | 8 | 18 |
|
Download local groundtruth
top
Queries
Three distinct pools of queries can be used for evaluation, Qset1, Qset2, and Qset3:
Qset1 is composed of 55 internal queries, each defined by an image name and the coordinates of the logo bounding box in this image. Logos being the most frequent in the dataset (see above table) are represented by more queries than less frequent ones. Queries targeting the same logo have the same root name and a iterative number (ex: Addidas1, Addidas2, etc.).
Download internal Qset1 queries
Download internal Qset1 grountruth
Qset2 is composed of 26 jpeg thumbnails downloaded from Google first result page after querying 'logo $logoname'. The logo illustrations provided in above table are re-sized versions of the 26 thumbnails composing Qset2.
Download external Qset2 queries
Download external Qset2 grountruth
Qset3 is composed of 2697 internal queries, representing all the "OK annotated" instances of the 37 logos, each defined by an image name and the coordinates of the logo bounding box in this image. Queries targeting the same logo have the same root name and an incremental number (ex: Adidas1, Adidas2, etc.).
Download internal and local Qset3 queries
Download internal and local Qset3 grountruth
top
Evaluation
Evaluation Metric
The primary metric used for the evaluation is the Mean Average Precision over all queries of a given query set (Qset1, Qset2 or Qset3). Each query has to be searched independently from all other queries (even when a targeted logo is represented by several queries). Average precision is computed for each query and the mean over the query set is computed afterwards.
Secondary metrics can be used to study in detail the performances for each of the 26 logos. In this case the Mean Average Precision has to be computed as the mean of the average precisions of each query targeting the same logo (in a given query set).
Evaluation softwares
Qset1 and Qset2 are evaluated with
trec_eval
Qset3 is evaluated with a dedicated software (BelgaLogosEval) using the spatial position of the instances.
Download BelgaLogosEval
top
Download
Download the full BelgaLogos package
top
References
All publications making use of BelgaLogos dataset must include the following reference:
Alexis Joly and Olivier Buisson,
Logo retrieval with a contrario visual query expansion, In Proceedings of the Seventeen ACM international Conference on Multimedia, 2009.
@inproceedings{belgalogos09,
author = {Joly, Alexis and Buisson, Olivier},
title = {Logo retrieval with a contrario visual query expansion},
booktitle = {MM '09: Proceedings of the seventeen ACM international conference on Multimedia},
year = {2009},
pages = {581--584},
}
If you use the local groundtruth or the Qset3 queries, you must include the following reference:
Pierre Letessier, Olivier Buisson, Alexis Joly, Scalable Mining of Small Visual Objects, In Proceedings of the 20th ACM international Conference on Multimedia, 2012.
@inproceedings{letessier2012scalable,
title={Scalable mining of small visual objects},
author={Letessier, Pierre and Buisson, Olivier and Joly, Alexis},
booktitle={Proceedings of the 20th ACM international conference on Multimedia},
pages={599--608},
year={2012},
organization={ACM}
}
top
Related publications
Please send your publications related to BelgaLogos at
belgalogos@inria.fr
top
Alexis Joly, alexis(dot)joly(at)inria.fr
Pierre Letessier,
top