Main

My Phd Research Activities At INA

My PhD Thesis (french): pdf

Short Illustrated Overview:

What's a copy ?

Contrary to popular belief, a copy is not an identical document or a near replicated document but rather a transformed document, i.e a document that has been obtained from an original one by a succession of processes. Thus, a copy is not necessary more visually similar than other kinds of similar documents.

All the copies of an original document can be represented by a tree where each node corresponds to a new transformation. This copies tree represents the historic of the use contexts: where, when and how a document has been used or displayed. The copies tree has a very high information content and its construction is a challenging task.
Note that, watermarking based copy detection techniques do not allow the detection of all copies but only those that have been constructed from the document that contains the invisible mark. Content-based copy detection techniques theoretically allow the detection of all copies even if the original document is very old.


copies tree

Content-based video copy detection framework

Video local signatures extraction

  • Key frame detection based on maxima of motion intensity
  • Harris interest points detection
  • Local spatio-temporal differential invariants (20-dimensional vectors)


Distortion-based similarity search

A signature S(t(M)), extracted in a transformed document t(M), can be considered as a distorted version of the original signature S(M), extracted in the original document M. We define the distortion as the following variable:

DS = S(M) - S(t(M))

We define a distortion-based probabilistic query, associated to a probability equal to alpha, as the search of all the signatures contained in a region Valpha of the feature space satisfying:

Int_Valpha(p_DS(X - Q) dX) = alpha

where Q is the query (i.e. the candidate signature) and p_Delta S(.) is the probability density function of the distortion. Intuitively, the probabilistic query selects a region of the feature space such as the probability of finding signatures that could belong to a copy is equal to alpha.

INA television monitoring system

Only one single standard PC is able to monitor continuously a TV channel faced to a reference database containing 50,000 hours of video (about 3 billions 20-dimensional vectors). This system is currently used at the french national audiovisual institute (INA) to monitor ten channels with various large archives databases.

French television detection results ()

The system is robust to most of the pre-production and post-production process that distort the images.

Research

Personal

* [[Personal/Photo Page | Photos]]

edit SideBar

Blix theme adapted by David Gilbert, powered by PmWiki