SPEECH - Internship presentation slide what i will say ----- --------------- 1 Title - Hi, As you may know, I'm Ezequiel Adrián Cura and i will be showing some of the work i've done during my stay here in projet pulsar, etc, etc. As you may *not* know, this work is centered in developing a way to build general models with unsupervised tools. Let's start. 2 Motivation (State of art) - Nowadays most of the detection/segmentation algorithms works in a certain way: + first a target is well-define: person, chair. (image) + Then a sort of two step algorithm is perform by the scientist: + choosing good collection of features over a set of those. (Hoors for faces, HOG for human, color for flies, gabor, sift for licenced plates, shape context, etc) + finding a metrics/distances across this features that allows them to discriminate the target (euclidean, mahalanobis, kernels, svm, knearest- neighbor, etc) (image) example Usually, both things are bias toward the final target.(image) For example, it might be detection/segmentation of faces, humans, flies, trees, roads, movement, activities, position, color, etc. Almost always, what works for one of those topics doesn't work for others -> i.e. trying to detect white flies by color is a good idea, instead for cats that doesn't look right. 3 Motivation (State of art)- To stand over this sort of specialized algorithms, and looking for a more general approach, some works defined grammars, vocabularies and codebooks over images. (image) The main idea is to find a syntax for images and from there extract some semantic meanings. Some of them are: - A stochastic grammar of images (Song-Chun Zhu and D. Mumford, 2006) - A Numeric Study of the Bottom-up and Top-down Inference Processes in And-Or Graphs (T. Wu and Song-Chun Zhu,ICCV09) This two works defined a strong grammar over images. For example, a face will be the combination of hear, ears, nose and eyes. For this the build a huge database with hand made segmentations which follow this grammar. This work take into account the context by performing three different strategies during the recognition process - A Fragment-Based Approach to object Representation and Classification (S. Ullman, et al.) -find representative fragments/features of the image -> faces - Learning Shape Prior Models for Object Matching (T. Jiang, F. Jurie and C. Schmid) - Group of Adjacent Contour Segments for Object Detection (V. Ferrari, L. Fevrier, F. Jurie and C. Schmid) This approachs are purely shape oriented. they work with edges and without regarding of the context. - Context and Hierarchy in a Probabilistic Image Model (Y. Jin and S. Geman) The image we have seen, the main drawback is that bricks are defined by hand. In all these works the main issues are still two: !they involve strongly supervised learning steps (segmentations done by hand, a fixed vocabulary, etc) ! they have fixed strategies, vocabularies & grammars which biased in the way of the behavior. Also, they give for grant that the best way of "read" an image for a machine is the same as the one they thought we use. 4 Motivation (Goal) - We want to propose a new approach combining object detection and segmentation. This new approach should have this main characteristics : + allow us to perform unsupervised trainning, if we want. + Be unbiased at the start, which means, can learn whatever we fill in to. + fast -> we want something useful, not only a theoretical result. + expressive -> the model must be able to reach all the possible conclusions + compress -> maybe this sounds a little ackward, but we are looking for methods which compress data in a meaningfull way. Many of them discard a huge amount of data and retrieves us small information, i.e. methods like SIFT. Segmentation methods, instead, give us the object we are looking for. This will be helping us to find a general approach to the object detection and segmentation problems. 5 Model -> Reviewing the graph of the second slide, we stated that most works begin with a certain target and they try to find a special way to reach it in certain kind of images. So, for now on, I will call this *targets* we are looking for _concepts_ (image), because indeed they are that, a way to classify information in "common cases" that we find useful. Also, when we find this concept in the real world, we have a _fact_, a fact of this concept. So we will call this facts(real world occurrence of the concepts). Finally the collection of functions that allow us to spot this facts of known concepts will be called spotters. now we can see this model in a much simpler way and realize that something is missing. Here concepts must be defined by us. That is what forces us to build up strong supervised methods and bias the vocabulary towards a *human interpretation*. So from here it sounds natural to include something that allow the model to build "new concepts". We call this, conceiver. A conceiver will take some facts and define new concepts from them(example: humans->crowd) 6 Model -> definitions and example 7 Conceiving (entropy) We're looking for useful(in the sense of meaningful) representations. Information theory (shannon 1948) tell us that the most *expressive* alphabet is obtain when the entropy is maximazed. A known way to maximaze the entropy over a data set is compress it. So, one way is start by thinking ways of compressing data. In this topic, one of the easiest approachs (Lempel–Ziv–Welch) is to find data redundancy and replace it by a certain code(concept). This processes is reapeted and it lead us towards a hierarchichal model. In fact, most of the compressing ways we can think about lead us to group things by appearence over and over until get this most *compressed* version. 8 Conceiving (kolmogorov) Another way of see this conceiving process is as a reduction on the kolmogorov complexity of the data. Kolmogorov complexity : The Kolmogorov-Chaitin complexity of data D is measured as the number of characters need it to write the smallest program P, where P() gives D as output. There are some theoricals results which show that the the length of a kolmogorov program is related to the entropy of this data(example random data). Unfortunately, kolmogorov complexity is non-computable. But if we add some restriction over the setps number in the description, it seems possible to obtain such program, theoretically. This program will retrieves us the smallest representation of the data. In practice, we can start from a very simple program which output is the given data and apply a set of operators iteratively trying to minimize a certain energy (length of the program). Operators can be things like, for example, adding loops to capture redundancies. After applying each operator a new program will be generated, reaching at the end a small program where the groups of loops in the code define our different concepts. 9 global strategies: the supervisor is also in charge of measuring (how useful is/ when to look for) a certain concept, spotter, conceiver, etc. We can think of this process of strategy definition as a second level of learning. There are some tools we create, in this level we optimize when to use it. Ideally, this process is kind of and iteration of the first one.Is a second level of learning. In a way is who learns how to detect (where, how, to look)/ how to learn (what are the important things to remember). Finally, there many possible approachs but all them have sense only when the supervisor has a big set of images to train/ set up. Runtimes expected to be reduced. 10 offline optimizations -> minimization, depending on the conceiver definition, it will be possible to reduce the expressions of concepts, attributes and criterions. Looking for common things. This can be formalized as concepts! Generalization, again, given some attributes and criterions is possible to think ways of producing new concepts from the given ones. Without need of facts ~ something like dreamming. 11 supervisor overview -> stochastic components to ensure unbias! 12 Software - class diagram 13 Software - naifspotter activity diagram 9 Conclusions & Work in progress + Breaking through specialized learning models. Finding a way to produce iterative *meta*generalization looks as the breakthrough that will allow us to fully understand how we are able to learn in layers always using a similar process. At the end, in our head all the things are kind of concepts. We try to design a learning model which is capable of learn from itself. It isn't an easy task, but we like to think that we are going to do a little step forward in this direction. We still have to find some ways to express all the data as the same, because in the computer the are only bits. So the second level strategies must be concepts as chair, person, cat or fly. + Parallel computing. Thanks to the concept definition is possible to interact between different systems which have learn different things. A difficult task will be select the better concepts or defining which ones will survive, something as the darwin process "that ones that are more useful, will survive". + Theoretical definitions. During the work we reach some conclusions regarding compression, hierarchichal models, recursion, functions expressions, but we still need to formalize this demostrations and do others. Mainly showing that the choosen conceivers and the representation of concepts are powerfull enough. Some question to answer are, for example, if attributes need more input parameters or if the approach should be bit oriented. + Generating Spotters. That is still something we don't even try to analize directly. Our current work focus on the concepts, in the future we must define a way of improve the spotters also. + utility measure formalization. We have choose some information theory definitions and compression as the evolve path for our concepts. In fact, we are always looking for compress concepts and fast spotters. We have to formalize the definition of utility of a concept.