Education Background

  • 02/2013 - Present, Ph.D Student at SCALE team, INRIA Sophia-Antipolis and at Lab MAS, Ecole Centrale Paris

  • 09/2010 - 01/2013, Master of Engineering, Ecole Centrale Pekin, Beihang University, China

  • 09/2009 - 01/2013, Ingénieur Généraliste, Ecole Centrale Pekin, Beihang University, China

  • 09/2006 - 06/2010, BSc. Undergraduate, Beihang University, China

  • Working Experiences

    07/2012 – 12/2012, Internship at Oracle

    I work in a team which wants to create a distributed file system with Solaris. My work is to familiar with the kernel of Unix, and learn to develop with the kernel, and write some shell scripts to test the new release version of our products.

    02/2012 – 04/2012, Internship at Baidu

    The target of my work is to manage a 300-node cluster for a contest of internet. I have to install Hadoop to this cluster, and maintain them. And also give some technical supports for this contest.

    04/2011 – 09/2011, Internship at INRIA

    I am an intern in OASIS team at INRIA Sophia-Antipolis. And my work is to model the performance of Hadoop, and create a continuous version of Hadoop with other Ph.D and master students here.

    Projects

    03/2014, Solutions for Processing K Nearest Neighbor Joins for Massive Data on MapReduce

    Given a point p and a set of points S, the kNN operation finds the k closest points to p in S. It is a computational intensive task with a large range of applications such as knowledge discovery or data mining. However, as the volume and the dimension of data increase, only distributed approaches can perform such costly operation in a reasonable time. Recent works have focused on implementing efficient solutions using the MapReduce programming model because it is suitable for large scale data processing. Also, it can easily be executed in a distributed environment. Although these works provide different solutions to the same problem, each one has particular constraints and properties. There is no readily available comparison to help users choose the one most appropriate for their needs. This is the problem we address in this work. Firstly, we show that all kNN implementations go through a common workflow, which we use as a basis for classification. Secondly, we describe precisely the different techniques published so far. And lastly, we provide a set of objective criteria that can be used to make informed decisions.

    09/2013, Mini Project for ESWC summer school

    This project is a framework to work with the linked and the structured music data on the web. The modules which constituted the framwework allow a user to manage the data sources, map the entities between them, visualize the data, and republish the data with added insights. This project got the first prize at the 2013 ESWC Summer School.

    10/2011 – 11/2012, MapReduce Scheduling

    This project aims to propose a new scheduling algorithm for the current open source implementation of MapReduce----Hadoop, to avoid the unnecessary transmission of data in a distributed file system when running the tasks, and to make the efficient use of each part of the computer (CPU, disk, bandwidth...). This is my master's final project.

    04/2010 - 12/2011, ScilabCloud

    The purpose of this project is to put the open-source scientific computing software SCILAB into cloud, to make it work in a parallel and distributed way, so that it can be used more conveniently and efficiently. This project got the first prize at the 2011 Internetaional OW2 Contest.

    Publications

  • Solutions for Processing K Nearest Neighbor Joins for Big Data on MapReduce - 23rd International Conference on Parallel, Distributed and Network-based Processing, At Turku, Finland - Ge Song, Justine Rochas, Fabrice Huet, Frederic Magoules

  • A Hadoop MapReduce Performance Prediction Method - HPCC 2013 - Ge Song, Zide Meng, Fabrice Huet, Frederic Magoules, Lei Yu, Xuelian Lin [pdf] [ppt]

  • A Game Theory Based MapReduce Scheduling Algorithm - ICM 2012 - Ge Song, Zide Meng, Lei Yu, Xuelian Lin [pdf]

  • Volunteer

    07/2008 – 09/2008, Volunteer of Beijing Olympic Games

    I worked as a bilingual (English and French) translator at the 29th Olympic Games in Beijing.

    Language Skills

  • Chinese: Native Speaker

  • French: B2

  • English: B2

  • Japanese: N3

  • IT Skills

    Java, Python, Hadoop, MapReduce, Storm, SPARQL, MySQL, C, C++, OpenCV, Scilab, Matlab, Office, Photoshop, InDesign

    Hobbies

    Playing the Piano, Chinese Calligraphy, and cooking.