The amount of data daily collected or produced has dramatically exploded over the last decade. Exploiting such big data will benefit to many businesses and scientific areas. In spite of the extraordinary advance of its supporting technology, Big Data remains a hot topic of database research.
In the framework of a co-funded PhD thesis by UVSQ and CNES, UVSQ/ADAM group has proposed and implemented a distributed framework ASTROIDE [7,11] for efficient querying of astronomical surveys.
Drawing on past experience, this thesis proposal aims at extending the existing ASTROIDE framework towards more genericity, more functionality, and better adaptivity of the system.
Precisely, one of the objectives of this PhD thesis is to fill this gap by proposing an optimized generic storage and exchange model for astronomical data in a distributed processing environment. The use of Parquet, or equivalent formats (e.g., kudu), will favour its adoption, since various systems use them as a native storage. The second objective is to improve the overall system performance throughout its execution. We envision a solution based on monitoring the activity of the system, learning the performance behaviour from previous execution traces, which allows optimizing the current execution. The use of data mining and machine learning algorithms for this purpose is a promising direction of research.
The applicant should hold a Master diploma in Computer science, or equivalent:
- Good background in data management, data mining and machine learning
- Strong programming, system, and big data skills
- Good oral communication and technical reading and writing skills in English
- Proficiency in French is desirable, but not required.
To apply, we invite you to contact the PhD/research supervisor and fill, with him/her, the co-financing part of the online application form (Reply to the offer) by April 1st, 2019.