Learning Outcomes and Organization
Academic year 2017-2018
The present document describes the learning outcomes and the course organization of the new Master programme in Data Science proposed by the School of Engineering and the Montefiore Institute of the University of Liège.
A number of newly created courses appear in the programme. They are not yet fully documented in the ‘ULg progcours’ database and their outline is therefore included at the end of this document.
For further information, please contact Prof. Louis WEHENKEL or Prof. Guy LEDUC.
2. Learning outcomes
The programme is “full English”, i.e. English is the only language used and required in this programme. It aims at developing the following learning outcomes.
Outcome 1: Mastering the scientific fundamentals of Data Science
Data Science relies essentially on applied mathematics (probability, statistics, optimisation), on computer science (algorithms, data structures, automata, computational complexity), and on artificial intelligence (machine learning, knowledge representation, automatic reasoning).
In order to acquire a sustainable expertise and be able to exploit future techniques, it is paramount to master these scientific fundamentals.
Outcome 2: Being able to use computational tools
The purpose of Data Science is to extract synthetic and usable knowledge by exploiting real- world data. These data are often of heterogeneous quality, come in high volumes, and typically in very diverse forms (text, numbers, images, time-series). The nature of the knowledge to be extracted from the data may also take various forms (predictive models of behaviour, clusters of homogenous behaviours, subsets of relevant variables). The available tools that may be used to extract knowledge from data include machine learning and optimization toolboxes, programming languages and paradigms, massive data storage and processing systems.
The practice of Data Science requires an excellent knowledge of the strengths and weaknesses of these tools and experience in deploying them to develop practical solutions.
Outcome 3: Being able to develop an effective Big Data solution in a real environment
A Big Data solution is developed in several stages, including the definition of the target knowledge to be extracted, the choice of the particular data streams to exploit, prototyping of the data processing pipeline, the data collection per se, testing and optimizing the pipeline, presenting the results in a suitable form, and the design of a life-cycle management approach to ensure the sustainability of the proposed solution. In order to make sure that the solution fits the end-user needs, and can be deployed in the target operational environment (research lab, industry, administration, etc.), it is essential to involve the end-users and the management team of the client both at the design and implementation stages, in order to fully understand the nature of the data and of the field constraints.
It is therefore necessary to master the principles of Big Data project management, and be able to establish a dialog with the field experts and the IT department of the client, in order to make the right technical choices during the project development.
Outcome 4: Being able to carry out a cost-benefit analysis
In order to help companies to make the right choices in terms of leveraging data science, it is necessary to be able to carry out a cost-benefit analysis of a big data project, both about the initial stages of the project, as well as concerning the longer-term strategy.
The Data Scientist therefore has to be equipped with a methodology enabling him to carry out such cost-benefit analyses based on the information provided by a company and in close collaboration with the strategic management of the company.
Outcome 5: Understanding the legal and societal implications
The use of Data Science solutions may lead to important changes in terms of workload in the companies and/or to exploiting information about people and their actions (workers, clients, general public). In order to be acceptable, these solutions must be in line with the legal and ethical rules of society and of the companies.
The Data Scientist must be aware and respectful of the legal and societal implications of the projects he engages in.
3. Programme curriculum
The programme is organized in two 60 ECTS credits blocs, each corresponding to a year of study. The first bloc aims at mastering the fundamentals (scientific and technological) of Data Science, its problem-solving methods and the enabling technologies, and their application in the context of a « Big Data Project ». The second bloc includes a Master thesis, and various courses broadening the student’s outlook and/or allowing her to specialize.
NB: Q1 means that the course is organized in the first term (September – December 2017), Q2 means that it is organized in the second term (from February – May 2018).
Computer Science, Applied Mathematics and Data Science fundamentals (20 credits):
|INFO0016||Introduction to the theory of computation||5||Q1|
|MATH0461||Introduction to numerical optimization||5||Q1|
|INFO8006||Introduction to artificial intelligence||5||Q1|
|ELEN0062||Introduction to machine learning||5||Q1|
Professional focus in data science (30 credits):
|MATH2021||High-dimensional data analysis||3||Q1|
|INFO8002||Large-scale database systems||5||Q1|
|PROJ0016||Big data project||7||Q1 & Q2|
|ELEN0060||Information and coding theory||5||Q2|
|INFO8004||Advanced machine learning||5||Q2|
Elective courses (choose 10 credits in the following list):
|INFO8003||Optimal decision making for complex problems||5||Q2|
|INFO0010||Introduction to computer networks||5||Q2|
|INFO0045||Introduction to computer security||5||Q2|
Master thesis and internship (30 credits)
Management and legal issues (10 credits):
|GEST3162||Principles of management||5||Q1|
|DROI8031-n||Law of Artificial Intelligence, Robots and Data-Driven Algorithmic Applications||5||Q1|
Note: Students who have already acquired the skills and knowledge of GEST3162 (or equivalent) will replace it by GEST3032 (see electives below).
Elective courses (choose 20 credits in the following topics):
Elective courses in computer science
|INFO2049||Web and text analytics||5||Q1|
|INFO0939||High performance scientific computing||5||Q1|
|INFO0010||Introduction to computer networks||5||Q2|
|INFO0045||Introduction to computer security||5||Q2|
Elective courses in applied mathematics
|MATH2022||Large sample analysis: theory and practice||5||Q1|
Elective courses in bioinformatics
|GBIO0002||Genetics and bioinformatics||5||Q1|
|GBIO0009||Topics in bioinformatics||5||Q1|
|GBIO0030||Computational approaches to statistical genetics||5||Q2|
Elective course in management
|GEST3032||eBusiness and eCommerce||5||Q1|
|INGE0012||Scientific research in engineering and its impact on innovation||5||Q2|
With the agreement of the President of the Jury, students may also choose:
- Up to 15 credits in the application area of their Master thesis in other programmes of the university,
- 5 credits in any other programme of the university.
4. Description of new courses included in the programme
In this section, we provide some information about the new courses that appear in the programme but are not yet fully documented on the “ULg progcours” website.
Introduction to artificial intelligence (5 ECTS, Th 25h, Pr 10h, Proj 45h)
The course aims at giving a perspective both on the AI research goals and on the techniques developed over the years in order to build intelligent agents. It will be based on several chapters of the textbook “AI: a modern approach” (by S. Russel and P. Norvig) used worldwide since 1995 in order to teach essentials of AI. Many of the specialized parts (e.g. first-order logic, machine learning, optimization and control, games, computer vision, robotics) treated in the reference textbook are already covered at a deeper level in companion courses offered in our programs. Therefore, the present course will not address these topics in details. Rather, they will be ‘discussed’ by providing links with the other courses of the curriculum covering them more in details. Topics to be covered:
- The overall goal of AI
- AI challenge: the foundations, history and state-of-the-art
- Intelligent agents: modelling ‘rational behaviour’ in a ‘complex environment’
- Problem solving
- Basic search methods for single agent problem-solving over a known environment
- Discussion on the need for handling complex and partially unknown environments and adverse agents
- Reasoning and planning
- Agent reasoning based on propositional theorem proving
- Discussion on the need for using higher-order logics
- Classical planning: state-space search and planning graphs
- Discussion on multi-agent problem solving and knowledge representation
- Managing uncertainties and learning
- Discussion on inference and decision making under uncertainties
- Discussion on the need for learning and overview of various learning paradigms
- Communicating, perceiving and acting
- Natural language processing
- Discussion on perception and robotics
- Philosophical foundations and future of AI
- Possible practical projects:
- Implementing A* for a problem of interest
- Implementing a propositional logic theorem prover
High-dimensional data analysis (3 ECTS, Th 15h, Labo 10h, Proj 15h)
In this course, the focus is on exploratory techniques for high-dimensional data. First, two dimension reduction techniques based on projections will be considered:
- Principal components analysis, which constructs an optimal subspace using the correlation structure in the data, and
- Factorial discriminant analysis, which searches for subspaces in which different sub- groups of data are most discriminated.
Then, automatic classification algorithms will be developed. These rely on the definition of distances (or dissimilarities) and follow some aggregation methods based on different criteria (closest or farthest neighbours, within and between inertia, …).
Kernel smoothing procedures will also be introduced in this course, both in the density estimation context and in a regression framework.
Finally, penalized techniques allowing the handling of flat data (data with more dimensions than subjects) will be discussed, both in the multivariate estimation setting and in the regression setting (lasso and ridge regression).
Large-scale database systems (5 ECTS, Th 25h, Pr 10h, Proj 45h)
This course studies the architecture, design, and implementation of large-scale database systems. It addresses fundamental concepts of distributed database theory, including design and architecture, security, integrity, query processing and optimization, transaction management, concurrency control, and fault tolerance. It then applies these concepts to large- scale blockchain, data warehouse and cloud computing systems. Cloud computing topics include MapReduce and large-scale cloud databases.
Big data project (7 ECTS, Th 10h, Pr 0, Proj 180h)
The purpose of this course/project is for the students to apply knowledge acquired in the Data Science and Engineering program to a project involving actual data in a realistic setting. During the project, the students will engage in the entire process of solving a real-word data science project: formalizing the problem, collecting and processing data, applying appropriate analytical methods and algorithms, deploying a solution.
The course will offer a number of seminars given by industry experts and covering specific topics relevant for big data solutions: large-scale data storage systems, distributed computing frameworks, data science software libraries, specialized machine learning and statistics topics.
The students will work in groups to carry out a practical project over a big dataset, aiming at using the available software and hardware systems for retrieving a specific kind of information from the dataset. The project will be carried out within modern distributed computing and storage environments.
Advanced machine learning (5 ECTS, Th 30h, Pr 5h, Proj 45h)
This course is complementary to ELEN0062 and can be followed independently of the latter. With respect to ELEN0062, the aim of this course is to provide a deeper and more theoretical coverage of supervised learning techniques. The course will formalize the problem of statistical learning and present the main theoretical tools in the domain (bias-variance trade- off, empirical risk minimization, Bayesian approaches). The main families of supervised learning algorithms will be covered, with an emphasis on modern techniques (kernel methods, ensemble methods, deep learning, Gaussian processes, sparse linear models). Implementation issues and scalability of the algorithms will be discussed. The last part of the course will be devoted to a selection of more advanced techniques to deal with structured input and output spaces (rankings, texts, images, graphs) and non-standard learning protocols (semi-supervised learning, transfer learning,…).
At the end of the class, the students will be able to understand the state of the art in the domain. They will be able to implement, combine, or extend existing algorithms to address complex supervised learning tasks. The course will also aim at providing the necessary background to carry out research in the domain.
Depending on the topic, ex-cathedra lectures will be supplemented or replaced by discussions with the students around key papers in the field or by research seminars given by external speakers. Personal student projects will consist either in the implementation and evaluation of advanced algorithms or in critical reading of scientific papers on specific subtopics, depending on the interest and background of the student.
Optimal decision making for complex problems (5 ECTS, Th 25h, Pr 10h, Proj 45h)
There are numerous decision-making problems that can be formalised as problems for which one needs to maximize a numerical reward (or equivalently minimize a cost) when playing with an environment which is stochastic or (partially) unknown, exhibits little structure (e.g., it is not linear/convex), has a sequential nature (e.g., a sequence of decisions needs to be taken to reach an objective) and/or is adversarial (e.g., an opponent takes its decisions so as to minimize your payoff).
Typical examples of such problems are:
- The design of artificial intelligences able to learn to play computer games,
- The placement of advertisements on webpages to maximize the number of clicks,
- Controlling a rocket so as to safely reach a target with minimum fuel costs,
- The synthesis of winning strategies for playing with the stock market,
- The design of artificial intelligences for autonomous robots,
- The design of clinical experiences.
The goal of this class is to teach the techniques for taking optimal decisions for such complex problems. These techniques will borrow from results from system theory, probability theory, information theory, supervised learning as well as linear and convex optimisation.
- Optimal control theory for interacting with linear systems whose dynamics is fully known. Extension of the results to robust control.
- Multistage stochastic programming for interacting with systems whose dynamics is linear, fully known but stochastic.
- Computation of optimal strategies in discrete and stochastic environments that are perfectly known. Review of dynamic programming and direct policy search techniques.
- Learning to interact with discrete and stochastic environments that are unknown at the beginning of the interaction. Review of model-learning and model-free reinforcement techniques. Review of techniques for solving the exploration/exploitation trade-off.
- Extension of the techniques seen in (c) and (d) to environments having very large and/or continuous action spaces.
- Learning in environments that are partially observable.
- Tree search techniques in single-player environments
- Tree search techniques in multi-player environment.
Semantic data (5 ECTS, Th 25h, Pr 10h, Proj 45h)
The course will first cover the conceptual foundations of the representation of semantic knowledge and its use in inference, in order to provide a strong theoretical basis for the remaining content.
Semantic networks and ontologies will be presented and historical difficulties of reasoning with semantic networks explained. Description logics will be introduced as a theoretical basis for ontology-based reasoning, with appropriate formal semantics and inference algorithms, and their relationship with first order logic explained.
The course will then show how these concepts are reused by the semantic web initiative, and present the semantic web standards (description framework, ontologies, query language, rule language). The link between description logics and the ontology web language OWL will be further developed.
Finally, the course will illustrate how semantic data are used in modern industrial areas, such as big data, software engineering, and specific business domains (biomedicine, web design and search, document publishing, …).
Law of artificial intelligence, robots and data-driven algorithmic applications (Th 20h)
This course is an extended version of DROI8031 « Law of robots and AI ». DROI8031 discusses the legal questions related to the regulation of artificial intelligence (AI), a matter of intense acuity with technological development and medium-term marketing services. Amongst the numerous examples that may illustrate this trend, the most emblematic is probably the autonomous car without driver, developed by Google. The development of AI raises profound theoretical questions – opportunity of regulation in a context of technological innovation, the level of regulation (international / local), type of control (self-regulation / binding regulatory, etc.), but also practical ones: rights of AI, AI liability, intellectual property of AI, AI uses for non-commercial purposes, etc. Brand new, the course provides an overview of emerging legal issues related to the emergence of AI and robots.
This course will also address the legal aspects of « Big data ». The ex-cathedra lectures will be completed by lectures to be prepared by students.
Large sample analysis: theory and practice (Th 24h, Pr 12h, Proj 40h)
We first provide an overview of the basic material concerning large sample analysis (including a refresher on convergence modes and all the main approximation theorems). We then devote a chapter to the theory of fixed-n asymptotics (including Berry-Esseen theorems and Stein’s method) and their applications in sample analysis and goodness of fit testing. Particular focus will be devoted to inference and model evaluation of highly complex/intractable probabilistic models such as those arising from Markov Chain Monte Carlo Methods, probabilistic graphical models and deep learning models.