Process mining is the missing link between model-based process analysis and data-oriented analysis techniques. Through concrete data sets and easy to use software the course provides data science knowledge that can be applied directly to analyze and improve processes in a variety of domains.

Data science is the profession of the future, because organizations that are unable to use (big) data in a smart way will not survive. It is not sufficient to focus on data storage and data analysis. The data scientist also needs to relate data to process analysis. Process mining bridges the gap between traditional model-based process analysis (e.g., simulation and other business process management techniques) and data-centric analysis techniques such as machine learning and data mining. Process mining seeks the confrontation between event data (i.e., observed behavior) and process models (hand-made or discovered automatically). This technology has become available only recently, but it can be applied to any type of operational processes (organizations and systems). Example applications include: analyzing treatment processes in hospitals, improving customer service processes in a multinational, understanding the browsing behavior of customers using booking site, analyzing failures of a baggage handling system, and improving the user interface of an X-ray machine. All of these applications have in common that dynamic behavior needs to be related to process models. Hence, we refer to this as “data science in action”.

The course explains the key analysis techniques in process mining. Participants will learn various process discovery algorithms. These can be used to automatically learn process models from raw event data. Various other process analysis techniques that use event data will be presented. Moreover, the course will provide easy-to-use software, real-life data sets, and practical skills to directly apply the theory in a variety of application domains.

This course starts with an overview of approaches and technologies that use event data to support decision making and business process (re)design. Then the course focuses on process mining as a bridge between data mining and business process modeling. The course is at an introductory level with various practical assignments.

The course covers the three main types of process mining.

1. The first type of process mining is discovery. A discovery technique takes an event log and produces a process model without using any a-priori information. An example is the Alpha-algorithm that takes an event log and produces a process model (a Petri net) explaining the behavior recorded in the log.

2. The second type of process mining is conformance. Here, an existing process model is compared with an event log of the same process. Conformance checking can be used to check if reality, as recorded in the log, conforms to the model and vice versa.

3. The third type of process mining is enhancement. Here, the idea is to extend or improve an existing process model using information about the actual process recorded in some event log. Whereas conformance checking measures the alignment between model and reality, this third type of process mining aims at changing or extending the a-priori model. An example is the extension of a process model with performance information, e.g., showing bottlenecks. Process mining techniques can be used in an offline, but also online setting. The latter is known as operational support. An example is the detection of non-conformance at the moment the deviation actually takes place. Another example is time prediction for running cases, i.e., given a partially executed case the remaining processing time is estimated based on historic information of similar cases.

Process mining provides not only a bridge between data mining and business process management; it also helps to address the classical divide between “business” and “IT”. Evidence-based business process management based on process mining helps to create a common ground for business process improvement and information systems development.

The course uses many examples using real-life event logs to illustrate the concepts and algorithms. After taking this course, one is able to run process mining projects and have a good understanding of the Business Process Intelligence field.

After taking this course you should:

- have a good understanding of Business Process Intelligence techniques (in particular process mining),

- understand the role of Big Data in today’s society,

- be able to relate process mining techniques to other analysis techniques such as simulation, business intelligence, data mining, machine learning, and verification,

- be able to apply basic process discovery techniques to learn a process model from an event log (both manually and using tools),

- be able to apply basic conformance checking techniques to compare event logs and process models (both manually and using tools),

- be able to extend a process model with information extracted from the event log (e.g., show bottlenecks),

- have a good understanding of the data needed to start a process mining project,

- be able to characterize the questions that can be answered based on such event data,

- explain how process mining can also be used for operational support (prediction and recommendation), and

- be able to conduct process mining projects in a structured manner.

Data science courses contain math — no avoiding that! This course is designed to teach learners the basic math you will need in order to be successful in almost any data science math course and was created for learners who have basic math skills but may not have taken algebra or pre-calculus. Data Science Math Skills introduces the core math that data science is built upon, with no extra complexity, introducing unfamiliar ideas and math symbols one-at-a-time.

Topics include:

~Set theory, including Venn diagrams

~Properties of the real number line

~Interval notation and algebra with inequalities

~Uses for summation and Sigma notation

~Math on the Cartesian (x,y) plane, slope and distance formulas

~Graphing and describing functions and their inverses on the x-y plane,

~The concept of instantaneous rate of change and tangent lines to a curve

~Exponents, logarithms, and the natural log function.

~Probability theory, including Bayes’ theorem.

While this course is intended as a general introduction to the math skills needed for data science, it can be considered a prerequisite for learners interested in the course, “Mastering Data Analysis in Excel,” which is part of the Excel to MySQL Data Science Specialization. Learners who master Data Science Math Skills will be fully prepared for success with the more advanced math concepts introduced in “Mastering Data Analysis in Excel.”

This course is for novice programmers or business people who would like to understand the core tools used to wrangle and analyze big data. With no prior experience, you will have the opportunity to walk through hands-on examples with Hadoop and Spark frameworks, two of the most common in the industry. You will be comfortable explaining the specific components and basic processes of the Hadoop architecture, software stack, and execution environment. In the assignments you will be guided in how data scientists apply the important concepts and techniques such as Map-Reduce that are used to solve fundamental problems in big data. You’ll feel empowered to have conversations about big data and the data analysis process.

Learn to analyze big data using Apache Spark’s distributed computing framework.

In a series of focused, practical tasks, you will start by launching a spark cluster on Amazon’s EC2 cloud computing platform. As you progress to working with real data, you will gain exposure to a variety of useful tools, including RDFlib and SPARQL.



The practical tasks on this course make use of the Gutenberg Project data — the world’s largest open collection of ebooks. This offers no end of opportunity for highly engaging and novel analyses.

As the taught material and example code is given in Python, it is strongly recommended that all students have previous Python programming experience. Furthermore, launching and interacting with a cluster on EC2 requires basic knowledge of Unix command line, and some experience with a command-line editor such as vim or nano would also be advantageous.

With these minimal prerequisites, this course is designed to get you up and running in Spark as quickly and painlessly as possible, so that by the end, you will be comfortable and competent enough to start engineering your own big data solutions.

This course introduces the Bayesian approach to statistics, starting with the concept of probability and moving to the analysis of data. We will learn about the philosophy of the Bayesian approach as well as how to implement it for common types of data. We will compare the Bayesian approach to the more commonly-taught Frequentist approach, and see some of the benefits of the Bayesian approach. In particular, the Bayesian approach allows for better accounting of uncertainty, results that have more intuitive and interpretable meaning, and more explicit statements of assumptions. This course combines lecture videos, computer demonstrations, readings, exercises, and discussion boards to create an active learning experience. For computing, you have the choice of using Microsoft Excel or the open-source, freely available statistical package R, with equivalent content for both options. The lectures provide some of the basic mathematical development as well as explanations of philosophy and interpretation. Completion of this course will give you an understanding of the concepts of the Bayesian approach, understanding the key differences between Bayesian and Frequentist approaches, and the ability to do basic data analyses.

It will introduce methods to perform systematic reviews and meta-analysis of clinical trials. It will cover how to formulate an answerable research question, define inclusion and exclusion criteria, search for the evidence, extract data, assess the risk of bias in clinical trials, and perform a meta-analysis.

Upon successfully completing this course, you will be able to:

- Describe the steps in conducting a systematic review

- Develop an answerable question using the “Participants Interventions Comparisons Outcomes” (PICO) framework

- Describe the process used to collect and extract data from reports of clinical trials

- Describe methods to critically assess the risk of bias of clinical trials

- Describe and interpret the results of meta-analyses

Welcome to the Advanced Linear Models for Data Science Class 1: Least Squares. This class is an introduction to least squares from a linear algebraic and mathematical perspective. Before beginning the class make sure that you have the following:

- A basic understanding of linear algebra and multivariate calculus.

- A basic understanding of statistics and regression models.

- At least a little familiarity with proof based mathematics.

- Basic knowledge of the R programming language.

After taking this course, students will have a firm foundation in a linear algebraic treatment of regression modeling. This will greatly augment applied data scientists’ general understanding of regression models.

Welcome to the Advanced Linear Models for Data Science Class 2: Statistical Linear Models. This class is an introduction to least squares from a linear algebraic and mathematical perspective. Before beginning the class make sure that you have the following:

- A basic understanding of linear algebra and multivariate calculus.

- A basic understanding of statistics and regression models.

- At least a little familiarity with proof based mathematics.

- Basic knowledge of the R programming language.

After taking this course, students will have a firm foundation in a linear algebraic treatment of regression modeling. This will greatly augment applied data scientists’ general understanding of regression models.

This course provides an unique opportunity for you to learn key components of text mining and analytics aided by the real world datasets and the text mining toolkit written in Java. Hands-on experience in core text mining techniques including text preprocessing, sentiment analysis, and topic modeling help learners be trained to be a competent data scientists.

Empowered by bringing lecture notes together with lab sessions based on the y-TextMiner toolkit developed for the class, learners will be able to develop interesting text mining applications.

What are the ethical considerations regarding the privacy and control of consumer information and big data, especially in the aftermath of recent large-scale data breaches?

This course provides a framework to analyze these concerns as you examine the ethical and privacy implications of collecting and managing big data. Explore the broader impact of the data science field on modern society and the principles of fairness, accountability and transparency as you gain a deeper understanding of the importance of a shared set of ethical values. You will examine the need for voluntary disclosure when leveraging metadata to inform basic algorithms and/or complex artificial intelligence systems while also learning best practices for responsible data management, understanding the significance of the Fair Information Practices Principles Act and the laws concerning the “right to be forgotten.”

This course will help you answer questions such as who owns data, how do we value privacy, how to receive informed consent and what it means to be fair.

Data scientists and anyone beginning to use or expand their use of data will benefit from this course. No particular previous knowledge needed.