Open Textbook:

From Algorithms to Z-Scores: Probabilistic and Statistical Modeling in Computer Science

Professor Norm Matloff , University of California, Davis

OVERVIEW:

The materials here form a textbook for a course in mathematical probability and statistics for computer science students. (It would work fine for general students too.)

"Why is this text different from all other texts?"

Computer science examples are used throughout, in areas such as: computer networks; data and text mining; computer security; remote sensing; computer performance evaluation; software engineering; data management; etc.

The R statistical/data manipulation language is used throughout. Since this is a computer science audience, a greater sophistication in programming can be assumed. It is recommended that my R tutorials be used as a supplement: Chapter 1 of my book on R software development, The Art of R Programming , NSP, 2011 Part of a VERY rough and partial draft of that book. It is only about 50% complete, has various errors, and presents a number of topics differently from the final version, but should be useful in R work for this class.

Throughout the units, mathematical theory and applications are interwoven, with a strong emphasis on modeling : What do probabilistic models really mean, in real-life terms? How does one choose a model? How do we assess the practical usefulness of models? For instance, the chapter on continuous random variables begins by explaining that such distributions do not actually exist in the real world, due to the discreteness of our measuring instruments. The continuous model is therefore just that--a model , and indeed a very useful model. There is actually an entire chapter on modeling, discussing the tradeoff between accuracy and simplicity of models.

: What do probabilistic models really mean, in real-life terms? How does one choose a model? How do we assess the practical usefulness of models? There is considerable discussion of the intuition involving probabilistic concepts, and the concepts themselves are defined through intuition. However, all models and so on are described precisely in terms of random variables and distributions.

For topical coverage, see the book's detailed table of contents.

The materials are continuously evolving, with new examples and topics being added.

Prerequisites: The student must know calculus, basic matrix algebra, and have some minimal skill in programming.

LICENSING:



This work is licensed under a Creative Commons Attribution-No Derivative Works 3.0 United States License. Copyright is retained by N. Matloff in all non-U.S. jurisdictions, but permission to use these materials in teaching is still granted, provided the authorship and licensing information here is displayed. I would appreciate being notified if you use this book for teaching, just so that I know the materials are being put to use, but this is not required.

The book is freely available, subject to the conditions above, and can be downloaded from http://heather.cs.ucdavis.edu/~matloff/132/PLN/probstatbook/ProbStatBook.pdf.

AUTHOR'S BIO:

Norm Matloff is a Professor of Computer Science at the University of California, Davis. He was formerly a statistics professor at that university, and thus approaches the subject matter here as both a statistician and computer scientist. His research has included a number of diverse areas in the two fields, and he has been a recipient of the university's Distinguished Teaching Award. He was born and raised in Los Angeles, and earned his PhD in theoretical mathematics (probability theory/functional analysis) at UCLA.