Administrative info

Contents

State-of-the-art algorithmic techniques and models for massive data sets. Official description

Teachers

Inge Li Gørtz, inge@dtu.dk, office hours Friday 12.15-13.00. Philip Bille, phbi@dtu.dk, office hours Monday + Thursday 12.30-13.00.

When and where

Monday 8.15-10. Bldg. 308/Aud. 11 Monday 10-12. Bldg. 308/1. floor south. The course runs in the DTU spring semester (Feb 3th to May 13th). There is no teaching in the Easter break (April 14th and April 21st).

Mandatory exercises

Use the template.tex file to prepare your hand in exercises. Compile using LaTeX. Upload the resulting pdf file (and only this file) via Campusnet. The maximum size of the finished pdf must be at most 2 pages. The exercise from week x must be handed in no later than Sunday in week x before 20.00.

Collaboration policy for mandatory exercises.

You may collaborate with fellow students on the hand in exercises.

Collaboration is limited to discussion of ideas only, and you should write up the solutions entirely on your own.

Do not use or seek out solutions from previous years of the course, solutions from similar courses, or solutions found on the internet.

You should list your collaborators (see the template) as well as cite any references you may have used.

Weekplan

The weekplan is preliminary. It will be updated during the course. Under each week there is a number of suggestions for reading material regarding that weeks lecture. It is not the intention that you read ALL of the papers. It is a list of papers and notes where you can read about the subject discussed at the lecture. Week 1: Introduction and Hashing: Chained, Universal, and Perfect.

Week 2: Predecessor Data Structures: x-fast tries and y-fast tries.

Week 3: Decremental Connectivity in Trees: Cluster decomposition, Word-Level Parallelism.

Week 4: Nearest Common Ancestors: Distributed data structures, Heavy-path decomposition, alphabetic codes.

Week 5: Introduction to approximation algorithms: TSP, k-center, and vertex cover.

David P. Williamson and David Shmoys: The Design of Approximation Algorithms (sections 1.1., 2.2, and 2.4)

Jeff Erickson: Non-Lecture K: Approximation Algorithms

You can also read about some of these problems in Kleinberg and Tardos: “Algorithm Design”, V.V. Vazirani: “Approximation Algorithms”, Cormen, Leiserson, Rivest, and Stein: “Introduction to Algorithms”.

Exercises

Week 6: Approximation algorithms: Stable matching and TSP.

Week 7: Range Reporting: Range Trees, Fractional Cascading, and kD Trees.

Week 8: Persistent data structures.

Week 9: Union-Find and amortized analysis (potential method). CANCELLED! Amortized Analysis:

Rebecca Fiebrink’s notes on amortized analysis from Princeton.

Pawel Winter’s notes on amortized analysis from DIKU.

R. E. Tarjan: Amortized Computational Complexity, SIAM. J. on Algebraic and Discrete Methods Volume 6, Issue 2, pp. 306-318 (April 1985)

Union-Find:

Week 10: String Indexing: Dictionaries, Tries, Suffix trees, and Suffix Sorting.

Week 11: External Memory: I/O Algorithms, Cache-Oblivious Algorithms, and Dynamic Programming

FAQ

How should I write my mandatory exercises? The ideal writing format for mandatory exercises is classical scientific writing, such as the writing found in the peer-reviewed articles listed as reading material for this course (not textbooks and other pedagogical material). One of the objectives of this course is to practice and learn this kind of writing. A few tips:

Write things directly: Cut to the chase and avoid anything that is not essential. Test your own writing by answering the following question: “Is this the shortest, clearest, and most direct exposition of my ideas/analysis/etc.?”

Add structure: Don’t mix up description and analysis unless you know exactly what you are doing. For a data structure explain following things separately: The contents of the data structure, how to build it, how to query/update it, correctness, analysis of space, analysis of query/update time, and analysis of preprocessing time. For an algorithm explain separately what it does, correctness, analysis of time complexity, and analysis of space complexity.

Be concise: Convoluted explanations, excessively long sentences, fancy wording, etc. have no in place scientific writing. Do not repeat the problem statement.

Try to avoid pseudocode: Generally, aim for human readable description of algorithms that can easily and unambiguously be translated into code.

Examples for support: Use figures and examples to illustrate key points of your algorithms and data structures.

How much do the mandatory exercises count in the final grade? The final grade is an overall evaluation of your mandatory exercise and the oral exam combined. Thus, there is no precise division of these part in the final grade. However, expect that (in most cases, and under normal circumstances) the mandatory exercises account for a large fraction of the final grade, and the oral exam is a “fine tuning” of your scores in the mandatory exercises. What do I do if I want to do a MSc/BSc thesis or project in Algorithms? Great! Algorithms is an excellent topic to work on 🙂 and Algorithms for Massive Data Sets is designed to prepare you to write a strong thesis. Some basic tips and points.

Let us know well in advance: Identifying an interesting problem in algorithms that matches your interest can take time. With enough time to go over the related litterature and study up on relevant topics your project will likely be more succesful. It may also be a good idea to do an initial “warm up” project before a large thesis to test ideas or survey an area.

Join the community: It is very good idea to enter the local algorithms community at DTU and the Copenhagen area to get a feel for what kind of stuff you could work on for your thesis and what thesis work algorithms is about. Talk to other students doing thesis work in algorithms. Go to algorithms talks and thesis defenses in algorithms.

Collaborate: We strongly encourage you to do your thesis in pairs. We think that having a collaborator to discuss with greatly helps in many aspects of thesis work in algorithms. Our experience confirms this.

No strings attached. Choosing a topic for your thesis is important. You are welcome to discuss master thesis topics with us without pressure to actually write your thesis in algorithms. We encourage you to carefully select your topic.