By Lukas Mueller

Descriptors:

General Bioinformatics, Common Topics, Common Tools, Programming languages, Scripts, Pipelines, Databases.

Justification:

Just the introduction to get a global view of the course and bioinformatics and prepare the virtual machine for the rest of the course.



Estimated Time:

Lecture: 1h

Exercises: 1h





Block 2: Web Tools (Core)

by Susan Strickler

Descriptors:

Biological Databases and Repositories, Genome Browsers, Sequence Analysis Tools, Phylogenetic Tools.

Justification:

There are a lot of bioinformatic tools available through web interfaces such blast or gbrowser. The idea is to show the basic use of some of them to maximize the information that we can get from them. Examples would be, the right use of blast to search homologous genes or how we can download sequences or annotations using GBrowse.

Estimated Time:

Lecture: 1h 30m

Exercises: 30m





Block 3: Basic Linux (Core)

by Lukas Mueller

Descriptors:

Command-line interface, Bash, Shortcuts, File and Directory Commands, Manual and Help, Monitoring Resources, Permissions and Ownership, Installing Programs, Piping and Bash Scripting, Environment Variables, Popular Combinations in Bioinformatics.

Justification:

Most of the bioinformatic tools are programs designed to be used through a console, generally in an Unix system, such as NGS mapping tools (bwa or bowtie…), sequence assembly tools (mira, velvet, abyss…). It is essential the knowledge of basic linux commands to execute these programs, control the resources usage of the machine or preprocess some the results. Additionally there are some “basic recipes” to manipulate big files, such as “grep -c ‘>’ fastafile” to count the number of sequences in fasta file.

Estimated Time:

Lecture: 2h

Exercises: 1h





Block 4: Basic Databases and SQL (Core)

by Naama Menda

Descriptors:

Database definition, Common database software: GUI, OpenOfficeDatabase; CLI, Postgres, Database structure: Table, Columns, Rows, Views, SQL Language: Create Table, Insert, Select, Copy. Biological Databases Schemas: Chado.

Justification:

Why do we need databases? A small scale data is easy to manage in a spreadsheet (such as MS Excell), but when you have large and diverse data, it is the moment to consider moving to a database system.

This class includes an overview of data and supporting data structures, and popular Database Management Systems (DMBS) such as basic ones with a graphical interface (Microsoft Access, Open Office Base), and more robust ones, generally controlled by SQL language (MySQL or PostgreSQL).

Estimated Time:

Lecture: 1h

Exercises: 2h





Block 5: Basic R (Core)

by Aureliano Bombarely

Descriptors:

R description, R-studio, Objects, Functions, Simple Statistical Analysis, Simple graphs.

Justification:

Most of the data scientific data analysis needs from a statistical support. R is free software environment for statistical computing and graphics extensively used for expression analysis and GO term analysis. Do you want to do a Venn diagram or a function distribution pie of your expressed genes ? Probably R can help you with that.

Estimated Time:

Lecture: 2h

Exercises: 3h





Block 6: Basic Perl (Core)

by Lukas Mueller and Naama Menda

Descriptors:

What is Perl, Variables, Functions, Conditionals, Opening Files, Regular Expressions, Subroutines, Modules, Scripting and Testing, Popular Modules and Bioperl.

Justification:

Perl is an easy to lean programming language. Some of the basic bioinformatic analysis requires simple tools and scripts, to process and filter the data. To have some knowledge of Perl can be useful to create these tools, for example to count blast hits of a set of sequences mapped with a similar genome sequence, or simply to change the format of some files.

Estimated Time:

Lecture: 2h

Exercises: 3h