Database Systems

Course instructor: Yanif Ahmad.
Course staff: P.C. Shyamshankar (TA)
Class schedule: MW 12-1.15pm, Shaffer 100.

Office hours:
Yanif: W 1:30-3pm.
Shyam: Tu 3-4:30pm. (Malone 239)

Course Description

This course serves as an introduction to the architecture and design of modern database management systems. Database management systems (DBMS) are widely used to manage, store and query diverse datasets and have become an invaluable tool in today's enterprises and large web companies with applications in transaction processing, business intelligence and analytics. Topics include query processing algorithms and data structures, data organization and storage, query optimization and cost modeling, transaction management and concurrency control, high-availability mechanisms, parallel and distributed databases, and a survey of modern architectures including NoSQL, column-oriented and streaming databases. In addition to technical material, we will devote a portion of weekly lectures to looking at the use of database technology in today's enterprises, including document indexing at Google, parallel data warehousing with systems such as Hadoop and HIVE, and transactional web applications. Coursework includes programming assignments and experimentation in a simple database framework written in Python.

Which JHU Database course do I take? Database Systems (CS 600.316/416) is complementary to Databases (CS 600.315/415) offered in the Fall semester and the two make a natural course sequence. 315/415 focuses on using databases in applications, for example to support web sites, with topics including database schema design and models, and database programming languages. 316/416 focuses on database internals and the implementation of a database runtime to realize declarative queries, including storage and data organization, indexing, query processing and optimization and transactions. 316/416 does not have 315/415 as a prerequisite but it is recommended that you have some prior exposure to SQL (for example by reading through the SQL tutorials below).

Course area: Systems.
Prerequisites: CS 600.120 (Intermediate Programming) and CS 600.226 (Data Structures) or equivalent.
Class discussions: See our Piazza page.

The following links may be useful if you're looking for a refresher on SQL, or want to play with DBMS software.
SQL tutorial: PDF, CS 315@JHU, SQLZoo, W3Schools
Open-source DBMS: SQLite, Postgres, MySQL


Academic Conduct

All activities related to this course are subject to JHU's academic ethics and student conduct policies. Students are also expected to adhere to the Computer Science Academic Integrity Code.

Grading Scheme

60% Assignments
15% Midterm
15% Final
10% Class participation

Assignments will be made available on Wednesdays at 9am, and are due on Wednesdays 11:59pm on their hand-in weeks. We encourage you to work in pairs to develop and debug your assignment solution. Assignments will be graded on a 100 point scale for CS316 and 125 points for CS416. We will return your grades to you within two weeks of handin.

Late policy: You have a total of 10 late days throughout the semester. Any late handins after you have used up late days will automatically be deducted 20% of the assignment grade. We strongly suggest you start programming assignments well in advance of their handin date to give you time to address software bugs and to ask for clarifications as needed.

Syllabus

Week Topic Suggested reading
1: January 30 Welcome, SQL+DBMS intro. SQL tutorials above.
Assignment 0: query and Python warmup.
2: February 6 Storage and data organization. Cow 9.3-9.7, Boat 10.5-10.8
Assignment 1: storage manager. Hand-in: week 4.
3: February 13 Indexing and access methods. Cow 10, Boat 11-11.4
4: February 20 Query processing. Cow 12-14, Boat 12
Assignment 2: operator algorithms. Hand-in: week 6.
5: February 27 Query optimization. Cow 15, Boat 13-13.4
6: March 6 Physical database design. Cow 20, Boat 13.5-13.6,24.1-24.1
7: March 13 Transaction processing. Cow 16-17, Boat 14,15
Midterm exam. 4-day take home.
8: March 20 No class, spring break! The Manga Guide to Databases, while here
9: March 27 Logging and recovery. Cow 17-18, Boat 15,16
Assignment 3: query optimization. Hand-in: week 11.
10: April 3 Data analytics. Cow 25, Boat 20
11: April 10 Parallel and distributed QP. Cow 22, Boat 18
Assignment 4: distributed database. Hand-in: week 13.
12: April 17 Modern architectures: NoSQL Cassandra/HBase, MongoDB.
13: April 24 Modern architectures: MPP and streams. Hadoop, Hive, StreamBase, Storm.
14: May 1 Modern architectures: graph and prob. DBs Pregel/GoldenOrb/Giraph, MayBMS, MCDB.
Final exam. Date to be announced (mid-May)

Organization

Database Systems is a 13-week course that meets twice a week and is subdivided into system design, algorithms and architecture topics as listed in the syllabus below in 1 week units. Classes consist of lectures, discussions and reading, with a series of programming assignments comprising the bulk of a student's grade. Further details on the assignments can be found below. In addition, there will be two short quizzes making up an in-class midterm and final. The primary source of course material will be lecture slides made available on the course's Blackboard page. The textbooks below are not required, but may be of use as reference material for in-depth study of topics.

Textbooks (recommended, not required):
Database Management Systems. (3rd edition)
Raghu Ramakrishnan and Johannes Gehrke.
http://pages.cs.wisc.edu/~dbbook/
(the "Cow" book)

Database Systems Concepts.
Avi Silberschatz, Henry F. Korth, S. Sudarshan. (6th edition)
http://codex.cs.yale.edu/avi/db-book/
(the "Boat" book)

Assignments:
There will be 4 x 2-week programming assignments in this course on the following topics:
  • Database storage, implementing a heap file and buffer pool.
  • Query processing, implementing query operators.
  • Query optimization, implementing the System-R optimizer.
  • Scalable analytics, implementing shuffling and querying in a Map-Reduce style parallel dataflow.
We will use Python as our programming language in this course. Students may use any development environment of their choice, although we recommend Sublime Text, or Eclipse. Each assignment will have details on how to hand in your solution. We will ask you to perform a codewalk in the event that we have difficulty getting your assignment to compile and run. Assignment grades will be reported back via email.

Collaboration:
You may discuss Python related issues, potential "bugs" in your code, and clarifications on the assignment handouts with other students. We will use Piazza rather than Blackboard's discussion forums for this purpose. Your activity on Piazza, in-class questions and office hour interactions will determine the class participation component of your grade. All code that you turn in must be your own. Please adhere to the Computer Science Academic Integrity Code and the University's Ethics Code.

Office hours:
We encourage you to use office hours to ask further questions on course material and any detailed assignment questions specific to your solution that should not be shared with other students. You can also email Yanif for a 1-on-1 if you're unable to meet any of the course staff during office hours due to scheduling conflicts.

Course material

We will be using Piazza for lecture notes, assignment handouts, and class discussion. Rather than emailing questions to the teaching staff, I encourage you to post your questions on Piazza. The course staff will check this page regularly and we encourage you to do the same to engage your classmates, the TA, and myself. Here is our Piazza page.