CS 550: Massive Data Mining and Learning (Spring 2019)


Course Information


Instructor: Yongfeng Zhang
Email: yongfeng.zhang@rutgers.edu
Office: CoRE 309

Time: Fridays, 3:20-6:20 pm
Location: SEC 117
Office Hours: Thursdays 3:00-4:00pm or by appointment

TA: Yunqi Li, Hanxiong Chen
Email: yunqi.li@rutgers.edu, hc691@scarletmail.rutgers.edu
Office: CoRE 329
TA Office Hours: Tuesday 2:00-3:00pm or by appointment

Textbook: (LRU) Mining Massive Data Sets by J. Leskovec, A. Rajaraman, J. D. Ullman



Announcements

  • 2/15: Homework 1 released on Sakai, due date Saturday 3/2 11:59PM, one late day allowed
  • 1/25: Homework 0 released on Sakai, this is a practice homework and does not require submission
  • 1/10: The first class is on Friday, Jan 25.


  • Course Descriptions

    This class introduce computing infrastructurs, algorithms, thories, and practice of massive data analytics and machine learning, as well as their application in frequently used scenarios, including recommender systems, web search engine, social networks, computational advertising, e-commerce, etc. Students will learn algorithms to store, process, mine, analyze, and synthesize streaming data, or data at rest that does not fit in random access memory. The material covered here equips students with the main backend algorithms and infrastructure for the Capstone Project and research tasks closely related with data science and analytics.



    Prerequisites

  • CS 512 or CS 513 (Fundamental Algorithms)
  • Linear Algebra, Basic Probability (Moments, Typical Distributions, MLE)
  • Programming Languages: C++/Java
  • Infrastructure: Hadoop Cluster


  • Expected Work

  • Homework: 4 homework assignments (10% each, 40% total)
  • Midterm: Friday, Mar 15, 4:45pm - 6:15pm (25%)
  • Final Project: Complete as a team of at most 4 students, choose one topic from a list of provided project tasks, provide a 10 min presentation, and submit the code, data, and a report (35%)

  • The midterm is closed-book, but you are allowed to bring 1 letter-sized page of note that you prepared by yourself.



    Tentative Schedule

    Note that the schedule may be subject to change (e.g., due to snow or campus close). Please check the course website frequently for the latest schedule.

    Week
    Date
    Topics and Assignments
    1

    1/25

      Introduction (Reading: Ch 1, Ch 2.1-2.4)
    2

    2/1

      Frequent Item Sets Mining (Reading: Ch 6)
      Association Rule Mining (Reading: Ch 6)
    3

    2/8

      Locally Sensitive Hashing (Reading: Ch 3)
    4

    2/15

      Clustering, similarity, k-means, BFR (Reading: Ch 7.1-7.4)
    5

    2/22

      Dimensionality Reduction, SVD, CUR (Reading: Ch 11)
    6

    3/1

      Content-based Recommendation (Reading: Ch 9.1-9.2)
      Collaborative Filtering, Latent Factor Models (Reading: Ch 9.3-9.4)
    7

    3/8

      Learning to Rank and Deep Learning for RS, Project Description
      Link Analysis, Page Rank (Reading: Ch 5.1-5.3, 5.5)
    8

    3/15

      Web Spam, Trust Rank (Reading: Ch 5.4)
      Mid-term exam (4:45pm - 6:15pm)
    9

    3/22

      No class, Spring recess
    10

    3/29

      Social Networks, Community Detection (Reading: Ch 10.1-10.2, 10.6)
      Spectral Clustering, Trawling (Reading: Ch 10.1-10.2, 10.6)
    11

    4/5

      Overlapping Communities (Reading: Ch 10.3-10.5, 10.7-10.8)
      Large-scale Machine Learning (Reading: Ch 12)
    12

    4/12

      Mining Data Streams (Reading: Ch 4)
    13

    4/19

      Computational Advertising (Reading: Ch 8)
      Learning through Experimentations with Bandit-based Learning
    14

    4/26

      Neural Networks and Graph Neural Networks
    15

    5/3

      Project Presentations and Summary of the Class