UNIVERSITY OF WINDSOR
Selected Topics on Web Data
Extraction Techniques for Recommendation Systems
03-60-592-1 Winter 2017
Classes: Mon : 2:30 –
5:20pm (ER 1114)
Instructor: Dr. C.I. Ezeife
Office: LT 5105
Phone: 253-3000 ext. 3012
e-mail: cezeife@uwindsor.ca
Office
hours: Monday: 5:30pm - 6:50pm (by appointment too)
RECOMMENDED
Materials:
C.I Ezeife, Course Notes for 60-592, Selected
Topics on Web Data Extraction Techniques and Recommendation Systems, University of Windsor, Winter
2017.
Recommended Text:
1. Recommender Systems: The Textbook, by
Aggarwal, Charu C., Springer publishers, ISBN 978-3-319-29657-9. Available through the bookstore or online
order from Springer
Reference Materials:
1. Jiawei Han, Micheline Kamber
and Jian Pei. Data Mining - Concepts and Techniques, published by Morgan
Kaufmann/Elsevier, 2011, Third Edition, by (isbn: 978-0-12-381479-1). **** Most comprehensive and useful to read for
mining algorithms.
2. Ian Witten, Eibe Frank, Mark
Hall, and Christopher Pal. Data Mining: Practical Machine
Learning Tools and Techniques, 4th Edition by Morgan Kaufamann, isbn:
978-3-540-37881-5 **** Good for data mining tools like WEKA review.
3. Ryan Mitchell,
Web Scraping with Python: Collecting Data from the Modern Web.
O’Reilly books, 2017, isbn: 9781491-910290.
4. Bing Liu, 2008. Web Data Mining - Exploring
Hyperlinks, Contents and Usage Data, Springer-Verlag, 2007, isbn
978-3-540-37881-5. ** Good for Web
Mining.
5. Ezeife,
C.I. and Titas Mutsuddy,
Towards Comparative Mining of Web Document
Objects with NFA: WebOMiner System, International
Journal of Data Warehousing and Mining (IJDWM), 8(4), pp. 1-21, October-December 2012. ****
Our Research and application in data extraction
6. Ezeife, C.I. and Bindu Peravali, “ Comparative Mining
of B2C Web Sites by Discovering Web Database Schemas”, in the proceedings of
the 20th ACM International
Database Engineering & Applications Symposium (IDEAS16), pp. 183-192 , Montreal, QC, CANADA , 11-13 July, 2016. *** Our WebOMiner_S data extractor
LEARNING OUTCOMES
Students who successfully complete
this course will be able to:
§ Understand data mining methods and
in particular, association rule mining approaches and web sequential mining
techniques, classification and clustering techniques.
§ Understanding web data extraction
as web data is a data source for recommendation systems.
§ Understand web recommendation
systems and their applications, challenges and application domains such as
retail, music, content, web search.
§ Understand recommendation systems
data (e.g., (i) collaborative filtering data from user-item interactions such
as ratings or buying behavior, and (ii) extracted web content data on users and
items).
§ Understand collaborative filtering
models, approaches, challenges, and outcomes.
§ Understand recent research results
in Web data extraction and recommendation systems published in relevant good ACM/IEEE on Web systems and
recommendation systems.
§ Develop and implement web data
extraction and recommendation systems using collaborative filtering approaches.
NOTE: By successfully completing
this course, students would have progressed towards gathering training needed
to embark on independent original research in databases, mining, recommendation
systems, collaborative filtering or related areas.
COURSE CONTENT
Data Mining provides the tools for transforming
massive data into some valuable information which the organization can quickly
exploit to gain some competitive advantage. Web as a medium for online
E-Commerce and other transactions serves as a driving force for the development
of recommender systems technology since the current Web allows users to provide
feedback about their likes and dislikes (or rate items). Even browsing of
product items can be collected as data indicating an endorsement of the item.
Thus, recommendation systems use these various sources of data about user
(customer) and items (products) to infer customer interests. Recommendation
analysis can be done using data mining and learning algorithms on these
data. One recommendation system approach
is collaborative filtering which in simple terms, uses ratings from multiple
users in a collaborative way to predict missing ratings so that product or
service recommendations can be made to those users.
The objective of this course is to (i) learn the
basic data mining techniques of association rule mining, classification and
clustering for analyzing extracted data (ii) discuss web data extraction
approaches for recommendation and (iii) recommendation systems, collaborative
filtering, their applications and challenges.
Thus, topics discussed include:
·
Data mining (Mining techniques of association rule mining such as the
Apriori Algorithm, classification such as Decision tree algorithm, Clustering
such as the K-Means algorithm). The reference book by Ian Witten and co will be
used here.
·
Web data Extraction techniques using Python and as surveyed in our two
recent papers on web data extraction of our WEBOMINER system.
·
Web recommendation systems as discussed in the book by Charu C.
Aggarwal.
Students
are urged to attend all given formal lectures/seminars with tentative schedule
as:
60-592 TENTATIVE SCHEDULE (Fall 2016)
Week (of) |
Activity |
1 (Jan 5) |
Course Outlines and Data Mining Techniques |
2 (Jan 9) |
Data Mining Techniques |
3 (Jan 16) |
Data Mining Techniques and Web Data Extraction |
4 (Jan 23) |
Web Data Extractions |
5 (Jan 30) |
Web Recommendation Systems |
6 (Feb 6) |
Web Recommendation Systems |
7 (Feb 13) |
Web Recommendation Systems |
8 (Feb 20) |
Reading week (No classes) |
9 (Feb 27) |
Midterm Test |
10 (Mar 6) |
Student research seminar (week 1) |
11 (Mar 13) |
Student research seminar (week 2) |
12 (Mar 20) |
Student research seminar (week 3) |
13 (Mar 27) |
Project presentation (week 1) |
14 (Apr 3) |
Project presentation (week 1) |
Apr 10 |
Project demonstrations pre-scheduled with prior report submission |
*All schedules presented in this document are only
tentative and subject to possible revisions in the course of the term. Any changes will be announced in class or
will be posted on the course website.
COURSE
EVALUATION
Work |
Mark (out of 100%) |
Midterm exam (Feb, 27, 2017) |
25% (covers all lecture materials) |
Student
seminar (Mar. 6 to Mar. 20) |
15% (graded 25% by students in the class and 75%
by me) |
Seminar
attendance and contributions |
10% |
2 Seminar reports (due by Mar. 20) |
20% |
Project presentation(Mar. 20 to Mar.27) |
10% (graded 25% by students in the class and 75%
by me) |
Project content and report (due Apr 3) |
20%
(includes project demo scheduled for Apr. 10) |
CONVERSION OF MARKS (new % marking
scheme used for Fall 2016)
Only raw % scores are assigned in course work and meaning of scores
in transcripts are:
%
Score |
Grade |
%
Score |
Grade |
Comments |
90-100 |
A+ |
63-66.99 |
C |
In
computing a student's average, grades from 0% to 22% are calculated as 22%.
Grades from 23% to40% calculated as 40%. Grades from 40% to 49% are
calculated as is into the student’s average. All grades are recorded in the
transcript as is. All grades below 50%
are considered failures. (see mark/grades descriptor page of calendar www.uwindsor.ca/calendar for details). The |
85-89.99 |
A |
60-62.99 |
C- |
|
80-84.99 |
A- |
57-59.99 |
D+ |
|
77-79.99 |
B+ |
53-56.99 |
D |
|
73-76.99 |
B |
50-52.99 |
D- |
|
70-72.99 |
B- |
0-49.99 |
F |
|
67-69.99 |
C+ |
|
|
IMPORTANT DATES (as in University calendar www.uwindsor.ca/calendar)
Thurs,
Jan. 5, 2017 ………… Classes begin.
Wed.,
Jan. 18, 2017 ……….. Final
day for registration revisions.
Sat.,
Feb 18, 2017 – Sun., Feb 26, 2017 …. Study Week (No classes).
Mon.,
Feb. 20, 2017 ………… Family Day (No classes).
Wed.,
Mar. 15, 2017 ………. Last day for voluntary withdrawal
from courses.
Last day to
receive partial refund for withdrawal from courses
Wed.,
Apr. 5, 2017 ……….. Last day of classes
Sat.,
Apr. 8, 2017 ……… Fall term final examinations begin
Fri.,
Apr. 14, 2017 ……….. Good Friday (No classes)
Wed.,
Apr. 21, 2017 ……….. Fall term final examinations end
ASSIGNMENTS AND
COURSE WORK
1. Completed report must be
handed in five minutes before the beginning of class on the day on which they
are due. Late reports will not normally be accepted.
2. All reports must be neatly
stapled together or cerlox bound. Report should include a title page clearly
marked on the outside with student’s name, student number, course and
instructor’s name.
3. No make-up tests will be
given for missed tests.
4. All parts of the course must
be done to obtain a final grade in the course.
5. The following confidentiality agreement and statement
of honesty will need to be signed by students for all handed-in course work to
discourage and prevent academic dishonesty and cheating. Note that if two
assignments are found to be a copy of each other, a mark of 0 will be assigned
to both assignments.
CONFIDENTIALITY AGREEMENT
& STATEMENT OF HONESTY
I
confirm that I will keep the content of this assignment/examination
confidential.
I
confirm that I have not received any unauthorized assistance in preparing for
or doing this assignment/examination. I
confirm knowing that a mark of 0 may be assigned for copied work.
________________________________________ ________________________________________
Student Signature Student
Name (please print)
________________________________________ ________________________________________
Student I.D. Number Date
____________________________________________________________________________________________________________________
PENALTIES AND
DISCIPLINARY ACTION FOR DEFICIENT TERM WORK
1. Seminar attendance is
compulsory. Students are expected to read the papers being presented to be able
to make meaningful contributions in the seminars. Failing to do this leads to loss of some
marks.
2. While collaboration with
course mates is encouraged for discussing class topics, students are expected
to develop individual research abilities in the area and hand in projects and
reports prepared individually by themselves.
In other words, cheating is not allowed in this course.
The professors and
teaching assistants will report any suspicion of cheating to the Director of
the
1) Copying assignments, 2)
Allowing another student to copy an assignment from you and present it as their
own work, 3) Copying from another student during a test or exam, 4) Referring to notes, textbooks, etc. during
a test or exam, 5) Talking during a test or an exam, 6) Not sitting at the pre-assigned seat
during a test or exam, 7) Communicating with another student in any way during
a test or exam, 8) Having access to the exam/test paper prior to the exam/test,
9) Asking a teaching assistant for the answer to a question during an
exam/test, 10) Presenting another’s work as your own, 11) Modifying answers
after they have been marked, 12) Any other behaviour which attempts unfairly to
give you an advantage over other students in the grade-assessment process, 13) Refusing
to obey the instructions of the officer in charge of an examination.
Students who are
found guilty of any form of cheating will be given a grade of F- for the whole
course.
Several University of Windsor students have been caught
cheating during the last few years. In
most cases the evidence was sufficient to invoke a disciplinary process which
resulted in various forms of punishment including letters of censure, loss of
marks, failing grades, and expulsions.
Do not cheat, if you are caught and found guilty, you could be thrown
out of the university and will have to explain why when you go looking for a
job.