UW_Logo_2L_horzUNIVERSITY OF WINDSOR

 

Selected Topics on Web Data Extraction Techniques for Recommendation Systems          

03-60-592-1    Winter 2017                                                   

 

Classes: Mon : 2:30 – 5:20pm (ER 1114)                                                                              

Instructor: Dr. C.I. Ezeife

Office: LT 5105                     

Phone: 253-3000 ext. 3012

e-mail: cezeife@uwindsor.ca

Office hours: Monday: 5:30pm - 6:50pm (by appointment too)

 

RECOMMENDED Materials:

C.I Ezeife, Course Notes for 60-592, Selected Topics on Web Data Extraction Techniques and Recommendation Systems, University of Windsor, Winter 2017.

Recommended Text:
1. Recommender Systems: The Textbook, by Aggarwal, Charu C., Springer publishers, ISBN 978-3-319-29657-9.  Available through the bookstore or online order from Springer

Reference Materials:

1.    Jiawei Han, Micheline Kamber and Jian Pei. Data Mining - Concepts and Techniques, published by Morgan Kaufmann/Elsevier, 2011, Third Edition, by (isbn: 978-0-12-381479-1). ****  Most comprehensive and useful to read for mining algorithms.

2.    Ian Witten, Eibe Frank, Mark Hall, and Christopher Pal. Data Mining: Practical Machine Learning Tools and Techniques, 4th Edition by Morgan Kaufamann, isbn: 978-3-540-37881-5     ****    Good for data mining tools like WEKA review. 

3.    Ryan Mitchell, Web Scraping with Python: Collecting Data from the Modern Web.
O’Reilly books, 2017, isbn: 9781491-910290.

4.     Bing Liu, 2008. Web Data Mining - Exploring Hyperlinks, Contents and Usage Data, Springer-Verlag, 2007, isbn 978-3-540-37881-5.  ** Good for Web Mining.

5.     Ezeife, C.I. and Titas Mutsuddy, Towards Comparative Mining of Web Document Objects with NFA: WebOMiner System, International Journal of Data Warehousing and Mining (IJDWM), 8(4), pp. 1-21, October-December 2012.      ****  Our Research and application in data extraction

6.     Ezeife, C.I. and Bindu Peravali, “ Comparative Mining of B2C Web Sites by Discovering Web Database Schemas”, in the proceedings of the 20th ACM International Database Engineering & Applications Symposium (IDEAS16), pp. 183-192 , Montreal, QC, CANADA , 11-13 July, 2016.  *** Our WebOMiner_S data extractor

 

LEARNING OUTCOMES

Students who successfully complete this course will be able to:

§ Understand data mining methods and in particular, association rule mining approaches and web sequential mining techniques, classification and clustering techniques.

§ Understanding web data extraction as web data is a data source for recommendation systems.

§ Understand web recommendation systems and their applications, challenges and application domains such as retail, music, content, web search.

§ Understand recommendation systems data (e.g., (i) collaborative filtering data from user-item interactions such as ratings or buying behavior, and (ii) extracted web content data on users and items).

§ Understand collaborative filtering models, approaches, challenges, and outcomes.

§ Understand recent research results in Web data extraction and recommendation systems published  in relevant  good ACM/IEEE on Web systems and recommendation systems.

§ Develop and implement web data extraction and recommendation systems using collaborative filtering approaches.

 

NOTE: By successfully completing this course, students would have progressed towards gathering training needed to embark on independent original research in databases, mining, recommendation systems, collaborative filtering or related areas.

 

COURSE CONTENT

Data Mining provides the tools for transforming massive data into some valuable information which the organization can quickly exploit to gain some competitive advantage. Web as a medium for online E-Commerce and other transactions serves as a driving force for the development of recommender systems technology since the current Web allows users to provide feedback about their likes and dislikes (or rate items). Even browsing of product items can be collected as data indicating an endorsement of the item. Thus, recommendation systems use these various sources of data about user (customer) and items (products) to infer customer interests. Recommendation analysis can be done using data mining and learning algorithms on these data.  One recommendation system approach is collaborative filtering which in simple terms, uses ratings from multiple users in a collaborative way to predict missing ratings so that product or service recommendations can be made to those users.

 

The objective of this course is to (i) learn the basic data mining techniques of association rule mining, classification and clustering for analyzing extracted data (ii) discuss web data extraction approaches for recommendation and (iii) recommendation systems, collaborative filtering, their applications and challenges.   Thus, topics discussed include:

·         Data mining (Mining techniques of association rule mining such as the Apriori Algorithm, classification such as Decision tree algorithm, Clustering such as the K-Means algorithm). The reference book by Ian Witten and co will be used here.

·         Web data Extraction techniques using Python and as surveyed in our two recent papers on web data extraction of our WEBOMINER system.

·         Web recommendation systems as discussed in the book by Charu C. Aggarwal.

 

Students are urged to attend all given formal lectures/seminars with tentative schedule as:

60-592  TENTATIVE SCHEDULE (Fall 2016)

Week (of)

Activity

1  (Jan 5)

Course Outlines and Data Mining Techniques

2  (Jan 9)

Data Mining Techniques

3  (Jan 16)

Data Mining Techniques and Web Data Extraction

4  (Jan 23)

Web Data Extractions

5  (Jan 30)

Web Recommendation Systems

6  (Feb 6)

Web Recommendation Systems

7  (Feb 13)

Web Recommendation Systems

8  (Feb 20)

Reading week (No classes)

9   (Feb 27)

Midterm Test

10 (Mar 6)

Student research seminar (week 1)

11 (Mar 13)

Student research seminar (week 2)

12 (Mar 20)

Student research seminar (week 3)

13 (Mar 27)

Project presentation (week 1)

14 (Apr 3)

Project presentation (week 1)

Apr 10

Project demonstrations pre-scheduled with prior report submission

*All schedules presented in this document are only tentative and subject to possible revisions in the course of the term.  Any changes will be announced in class or will be posted on the course website.

 

 

COURSE EVALUATION

Work

Mark (out of 100%)

Midterm exam (Feb, 27, 2017)

25% (covers all lecture materials)

Student seminar (Mar. 6 to Mar. 20)

15% (graded 25% by students in the class and 75% by me)

Seminar attendance and contributions

10%

2 Seminar reports (due by Mar. 20)

20%

Project presentation(Mar. 20 to Mar.27)

10% (graded 25% by students in the class and 75% by me)

Project content and report (due Apr 3)

20%  (includes project demo scheduled for Apr. 10)

 

CONVERSION OF MARKS (new % marking scheme used for Fall 2016)

Only raw % scores are assigned in course work and meaning of scores in transcripts are:

% Score

Grade

% Score

Grade

Comments

90-100

A+

63-66.99

C

In computing a student's average, grades from 0% to 22% are calculated as 22%. Grades from 23% to40% calculated as 40%. Grades from 40% to 49% are calculated as is into the student’s average. All grades are recorded in the transcript as is.  All grades below 50% are considered failures. (see mark/grades descriptor page of calendar www.uwindsor.ca/calendar for details).

 

The University of Windsor uses a percentage marking and grading scale

85-89.99

A

60-62.99

C-

80-84.99

A-

57-59.99

D+

77-79.99

B+

53-56.99

D

73-76.99

B

50-52.99

D-

70-72.99

B-

0-49.99

F

67-69.99

C+

 

 

 

IMPORTANT DATES (as in University calendar www.uwindsor.ca/calendar)

Thurs, Jan. 5, 2017      …………        Classes begin.

Wed., Jan. 18, 2017    ………..         Final day for registration revisions.

Sat., Feb 18, 2017 – Sun., Feb 26, 2017 …. Study Week (No classes).

Mon., Feb. 20, 2017    …………        Family Day (No classes).

Wed., Mar. 15, 2017   ……….           Last day for voluntary withdrawal from courses.

                                    Last day to receive partial refund for withdrawal from courses

Wed., Apr. 5, 2017     ………..          Last day of classes

Sat., Apr. 8, 2017        ………            Fall term final examinations begin

Fri., Apr. 14, 2017      ………..          Good Friday (No classes)

Wed., Apr. 21, 2017   ………..          Fall term final examinations end

 

ASSIGNMENTS AND COURSE WORK

1.      Completed report must be handed in five minutes before the beginning of class on the day on which they are due. Late reports will not normally be accepted.

2.      All reports must be neatly stapled together or cerlox bound. Report should include a title page clearly marked on the outside with student’s name, student number, course and instructor’s name.

3.      No make-up tests will be given for missed tests.

4.      All parts of the course must be done to obtain a final grade in the course.

5.      The following confidentiality agreement and statement of honesty will need to be signed by students for all handed-in course work to discourage and prevent academic dishonesty and cheating. Note that if two assignments are found to be a copy of each other, a mark of 0 will be assigned to both assignments.

 

CONFIDENTIALITY AGREEMENT & STATEMENT OF HONESTY

I confirm that I will keep the content of this assignment/examination confidential. 

I confirm that I have not received any unauthorized assistance in preparing for or doing this assignment/examination.   I confirm knowing that a mark of 0 may be assigned for copied work. 

 ________________________________________                                            ________________________________________

Student Signature                                                                Student Name (please print)

________________________________________                                             ________________________________________

Student I.D. Number                                                                                           Date

____________________________________________________________________________________________________________________

 

 

PENALTIES AND DISCIPLINARY ACTION FOR DEFICIENT TERM WORK

1.      Seminar attendance is compulsory. Students are expected to read the papers being presented to be able to make meaningful contributions in the seminars.  Failing to do this leads to loss of some marks.

2.      While collaboration with course mates is encouraged for discussing class topics, students are expected to develop individual research abilities in the area and hand in projects and reports prepared individually by themselves.  In other words, cheating is not allowed in this course.

Policy on cheating

The professors and teaching assistants will report any suspicion of cheating to the Director of the School of Computer Science.  If sufficient evidence is available, the Director will begin a formal process according to the University Senate Bylaws.  The instructor will not negotiate with students who are accused of cheating but will pass all information to the Director of the School of Computer Science. The following behaviour will be regarded as cheating (together with other acts that would normally be regarded as cheating in the broad sense of the term):

1) Copying assignments,  2) Allowing another student to copy an assignment from you and present it as their own work, 3) Copying from another student during a test or exam,  4) Referring to notes, textbooks, etc. during a test or exam, 5) Talking during a test or an exam,  6) Not sitting at the pre-assigned seat during a test or exam, 7) Communicating with another student in any way during a test or exam, 8) Having access to the exam/test paper prior to the exam/test, 9) Asking a teaching assistant for the answer to a question during an exam/test, 10) Presenting another’s work as your own, 11) Modifying answers after they have been marked, 12) Any other behaviour which attempts unfairly to give you an advantage over other students in the grade-assessment process, 13) Refusing to obey the instructions of the officer in charge of an examination.

Students who are found guilty of any form of cheating will be given a grade of F- for the whole course.

Several University of Windsor students have been caught cheating during the last few years.  In most cases the evidence was sufficient to invoke a disciplinary process which resulted in various forms of punishment including letters of censure, loss of marks, failing grades, and expulsions.  Do not cheat, if you are caught and found guilty, you could be thrown out of the university and will have to explain why when you go looking for a job.