Twitter Classification

Background

}Goal: Mesh the viewing of RSS Articles with related Twitter Message

◦Use Machine Learning to determine the most significant Tweets based on User’s preferences.

◦Determine the best ML model to use that can handle dynamic content

Process

}Article Processing

◦Clean Article Text

◦Keyword Extraction (Regular Expressions)

Proper Nouns

Twitter Hash Tags / User Names

}Keyword Analysis

}Twitter Querying

Multinomial Naive Bayes

}Training / Learning

◦Consider T to be the Set of Tweets used in Training

1.) Extract Keywords from Tweets (K)

2.) For Each T

2a.) Calculate the Prior Based on previous User Feedback

2b.) For Each Extracted Keyword In T

3.) Aggregate Keyword Weights and compute the Conditional Probability

}Classification

1.Extract the Keywords from the Article

2.For Each Classifying Tweet

2a. Go Through Each Keyword in the Article

3a. Calculate the Conditional Probability

4a. Sum this value for all Keywords in a Tweet

3. Sort Tweets by this Value to get Ranking

Implementation

}MALLET

Java-based package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to text.

http://mallet.cs.umass.edu/

Machine Learning Final Report.pdf

저작자표시 비영리 변경금지

'Project' 카테고리의 다른 글

PDF merger (0)	2011.02.21
Full of Sheep Traveler ( Ticket Booking System ) (0)	2011.02.21
Courses Recommend System (0)	2011.02.21
Scotland Yard - Online Board Game (0)	2011.02.21
Orderly / Training Control System (0)	2011.02.20

NY비즐

Twitter Classification

'Project' 카테고리의 다른 글

티스토리툴바

Twitter Classification

'Project' 카테고리의 다른 글

'Project' Related Articles

티스토리툴바