본문 바로가기

Project

Twitter Classification

Background
}Goal: Mesh the viewing of RSS Articles with related Twitter Message
Use Machine Learning to determine the most significant Tweets based on User’s preferences.
Determine the best ML model to use that can handle dynamic content
Process
}Article Processing
Clean Article Text
Keyword Extraction (Regular Expressions)
–Proper Nouns
–Twitter Hash Tags / User Names
}Keyword Analysis
}Twitter Querying
Multinomial Naive Bayes
}Training / Learning
Consider T to be the Set of Tweets used in Training
1.) Extract Keywords from Tweets (K)
2.) For Each T
2a.) Calculate the Prior Based on previous User Feedback
2b.) For Each Extracted Keyword In T
3.) Aggregate Keyword Weights and compute   the Conditional Probability
}Classification
1.Extract the Keywords from the Article
2.For Each Classifying Tweet
2a. Go Through Each Keyword in the Article
3a. Calculate the Conditional Probability
4a. Sum this value for all Keywords in a Tweet
3. Sort Tweets by this Value to get Ranking
Implementation
}MALLET

 Java-based package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to text.

http://mallet.cs.umass.edu/


Machine Learning Final Report.pdf

'Project' 카테고리의 다른 글

PDF merger  (0) 2011.02.21
Full of Sheep Traveler ( Ticket Booking System )  (0) 2011.02.21
Courses Recommend System  (0) 2011.02.21
Scotland Yard - Online Board Game  (0) 2011.02.21
Orderly / Training Control System  (0) 2011.02.20