Document Analysis Platform
Web app for gathering and consuming news stories. Quickly retrieve online articles about a topic and a chatbot will reference them while answering your questions.
- Flask
- SQLite
- Weaviate Vector Database
- OpenAI API
- Google Search API
Functionality
- Seek and Store - using Google Search API and BeautifulSoup extract text from search results then clean and store in databases.
- Q&A - Question and Answer over dataset with a GPT powered chatbot. Vector search via Weaviate DB.
- NER - Visualize named entity frequency between articles. Powered by Flair and Chart.js
- Summarize - Enter URL and see an article's summary.
Topic Modeling via BERTopic and LDA
A tool for quantifying topic model performance. Compares BERTopic and Latent Dirichlet Allocation.
- BERTopic
- scikit-learn Pipelines
- Sentence-Transformers
- Topic Coherence / NPMI Score
- Matplotlib
Algorithms
Implementations of classic algorithms from Stanford's Coursera course.
- Sorting, search and randomized algorithms
- Graph search and shortest paths
- Data Structures
- Time complexity analysis
News Classification
Training and evaluating different classifier models on 2 different news datasets. BBCs 5 class and Huffposts 42 class dataset.
- Logistic Regression
- Random Forest
- Neural Classifier
Information Extraction
Extracting information from text in the form of Subject, Verb, Object triples.
- SpaCy
- Coreferee
- Textacy
News Clustering Study
Scraping news articles, vectorizing their contents and comparing their similarity.
- BeautifulSoup
- TF-IDF
- PCA
- Matplotlib
Document Database
Input documents, tokenize and encode them via Sentence-Transformers then insert into Milvus DB.
- NLTK Tokenizer
- Sentence-Transformers
- Milvus DB
- Docker