Current Projects

CollegeViz

CollegeViz is a web-based visualization tool that visualizes college data related to mobility from two different sources and helps the user see societal trends in economic opportunity as a result of higher education. For instance, one can see correlations between the type of institution, mobility rate from the bottom 20% of income to top 1%, and admission rate.

You can find this project on github, live demo on heroku, as well as the Harvard’s Vizathon that started this project.

MuMoA

Multi-Model All-purpose ChatBot (“MuMoA”), simultaneously uses 5 NLP models to achieve a well-rounded bot that can also guide users through a very specific topic by providing accurate, company/organization-specific information, and be conversational at the same time. It also has an extra “mood detector” built-in, so that it can better respond to users.

A typical chatbot today uses a single Natural Language Processing (“NLP”, which is a subset of Machine Learning or “ML”) model that best suits its needs. There are a lot of research studies that cover each NLP model and most of them are open-sourced and free to be used by anyone. Most of the models are good in one area but not so good in another area. For instance, one model could be good at providing specific information (about a company or a topic, such as Covid-19) but when users start chit chatting or talking about something else the chatbot just stops responding and keeps asking users to engage in the very specific area. There are also chatbots that are very good in generic conversation (such as Google, Siri) but they cannot provide detailed information about a specific topic or organization. 

Github & Citizens Fight Dementia at Hacking 4 Community

Chatbot Evaluation Framework and Tools

Evaluating and comparing different chatbots usually involved lots of user feedback and manual process.
I am developing a system that evaluates chatbots based on two very different approaches: Approach 1: programmatically comparing chatbot outputs to the dataset answers using a sentence-transformers model that maps sentences & paragraphs to a 768 dimensional dense vector space and can come up with a “similarity score” between the bot answer and the dataset answer.

Approach 2: Automatically collect and generated all the bot answers and create a polling website that can randomly ask the visitors which answer they think is the best. Then collect their answer and score the chatbot systems accordingly.