Large-scale pretraining for dialogue
-
Updated
Oct 17, 2022 - Python
Large-scale pretraining for dialogue
Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow. This is part of the CASL project: http://casl-project.ai/
Large-scale pretrained models for goal-directed dialog
Integrating the Best of TF into PyTorch, for Machine Learning, Natural Language Processing, and Text Generation. This is part of the CASL project: http://casl-project.ai/
Forte is a flexible and powerful ML workflow builder. This is part of the CASL project: http://casl-project.ai/
Conversational Toolkit. An Open-Source Toolkit for Fast Development and Fair Evaluation of Text Generation
Cleans Reddit Text Data 📜 🧹
A Python library that enables smooth keyword extraction from any text using the RAKE(Rapid Automatic Keyword Extraction) algorithm.
Question Classification for the dataset CogComp QC Dataset - [ http://cogcomp.org/Data/QA/QC/ ].
Presents an optimized Apache Beam pipeline for generating sentence embeddings (runnable on Cloud Dataflow).
How Will Your Tweet Be Received? Predicting theSentiment Polarity of Tweet Replies
A Python package implementing the Directed LDA model for targeted extraction of specific topics from text data
For reading from and writing to parallel data files in Python
Python script to perform sentiment analysis on Turkish text data using multiple pre-trained transformer models and list of Turkish Sentiment Analysis Datasets between 2012 to 2022.
A comprehensive repository of classical Persian poetry, curated from Ganjoor.net, designed for Natural Language Processing (NLP), machine learning applications, and literary research.
DupliPy is a quick and easy-to-use package that can handle text formatting and data augmentation tasks for NLP in Python. It now offers support for image augmentation tasks as well.
This repository implements a pipeline to store various data of files from a large unstructured dataset. These fields are used for topic modeling (wordclouds, based on low-dimensional versions of embedding vectors, Named Entity Clustering and document-topic incidences). The information is aggregated and visualised using FCA.
Just a bunch of experiments with embedded graph databases
This is a simple graphical representation of Zipf's Law using term frequencies, calculated for three different text data.
IMDb-Scraping is for retrieving user-generated movie text reviews as well as relevant movie characteristics from imdb.com.
Add a description, image, and links to the text-data topic page so that developers can more easily learn about it.
To associate your repository with the text-data topic, visit your repo's landing page and select "manage topics."