tm.plugin.risjbot

tm.plugin.risjbot provides a tm Source function for creating corpora from articles scraped by the RISJbot webcrawler. RISJbot is a Scrapy/Python project designed to collect the full text and metadata of news articles from the web, using sites' own sitemaps and RSS feeds as a source. It produces a number of output formats, including JSONLines files which this package can read into tm sources.

To allows better integration with other tm data sources, arbitrary mappings are allowed between field names in the RISJbot JSONLines file and metadata field names in the eventual tm::PlainTextDocument.

Installation

You can install the development version of tm.plugin.risjbot from Github with:

devtools::install_github("pmyteh/tm.plugin.risjbot")

Example

This example shows you how to create a tm::VCorpus object using this package's source function:

library(tm.plugin.risjbot)
s <- RISJbotSource('input.jl')
corp <- VCorpus(s)

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
R		R
man		man
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
LICENSE.md		LICENSE.md
NAMESPACE		NAMESPACE
README.md		README.md
tm.plugin.risjbot.Rproj		tm.plugin.risjbot.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

tm.plugin.risjbot

Installation

Example

About

Releases

Packages

Languages

License

pmyteh/tm.plugin.risjbot

Folders and files

Latest commit

History

Repository files navigation

tm.plugin.risjbot

Installation

Example

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages