Skip to content
Mark Fullmer edited this page Oct 25, 2021 · 30 revisions

Welcome to the Corpus Developer Toolkit!

This wiki provides information on the corpus web software that runs Crow, the corpus & repository of writing. It is intended to be starting point or template for researchers to develop their own web-based corpora platforms.

First step: read the executive summary to determine if this approach fits your corpus goals.

Contents

The "backend": Data storage & retrieval via the application programming interface (API)

The dataset that makes up the Corpus and Repository of Writing (Crow) is a large-scale learner corpus of English writing samples from university foundational writing courses, as well as pedagogical materials used in those courses. It is designed to contain tens of thousands of individual texts, searchable by word, phrase, or metadata.

  1. Overview: the website backend
  2. Importing corpus texts
  3. Importing repository materials
  4. Deploying to a server
  5. Updating software dependencies

The "frontend": Design and usability considerations for a corpus interface

The user interface for the Crow corpus Interface Design is designed to caterer to multiple audiences -- corpus researchers, writing teachers, and students. Registration is required, with different tiers of access.

  1. Overview: the website frontend

Context & case studies

  1. Case study: Adapting the software for a multilingual corpus, MACAWS
Clone this wiki locally