Skip to content

Scrape Detailed Data #39

@Cwooper

Description

@Cwooper

Other Sites to Scrape

Each CRN has the following extra information by sending a POST to the following URLs, starting with base/searchReults/. Payload:

term: 202620
courseReferenceNumber: 20001
first: first

Response is html that must be parsed.

  • Course Description: getCourseDescription
    • Returns <section> course description... <span1 class="notvisible" /> <span2 class="notvisible" /> ... </section>. Where the spans have info about the course.
  • Prerequisites: getSyllabus
    • Returns <section> prepreq info </section>
  • Detailed Prerequisites: getSectionPrerequisites
    • Returns <section> <h3>Catalog Prerequisites</h3> {sometimes a table labeled with class "basePreqTable"} </section>
  • Fees: getFees
  • Restrictions: getRestrictions
  • Attributes: getSectionAttributes
  • Enrollment/Waitlist: getEnrollmentInfo
  • Bookstore Links: getSectionBookstoreDetails
  • Corequisites: getCorerequisites
  • Cross Listed Courses: getXlstSections
  • Class Details: getClassDetails
  • Catalog: getSectionCatalogDetails

We want to store descriptions and prerequisites. Detailed prerequisites and normal Prerequisites are inconsistent on the API--likely due to the department for the course being in charge of it. So, we likely need to grab both, further testing is needed.

We will likely want to crawl backwards from the future to the past to get course descriptions and prerequisites. There will be many CRNs to one course description/prerequisite. E.g., Many CRNs, over many quarters -> "CSCI 247" -> getDescription. We will cache current/future ones first. Most recent overrides and old. Each time a new term comes out, we will likely want to rescrape descriptions/prerequisites just in-case they changed.

Metadata

Metadata

Assignees

Labels

backendBackend related change

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions