Skip to content

aut-1.0.0

Compare
Choose a tag to compare
@ruebot ruebot released this 11 Jun 17:11
· 20 commits to main since this release
4655448

Documentation

Release Notes

Full Changelog

Implemented enhancements:

  • Remove http headers, and html on webpages() #538
  • Add domain column to webpages() #534
  • Replace Java ARC/WARC record processing library #494
  • Method to perform finer-grained selection of ARCs and WARCs #247
  • Unnecessary buffer copying #18

Fixed bugs:

  • Discard date RDD filter only takes a single string, not a list of strings. #532
  • Extract gzip data from transfer-encoded WARC #493
  • ARC reader string vs int error on record length #492

Closed issues:

  • java.lang.RuntimeException: Unsupported literal type class scala.collection.immutable.Set$Set1 Set(liberal.ca) #529
  • Improve CommandLineApp.scala test coverage #262
  • Improve ExtractBoilerpipeText.scala test coverage #261
  • Improve ArchiveRecord.scala test coverage #260
  • Unit testing for RecordLoader #182
  • Improve ArchiveRecordWritable.java test coverage #76
  • Improve WarcRecordUtils.java test coverage #74
  • Improve ArcRecordUtils.java test coverage #73
  • Improve ExtractDate.scala test coverage #64
  • Remove org.apache.commons.httpclient #23

Merged pull requests: