Releases: archivesunleashed/aut
Releases · archivesunleashed/aut
aut-1.2.0
aut-1.1.1
Documentation
Release Notes
Fixed bugs:
- DomainGraph should use YYYYMMDD not YYYYMMDDHHMMSS #544
Merged pull requests:
- Use YYYYMMDD for crawl_date for DomainGraphExtractor. #545 (ruebot)
- Bump jsoup from 1.14.2 to 1.15.3 #543 (dependabot[bot])
aut-1.1.0
aut-1.0.0
Documentation
Release Notes
Implemented enhancements:
- Remove http headers, and html on webpages() #538
- Add domain column to webpages() #534
- Replace Java ARC/WARC record processing library #494
- Method to perform finer-grained selection of ARCs and WARCs #247
- Unnecessary buffer copying #18
Fixed bugs:
- Discard date RDD filter only takes a single string, not a list of strings. #532
- Extract gzip data from transfer-encoded WARC #493
- ARC reader string vs int error on record length #492
Closed issues:
- java.lang.RuntimeException: Unsupported literal type class scala.collection.immutable.Set$Set1 Set(liberal.ca) #529
- Improve CommandLineApp.scala test coverage #262
- Improve ExtractBoilerpipeText.scala test coverage #261
- Improve ArchiveRecord.scala test coverage #260
- Unit testing for RecordLoader #182
- Improve ArchiveRecordWritable.java test coverage #76
- Improve WarcRecordUtils.java test coverage #74
- Improve ArcRecordUtils.java test coverage #73
- Improve ExtractDate.scala test coverage #64
- Remove org.apache.commons.httpclient #23
Merged pull requests:
- Make webpages() consistent across aut and ARCH. #539 (ruebot)
- Update README #537 (ruebot)
- Fix codecov GitHub action. #536 (ruebot)
- Bump commons-compress from 1.14 to 1.21 #535 (dependabot[bot])
- Remove Java w/arc processing, and replace it with Sparkling. #533 (ruebot)
- Bump xercesImpl from 2.12.0 to 2.12.2 #527 (dependabot[bot])
aut-0.91.0
Documentation
Release Notes
Implemented enhancements:
- Include timestamp in crawl date #525
Merged pull requests:
aut-0.90.4
aut-0.90.3
Documentation
Release Notes
Fixed bugs:
- ExtractDomains returns non-Apex Domains #519
Merged pull requests:
- Update ExtractDomain to extract apex domains. #520 (ruebot)
- Bump jsoup from 1.13.1 to 1.14.2 #518 (dependabot[bot])
aut-0.90.2
aut-0.90.1
Documentation
Release Notes
Fixed bugs:
- crawl_date is not included on binary information jobs when documentation says it is #512
Merged pull requests:
aut-0.90.0
Documentation
Release Notes
Fixed bugs:
- Python implementation of .all() has .keepValidPages() incorrectly applied to it #502
- Extract hyperlinks from wayback machine #501
- Release 0.80.0 JAR produces error; built 0.80.1 fatjar built on repo works #495
Closed issues:
- Migrate CI infrastructure from TravisCI to GitHub Action #506
- Split tf into it's own repo #498
- Change master branch to main branch #490
- GitHub action - Run isort and black on Python code #488
- Add scalafmt GitHub action #486
- Add Google Java Formatter as a GitHub action #484
- Packages build is often broken - should we support it? #483
- Implement SaveToDisk in Python #478
- Java 11 support #356
Merged pull requests:
- ars-cloud compatibility with aut and Java 11 #510 (ruebot)
- Update to Spark 3.0.1 #508 (ruebot)
- Replace TravisCI with GitHub Actions. #507 (ruebot)
- Bump junit from 4.12 to 4.13.1 #505 (dependabot[bot])
- Fix relative links extraction #504 (yxzhu16)
- Remove .keepValidPages() on .all() Python implmentation. #503 (ruebot)
- Updates read.me to include citation section #500 (SamFritz)
- Remove tf project; resolves #498. #499 (ruebot)
- Add Python formatter GitHub Action. #489 (ruebot)
- Add scalafmt GitHub action and apply it to scala code. #487 (ruebot)
- Add Google Java Formatter as an action, and apply it. #485 (ruebot)
- Add Python implementation of SaveBytes. #482 (ruebot)
- Bump xercesImpl from 2.11.0 to 2.12.0 #481 (dependabot[bot])
- [Skip Travis] Trim README down given aut.docs.archivesunleashed.org #480 (ruebot)
- Spark 3.0.0 + Java 11 support. #375 (ruebot)