-
Notifications
You must be signed in to change notification settings - Fork 33
Issues: archivesunleashed/aut
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
Add EscapeHTML Function for ExtractLinks
enhancement
feature
#266
by ianmilligan1
was closed Sep 13, 2018
Add method for unknown extensions in binary extractions
DataFrames
enhancement
resolve before 0.18.0
Scala
#343
by ruebot
was closed Aug 18, 2019
Test aut with Apache Spark 2.4.0
discussion
enhancement
on hold
#295
by ruebot
was closed Jul 17, 2019
Spreadsheet binary object extraction
DataFrames
enhancement
feature
Scala
#303
by ruebot
was closed Aug 16, 2019
Doc binary object extraction
DataFrames
enhancement
feature
Scala
#304
by ruebot
was closed Aug 16, 2019
PDF binary object extraction
DataFrames
enhancement
feature
Scala
#302
by ruebot
was closed Aug 12, 2019
Video binary object extraction
DataFrames
enhancement
feature
Scala
#306
by ruebot
was closed Aug 13, 2019
Audio binary object extraction
DataFrames
enhancement
feature
Scala
#307
by ruebot
was closed Aug 13, 2019
Powerpoint binary object extraction
DataFrames
enhancement
feature
Scala
#305
by ruebot
was closed Aug 16, 2019
Method to perform finer-grained selection of ARCs and WARCs
enhancement
in progress
RA-Task
#247
by lintool
was closed May 24, 2022
Replace hashing of unique ids with .zipWithUniqueId()
enhancement
#243
by greebie
was closed Nov 22, 2018
Update PlainTextExtractor to just extract text
App
DataFrames
enhancement
Scala
#452
by ruebot
was closed Apr 22, 2020
Add alt text column to imageGraph (imageLinks)
DataFrames
enhancement
Scala
#420
by ruebot
was closed Feb 10, 2020
Discussion: Restyle UDFs in the context of DataFrames
DataFrames
enhancement
rdd
Scala
#425
by lintool
was closed Mar 18, 2020
UDFs that filter on url should also filter on src
DataFrames
enhancement
Scala
#418
by ruebot
was closed Feb 12, 2020
Replace Java ARC/WARC record processing library
enhancement
Java
#494
by ruebot
was closed May 24, 2022
Replace scala-uri library from ExtractDomain and just parse public_suffix_list.dat
clean-up
enhancement
#521
by ruebot
was closed Nov 1, 2021
Adding getCrawlYear in ArchiveRecords, resolves #104
enhancement
#105
by ianmilligan1
was merged Oct 26, 2017
Loading…
PySpark support for core AUT functionality. #12, #13.
enhancement
#100
by MapleOx
was closed Dec 5, 2017
Loading…
Changing keepDate to allow multiple dates, would close #108
enhancement
#161
by ianmilligan1
was merged Jan 8, 2018
Loading…
Previous Next
ProTip!
Find all open issues with in progress development work with linked:pr.