Skip to content

nscheuner/ISCRAP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Quick Guide - iScrap - Subtitle Scraper & More

iScrap

Subtitle scraper and publisher - Academic projet - HEIG-VD 2016,

Out of the box it supports the following features:

  • Amazon ec2 instance creation and handling with 24h ip blacklist
  • Download subtitle over SSH and increment error count if redirection
  • MovieCollection aggregator
  • IMDb and opensubtitles.org scrapper, gets youtube video id for video trailer
  • Databse seed from opensubtitles.org list
  • Create and Update methode for Wordpress posts
  • Handlebar template for Wordpress posts
  • Publish subtitles according to a website scope stored in databse
  • Image and zip file storage in databse
  • Wordpress posts retrieve subtitle from database
  • Comprehensive logs
  • Error handeling

Demo

Check out the Wordpress demo! more than 10'000 movies posted http://149.202.172.22

Tech specs

This projet has been deployed on a ubuntu 16.04 virtual server that should be fully compatible now
Database : MongoDB 3.2 PHP: 7.x

Installation

projectdir$ npm install app.js

Monkii override

Small override is need to avoid casting issue if custom id used. Replace, line 53 in lib/collection

function (str) {
  if (null == str) return this.col.id();
 return 'string' == typeof str ? this.col.id(str) : str;
};

With this

function (str) { return str; };

Credential

youtube API key (google dev) for youtube trailer matching Wordpress user/password JSON Basic Authentication needed

Wordpress installation

Regular update Wordpress is sufficent but following plugin are required
WP REST API
JSON Basic Authentication
MCE Table Buttons (just for design)
Also to download subtitles from Mongo in PHP7 beware of native driver change
```php

\MongoClient -> \MongoDB\Client
\MongoCollection -> \MongoDB\Collection
```

Usage

Default usage counts the number of main app loop

if (count == 1) finished();

ToDo

templateFr.html -> handle if no image
awsManager -> ? if blacklist
zipZupload -> Handle error if no files
dbSeeder:csv2json -> Random? conversion error
dbSeeder -> Huge file handeling is not done here
lib/database -> Update and clean lib, implement $upsert, handle collection "subtitlesToHandleManualy"

About

Academic Projet - Subtitles scraper

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published