Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scrape by reference #37

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 18 additions & 0 deletions lib/uk_planning_scraper/authorities.csv
Original file line number Diff line number Diff line change
@@ -1,8 +1,10 @@
authority_name,url,tags
Aberdeen,https://publicaccess.aberdeencity.gov.uk/online-applications/search.do?action=advanced,scotland
Aberdeenshire,https://upa.aberdeenshire.gov.uk/online-applications/search.do?action=advanced,scotland
Adur and Worthing,https://planning.adur-worthing.gov.uk/online-applications/search.do?action=advanced,england
Allerdale,https://planning.allerdale.gov.uk/portal/servlets/ApplicationSearchServlet,england
Amber Valley,https://www.ambervalley.gov.uk/environment-and-planning/planning/development-management/planning-applications/view-a-planning-application.aspx,england
Argyll and Bute,https://publicaccess.argyll-bute.gov.uk/online-applications/search.do?action=advanced,scotland
Arun,https://www.arun.gov.uk/weekly-lists,england
Ashfield,https://www2.ashfield.gov.uk/cfusion/Planning/plan_findfile.cfm,england
Ashford,http://planning.ashford.gov.uk/planning/Default.aspx?new=true,england
Expand All @@ -25,18 +27,27 @@ Bury,https://planning.bury.gov.uk/online-applications/search.do?action=advanced,
Calderdale,https://portal.calderdale.gov.uk/online-applications/search.do?action=advanced,england westyorkshire
Camden,http://planningrecords.camden.gov.uk/Northgate/PlanningExplorer17/GeneralSearch.aspx,londonboroughs london
Cardiff,https://planningonline.cardiff.gov.uk/online-applications/search.do?action=advanced,wales
Carlisle,https://publicaccess.carlisle.gov.uk/online-applications/search.do?action=advanced,england
Cheshire West and Chester,https://pa.cheshirewestandchester.gov.uk/online-applications/search.do?action=advanced,chester cheshire england
City of London,http://www.planning2.cityoflondon.gov.uk/online-applications/search.do?action=advanced,london innerlondon northlondon england
Conwy,http://www.conwy.gov.uk/en/Resident/Planning-Building-Control-and-Conservation/Planning-Applications/Planning-Explorer.aspx,wales
Cornwall,http://planning.cornwall.gov.uk/online-applications/search.do?action=advanced,england
Croydon,http://publicaccess2.croydon.gov.uk/online-applications/search.do?action=advanced,londonboroughs london
Cornwall,http://planning.cornwall.gov.uk/online-applications/search.do?action=advanced,england
Comhairle nan Eilean Siar,http://planning.cne-siar.gov.uk/PublicAccess/search.do?action=advanced,scotland
County Durham,https://publicaccess.durham.gov.uk/online-applications/search.do?action=advanced,england
Craven,https://publicaccess.cravendc.gov.uk/online-applications/search.do?action=advanced,england
Darlington,https://publicaccess.darlington.gov.uk/online-applications/search.do?action=advanced,england
Doncaster,https://planning.doncaster.gov.uk/online-applications/search.do?action=advanced,england southyorkshire
Dumfries and Galloway,https://eaccess.dumgal.gov.uk/online-applications/search.do?action=advanced,scotland
Ealing,https://pam.ealing.gov.uk/online-applications/search.do?action=advanced,londonboroughs london
East Riding of Yorkshire,https://newplanningaccess.eastriding.gov.uk/newplanningaccess/search.do?action=advanced,england
East Lothian,https://pa.eastlothian.gov.uk/online-applications/search.do?action=advanced,scotland
East Lindsey,https://publicaccess.e-lindsey.gov.uk/online-applications/search.do?action=advanced,england
Edinburgh,https://citydev-portal.edinburgh.gov.uk/idoxpa-web/search.do?action=advanced,scotland
Enfield,https://planningandbuildingcontrol.enfield.gov.uk/online-applications/search.do?action=advanced,londonboroughs london
Epsom and Ewell,http://eplanning.epsom-ewell.gov.uk/online-applications/search.do?action=advanced,surrey england
Fife,https://planning.fife.gov.uk/online/search.do?action=advanced,scotland
Glasgow,https://publicaccess.glasgow.gov.uk/online-applications/search.do?action=advanced,scotland
Greenwich,https://planning.royalgreenwich.gov.uk/online-applications/search.do?action=advanced,london innerlondon southlondon england londonboroughs
Hackney,http://planning.hackney.gov.uk/Northgate/PlanningExplorer/generalsearch.aspx,londonboroughs london
Expand All @@ -46,6 +57,7 @@ Haringey,http://www.planningservices.haringey.gov.uk/portal/servlets/Application
Harrow,http://www.harrow.gov.uk/planningsearch/lg/plansearch.page?org.apache.shale.dialog.DIALOG_NAME=planningsearch&Param=lg.Planning&searchType=detailed,london londonboroughs
Havering,http://development.havering.gov.uk/OcellaWeb/planningSearch,london londonboroughs eastlondon outerlondon
Hillingdon,http://planning.hillingdon.gov.uk/OcellaWeb/planningSearch,london londonboroughs westlondon outerlondon
Highland,https://wam.highland.gov.uk/wam/search.do?action=advanced,scotland
Hounslow,http://planning.hounslow.gov.uk/planning_search.aspx,london londonboroughs
Hull,https://www.hullcc.gov.uk/padcbc/publicaccess-live/search.do?action=advanced,england
Islington,http://planning.islington.gov.uk/northgate/planningexplorer/generalsearch.aspx,londonboroughs london
Expand All @@ -62,8 +74,10 @@ Merton,http://planning.merton.gov.uk/Northgate/PlanningExplorerAA/GeneralSearch.
Milton Keynes,https://publicaccess2.milton-keynes.gov.uk/online-applications/search.do?action=advanced,england
Newcastle upon Tyne,https://publicaccessapplications.newcastle.gov.uk/online-applications/search.do?action=advanced,england tyneandwear
Newham,https://pa.newham.gov.uk/online-applications/search.do?action=advanced,londonboroughs london londonboroughs
North Ayrshire,https://www.eplanning.north-ayrshire.gov.uk/OnlinePlanning/search.do?action=advanced,scotland
North East Lincolnshire,http://planninganddevelopment.nelincs.gov.uk/online-applications/search.do?action=advanced,england
North Lincolnshire,http://www.planning.northlincs.gov.uk/plan/search/,england
North Norfolk,https://idoxpa.north-norfolk.gov.uk/online-applications/search.do?action=advanced,england
North Somerset,https://planning.n-somerset.gov.uk/online-applications/search.do?action=advanced,england
Northern Ireland,http://epicpublic.planningni.gov.uk/publicaccess/search.do?action=advanced&searchType=Application,northernireland belfast
North Tyneside,https://idoxpublicaccess.northtyneside.gov.uk/online-applications/search.do?action=advanced,england tyneandwear
Expand All @@ -75,10 +89,13 @@ Peterborough,https://planpa.peterborough.gov.uk/online-applications/search.do?ac
Plymouth,https://planning.plymouth.gov.uk/online-applications/search.do?action=advanced,england
Poole,https://boppa.poole.gov.uk/online-applications/search.do?action=advanced,england
Portsmouth,http://publicaccess.portsmouth.gov.uk/online-applications/search.do?action=advanced,england
Purbeck,https://planningsearch.purbeck-dc.gov.uk/PlanAppSrch.aspx,england
Redbridge,http://planning.redbridge.gov.uk/swiftlg/apas/run/wphappcriteria.display,london londonboroughs
Richmond,http://www2.richmond.gov.uk/PlanData2/Planning_Report.aspx,london londonboroughs
Rhondda Cynon Taff,https://planningonline.rctcbc.gov.uk/online-applications/search.do?action=advanced,wales
Rochdale,http://publicaccess.rochdale.gov.uk/online-applications/search.do?action=advanced,england greatermanchester
Rutland,https://publicaccess.rutland.gov.uk/online-applications/search.do?action=advanced,england
Scottish Government S36,http://www.energyconsents.scot/ApplicationSearch.aspx?T=2,scotland
Salford,http://publicaccess.salford.gov.uk/publicaccess/search.do?action=advanced,england greatermanchester
Sefton,https://pa.sefton.gov.uk/online-applications/search.do?action=advanced,england merseyside liverpoolcityregion
Sheffield,https://planningapps.sheffield.gov.uk/online-applications/search.do?action=advanced,england
Expand All @@ -87,6 +104,7 @@ Southend-on-Sea,https://publicaccess.southend.gov.uk/online-applications/search.
South Downs,https://planningpublicaccess.southdowns.gov.uk/online-applications/search.do?action=advanced,nationalparks england
Southampton,https://planningpublicaccess.southampton.gov.uk/online-applications/search.do?action=advanced,england
South Gloucestershire,https://developments.southglos.gov.uk/online-applications/search.do?action=advanced,england
South Lanarkshire,https://publicaccess.southlanarkshire.gov.uk/online-applications/search.do?action=advanced,scotland
South Tyneside,http://planning.southtyneside.info/Northgate/PlanningExplorer/GeneralSearch.aspx,england tyneandwear
Southwark,https://planning.southwark.gov.uk/online-applications/search.do?action=advanced,londonboroughs london
St. Helens,https://publicaccess.sthelens.gov.uk/online-applications/search.do?action=advanced,england merseyside liverpoolcityregion
Expand Down
6 changes: 6 additions & 0 deletions lib/uk_planning_scraper/authority_scrape_params.rb
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,12 @@ def decided_days(n)
self
end

def council_reference(s)
check_class(s, String)
@scrape_params[:council_reference] = s.strip
self
end

def applicant_name(s)
unless system == 'idox'
raise NoMethodError.new("applicant_name is only implemented for Idox. \
Expand Down
131 changes: 74 additions & 57 deletions lib/uk_planning_scraper/idox.rb
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ def scrape_idox(params, options)
form.send(:"date(applicationDecisionStart)", params[:decided_from].strftime(date_format)) if params[:decided_from]
form.send(:"date(applicationDecisionEnd)", params[:decided_to].strftime(date_format)) if params[:decided_to]

form.send(:"searchCriteria\.reference", params[:council_reference])
form.send(:"searchCriteria\.description", params[:keywords])

# Some councils don't have the applicant name on their form, eg Bexley
Expand Down Expand Up @@ -115,68 +116,84 @@ def scrape_idox(params, options)

if res.code == '200' # That's a String not an Integer, ffs
# Parse the summary tab for this app

app.scraped_at = Time.now

# The Documents tab doesn't show if there are no documents (we get li.nodocuments instead)
# Bradford has #tab_documents but without the document count on it
app.documents_count = 0

if documents_link = res.at('.associateddocument a')
if documents_link.inner_text.match(/\d+/)
app.documents_count = documents_link.inner_text.match(/\d+/)[0].to_i
app.documents_url = base_url + documents_link[:href]
end
elsif documents_link = res.at('#tab_documents')
if documents_link.inner_text.match(/\d+/)
app.documents_count = documents_link.inner_text.match(/\d+/)[0].to_i
app.documents_url = base_url + documents_link[:href]
end
end

# We need to find values in the table by using the th labels.
# The row indexes/positions change from site to site (or even app to app) so we can't rely on that.

res.search('#simpleDetailsTable tr').each do |row|
key = row.at('th').inner_text.strip
value = row.at('td').inner_text.strip

case key
when 'Reference'
app.council_reference = value
when 'Alternative Reference'
app.alternative_reference = value unless value.empty?
when 'Planning Portal Reference'
app.alternative_reference = value unless value.empty?
when 'Application Received'
app.date_received = Date.parse(value) if value.match(/\d/)
when 'Application Registered'
app.date_received = Date.parse(value) if value.match(/\d/)
when 'Application Validated'
app.date_validated = Date.parse(value) if value.match(/\d/)
when 'Address'
app.address = value unless value.empty?
when 'Proposal'
app.description = value unless value.empty?
when 'Status'
app.status = value unless value.empty?
when 'Decision'
app.decision = value unless value.empty?
when 'Decision Issued Date'
app.date_decision = Date.parse(value) if value.match(/\d/)
when 'Appeal Status'
app.appeal_status = value unless value.empty?
when 'Appeal Decision'
app.appeal_decision = value unless value.empty?
else
puts "Error: key '#{key}' not found"
end # case
end # each row
parse_summary(app, res)
else
puts "Error: HTTP #{res.code}"
end # if
end # scrape summary tab for apps

if apps == [] && params[:council_reference] && page.at_css('.addressCrumb')
app = Application.new
app.council_reference = params[:council_reference]
parse_summary(app, page)
apps << app
end # direct hit
apps
end # scrape_idox

def parse_summary(app, res)
base_url = @url.match(/(https?:\/\/.+?)\//)[1]

app.scraped_at = Time.now

unless app.info_url
key_val = res.link_with(id: 'tab_summary')&.href
app.info_url = "#{base_url}#{key_val}"
end

# The Documents tab doesn't show if there are no documents (we get li.nodocuments instead)
# Bradford has #tab_documents but without the document count on it
app.documents_count = 0

if documents_link = res.at('.associateddocument a')
if documents_link.inner_text.match(/\d+/)
app.documents_count = documents_link.inner_text.match(/\d+/)[0].to_i
app.documents_url = base_url + documents_link[:href]
end
elsif documents_link = res.at('#tab_documents')
if documents_link.inner_text.match(/\d+/)
app.documents_count = documents_link.inner_text.match(/\d+/)[0].to_i
app.documents_url = base_url + documents_link[:href]
end
end

# We need to find values in the table by using the th labels.
# The row indexes/positions change from site to site (or even app to app) so we can't rely on that.
res.search('#simpleDetailsTable tr').each do |row|
key = row.at('th').inner_text.strip
value = row.at('td').inner_text.strip

case key
when 'Reference'
app.council_reference = value
when 'Alternative Reference'
app.alternative_reference = value unless value.empty?
when 'Planning Portal Reference'
app.alternative_reference = value unless value.empty?
when 'Application Received'
app.date_received = Date.parse(value) if value.match(/\d/)
when 'Application Registered'
app.date_received = Date.parse(value) if value.match(/\d/)
when 'Application Validated'
app.date_validated = Date.parse(value) if value.match(/\d/)
when 'Address'
app.address = value unless value.empty?
when 'Proposal'
app.description = value unless value.empty?
when 'Status'
app.status = value unless value.empty?
when 'Decision'
app.decision = value unless value.empty?
when 'Decision Issued Date'
app.date_decision = Date.parse(value) if value.match(/\d/)
when 'Appeal Status'
app.appeal_status = value unless value.empty?
when 'Appeal Decision'
app.appeal_decision = value unless value.empty?
else
puts "Error: key '#{key}' not found"
end # case
end
end
end # class
end
35 changes: 18 additions & 17 deletions spec/authority_spec.rb
Original file line number Diff line number Diff line change
@@ -1,21 +1,19 @@
require 'spec_helper'

describe UKPlanningScraper::Authority do

describe '#named' do

let(:authority) { described_class.named(authority_name) }
subject(:authority) { UKPlanningScraper::Authority.named(name) }

context 'when authority exists' do
let(:authority_name) { 'Westminster' }
let(:name) { 'Westminster' }

it 'returns an authority' do
expect(authority).to be_a(UKPlanningScraper::Authority)
end
end

context 'when authority does not exist' do
let(:authority_name) { 'Westmonster' }
let(:name) { 'Westmonster' }

it 'raises an error' do
expect { authority }.to raise_error(UKPlanningScraper::AuthorityNotFound)
Expand All @@ -24,11 +22,10 @@
end

describe '#all' do
let(:all) { UKPlanningScraper::Authority.all }

let(:all) { described_class.all }

it 'returns all authorities' do
expect(all.count).to eq(112)
it 'returns more than 100 authorities' do
expect(all.count).to be > 100
end

it 'returns a list of authorities' do
Expand All @@ -40,18 +37,22 @@
end

describe '#tagged' do
let (:tagged_london) { described_class.tagged('london') }
let (:authority) { UKPlanningScraper::Authority.tagged(tag) }

context 'when tagged london' do
let(:tag) { 'london' }

it 'returns all London authorities' do
expect(tagged_london.count).to eq(35)
it 'returns all 35 London authorities' do
expect(authority.count).to eq(35)
end
end

let (:tagged_londonboroughs) { described_class.tagged('londonboroughs') }
context 'when tagged londonboroughs' do
let(:tag) { 'londonboroughs' }

it 'returns all London boroughs' do
expect(tagged_londonboroughs.count).to eq(32)
it 'returns all 32 London boroughs' do
expect(authority.count).to eq(32)
end
end

end

end
40 changes: 40 additions & 0 deletions spec/council_reference_spec.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
require 'spec_helper'

describe UKPlanningScraper::Authority do
describe 'named+council_reference scrape' do
let(:scraper) { UKPlanningScraper::Authority.named(authority_name).council_reference(council_reference) }

context 'for an existing idox planning reference' do
let(:authority_name) { 'Brighton and Hove' }
let(:council_reference) { 'BH2017/04225' }
subject(:apps) {
VCR.use_cassette("#{self.class.description}") {
scraper.scrape
}
}

it 'returns an app (in the apps array)' do
expect(apps.any?).to be_truthy
end

it 'has a status of Withdrawn' do
expect(apps.first[:status]).to eql('Withdrawn')
end
end

context 'for a non-existant idox planning reference' do
let(:authority_name) { 'Brighton and Hove' }
let(:council_reference) { 'XYZ123' }
subject(:apps) {
VCR.use_cassette("#{self.class.description}") {
scraper.scrape
}
}

it 'returns an empty apps array' do
expect(apps.empty?).to be_truthy
end
end
end

end
Loading