Unable to set up `ExtractorChrome`, fails with `Can't create bean 'fetchProcessors'; Can't create bean 'crawlController'; Can't create bean 'extractorChrome'` #560

vnznznz · 2023-06-27T08:01:44Z

vnznznz
Jun 27, 2023

I'm trying to add the Chrome Extractor to the default crawl config according to the docs but it fails with Can't create bean 'fetchProcessors'; Can't create bean 'crawlController'; Can't create bean 'extractorChrome'

Relevant parts of the config:

  <!-- Chrome Extractor -->
  <bean id="extractorChrome" class="org.archive.modules.extractor.ExtractorChrome">
    <!-- Set your desired properties for the Chrome extractor -->
    <property name="executable" value="chromium-browser" />
    <property name="commandLineOptions">
      <list>
        <value>--headless=new</value>
        <value>--remote-allow-origins=*</value>
      </list>
    </property>
  </bean>

 <bean id="fetchProcessors" class="org.archive.modules.FetchChain">
  <property name="processors">
   <list>
    <!-- re-check scope, if so enabled... -->
    <ref bean="preselector"/>
    <!-- ...then verify or trigger prerequisite URIs fetched, allow crawling... -->
    <ref bean="preconditions"/>
    <!-- ...fetch if DNS URI... -->
    <ref bean="fetchDns"/>
    <!-- <ref bean="fetchWhois"/> -->
    <!-- ...fetch if HTTP URI... -->
    <ref bean="fetchHttp"/>

    <ref bean="extractorChrome" />

    <!-- ...extract outlinks from HTTP headers... -->
    <ref bean="extractorHttp"/>
    <!-- ...extract sitemap urls from robots.txt... -->
    <ref bean="extractorRobotsTxt"/>
    <!-- ...extract links from sitemaps... --> 
    <ref bean="extractorSitemap"/>
    <!-- ...extract outlinks from HTML content... -->
    <ref bean="extractorHtml"/>
    <!-- ...extract outlinks from CSS content... -->
    <ref bean="extractorCss"/>
    <!-- ...extract outlinks from Javascript content... -->
    <ref bean="extractorJs"/>
    <!-- ...extract outlinks from Flash content... -->
    <ref bean="extractorSwf"/>
   </list>
  </property>
 </bean>

 <!-- CRAWLCONTROLLER: Control interface, unifying context -->
 <bean id="crawlController" 
   class="org.archive.crawler.framework.CrawlController">
  <!-- <property name="maxToeThreads" value="25" /> -->
  <!-- <property name="pauseAtStart" value="true" /> -->
  <!-- <property name="runWhileEmpty" value="false" /> -->
  <!-- <property name="recorderInBufferBytes" value="524288" /> -->
  <!-- <property name="recorderOutBufferBytes" value="16384" /> -->
  <!-- <property name="scratchDir" value="scratch" /> -->
 </bean>

Answered by vnznznz

Jun 27, 2023

I solved it by setting lazy-init="true" on the bean! 🎉

  <!-- Chrome Extractor -->
  <bean id="extractorChrome" class="org.archive.modules.extractor.ExtractorChrome" lazy-init="true">
    <!-- Set your desired properties for the Chrome extractor -->
    <property name="executable" value="chromium-browser" />
    <property name="commandLineOptions">
      <list>
        <value>--headless=new</value>
        <value>--remote-allow-origins=*</value>
      </list>
    </property>
  </bean>

Here are parts of the stack trace, so future google searches might find this thread

Jun 27 08:32:53 rocky-8gb-fsn1-1 heritrix[899796]: Caused by: org.springframework.beans.factory.BeanCreationException: Erro…

View full answer

ato · 2023-06-27T08:10:23Z

ato
Jun 27, 2023
Maintainer

There should be a larger stacktrace in one of the log files with more details about why it failed to create the bean.

Note that ExtractorChrome is part of heritrix-contrib not the main distribution of Heritrix, so if you're not using contrib, that could be why. It's also a bit half-baked and not really suitable for production use yet. I should probably add a warning to the docs about that.

2 replies

vnznznz Jun 27, 2023
Author

I solved it by setting lazy-init="true" on the bean! 🎉

  <!-- Chrome Extractor -->
  <bean id="extractorChrome" class="org.archive.modules.extractor.ExtractorChrome" lazy-init="true">
    <!-- Set your desired properties for the Chrome extractor -->
    <property name="executable" value="chromium-browser" />
    <property name="commandLineOptions">
      <list>
        <value>--headless=new</value>
        <value>--remote-allow-origins=*</value>
      </list>
    </property>
  </bean>

Here are parts of the stack trace, so future google searches might find this thread

Jun 27 08:32:53 rocky-8gb-fsn1-1 heritrix[899796]: Caused by: org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'fetchProcessors' defined in URL [file:/home/wct/heritrix3/heritrix-3.4.0-20220727/./jobs/chrome/crawler-beans.cxml]: Cannot resolve reference to bean 'extractorChrome' 
while setting bean property 'processors' with key [4]; nested exception is org.springframework.beans.factory.BeanCurrentlyInCreationException: Error creating bean with name 'extractorChrome': Requested bean is currently in creation: Is there an unresolvable circular reference?                                         
Jun 27 08:32:53 rocky-8gb-fsn1-1 heritrix[899796]:         at org.springframework.beans.factory.support.BeanDefinitionValueResolver.resolveReference(BeanDefinitionValueResolver.java:342)                                     
Jun 27 08:32:53 rocky-8gb-fsn1-1 heritrix[899796]:         at org.springframework.beans.factory.support.BeanDefinitionValueResolver.resolveValueIfNecessary(BeanDefinitionValueResolver.java:113)       
Jun 27 08:32:53 rocky-8gb-fsn1-1 heritrix[899796]:         at org.springframework.beans.factory.support.BeanDefinitionValueResolver.resolveManagedList(BeanDefinitionValueResolver.java:428)                                  
Jun 27 08:32:53 rocky-8gb-fsn1-1 heritrix[899796]:         at org.springframework.beans.factory.support.BeanDefinitionValueResolver.resolveValueIfNecessary(BeanDefinitionValueResolver.java:173)                              
Jun 27 08:32:53 rocky-8gb-fsn1-1 heritrix[899796]:         at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.applyPropertyValues(AbstractAutowireCapableBeanFactory.java:1707)
Jun 27 08:32:53 rocky-8gb-fsn1-1 heritrix[899796]:         at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.populateBean(AbstractAutowireCapableBeanFactory.java:1452)                         
Jun 27 08:32:53 rocky-8gb-fsn1-1 heritrix[899796]:         at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:619)                                                                                                                          
Jun 27 08:32:53 rocky-8gb-fsn1-1 heritrix[899796]:         at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:542)                                                                                                                            
Jun 27 08:32:53 rocky-8gb-fsn1-1 heritrix[899796]:         at org.springframework.beans.factory.support.AbstractBeanFactory.lambda$doGetBean$0(AbstractBeanFactory.java:335)                            
Jun 27 08:32:53 rocky-8gb-fsn1-1 heritrix[899796]:         at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:234)                                                                                                                                      
Jun 27 08:32:53 rocky-8gb-fsn1-1 heritrix[899796]:         at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:333)                                                                                                                                                           
Jun 27 08:32:53 rocky-8gb-fsn1-1 heritrix[899796]:         at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:208)                                       
Jun 27 08:32:53 rocky-8gb-fsn1-1 heritrix[899796]:         at org.springframework.beans.factory.config.DependencyDescriptor.resolveCandidate(DependencyDescriptor.java:276)                                                                                                                                                   
Jun 27 08:32:53 rocky-8gb-fsn1-1 heritrix[899796]:         at org.springframework.beans.factory.support.DefaultListableBeanFactory.doResolveDependency(DefaultListableBeanFactory.java:1389)                         
Jun 27 08:32:53 rocky-8gb-fsn1-1 heritrix[899796]:         at org.springframework.beans.factory.support.DefaultListableBeanFactory.resolveDependency(DefaultListableBeanFactory.java:1309)                                       
Jun 27 08:32:53 rocky-8gb-fsn1-1 heritrix[899796]:         at org.springframework.beans.factory.annotation.AutowiredAnnotationBeanPostProcessor$AutowiredMethodElement.resolveMethodArguments(AutowiredAnnotationBeanPostProcessor.java:759)
Jun 27 08:32:53 rocky-8gb-fsn1-1 heritrix[899796]:         ... 91 more                                                                                                                                                                                                                                                        
Jun 27 08:32:53 rocky-8gb-fsn1-1 heritrix[899796]: Caused by: org.springframework.beans.factory.BeanCurrentlyInCreationException: Error creating bean with name 'extractorChrome': Requested bean is currently in creation: Is there an unresolvable circular reference?
Jun 27 08:32:53 rocky-8gb-fsn1-1 heritrix[899796]:         at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.beforeSingletonCreation(DefaultSingletonBeanRegistry.java:355)                                                                                                                           
Jun 27 08:32:53 rocky-8gb-fsn1-1 heritrix[899796]:         at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:227)                                                                                                                                      
Jun 27 08:32:53 rocky-8gb-fsn1-1 heritrix[899796]:         at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:333)                                                              
Jun 27 08:32:53 rocky-8gb-fsn1-1 heritrix[899796]:         at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:208)                                                                                                                                                             
Jun 27 08:32:53 rocky-8gb-fsn1-1 heritrix[899796]:         at org.springframework.beans.factory.support.BeanDefinitionValueResolver.resolveReference(BeanDefinitionValueResolver.java:330)                                                                                                                                    
Jun 27 08:32:53 rocky-8gb-fsn1-1 heritrix[899796]:         ... 106 more

Answer selected by vnznznz

vnznznz Jun 27, 2023
Author

But yeah @ato as someone not used to the java springboot way of doing things, setting up the contrib stuff like AMQP etc was quite a challenge!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to set up `ExtractorChrome`, fails with `Can't create bean 'fetchProcessors'; Can't create bean 'crawlController'; Can't create bean 'extractorChrome'` #560

{{title}}

Replies: 1 comment 2 replies

{{title}}

{{title}}

{{title}}

Select a reply

Unable to set up ExtractorChrome, fails with Can't create bean 'fetchProcessors'; Can't create bean 'crawlController'; Can't create bean 'extractorChrome' #560

vnznznz Jun 27, 2023

Replies: 1 comment · 2 replies

ato Jun 27, 2023 Maintainer

vnznznz Jun 27, 2023 Author

vnznznz Jun 27, 2023 Author

Unable to set up `ExtractorChrome`, fails with `Can't create bean 'fetchProcessors'; Can't create bean 'crawlController'; Can't create bean 'extractorChrome'` #560

vnznznz
Jun 27, 2023

Replies: 1 comment 2 replies

ato
Jun 27, 2023
Maintainer

vnznznz Jun 27, 2023
Author

vnznznz Jun 27, 2023
Author