org.apache.nutch.crawl
Class Injector

java.lang.Object
  extended by org.apache.hadoop.conf.Configured
      extended by org.apache.nutch.crawl.Injector
All Implemented Interfaces:
org.apache.hadoop.conf.Configurable, org.apache.hadoop.util.Tool

public class Injector
extends org.apache.hadoop.conf.Configured
implements org.apache.hadoop.util.Tool

This class takes a flat file of URLs and adds them to the of pages to be crawled. Useful for bootstrapping the system.


Nested Class Summary
static class Injector.InjectMapper
          Normalize and filter injected urls.
static class Injector.InjectReducer
          Combine multiple new entries for a url.
 
Field Summary
static org.apache.commons.logging.Log LOG
           
 
Constructor Summary
Injector()
           
Injector(org.apache.hadoop.conf.Configuration conf)
           
 
Method Summary
 void inject(org.apache.hadoop.fs.Path crawlDb, org.apache.hadoop.fs.Path urlDir)
           
static void main(String[] args)
           
 int run(String[] args)
           
 
Methods inherited from class org.apache.hadoop.conf.Configured
getConf, setConf
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface org.apache.hadoop.conf.Configurable
getConf, setConf
 

Field Detail

LOG

public static final org.apache.commons.logging.Log LOG
Constructor Detail

Injector

public Injector()

Injector

public Injector(org.apache.hadoop.conf.Configuration conf)
Method Detail

inject

public void inject(org.apache.hadoop.fs.Path crawlDb,
                   org.apache.hadoop.fs.Path urlDir)
            throws IOException
Throws:
IOException

main

public static void main(String[] args)
                 throws Exception
Throws:
Exception

run

public int run(String[] args)
        throws Exception
Specified by:
run in interface org.apache.hadoop.util.Tool
Throws:
Exception


Copyright © 2006 The Apache Software Foundation