org.apache.nutch.indexer
Interface IndexingFilter
- All Superinterfaces:
- org.apache.hadoop.conf.Configurable, Pluggable
- All Known Implementing Classes:
- BasicIndexingFilter, CCIndexingFilter, LanguageIndexingFilter, MoreIndexingFilter, RelTagIndexingFilter
public interface IndexingFilter
- extends Pluggable, org.apache.hadoop.conf.Configurable
Extension point for indexing. Permits one to add metadata to the indexed
fields. All plugins found which implement this extension point are run
sequentially on the parse.
Methods inherited from interface org.apache.hadoop.conf.Configurable |
getConf, setConf |
X_POINT_ID
static final String X_POINT_ID
- The name of the extension point.
filter
Document filter(Document doc,
Parse parse,
org.apache.hadoop.io.Text url,
CrawlDatum datum,
Inlinks inlinks)
throws IndexingException
- Adds fields or otherwise modifies the document that will be indexed for a
parse. Unwanted documents can be removed from indexing by returning a null value.
- Parameters:
doc
- document instance for collecting fieldsparse
- parse data instanceurl
- page urldatum
- crawl datum for the pageinlinks
- page inlinks
- Returns:
- modified (or a new) document instance, or null (meaning the document
should be discarded)
- Throws:
IndexingException
Copyright © 2006 The Apache Software Foundation