FAST Static Ranking
May 23, 2012 1 Comment
FAST has two ways of updating the ranking statistics of the documents. One is the static ranking and the other one is the dynamic ranking. The static ranking is done at crawling time, during the pipeline processing and it is going to be the same for one document on all queries executed on FAST. The dynamic ranking is done at query time using XRANK expressions based on the search terms and also on custom defined rules.
To update the ranking using the static approach you need to:
- Add a custom relevancy field to your data source.
- Create a new pipeline stage to update the ranking initial value.
- Add the new pipeline stage to a pipeline.
- Run the crawler for your data source.
- Execute a query to test the results ranking.
Add a custom field to your data source
This field should represent the weight to be updated on the relevancy. There are several ways to calculate how a document is relevant to your usage profile. The most common way is to monitor the queries executed and the results clicked by the users and execute statistics bases on this data. Unfortunately FAST doesn’t provide an out-of-the-box way to get this type of data and update the ranking statistics. The only feature FAST ESP + Impulse provides to get data from the queries is through the Impulse Reporting feature but it doesn’t provide click tracking, only search result statistics.
Create a new pipeline stage to update the ranking initial value
To create a new pipeline stage you will need 2 files: a Python code file and a XML configuration file. This pipeline stage needs to use your custom field (customrankboost in this example) and change the value of the hwboost field. The default value for the hwboost field is 500,000. The idea is to use the custom field to apply a boost or a penalty to the hwboost field.
Examples of the files:
-Python code file.
In this code customrankboost field value is added to the hwboost field. Your implementation can do whatever you need to update the ranking initial value.
The Python file has a PY extension and should to be placed on the [FAST Root]\esp\lib\python2.3\processors\ folder.
-A XML pipeline configuration file.
The XML file should be place on the [FAST Root]\esp\etc\processors\ folder.
Add the new pipeline stage to a pipeline
After the files are in place, restart the proc_servers processes running on the FAST box and the stage will be available to be added to a FAST ESP pipeline.
Go to the FAST ESP Admin web site.
Go to the Document Processing tab and check if your pipeline stage is available.
On the Document Processing tab, edit a pipeline and add your custom stage to it.
Now your pipeline is ready to process the custom ranking field and change the initial ranking value for your documents.
Run the crawler for your data source
Make sure your custom field on your data source is filled in and run the crawler to call your pipeline with the custom ranking stage.
Execute a query to test the results ranking
After the crawl is done and the documents have been processed, execute a query and test the ranking. You should notice a change on the ranking field on the results for the documents affected by your custom field on the data source.