FAST Static Ranking

FAST has two ways of updating the ranking statistics of the documents. One is the static ranking and the other one is the dynamic ranking. The static ranking is done at crawling time, during the pipeline processing and it is going to be the same for one document on all queries executed on FAST. The dynamic ranking is done at query time using XRANK expressions based on the search terms and also on custom defined rules.

To update the ranking using the static approach you need to:

  • Add a custom relevancy field to your data source.
  • Create a new pipeline stage to update the ranking initial value.
  • Add the new pipeline stage to a pipeline.
  • Run the crawler for your data source.
  • Execute a query to test the results ranking.

Add a custom field to your data source

This field should represent the weight to be updated on the relevancy. There are several ways to calculate how a document is relevant to your usage profile. The most common way is to monitor the queries executed and the results clicked by the users and execute statistics bases on this data. Unfortunately FAST doesn’t provide an out-of-the-box way to get this type of data and update the ranking statistics. The only feature FAST ESP + Impulse provides to get data from the queries is through the Impulse Reporting feature but it doesn’t provide click tracking, only search result statistics.

Create a new pipeline stage to update the ranking initial value

To create a new pipeline stage you will need 2 files: a Python code file and a XML configuration file. This pipeline stage needs to use your custom field (customrankboost in this example) and change the value of the hwboost field. The default value for the hwboost field is 500,000. The idea is to use the custom field to apply a boost or a penalty to the hwboost field.

Examples of the files:

-Python code file.

In this code customrankboost field value is added to the hwboost field. Your implementation can do whatever you need to update the ranking initial value.

The Python file has a PY extension and should to be placed on the [FAST Root]\esp\lib\python2.3\processors\ folder.

-A XML pipeline configuration file.

The XML file should be place on the [FAST Root]\esp\etc\processors\ folder.

Add the new pipeline stage to a pipeline

After the files are in place, restart the proc_servers processes running on the FAST box and the stage will be available to be added to a FAST ESP pipeline.

Go to the FAST ESP Admin web site.

Go to the Document Processing tab and check if your pipeline stage is available.

On the Document Processing tab, edit a pipeline and add your custom stage to it.

Now your pipeline is ready to process the custom ranking field and change the initial ranking value for your documents.

Run the crawler for your data source

Make sure your custom field on your data source is filled in and run the crawler to call your pipeline with the custom ranking stage.

Execute a query to test the results ranking

After the crawl is done and the documents have been processed, execute a query and test the ranking. You should notice a change on the ranking field on the results for the documents affected by your custom field on the data source.

See you,

Amadeu.

FAST Impulse – User name or password is incorrect on BizMan

Out of a sudden we started getting a message about user names and passwords being wrong on BizMan in a FAST ESP + Impulse environment.

The users were trying to access the BizMan which is the Impulse tool to create and manage search profiles, facets, promotions, search terms and search result boosts.

The error message was “incorrect user name or password“. This lead us to think somebody changed his password and didn’t remember it. But then we checked all the users we knew were having the same issue and seeing same error message. Suspicious, isn’t it?

So we checked the Impulse BizMan database and, believe you or not, on the ImpulseBizMan.users table we could find all the passwords stored as clear text. We checked the passwords on the table and they all matched the ones we used to try to log in to BizMan.

I checked the error messages on all log files on the ESP\var\logs\adminserver folder and found no error messages about user’s logging process.

Since BizMan service/web site depends on the EML Server to get all the information from the Impulse BizMan database I toh. When I checked the logs for the EML Server I found lots of entries about an invalid character in one of the category ids in the category tree. The message was:

ERROR 2012-01-11 11:49:55,823 (no.fast.ecomm.taxonomy.dao.impl.UpdateTaxonomyTask:55) – The value of attribute “id” associated with an element type “category” must not contain the ‘<‘ character.
no.fast.esp.taxonomy.util.TaxonomyParserException: The value of attribute “id” associated with an element type “category” must not contain the ‘<‘ character.
at no.fast.esp.taxonomy.util.TaxonomyParser.readTaxonomy(TaxonomyParser.java:66)

I opened the category tree file and noticed it had been updated a few hours before. Good cue!

Reading the file we could find that on one of the categories we had a XML syntax error (missing a double quote):

<category id="categoryID><![CDATA[Category Name]]></category>

The correct syntax should be:

<category id="categoryID"><![CDATA[Category Name]]></category>

After fixing the file I had to restart the EML Server using the command:

nctrl restart emlserver

BizMan took a few minutes to reconnect to the EML Server and when it was done all users could log in successfully to BizMan.

As I buddy of mine said: “Seems they randomized error messages on the FAST to make it harder for you to debug”. I completely agree.

See you,

Amadeu.

FAST Impulse – JDBC Connector Out of Memory Expection

We run a FAST ESP + Impulse implementation as our search platform. The main data load activities use the ELXT file format and the EXLT and JDBC Connectors to load documents into FAST index.

When we added a new content source to our loading process and created a new JDBC connector instance to load its data, we started getting an out of memory exception from the JDBC connector.

The error message I found on the connector’s log was:

Exception in thread “main” java.lang.OutOfMemoryError: Java heap space
at com.microsoft.sqlserver.jdbc.TDSPacket.<init>(Unknown Source)
at
com.microsoft.sqlserver.jdbc.TDSReader.readPacket(Unknown Source)
at
com.microsoft.sqlserver.jdbc.TDSReader.readPacket(Unknown Source)
at
com.microsoft.sqlserver.jdbc.TDSReader.readResponse(Unknown Source)
at
com.microsoft.sqlserver.jdbc.TDSCommand.startResponse(Unknown Source)
at
com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.doExecutePreparedStatement(Unknown
Source)
at
com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement$PrepStmtExecCmd.doExecute(Unknown
Source)
at
com.microsoft.sqlserver.jdbc.TDSCommand.execute(Unknown Source)
at
com.microsoft.sqlserver.jdbc.SQLServerConnection.executeCommand(Unknown
Source)
at
com.microsoft.sqlserver.jdbc.SQLServerStatement.executeCommand(Unknown
Source)
at
com.microsoft.sqlserver.jdbc.SQLServerStatement.executeStatement(Unknown
Source)
at
com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.executeQuery(Unknown
Source)
at
com.fastsearch.components.jdbcconnector.JdbcDocumentIterator.<init>(JdbcDocumentIterator.java:256)
at
com.fastsearch.components.jdbcconnector.JdbcAccess.iterator(JdbcAccess.java:224)
at
com.fastsearch.components.jdbcconnector.JdbcConnector.processDocuments(JdbcConnector.java:835)
at
com.fastsearch.components.jdbcconnector.JdbcConnector.runConnector(JdbcConnector.java:803)
at
com.fastsearch.components.jdbcconnector.JdbcConnector.main(JdbcConnector.java:1897)
Exception in thread “SubmitterThread” java.lang.OutOfMemoryError: Java
heap space
at
org.apache.commons.httpclient.ChunkedInputStream.exhaustInputStream(ChunkedInputStream.java:367)
at
org.apache.commons.httpclient.ContentLengthInputStream.close(ContentLengthInputStream.java:117)
at java.io.FilterInputStream.close(FilterInputStream.java:159)
at
org.apache.commons.httpclient.AutoCloseInputStream.notifyWatcher(AutoCloseInputStream.java:176)
at
org.apache.commons.httpclient.AutoCloseInputStream.close(AutoCloseInputStream.java:140)
at
org.apache.commons.httpclient.HttpMethodBase.releaseConnection(HttpMethodBase.java:1078)
at
com.fastsearch.esp.content.http.SessionFactory.releaseConnection(SessionFactory.java:259)
at
com.fastsearch.esp.content.http.Session.processCall(Session.java:477)
at
com.fastsearch.esp.content.http.Session.process(Session.java:372)
at
com.fastsearch.esp.content.http.Dispatcher.send(Dispatcher.java:1104)
at
com.fastsearch.esp.content.http.ContentManager.submitContentOperations(ContentManager.java:247)
at
com.fastsearch.esp.content.http.ContentManager.submitContentOperations(ContentManager.java:206)
at
com.fastsearch.esp.content.feeding.DocumentSubmitter.doSubmitBatch(DocumentSubmitter.java:279)
at
com.fastsearch.esp.content.feeding.DocumentSubmitter.submitBatch(DocumentSubmitter.java:258)
at
com.fastsearch.esp.content.feeding.DocumentSubmitter.run(DocumentSubmitter.java:170)
at java.lang.Thread.run(Thread.java:595)
Full thread dump Java HotSpot(TM) Client VM (1.5.0_22-b03 mixed mode, sharing):

“DestroyJavaVM” prio=6 tid=0x00108510 nid=0xfb4 waiting on condition
[0x00000000..0x000bfab0]

“BatchTimerThread” prio=6 tid=0x03876588 nid=0x1b00 waiting on
condition [0x03e1f000..0x03e1fc30]
at java.lang.Thread.sleep(Native Method)
at
com.fastsearch.esp.content.http.BatchTimer.run(BatchTimer.java:32)
at java.lang.Thread.run(Thread.java:595)

“CallbackPollThread” prio=6 tid=0x035e33a8 nid=0xd24 in Object.wait()
[0x03bbf000..0x03bbfcb0]
at java.lang.Object.wait(Native Method)
– waiting on <0x24924490> (a java.lang.Object)
at java.lang.Object.wait(Object.java:474)
at
com.fastsearch.esp.content.http.Session.waitForActiveBatches(Session.java:292)
– locked <0x24924490> (a java.lang.Object)
at com.fastsearch.esp.content.http.Session.run(Session.java:244)
at java.lang.Thread.run(Thread.java:595)

“CallbackHandlerThread” prio=6 tid=0x033fa800 nid=0xec0 in
Object.wait() [0x03d9f000..0x03d9fd30]
at java.lang.Object.wait(Native Method)
– waiting on <0x249244f8> (a java.lang.Object)
at java.lang.Object.wait(Object.java:474)
at
no.fast.util.SynchronizedQueue.dequeue(SynchronizedQueue.java:148)
– locked <0x249244f8> (a java.lang.Object)
at
no.fast.util.SynchronizedQueue.dequeue(SynchronizedQueue.java:97)
at
com.fastsearch.esp.content.http.CallbackHandler.run(CallbackHandler.java:145)
at java.lang.Thread.run(Thread.java:595)

“MultiThreadedHttpConnectionManager cleanup” daemon prio=6
tid=0x03610bb0 nid=0x1970 in Object.wait() [0x03c3f000..0x03c3fab0]
at java.lang.Object.wait(Native Method)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:120)
– locked <0x2488d058> (a java.lang.ref.ReferenceQueue$Lock)
at
org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ReferenceQueueThread.run(MultiThreadedHttpConnectionManager.java:1082)

“Thread-0” daemon prio=6 tid=0x034daa98 nid=0xd34 waiting on condition
[0x03a2f000..0x03a2fbb0]
at java.lang.Thread.sleep(Native Method)
at
org.apache.log4j.helpers.FileWatchdog.run(FileWatchdog.java:95)

“Low Memory Detector” daemon prio=6 tid=0x00f7ba68 nid=0x16bc runnable
[0x00000000..0x00000000]

“CompilerThread0” daemon prio=10 tid=0x00faaf38 nid=0x161c waiting on
condition [0x00000000..0x0326fa10]

“Signal Dispatcher” daemon prio=10 tid=0x00f7b0e0 nid=0xdc waiting on
condition [0x00000000..0x00000000]

“Finalizer” daemon prio=8 tid=0x00f78de8 nid=0x1688 in Object.wait()
[0x0316f000..0x0316fa30]
at java.lang.Object.wait(Native Method)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:120)
– locked <0x24833758> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:136)
at
java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)

“Reference Handler” daemon prio=10 tid=0x00f78240 nid=0x1978 in
Object.wait() [0x030ef000..0x030efab0]
at java.lang.Object.wait(Native Method)
– waiting on <0x248337e0> (a java.lang.ref.Reference$Lock)
at java.lang.Object.wait(Object.java:474)
at
java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
– locked <0x248337e0> (a java.lang.ref.Reference$Lock)

“VM Thread” prio=10 tid=0x00fa2d48 nid=0x1ba0 runnable

“VM Periodic Task Thread” prio=10 tid=0x00f6af18 nid=0xefc waiting on condition

Reading the error message we could conclude the heap was running out of memory while processing the documents we were trying to load.

Checking the JAVA virtual machine configuration we found out the maximum memory for JAVA processes was set to 64 MB.

We changed it to 300 MB specifically for the JDBC connector.

In order to do it, we changed the JAVA options on the connector.windows.conf file under the FAST_ESP_Root\components\jdbcconnector\bin folder.

The options parameter was set to: JAVA_OPTS=-Xmx300m

After it we restarted the connector using the command: nctrl restart jdbc_connector_name

Immediately the documents started been processed with no error messages on the JDBC connector log.

See you,

Amadeu.

FAST Impulse – SQL Queries to Monitor the JDBC Connector Execution

If you run FAST ESP + Impulse and use the Impulse and the JDBC connectors to load data into FAST you probably need to monitor how the connector is running and processing documents for each of your collections.

I use the following SQL queries to get the current status of the items on the Impulste Items database (ImpulseItems schema).

This query shows all the documents per collection and per different statuses:

SELECT collection_name, update_flag, COUNT(*)
FROM ImpulseItems.status (NOLOCK)
GROUP BY collection_name, update_flag
ORDER BY collection_name

This query shows the statuses fo the documents for a specific collection:

SELECT update_flag, COUNT(*)
FROM ImpulseItems.status (NOLOCK)
WHERE collection_name = '[collection name]'
GROUP BY update_flag

The status is defined by the update_flag field.

The update_flag field value means:

  • < -1: document is being deleted by one of the JDBC connector instances. This number represents the number of the JDBC process defined on the NodeConf.xml file.
  • -1: document should be deleted when the JDBC connector instance runs.
  • 0: no updates on the document / document doesn’t need to be processed.
  • 1: document has been update and needs to be processed.
  • between 2 and 221: document is being processed by one of the JDBC connector instances. This number represents the number of the JDBC process defined on the NodeConf.xml file.
  • 222: document is being loaded by the EXLT connector.
  • 333: document is locked by EML server.

It is important for you to run the queries using the NO LOCK query hint in order to avoid interference on the execution of the connector (no blocking on SQL Server processes).

See you,

Amadeu.

How to clear a FAST Collection

I’ve struggled a lot lots of times when I needed to clear a collection on FAST ESP + Impulse because the most common commands available on the documentation for this task do not always work.
I’ve used several of them:

  • collection-admin –m clearcollection –n <collection name>
  • indexeradmin –a cleancollection <collectionname>

They seemed to work but when I checked the collection detailed information I could yet see the documents there and I could also see the documents on the FAST Impulse Search page.

The only process to clear a collection which always worked is:

  • Generate a list of all documents internal ids on the target collection and save it to a file.
    •  indexerinfo reportcontents <collection name> > documents.txt
    • It will save the list internal ids on the documents.txt file.
  • Run a command to delete all documents list on the file you generated on the previous step.
    • indexeradmin rdocs file.txt <collection name> <execution number>
    • This command will get all internal ids from the documents.txt and send to the indexer admin to delete. The execution number normally will be 1 but if you need to send more than 4000 documents to be deleted you might need to run these commands more than one time, increasing the execution number. The indexeradmin rdocs command can only process 4000 documents at a time.
It might take a little bit to FAST to delete all the documents but it will work.
You can check the progress by going to the collection detailed information page on FAST Admin or running the command to show the doc count for a collection:
indexerinfo -a doccount <collectionname>
See you around this world,
Amadeu.