Faceted search

One of the advantages of Solr is the ability to group results on the basis of the field's contents. This ability to group results using Solr is defined as faceting which can help us in several tasks that we need to do in our everyday work. For instance, getting the number of documents with the same values in a field (such as the companies from the same city) through the ability of value and ranges grouping, to the autocomplete features based on faceting. In this section, I will show you how to handle some of the important and common tasks when using faceting.

Search based on the same value range You have an application that allows the users to search for companies in Europe (for instance), and imagine a situation where your customer wants to have the number of companies in the cities where the companies that were found by the query are located. Just think how frustrating it would be to run several queries to do this. Don't panic, Solr will relieve your frustration and will make this task much easier by using faceting. Let me show you how to do it. Let us assume that we have the following index structure which we have added to our field definition section of our schema.xml file; we will use the city field to do the faceting: <field name="id" type="string" indexed="true" stored="true" required="true" /> <field name="name" type="text" indexed="true" stored="true" /> <field name="city" type="string" indexed="true" stored="true" /> And our example data looks like this: <add> <doc> <field name="id">1</field> <field name="name">Company 1</field> <field name="city">New York</field> </doc> <doc> <field name="id">2</field> <field name="name">Company 2</field> <field name="city">California</field> </doc> <doc> <field name="id">3</field> <field name="name">Company 3</field> <field name="city">New York</field> </doc> </add> Let us suppose that a user searches for the word company. The query will look like this: http://localhost:8080/solr/select?q=name:company&facet=true&facet. field=city The result produced by this query looks like: <?xml version="1.0" encoding="UTF-8"?> <response> <lst name="responseHeader"> <int name="status">0</int> <int name="QTime">1</int> <lst name="params"> <str name="facet">true</str> <str name="facet.field">city</str> <str name="q">name:company</str> </lst> </lst> <result name="response" numFound="3" start="0"> <doc> <str name="city">New York</str> <str name="id">1</str> <str name="name">Company 1</str> </doc> <doc> <str name="city">California</str> <str name="id">2</str> <str name="name">Company 2</str> </doc> <doc> <str name="city">New York</str> <str name="id">3</str> <str name="name">Company 3</str> </doc> </result> <lst name="facet_counts"> <lst name="facet_queries"/> <lst name="facet_fields"> <lst name="city"> <int name="New York">2</int> <int name="California">1</int> </lst> </lst> <lst name="facet_dates"/> </lst> </response> Note Notice that, besides the normal results list, we got the faceting results with the numbers that we wanted. The index structure and data are quite simple and the field we would like to focus on is the city field based on which we would like to fetch the number of companies having the same value of this city field. We query Solr and inform the query parser that we want the documents that have the word company in the title field and indicate that we also wish to enable faceting by using the facet=true parameter. The facet.field parameter tells Solr which field to use to calculate the faceting numbers. Note You are open to specify the facet.field parameter multiple times to get the faceting numbers for different fields in the same query. As you can see in the results list, all types of faceting are grouped in the list with the name="facet_counts" attribute. The field based faceting is grouped under the list with the name="facet_fields" attribute. Every field that you specified using the facet.field parameter has its own list which has the name attribute same as the value of the parameter in the query (in our case, city). Finally, we see the results that we are interested in: the pairs of values (the name attribute) and how many documents have that value in the specified field.

Filter your facet results Imagine a situation where you need to search for books in your eStore or library. If this was only the situation, it would have been very simple to search. Just think of the adds-on of showing the book count which lies between a specific price range! Can Solr handle such a complex situation? I would answer yes, and here we go. Suppose that we have the following index structure which has been added to field definition section of our schema.xml ; we will use the price field to do the faceting: <field name="id" type="string" indexed="true" stored="true" required="true" /> <field name="name" type="text" indexed="true" stored= "true" /> <field name="price" type="float" indexed="true" stored="true" /> Here is our example data: <add> <doc> <field name="id">1</field> <field name="name">Book 1</field> <field name="price">70</field> </doc> <doc> <field name="id">2</field> <field name="name">Book 2</field> <field name="price">100</field> </doc> <doc> <field name="id">3</field> <field name="name">Book 3</field> <field name="price">210.95</field> </doc> <doc> <field name="id">4</field> <field name="name">Book 4</field> <field name="price">99.90</field> </doc> </add> Let us assume that the user searches for a book and wishes to fetch the document count within the price range of 60 to 100 or 200 to 250. Our query will look like this: http://localhost:8080/solr/select?q=name:book&facet=true&facet. query=price:[60 TO 100]&facet.query=price:[200 TO 250] The result list of our query would look like this: <?xml version="1.0" encoding="UTF-8"?> <response> <lst name="responseHeader"> <int name="status">0</int> <int name="QTime">1</int> <lst name="params"> <str name="facet">true</str> <arr name="facet.query"> <str>price:[60 TO 100]</str> <str>price:[200 TO 250]</str> </arr> <str name="q">name:book</str> </lst> </lst> <result name="response" numFound="4" start="0"> <doc> <str name="id">1</str> <str name="name">Book 1</str> <float name="price">70.0</float> </doc> <doc> <str name="id">2</str> <str name="name">Book 2</str> <float name="price">100.0</float> </doc> <doc> <str name="id">3</str> <str name="name">Book 3</str> <float name="price">210.95</float> </doc> <doc> <str name="id">4</str> <str name="name">Book 4</str> <float name="price">99.9</float> </doc> </result> <lst name="facet_counts"> <lst name="facet_queries"> <int name="price:[60 TO 100]">3</int> <int name="price:[200 TO 250]">1</int> </lst> <lst name="facet_fields"/> <lst name="facet_dates"/> </lst> </response> As you can see, the index structure is quite simple and we have already discussed it earlier. So, let's omit it here for now. Next is the query I would like you to pay special attention to. We see a standard query where we instruct Solr that we want to get all the documents that have the word book in the name field (the q=name:book parameter). Then, we say that we want to use faceting by adding the facet=true parameter to the query, that is, we can now pass the query to faceting and as a result, we expect the number of documents that match the given query; in our case, we want two price ranges: 60 to 100 and 200 to 250. We achieved it by adding the facet.query parameter with the appropriate value. The first price range is defined as a standard range query ( price:[60 TO 100] ). The second query is very similar, just different values where we define the other price range ( price:[200 TO 250] ). Note The value passed to the facet.query parameter should be a lucene query written using the default query syntax. As you can see in the result list, the query faceting results are grouped under the <lst name="facet_queries"> XML tag with the names exactly as in the passed queries. You can see that Solr calculated the number of books in each of the price ranges appropriately, which proved to be a perfect solution to our assumption.