Apache Solr is one of the best search engines I use(d). It’s fast, simple to use, indexing is reasonably fast, and sports great many features.
One of useful features is dynamic fields, where you may define fields and their type based on field prefix/suffix. Along with ability to store (and retrieve) the field content, this qualifies Solr for near-NoSQL-DB title (at least in my book) – you may store data with almost no schema, with or without strong datatyping.
Interestingly, I was unable to find around the Net any way to retrieve the list of the indexed tags – e.g. if you want to provide faceting (drill-down) facilities, but some/all of your fields are dynamic. Sure, you can retrieve and parse the schema definition, but you won’t get the actual fields indexed that way.
I found a way to do this accidentally, while testing out the new version of Solr. All you have to do is query your data using CSV output formatting, like this:
and you’ll get a CSV list of indexed tags.
One “bad news” is, that you get the full list every time, no matter you use (both &q= or &fq=. Does anyone know how to retrieve this list for a query-defined subset of data?
Wrong Type of Dynamic Field Tag
One thing that haunted me few times, and might get confusing at times, is the question of the dynamic field type, in case you have fields that get matched by several patterns.
The Solr documentation tactfully avoids this question. But the commented schema example file says:
Longer patterns will be matched first. if equal size patterns both match, the first appearing in the schema will be used.
This is nice, and if you have some tags structure such as “tag1”, “tag1-tag2”, “tag1-tag2-tag3”, etc., this may be handy for defining types inside the “hierarchy”. But it goes against the typical “first or last in the configuration”, that you would typically expect.