parasolr documentation¶
Code Documentation¶
Solr Client¶
Base and Exceptions¶
- class parasolr.solr.base.ClientBase(session=None)[source]¶
Base object with common communication methods for talking to Solr API.
- Parameters
session (
Optional
[Session
]) – A python-requestsrequests.Session
.
- build_url(solr_url, collection, handler)[source]¶
Return a url to a handler based on core and base url.
- make_request(meth, url, headers=None, params=None, data=None, wrap=True, allowed_responses=None, **kwargs)[source]¶
Make an HTTP request to Solr. May optionally specify a list of allowed HTTP status codes for this request. Responses will be logged as errors if they are not in the list, but only responses with 200 OK status will be loaded as JSON.
- Parameters
meth (
str
) – HTTP method to use.url (
str
) – URL to make request to.params (
Optional
[dict
]) – Params to use as form-fields or query-string params.allowed_responses (
Optional
[list
]) – HTTP status codes that are allowed for this request; if not set, defaults to 200 OK**kwargs (
Any
) – Any other kwargs for the request.
- Return type
Optional
[AttrDict
]
- exception parasolr.solr.base.ImproperConfiguration[source]¶
Raised when a required setting is not present or is an invalid value.
Client and Search API¶
- class parasolr.solr.client.BaseResponse(response)[source]¶
Base Solr response class with fields common to standard and grouped results.
- _process_facet_counts(facet_counts)[source]¶
Convert facet_fields and facet_ranges to OrderedDict.
- Parameters
facet_counts (
AttrDict
) – Solr facet_counts field.- Return type
- Returns
Solr facet_counts field
- property expanded¶
expanded portion of the response, if collapse/expanded results enabled
- property highlighting¶
highlighting portion of the response, if highlighting was requested
- property params¶
parameters sent to solr in the request, as returned in response header
- property stats¶
stats portion of the response, if statics were requested
- class parasolr.solr.client.GroupedResponse(response)[source]¶
Query response variant for grouped results.
- Parameters
response (
Dict
) – A Solr query response
- class parasolr.solr.client.ParasolrDict(*args, **kwargs)[source]¶
A subclass of
attrdict.AttrDict
that can convert itself to a regular dictionary.- as_dict()[source]¶
Copy attributes from self as a dictionary, and recursively convert instances of
ParasolrDict
.
- class parasolr.solr.client.QueryResponse(response)[source]¶
Thin wrapper to give access to Solr select responses.
- Parameters
response (
Dict
) – A Solr query response
- class parasolr.solr.client.SolrClient(solr_url, collection, commitWithin=None, session=None)[source]¶
Class to aggregate all of the other Solr APIs and settings.
- Parameters
- collection = ''¶
core or collection name
- commitWithin = 1000¶
commitWithin time in ms
- core_admin_handler = 'admin/cores'¶
CoreAdmin API handler
- query(wrap=True, **kwargs)[source]¶
Perform a query with the specified kwargs.
- Parameters
**kwargs (
Any
) – Any valid Solr search parameters.- Return type
- Returns
A search QueryResponse.
- schema_handler = 'schema'¶
Schema API handler
- select_handler = 'select'¶
Select handler
Schema API¶
Module with class and methods for the Solr Schema API.
- class parasolr.solr.schema.Schema(solr_url, collection, handler, session=None)[source]¶
Class for managing Solr Schema API
- Parameters
- get_schema()[source]¶
Get the full schema for a Solr collection or core.
- Return type
AttrDict
- Returns
Schema as returned by Solr.
- list_fields(fields=None, includeDynamic=False, showDefaults=False)[source]¶
Get a list of field definitions for a Solr Collection or core.
Update API¶
- class parasolr.solr.update.Update(solr_url, collection, handler, commitWithin, session=None)[source]¶
API client for Solr update functionality.
- Parameters
solr_url (
str
) – Base url for Solr.handler (
str
) – Handler for Update API.session (
Optional
[Session
]) – A python-requestsrequests.Session
.
Core Admin API¶
- class parasolr.solr.admin.CoreAdmin(solr_url, handler, session=None)[source]¶
API client for Solr core admin.
- Parameters
solr_url (
str
) – Base url for Solr.handler (
str
) – Handler for CoreAdmin APIsession (
Optional
[Session
]) – A python-requestsrequests.Session
Schema¶
Solr schema configuration and management.
Extend SolrSchema
for your project and configure
the fields, field types, and copy fields you want defined in Solr.
Fields should be defined using SolrField
and field types
with SolrAnalyzer
and SolrFieldType
.
For example:
from parasolr import schema
class MySolrSchema(schema.SolrSchema):
'''Project Solr schema configuration'''
# field declarations
author = schema.SolrField('text_en')
author_exact = schema.SolrStringField()
title = schema.SolrField('text_en')
title_nostem = schema.SolrStringField()
subtitle = schema.SolrField('text_en')
collections = schema.SolrField('text_en', multivalued=True)
#: copy fields, for facets and variant search options
copy_fields = {
'author': 'author_exact',
'collections': 'collections_s',
'title': ['title_nostem', 'title_s'],
'subtitle': 'subtitle_s',
}
Copy fields should be a dictionary of source and destination fields; both single value and list are supported for destination.
If you want to define a custom field type, you can define an analyzer for use in one or more field type declarations:
class UnicodeTextAnalyzer(schema.SolrAnalyzer):
'''Solr text field analyzer with unicode folding. Includes all standard
text field analyzers (stopword filters, lower case, possessive, keyword
marker, porter stemming) and adds ICU folding filter factory.
'''
tokenizer = 'solr.StandardTokenizerFactory'
filters = [
{"class": "solr.StopFilterFactory", "ignoreCase": True,
"words": "lang/stopwords_en.txt"},
{"class": "solr.LowerCaseFilterFactory"},
{"class": "solr.EnglishPossessiveFilterFactory"},
{"class": "solr.KeywordMarkerFilterFactory"},
{"class": "solr.PorterStemFilterFactory"},
{"class": "solr.ICUFoldingFilterFactory"},
]
class SolrTextField(schema.SolrTypedField):
field_type = 'text_en'
class MySolrSchema(schema.SolrSchema):
'''Schema configuration with custom field types'''
text_en = schema.SolrFieldType('solr.TextField',
analyzer=UnicodeFoldingTextAnalyzer)
content = SolrTextField()
To update your configured solr core with your schema, run:
python manage.py solr_schema
This will automatically find your SolrSchema
subclass and
apply changes. See solr_schema
manage command documentation for more details.
- class parasolr.schema.SolrAnalyzer[source]¶
Class to declare a solr field analyzer with tokenizer and filters, for use with
SolrFieldType
.- filters = None¶
list of the filters to apply
- tokenizer = None¶
string name of the tokenizer to use
- class parasolr.schema.SolrField(fieldtype, required=False, multivalued=False, default=None, stored=True)[source]¶
A descriptor for declaring a solr field on a
SolrSchema
instance.- Parameters
- Raises
AttributeError – If
__set__
is called.
- class parasolr.schema.SolrFieldType(field_class, analyzer, **kwargs)[source]¶
A descriptor for declaring and configure a solr field type on
- Parameters
- Raises
AttributeError – If __set__ is called.
- class parasolr.schema.SolrSchema[source]¶
Solr schema configuration.
- classmethod configure_copy_fields(solr)[source]¶
Update configured Solr instance schema with copy fields.
- Parameters
solr (
SolrClient
) – Configured Solr Schema.- Return type
- classmethod configure_fields(solr)[source]¶
Update the configured Solr instance schema to match the configured fields.
Calls
configure_copy_fields()
after new fields have been created and before old fields are removed, since an outdated copy field could prevent removal.- Parameters
solr (
SolrClient
) – A configured Solr instance schem.- Return type
AttrDefault
- Returns
attrdict.AttrDefault
with counts for added, updated, and deleted fields.
- classmethod configure_fieldtypes(solr)[source]¶
Update the configured Solr instance so the schema includes the configured field types, if any.
- Parameters
solr (
SolrClient
) – A configured Solr instance.- Return type
AttrDefault
- Returns
attrdict.AttrDefault
with counts for updated and added field types.
- copy_fields = {}¶
dictionary of copy fields to be configured key is source field, value is destination field or list of fields
- classmethod get_configuration()[source]¶
Find a SolrSchema subclass for use as schema configuration. Currently only supports one schema configuration.
- classmethod get_field_names()[source]¶
iterate over class attributes and return all that are instances of
SolrField
.
- classmethod get_field_types()[source]¶
iterate over class attributes and return all that are instances of
SolrFieldType
.- Return type
- Returns
List of attriubtes that are
SolrFieldType
.
Indexing¶
Model-based indexing with Solr.
Items to be indexed in Solr should extend Indexable
. The
default implementation should work for most Django models; at a minimum
you should extend Indexable.index_data()
to include the information
to be indexed in Solr. You may also customize Indexable.index_item_type()
and Indexable.index_item_id()
.
To manually index content in Solr, see
index
manage command documentation.
- class parasolr.indexing.Indexable[source]¶
Mixin for objects that are indexed in Solr. Subclasses must implement index_id and index methods.
When implementing an Indexable subclass where items_to_index returns something like a generator, which does not expose either a count method or can be counted with len, for use with the Django index manage command you should implement total_to_index and return the number of items to be indexed.
- ID_SEPARATOR = '.'¶
id separator for auto-generated index ids
- classmethod all_indexables()[source]¶
Find all
Indexable
subclasses for indexing. Ignore abstract and proxyIndexable
subclasses such asModelIndexable
.
- index_chunk_size = 150¶
number of items to index at once when indexing a large number of items
- index_data()[source]¶
Dictionary of data to index in Solr for this item. Default implementation adds
index_id()
andindex_item_type()
- index_id()[source]¶
Solr identifier. By default, combines
index item_type()
andid
with :attr:ID_SEPARATOR`.
- classmethod index_item_type()[source]¶
Label for this kind of indexable item. Must be unique across all Indexable items in an application. By default, uses Django model verbose name. Used in default index id and in index manage command.
- classmethod index_items(items, progbar=None)[source]¶
Indexable class method to index multiple items at once. Takes a list, queryset, or generator of Indexable items or dictionaries. Items are indexed in chunks, based on
Indexable.index_chunk_size
.- Parameters
items – list, queryset, or generator of indexable objects or dictionaries
progbar – optional
progressbar.Progressbar
object tochunks. (update when indexing items in) –
- Returns
Total number of items indexed
- classmethod items_to_index()[source]¶
Get all items to be indexed for a single class of Indexable content. Subclasses can override this method to return a custom iterable, e.g. a Django QuerySet that takes advantage of prefetching. By default, returns all Django objects for a model. Raises NotImplementedError if that fails.
- classmethod prep_index_chunk(chunk)[source]¶
Optional method for any additional processing on chunks of items being indexed. Intended to allow adding prefetching on a chunk when iterating on Django QuerySets; since indexing uses Iterator, prefetching configured in items_to_index is ignored.
- remove_from_index()[source]¶
Remove the current object from Solr by identifier using
index_id()
- solr = None¶
solr connection
QuerySet¶
Object-oriented approach to Solr searching and filtering modeled
on django.models.queryset.QuerySet
. Supports iteration,
slicing, counting, and boolean check to see if a search has results.
Filter, search and sort methods return a new queryset, and can be chained. For example:
SolrQuerySet(solrclient).filter(item_type_s='person') .search(name='hem*') .order_by('sort_name')
If you are working with Django you should use
parasolr.django.SolrQuerySet
,
which will automatically initialize a new parasolr.django.SolrClient
if one is not passed in.
- class parasolr.query.queryset.EmptySolrQuerySet(*args, **kwargs)[source]¶
Marker class that can be used to check if a given queryset is empty via
isinstance()
:assert isinstance(SolrQuerySet().none(), EmptySolrQuerySet) -> True assert isinstance(queryset, EmptySolrQuerySet) # True if empty
- class parasolr.query.queryset.SolrQuerySet(solr)[source]¶
A Solr queryset object that allows for object oriented searching and filtering of Solr results. Allows search results to be pagination using slicing, count, and iteration.
- ANY_VALUE = '[* TO *]'¶
any value constant
- LOOKUP_SEP = '__'¶
lookup separator
- static _lookup_to_filter(key, value, tag='')[source]¶
Convert keyword/value argument, with optional lookups separated by
__
, including: in and exists. Field names should NOT include double-underscores by convention. Accepts an optional tag argument to specify an exclude tag as needed. :rtype:str
Returns: A propertly formatted Solr query string.
- _set_faceting_opts(query_opts)[source]¶
Configure faceting attributes directly on query_opts. Modifies dictionary directly.
- Return type
- _set_group_opts(query_opts)[source]¶
Configure grouping atrtibutes on query_opts. Modifies dictionary directly.
- Return type
- _set_highlighting_opts(query_opts)[source]¶
Configure highlighting attributes on query_opts. Modifies dictionary directly.
- Return type
- _set_stats_opts(query_opts)[source]¶
Configure stats attributes directly on query_opts. Modifies dictionary directly.
- Return type
- also(*args, **kwargs)[source]¶
Use field limit option to return the specified fields, optionally provide aliases for them in the return. Works exactly the same way as
only()
except that it does not any previously specified field limits.- Return type
- default_search_operator = 'AND'¶
by default, combine search queries with AND
- facet(*args, **kwargs)[source]¶
Request facets for specified fields. Returns a new SolrQuerySet with Solr faceting enabled and facet.field parameter set. Does not support ranged faceting.
Subsequent calls will reset the facet.field to the last set of args in the chain.
For example:
qs = queryset.facet('person_type', 'age') qs = qs.facet('item_type_s')
would result in
item_type_s
being the only facet field.- Return type
- facet_field(field, exclude='', **kwargs)[source]¶
Request faceting for a single field. Returns a new SolrQuerySet with Solr faceting enabled and the field added to the list of facet fields. Any keyword arguments will be set as field-specific facet configurations.
ex
will specify a related filter query tag to exclude when generating counts for the facet.- Return type
- facet_range(field, **kwargs)[source]¶
Request range faceting for a single field. Returns a new SolrQuerySet with Solr range faceting enabled and the field added to the list of facet fields. Keyword arguments such as start, end, and gap will be set as field-specific facet configurations.
- Return type
- filter(*args, tag='', **kwargs)[source]¶
Return a new SolrQuerySet with Solr filter queries added. Multiple filters can be combined either in a single method call, or they can be chained for the same effect. For example:
queryset.filter(item_type_s='person').filter(birth_year=1900) queryset.filter(item_type_s='person', birth_year=1900)
A tag may be specified for the filter to be used with facet.field exclusions:
queryset.filter(item_type_s='person', tag='person')
To provide a filter that should be used unmodified, provide the exact string of your filter query:
queryset.filter('birth_year:[1800 TO *]')
You can also search for pre-defined using lookups on a field, for example:
queryset.filter(item_type_s__in=['person', 'book']) queryset.filter(item_type_s__exists=False)
Currently supported field lookups: :rtype:
SolrQuerySet
in : takes a list of values; supports ‘’ or None to match on field not set
exists: boolean filter to look for any value / no value
- range: range query. Takes a list or tuple of two values
for the start and end of the range. Either value can be unset for an open-ended range (e.g. year__range=(1800, None))
- get_facets()[source]¶
Return a dictionary of facet information included in the Solr response. Includes facet fields, facet ranges, etc. Facet field results are returned as an ordered dict of value and count.
- get_response(**kwargs)[source]¶
Query Solr and get the results for the current query and filter options. Populates result cache and returns the documents portion of the reponse.
- get_result_document(doc)[source]¶
Method to transform document results. Default behavior is to convert from attrdict to dict.
- get_results(**kwargs)[source]¶
Query Solr and get the results for the current query and filter options. Populates result cache and returns the documents portion of the reponse. (Note that this method is not currently compatible with grouping.)
- get_stats()[source]¶
Return a dictionary of stats information in Solr format or None on error.
- Return type
- group(field, **kwargs)[source]¶
“Configure grouping. Takes arbitrary Solr group parameters and adds the group. prefix to them. Example use, grouping on a group_id field, limiting to three results per group, and sorting group members by an order field:
queryset.group('group_id', limit=3, sort='order asc')
- Return type
- highlight(field, **kwargs)[source]¶
“Configure highlighting. Takes arbitrary Solr highlight parameters and adds the hl. prefix to them. Example use:
queryset.highlight('content', snippets=3, method='unified')
- Return type
- only(*args, replace=True, **kwargs)[source]¶
Use field limit option to return only the specified fields. Optionally provide aliases for them in the return. Subsequent calls will replace any previous field limits. Example:
queryset.only('title', 'author', 'date') queryset.only('title:title_t', 'date:pubyear_i')
- Return type
- order_by(*args)[source]¶
Apply sort options to the queryset by field name. If the field name starts with -, sort is descending; otherwise ascending.
- Return type
- query(**kwargs)[source]¶
Return a new SolrQuerySet with the results populated from Solr. Any options passed in via keyword arguments take precedence over query options on the queryset.
- Return type
- query_opts()[source]¶
Construct query options based on current queryset configuration. Includes filter queries, start and rows, sort, and search query.
- raw_query_parameters(**kwargs)[source]¶
Add abritrary raw parameters to be included in the query request, e.g. for variables referenced in join or field queries. Analogous to the input of the same name in the Solr web interface.
- Return type
- search(*args, **kwargs)[source]¶
Return a new SolrQuerySet with search queries added. All queries will combined with the default search operator when constructing the q parameter sent to Solr..
- Return type
- stats(*args, **kwargs)[source]¶
Request stats for specified fields. Returns a new SolrQuerySet with Solr faceting enabled and stats.field parameter set.
Subsequent calls will reset the stats.field to the last set of args in the chain.
For example:
qs = queryset.stats('person_type', 'age') qs = qs.stats('account_start_i')
would result in
account_start_i
being the only facet field.Any kwargs will be prepended with
stats.
. You may also pass local parameters along with field names, i.e.{!ex=filterA}account_start_i
.- Return type
- class parasolr.query.aliased_queryset.AliasedSolrQuerySet(*args, **kwargs)[source]¶
Extension of
SolrQuerySet
with support for aliasing Solr fields to more readable versions for use in code. To use, extend this class and define a dictionary offield_aliases
with the same syntax you would when callingonly()
. Those field aliases will be set as the default initial value forfield_list
, and aliases can be used in all extended methods.- _unalias_kwargs_with_lookups(**kwargs)[source]¶
convert alias name to solr field for keys in kwargs with support for __ lookups for filters
- facet(*args, **kwargs)[source]¶
Extend
parasolr.query.queryset.SolrQuerySet.facet()
to support using aliased field names in args.- Return type
- facet_field(field, exclude='', **kwargs)[source]¶
Extend
parasolr.query.queryset.SolrQuerySet.facet_field`()
to support using aliased field names for field parameter.- Return type
- field_aliases = {}¶
map of application-specific, readable field names to actual solr fields (i.e. if using dynamic field types)
- filter(*args, tag='', **kwargs)[source]¶
Extend
parasolr.query.queryset.SolrQuerySet.filter()
to support using aliased field names for keyword argument keys.- Return type
- get_facets()[source]¶
Extend
parasolr.query.queryset.SolrQuerySet.get_facets()
to use aliased field names for facet and range facet keys.
- get_stats()[source]¶
Extend
parasolr.query.queryset.SolrQuerySet.get_stats()
to return return aliased field names for field_list keys.
- group(field, **kwargs)[source]¶
Extend
parasolr.query.queryset.SolrQuerySet.group()
to support using aliased field names in kwargs. (Note that sorting does not currently support aliased field names).- Return type
- highlight(field, **kwargs)[source]¶
Extend
parasolr.query.queryset.SolrQuerySet.highlight()
to support using aliased field names in kwargs.- Return type
- only(*args, **kwargs)[source]¶
Extend
parasolr.query.queryset.SolrQuerySet.only`()
to support using aliased field names for args (but not kwargs).- Return type
- order_by(*args)[source]¶
Extend
parasolr.query.queryset.SolrQuerySet.order_by`()
to support using aliased field names in sort arguments.- Return type
- stats(*args, **kwargs)[source]¶
Extend
parasolr.query.queryset.SolrQuerySet.stats()
to support using aliased field names in args.- Return type
Django¶
Indexing¶
This module provides indexing support for Django models. Also see
Indexable
.
To use, add ModelIndexable
as a mixin to the model class
you want to be indexed. At minimum, you’ll want to extend the
index_data method to include the data you want in the indexed:
def index_data(self):
index_data = super().index_data()
# if there are some records that should not be included
# return id only. This will blank out any previously indexed
# values, and item will not be findable by type.
# if not ...
# del index_data['item_type']
# return index_data
# add values to index data
index_data.update({
...
})
return index_data
You can optionally extend items_to_index()
and index_item_type()
.
QuerySet¶
Provides SolrQuerySet
subclasses that
will automatically use SolrClient
if
no solr client is passed on.
- class parasolr.django.queryset.AliasedSolrQuerySet(solr=None)[source]¶
Combination of
SolrQuerySet
andAliasedSolrQuerySet
- class parasolr.django.queryset.SolrQuerySet(solr=None)[source]¶
SolrQuerySet
subclass that will automatically useSolrClient
if no solr client is passed on.:param Optional
parasolr.solr.client.SolrClient
:
Signals¶
This module provides on-demand reindexing of Django models when they change, based on Django signals. To use this signal handler, import import it in the ready method of a django app. This will automatically bind connect any configured signal handlers:
from django.apps import AppConfig
class MyAppConfig(AppConfig):
name = 'myapp'
def ready(self):
# import and connect signal handlers for Solr indexing
from parasolr.django.signals import IndexableSignalHandler
To configure index dependencies, add a property on any
ModelIndexable
subclass with the
dependencies and signals that should trigger reindexing. Example:
class MyModel(ModelIndexable):
index_depends_on = {
'collections': {
'post_save': signal_method,
'pre_delete': signal_method
}
}
The keys of the dependency dict can be:
an attribute on the indexable model (i.e., the name of a many-to-many relationship); this will bind an additional signal handler on the m2m relationship change.
an attribute on a related model using django queryset notation (use this for a secondary many-to many relationship, e.g. collections__authors)
a string with the model name in app.ModelName notation, to find and load a model directly
The dictionaries for each related model or attribute should contain:
a key with the
django.db.models.signals
signal to binda signal handler to bind
Currently attribute lookup only supports many-to-many and reverse many-to-many relationships.
Typically you will want to bind post_save and pre_delete for many-to-many relationships.
- class parasolr.django.signals.IndexableSignalHandler[source]¶
Signal handler for indexing Django model-based indexables. Automatically identifies and binds handlers based on configured index dependencies on indexable objects..
- static connect()[source]¶
bind indexing signal handlers to save and delete signals for
Indexable
subclassess and any indexing dependencies
- static handle_delete(sender, instance, **kwargs)[source]¶
remove from index on delete if an instance of
ModelIndexable
- static handle_relation_change(sender, instance, action, **kwargs)[source]¶
index on add, remove, and clear for
ModelIndexable
instances
- static handle_save(sender, instance, **kwargs)[source]¶
reindex on save if an instance of
ModelIndexable
Views¶
- class parasolr.django.views.SolrLastModifiedMixin(**kwargs)[source]¶
View mixin to add last modified headers based on Solr. By default, searches entire solr collection and returns the most recent last modified value (assumes last_modified field). To filter for items specific to your view, either set
solr_lastmodified_filters
or implementget_solr_lastmodified_filters()
.Constructor. Called in the URLconf; can contain helpful extra keyword arguments, and other things.
- dispatch(request, *args, **kwargs)[source]¶
Wrap the dispatch method to add a last modified header if one is available, then return a conditional response.
- get_solr_lastmodified_filters()[source]¶
Get filters for last modified Solr query. By default returns
solr_lastmodified_filters
.
- last_modified()[source]¶
Return last modified
datetime.datetime
from the specified Solr query
- solr_lastmodified_filters = {}¶
solr query filter for getting last modified date
Manage Commands¶
Solr schema¶
solr_schema is a custom manage command to update the configured schema definition for the configured Solr instance. Reports on the number of fields that are added or updated, and any that are out of date and were removed.
Example usage:
python manage.py solr_schema
Index¶
index is a custom manage command to index content into Solr. It should only be run after your schema has been configured via solr_schema.
By default, indexes _all_ indexable content.
You can optionally index specific items by type or by index id. Default index types are generated based on model verbose names.
A progress bar will be displayed by default if there are more than 5 items to process. This can be suppressed via script options.
You may optionally request the index or part of the index to be cleared before indexing, for use when index data has changed sufficiently that previous versions need to be removed.
Example usage:
# index everything
python manage.py index
# index specific items
python manage.py index person:1 person:1 location:2
# index one kind of item only
python manage.py index -i person
# suppress progressbar
python manage.py index --no-progress
# clear everything, then index everything
python manage.py index --clear all
# clear and then index one kind of item
python manage.py index --clear person --index person
# clear everything, index nothing
python manage.py index --clear all --index none
Getting Started¶
To use parasolr
you need a Solr installation that you can connect to. Once you have Solr set up,
use solr start
to make sure it’s running, and then create a new core: solr create -c core_name
.
To interact with solr, use the SolrClient
included in parasolr.
It should be initialized with the URL for your Solr installation and the name of the core you want to query:
from parasolr.solr.client import SolrClient
solr_url = "http://localhost:8983/solr"
solr_core = "core_name"
solr = SolrClient(solr_url, solr_core)
Now you can index some data. The index method takes a list of dictionaries; note that any content you include must be JSON-serializable. For example, to index data from a CSV file:
solr.update.index([{
"id": row["id"],
"name": row["name"],
"tags": row['tags'].split('|')
# etc ...
} for row in csv])
To query the data you’ve indexed, initialize a SolrQuerySet
, passing it
the solr client you used before:
from parasolr.query import SolrQuerySet
queryset = SolrQuerySet(solr)
queryset = queryset.search('search string').order_by('name')
results = queryset.get_results(rows=20)
results
contains a list of dictionaries that you’re can manipulate or display as needed.
To remove records from your solr core, you can delete based on a query. For example, to delete all indexed items:
solr.update.delete_by_query('*:*')
CHANGELOG¶
0.8.2¶
SolrQuerySet
now supports Solr grouping via new group method and GroupedResponseNew class method prep_index_chunk on
Indexable
class, to support prefetching related objects when iterating over Django querysets for indexingInclude django view mixins in sphinx documentation
Dropped support for python 3.6; added python 3.9
Dropped support for Django 2.2; added Django 3.2
No longer tested against Solr 6.6
0.8.2¶
When subclassing
SolrQuerySet
, result documents can now be customized by extendingget_result_document
0.8.1¶
Exclude proxy models when collecting indexable subclasses
0.8¶
Pytest fixture
mock_solr_queryset
now takes optional argument for extra methods to include in fluent interfaceSolrQuerySet
now supports highlighting on multiple fields viahighlight
method, with per-field highlighting options.AliasedSolrQuerySet
now correctly aliases fieldnames in highlighting results.Adopted black & isort python style and configured pre-commit hook
0.7¶
Dropped support for Python 3.5
Now tested against Python 3.6, 3.8, Django 2.2—3.1, Solr 6 and Solr 8
Continuous integration migrated from Travis-CI to GitHub Actions
bugfix: in some cases, index script was wrongly detecting ModelIndexable subclasses as abstract and excluding them; this has been corrected
ModelIndexable now extends
django.db.models.Model
; existing code MUST be updated to avoid double-extending ModelDefault index data has been updated to use a dynamic field
item_type_s
instead ofitem_type
so that basic setup does not require customizing the solr schema.ModelIndexable.get_related_model
now supports ForeignKey relationships and django-taggitTaggableManager
when identifying depencies for binding signal handlers
0.6.1¶
bugfix: fix regression in SolrQuerySet get_stats in 0.6
0.6¶
Solr client now escalates 404 errors instead of logging with no exception
Schema field declarations now support the
stored
optionSchema field type declarations now pass through arbitrary options
New method
total_to_index
onparasolr.indexing.Indexable
to better support indexing content that is returned as a generatorAccess to expanded results now available on QueryResponse and SolrQuerySet
SolrQuerySet no longer wraps return results from
get_stats
andget_facets
with QueryResponseNew last-modified view mixin for use with Django views
parasolr.django.views.SolrLastModifiedMixin
New pytest fixture
mock_solr_queryset
to generate a Mock SolrQuerySet that simulates the SolrQuerySet fluent interface
0.5.4¶
Only enable pytest plugin when parasolr is in Django installed apps and a Solr connection is configured
0.5.3 —
Support default option adding fields to solr schema
Add utility method to convert Solr timestamp to python datetime
0.5.2¶
bugfix: correct queryset highlighting so it actually works
Revise pytest plugin code to work on non-django projects
0.5.1¶
bugfix: SolrQuerySet improved handling for Solr errors
0.5¶
Support for on-demand indexing for Django models based on signals; see
parasolr.django.signals
; adds a Django-specific indexable classparasolr.django.indexing.ModelIndexable
pytest plugin to disconnect django signal handlers
Django pytest fixture for an empty solr
Adds an EmptySolrQuerySet class, as a simpler way to check for empty results
0.4¶
parasolr.query.SolrQuery
additional support for stats:New method
stats
to enable stats for a set of field names.New method
get_stats
to return the entire stats reponse.
0.3¶
parasolr.query.SolrQuerySet
additional support for faceting:New method
facet_field
for more fine-grained facet feature control for a single facet fieldNew method
facet_range
for enabling range facetingSupports tag and exclusion logic via
tag
option onfacet_field
method andexclude
option onfilter
get_facets
now returns the entire facet response, including facet fields, range facets, etc.
SolrQuerySet.filter()
method now supports the following advanced lookups:in: filter on a list of values
exists: filter on empty or not-empty
range: filter on a numeric range
New method
SolrQuerySet.also()
that functions just likeonly()
except it adds instead of replacing field limit options.New
parasolr.query.AliasedSolrQuerySet
supports aliasing Solr fields to local names for use across all queryset methods and return valuesparasolr.indexing.Indexable
now providesitems_to_index()
method to support customizing retrieving items for indexing withindex
manage command.
0.2¶
Subquent calls to
SolrQuerySet.only()
now replaces field limit options rather than adding to them.New SolrQuerySet method
raw_query_parameters
SolrQuerySet now has support for faceting via
facet
method to configure facets on the request andget_facets
to retrieve them from the response.Update
ping
method ofparasolr.solr.admin.CoreAdmin
so that a 404 response is not logged as an error.Refactor
parsolr.solr
tests into submodules
0.1.1¶
Fix travis-ci build for code coverage reporting.
0.1¶
Lightweight python library for Solr indexing, searching and schema management with optional Django integration.
Minimal Python Solr API client
Logic for updating and managing Solr schema
Indexable mixin for Django models
QuerySet for querying Solr in an object-oriented fashion similar to Django QuerySet
Django Solr client with configuration from Django settings
Django manage command to configure Solr schema
Django manage command to index subclasses of Indexable
pytest plugin for unit testing against a test Solr instance in Django
Basic Sphinx documentation
parasolr is a lightweight python library for Apache Solr indexing, searching and schema management with optional Django integration. It includes a Solr client (parasolr.solr.SolrClient). When used with Django, it provides management commands for updating your Solr schema configuration and indexing content.
Currently tested against Python 3.8 and 3.9, Solr 8.6.2, and Django 3.0-3.2 and without Django.
Installation¶
Install released version from pypi:
pip install parasolr
To install an unreleased version from GitHub:
pip install git+https://github.com/Princeton-CDH/parasolr@develop#egg=parasolr
To use with Django:
Add parasolr to INSTALLED_APPS
Configure SOLR_CONNECTIONS in your django settings:
SOLR_CONNECTIONS = { 'default': { 'URL': 'http://localhost:8983/solr/', 'COLLECTION': 'name', # Any configSet in SOLR_ROOT/server/solr/configsets. # The default configset name is "_default" as of Solr 7. # For Solr 6, "basic_configs" is the default. 'CONFIGSET': '_default' } }
Define a SolrSchema with fields and field types for your project.
Run
solr_schema
manage command to configure your schema; it will prompt to create the Solr core if it does not exist.
Note
The SolrSchema must be imported somewhere for it to be found automatically.
Development instructions¶
This git repository uses git flow branching conventions.
Initial setup and installation:
Recommmended: create and activate a Python 3.6 virtualenv:
python3 -m venv parasolr source parasolr/bin/activate
Install the package with its dependencies as well as development dependencies:
pip install -e . pip install -e '.[dev]'
Install pre-commit hooks¶
Install configured pre-commit hooks (currently black and isort):
pre-commit install
Styling was instituted in version 0.8; as a result, git blame
may not reflect the true author of a given line. In order to see a more accurate git blame
execute the following command:
git blame <FILE> –ignore-revs-file .git-blame-ignore-revs
Or configure your git to always ignore the black revision commit:
git config blame.ignoreRevsFile .git-blame-ignore-revs
Unit testing¶
Unit tests are written with pytest but use some Django test classes for compatibility with Django test suites. Running the tests requires a minimal settings file for Django-required configurations.
Copy sample test settings and add a secret key:
cp ci/testsettings.py testsettings.py python -c "import uuid; print('\nSECRET_KEY = \'%s\'' % uuid.uuid4())" >> testsettings.py
By default, parasolr expects Solr 8. If running tests with an earlier version of Solr, either explicitly change MAJOR_SOLR_VERSION in your local testsettings.py or set the environment variable:
export SOLR_VERSION=x.x.x
To run the test, either use the configured setup.py test command:
python setup.py test
Or install test requirements in and use pytest directly:
pip install -e '.[test]' pytest
License¶
parasolr is distributed under the Apache 2.0 License.
©2019 Trustees of Princeton University. Permission granted via Princeton Docket #20-3619 for distribution online under a standard Open Source license. Ownership rights transferred to Rebecca Koeser provided software is distributed online via open source.