JSONAlchemy¶

JSONAlchemy provides an abstraction layer on top of your database to work with JSON objects, helping the administrators to define the data model of their site independent of the master format they are working with and letting the developers work in a controlled and uniform data environment.

Module Structure
Configuration
How it Works
How to Extend JSONAlchemy Behaviour
Invenio Use Cases
API Documentation
- Core
- Extensions

WIP, Finish with the empty sections:

Module Structure: Explain what is each folder for and maybe the namespaces.
Readers:
How it works: explain how everything works together, how the magic happens. Maybe a small example (record centric?).
How to Extend JSONAlchemy Behaviour
Invenio Use Cases: pointer to records, annotations and documents documentation (where real ‘how to’ stile documentation is place for each of them).

Module Structure¶

Configuration¶

JSONAlchemy works with two different configuration files, one for the field definitions and the second one for the models.

Field Configuration Files¶

This is an example (it might not be 100% semantically correct) of the definition of the field ‘title’:

title:
    schema:
        {'title': {'type': 'dict', 'required': False}}
    creator:
        @legacy((("245", "245__","245__%"), ""),
                ("245__a", "title", "title"),
                ("245__b", "subtitle"),
                ("245__k", "form"))
        marc, '245..', {'title': value['a'], 'subtitle': value[b]}
        dc, 'dc:title', {'title': value}
        unimarc, '200[0,1]_', {'title': value['a'], 'subtitle': value['e']}
    producer:
        json_for_marc(), {'a': 'title', 'b': 'subtitle'}
        json_for_dc(), {'dc:title': ''}
        json_for_unimarc(), {'a': 'title', 'e': 'subtitle'}

A field definition is made out several sections, each of them identified by its indentation (like in python).

In this example exist the three most common sections that a field can have: schema, creator and producer. Even though there could be more sections, we will explain only the ones that Invenio provides, In fact, the aforementioned ones are inside the core of the JSONAlchemy, while the rest are already defined as extensions by Invenio. Be aware that new sections could come via extensions.

Each of these sections adds some information to the dictionary representing the field definition. For example, the dictionary generated for the field defined above would be something like:

{'aliases': [],
 'extend': False,
 'override': False,
 'pid': None,
 'producer': {
     'json_for_marc': [((), {'a': 'title', 'b': 'subtitle'})],
     'json_for_dc': [(), {'dc:title': ''}]
     'json_for_unimarc': [(), {'a': 'title', 'e': 'subtitle'}]}
 'rules': {
     'json': [
         {'decorators': {'after': {}, 'before': {}, 'on': {}},
         'function': <code object <module> at 0x10f173030, file "", line 1>,
         'source_format': 'json',
         'source_tags': ['title'],
         'type': 'creator'}],
     'marc': [
         {'decorators': {'after': {}, 'before': {}, 'on': {'legacy': None}},
         'function': <code object <module> at 0x10f10de30, file "", line 1>,
         'source_format': 'marc',
         'source_tags': ['245__'],
         'type': 'creator'}]}}

Only one field is shown here, but one file could contain from one up to n field definitions. Check out the atlantis.cfg file from the Invenio demo site to get a quick view about how the configuration file for your fields should look like.

For the BFN lovers, this is something close to the grammar used to parse this:

rule       ::= [pid | extend | override]
            json_id ["," aliases]":"
                body
json_id    ::= (letter|"_") (letter|digit|_)*
aliases    ::= json_id ["," aliases]

pid        ::= @persitent_identifier( level )
extend     ::= @extend
override   ::= @override

body       ::=(creator* | derived | calculated) (extensions)*

creator    ::= [decorators] format "," tag "," expr
derived    ::= [decorators] expr
calculated ::= [decorators] expr

Creator¶

The creator is the one of most important parts of the field definition: Inside it, the content of the field is created, while the way this happens depends on its origin.

The creator section is the one used to define the fields that are coming directly from the input file and don’t depend on any type of calculation from another source. We also call this kind of field a real field.

This section can be made out of one or several lines, each one representing the translation of the field, from whatever the input format is, into JSON.

For example:

marc, '245..', {'title': value['a'], 'subtitle': value[b]}

This tells us that any field that matches the regular expression 245.. (more regular expressions could be specified space separated), the master format marc will be used, and that the transformation {'title': value['a'], 'subtitle': value[b]} will be applied.

The transformation must be a valid python expression as it will be evaluated as such. In it, the value of the field with which we are dealing with is available as value (typically a dictionary). This python expression can also be a function call. This function can either be imported via the __import__() function or implemented in the /functions folder, the contents of which it are imported automatically.

For each master format that we want to deal with we need to have a Reader, we will see afterwards what that is and how to create one. A reader for JSON and for MARC21 is provided by default with Invenio. See Readers for more information about readers and How to Extend JSONAlchemy Behaviour to learn how to write your own reader.

Along with each creator rule there could be one or more decorators (like in python). We will describe the default decorators that are implemented and how to do more later in the Decorators section.

Derived¶

When a field is derived from a source that is not the input file and needs to be calculated only when the source it depends on changes (this is expected to happen infrequently) it is called a derived virtual field.

An example of a virtual field could be something like this:

number_of_authors:
    derived:
        @depends_on('authors')
        len(self['authors'])

This section is similar to the previous one, creator, but in this case each line is just a valid python expression.

Calculated¶

Another type of virtual fields are the ones which values’ change a lot over time; for example the number of comments that a record inside Invenio has or the number of reviews that a paper has.

In these cases we use calculated field definitions. Following the example of the number of comments, this could be its definition:

number_of_comments:
    calculated:
        @parse_first('recid')
        @memoize(30)
        get_number_of_comments(self.get('recid'))

The way that a calculate rule is defined is the same as for the derived fields.

One important point about the calculated fields is caching. One field could be:

Always cached - until someone (some other module) changes its name
Cached for a period of time - like in the example,
Not cached at all - so its value is calculated every time.

See the Decorators for more information about this.

Schema¶

Here we can specify the schema or structure that the field should follow. This is done using nicolaiarocci/cerberus and you read the documentation on how to use it in read the docs.

JSONAlchemy only adds two things to the default cerberus:

The force boolean value that tells if the value of the filed needs to be casted to type.
The default function (which has no parameters) that is used if the field has a default value.

An example of the schema section could be:

schema:
    {'uuid': {'type':'uuid', 'required': True, 'default': lambda: str(__import__('uuid').uuid4())}}

Description¶

This is an special section as it could be used without the block:

uuid:
    """
    This is the main persistent identifier of a document and will be
    used
    internally as this, therefore the pid important should always
    be '0'.
    """

recid:
    description:
        """Record main identifier. """

Both cases have the same syntax (triple-quoted strings a-la python) and the same end result.

Note

The docstrings are not used anywhere else but inside the configuration files, for now. The plan is to use them to build the sites data model documentation using spinxh, therefore is quite important to write them and keep them updated.

JSON¶

Not all the fields that we want to use have a JSON-friendly representation. Consider a date that we would like to use as a datetime object, yet we want to store it as a JSON object.

To solve this issue, we introduced the JSON section where a couple of functions:

loads to load the JSON representation from the database into an object, and
dumps which does the opposite.

A clear example of that is the creation_date field:

creation_date:
    json:
        dumps, lambda d: d.isoformat()
        loads, lambda d: __import__('dateutil').parser.parse(d)

Both functions take only one argument, which is the value of the field.

Producer¶

Generating a different output from a JSON object is not always easy: there might be implications among fields or rules. For this reason the producer section was introduced. The producer section can also be seen as a kind of documentation on how a field is exported to different formats and which formats those are.

This is an example of its use:

title:
    creator:
        marc, "245..", { 'title':value['a'], 'subtitle': value['b']}
    producer:
        json_for_marc(), {'a': 'title', 'b': 'subtitle'}
        json_for_dc(), {'dc:title': 'title'}

Each rule inside the producer section follows the same pattern: first we specify the function that we want to use (what we want to produce), which should be placed inside the /producers folder. This is not a real function call, but only a way to specify which producer we will use and which parameter we would like to use for this field. In the case of the MARC21 producer we can put 245__ as parameter, so that only if title originated from a 245__ MARC21 field this function will be used to generate the output. This parameter could be used differently depending on each producer.

The second part, after the comma, is the rule that we will apply and it is typically a dictionary. In the case of the MARC21 producer we can put full name of the field as key, 245__a, or just the subfield like in the example. The value for this key could a function call, a subfield or even empty (if we want to use the entire field as a value).

For more information about the MARC22 producer please check JSON for MARC documentation.

Inside any JSONAlchemy object, like records or documents, there is a method, produce(producer_code, fields=None), that uses this and outputs a dictionary with a certain “flavor”. This new representation of the JSON object could be used elsewhere, for example in the formatter module, to generate the desired output in a easier way than only using the JSON object.

Decorators¶

Like python decorators, field decorators could be used either to add extra information to the field itself or to modify the translation process that creates the field.

There are two different types of field decorators, one that decorates the entire field and the other that decorates one creator/derived/calculated rule. As well as for the sections in the field definition new decorators could be defined to extend the current ones.

Field Decorators¶

This type of decorators should be used outside of the field definition and affects the whole field, maybe adding some information to the dictionary that defines it.

Invenio provides three different field decorators:

@persitent_identifier(int): Identifies a field as a PID with a priority, which could later be accessed using the persistent_identifiers property
@override: As its name points out, it allows us to completely override the field definition.
@extend: Allows us extend an existing field with, for example, new creator rules.

Note

There are currently no extensions for this type of decorators. It is in the road map to allow each Invenio instace to extend these decorators with any other that they might need.

Rule Decorators¶

This other type of decorators applies to the creator/derived/calculated rules. For example:

authors:
"""List with all the authors, connected with main_author and rest_authors"""
derived:
    @parse_first('_first_author', '_additional_authors')
    @connect('_first_author', sync_authors)
    @connect('_additional_authors', sync_authors)
    @only_if('_firs_author' in self or '_additional_authors' in self)
    util_merge_fields_info_list(self, ['_first_author', '_additional_authors'])

These decorators are applied only if the derived rule of the field authors is applied.

The rule decorators are split into three different kinds depending on when they are evaluated: before the rule gets evaluated, during the evaluation of the rule and after the rule evaluation.

This is the list of rule decorators available in Invenio and what they are used for.

connect(field_name, handler=None)

This is an post-evaluation decorator that allows the connection between fields. This connection is bidirectional: if the connected field gets modified, then the decorated field also gets modified and vice versa.

The optional handler function will be called whenever there is any modification in any of the fields. The default behavior is to propagate the value across all the connected fields.

depends_on(*field_names)

This decorator acts before rule evaluation and tells JSONAlchemy whether the rule will be evaluated depending on the existence of the field_names inside the current JSON object.

If the fields are not in the JSON object and their rules have not been evaluated yet, then it will try to evaluate them before failing.

legacy(master_format, legay_field, matching)

An on-evaluation decorator that adds some legacy information to the rule that its being applied. The master format is not important if dealing with a creator rule (it will be derived from the rule), otherwise it needs to specified. The matching argument is typically a tuple where we connect the legacy field with the subfields.

memoize(life_time=0)

This post-evaluation decorator only works with calculated fields. It creates a cached value of the field that is decorated for a determined time.

only_if_master_value(*boolean_expresions)

On-evaluation decorator that gives access to the current master value. It is typically used to evaluate one rule only if the master value matches a series of conditions.

The boolean expression could be any python expression that is evaluated to True or Flase.

only_if(*boolean_expresions)

Like the previous one, but in this case we don’t have access to the current master value, only to the current JSON object.

parse_first(*field_names)

This could be seen as a lighter version of depends_on. However, in this case the rule will be evaluated even if the fields names are not inside the JSON object - it only triggers parsing the rules for the fields.

For more information about the decorators, and also about the other extensions, check the Parsers section.

Note

Be aware that, right now, the order of the decorators is not respected.

Model Configuration File¶

Readers¶

How it Works¶

How to Extend JSONAlchemy Behaviour¶

Invenio Use Cases¶

API Documentation¶

This documentation is automatically generated from JSONAlchemy’s source code.

Core¶

Bases¶

General extensions for JSON objects.

JSONAlchemy allows the developer to extend the behavior or capabilities of the JSON objects using extensions. For more information about how extensions works check invenio.modules.jsonalchemy.jsonext.parsers.extension_model_parser.ExtensionModelParser.

class invenio.modules.jsonalchemy.bases.Versionable¶

Versionable behavior for JSONAlchemy models.

update()¶: Create new revision of the object and saves link to the old one.

Errors¶

JSONAlchemy errors.

exception invenio.modules.jsonalchemy.errors.FieldParserException¶: Raised when some error happens parsing field definitions.

exception invenio.modules.jsonalchemy.errors.JSONAlchemyException¶: Base exception.

exception invenio.modules.jsonalchemy.errors.ModelParserException¶: Raised when some error happens parsing model definitions.

exception invenio.modules.jsonalchemy.errors.ReaderException¶: Raised when some error happens reading a blob.

Base Model and Field Parser¶

invenio.modules.jsonalchemy.parser._create_field_parser()¶

Create a parser that can handle field definitions.

BFN like grammar:

rule       ::= [pid | extend | override]
               json_id ["," aliases]":"
                   body
json_id    ::= (letter|"_") (letter|digit|_)*
aliases    ::= json_id ["," aliases]

pid        ::= @persistent_identifier( level )
extend     ::= @extend
override   ::= @override
hidden     ::= @hidden

body       ::=(creator* | derived | calculated) (extensions)*

creator    ::= [decorators] format "," tag "," expr
derived    ::= [decorators] expr
calculated ::= [decorators] expr

To check the syntactics of the parser extensions or decorators please go to invenio.modules.jsonalchemy.jsonext.parsers

invenio.modules.jsonalchemy.parser._create_model_parser()¶

Create a parser that can handle model definitions.

BFN like grammar:

TODO

Note: Unlike the field configuration files where you can specify more than one field inside each file for the models only one definition is allowed by file.

class invenio.modules.jsonalchemy.parser.FieldParser(namespace)¶

Field definitions parser.

classmethod decorator_after_extensions()¶: TODO.

classmethod decorator_before_extensions()¶: TODO.

classmethod decorator_on_extensions()¶: TODO.

classmethod field_definition_model_based(field_name, model_name, namespace)¶

Get the real field definition based on the model name.

Based on a model name (and namespace) it gets the real field definition.

classmethod field_definitions(namespace)¶

Get all the field definitions from a given namespace.

If the namespace does not exist, it tries to create it first

classmethod field_extensions()¶: Get the field parser extensions from the parser registry.

classmethod legacy_field_matchings(namespace)¶

Get all the legacy mappings for a given namespace.

If the namespace does not exist, it tries to create it first

See:	guess_legacy_field_names()

classmethod reparse(namespace)¶

Reparse all the fields.

Invalidate the cached version of all the fields inside the given namespace and parse them again.

class invenio.modules.jsonalchemy.parser.ModelParser(namespace)¶

Record model parser.

classmethod model_definitions(namespace)¶

Get all the model definitions given a namespace.

If the namespace does not exist, it tries to create it first.

classmethod parser_extensions()¶: Get only the model parser extensions from the parser registry.

classmethod reparse(namespace)¶

Invalidate the cached version of all the models.

It does it inside the given namespace and parse it again.

classmethod resolve_models(model_list, namespace)¶

Resolve all the field conflicts.

From a given list of model definitions resolves all the field conflicts and returns a new model definition containing all the information from the model list. The field definitions are resolved from left-to-right.

Parameters:	model_list – It could be also a string, in which case the model definition is returned as it is.
Returns:	Dictionary containing the union of the model definitions.

Base Reader¶

class invenio.modules.jsonalchemy.reader.Reader(json, blob=None, **kwargs)¶

Base reader.

classmethod add(json, fields, blob=None, fetch_model_info=False)¶

Add the list of fields to the json structure.

If fields is None it adds all the possible fields from the current model.

Parameters:	json – Any `SmartJson` object fields – Dict of fields to be added to the json structure containing field_name:json_id

classmethod process_model_info(json)¶

Process model information.

Fetches all the possible information about the current models and applies all the model extensions evaluate methods if any extension is used.

classmethod set(json, field, value=None, set_default_value=False)¶

Set new field value to json object.

When adding a new field to the json object finds as much information about it as possible and attaches it to the json object inside json['__meta_metadata__'][field].

Parameters:	json – Any `SmartJson` object field – Name of the new field to be added value – New value for the field (if not `None`) set_default_value – If set to `True` looks for the default value if any and sets it.

static split_blob(blob, schema=None, **kwargs)¶

Specify how to split the blob by single record.

In case of several records inside the blob this method specify how to split then and work one by one afterwards.

classmethod translate(blob, json_class, master_format='json', **kwargs)¶

Transform the incoming blob into a json structure (json_class).

It uses the rules described in the field and model definitions.

Parameters:	blob – incoming blob (like MARC) json_class – Any subclass of `SmartJson` master_format – Master format of the input blob. kwargs – parameter to pass to json_class
Returns:	New object of `json_class` type containing the result of the translation

classmethod update(json, fields, blob=None, update_db=False)¶

Update the fields given from the json structure.

Parameters:	json – Any `SmartJson` object blob – incoming blob (like MARC), if `None`, `json.get_blob` will be used to retrieve it if needed. fields – List of fields to be updated, if `None` all fields will be updated. save – If set to `True` a ‘soft save’ will be performed with the changes.

classmethod update_meta_metadata(json, blob=None, fields=None, section=None, keep_core_values=True, store_backup=True)¶

Update the meta-metadata for a guiven set of fields.

If it is None all fields will be used.

Registries¶

Storage Engine Interface¶

class invenio.modules.jsonalchemy.storage.Storage(model, **kargs)¶

Default storage engine interface.

create()¶: Create underlying empty storage.

drop()¶: Drop data from underlying storage.

get_field_values(ids, field, repetitive_values=True, count=False, include_recid=False, split_by=0)¶

Return a list of field values for field for the given ids.

Parameters:	ids – list (or iterable) of integers repetitive_values – if set to True, returns all values even if they are doubled. If set to False, then return unique values only. count – in combination with repetitive_values=False, adds to the result the number of occurrences of the field. split – specifies the size of the output.

get_fields_values(ids, fields, repetitive_values=True, count=False, include_recid=False, split_by=0)¶

Return a dictionary of field values for field for the given ids.

As in get_field_values() but in this case returns a dictionary with each of the fields and the list of field values.

get_many(ids)¶: Return an iterable of json objects which id is inside ids.

get_one(id)¶: Return the json matching the id.

save_many(jsons, ids=None)¶: Store many JSON as elements on the iterable jsons.

save_one(json, id=None)¶: Store one json in the storage system.

search(query)¶

Retrieve all entries which match the query JSON prototype document.

This method should not be used on storage engines without native JSON support (e.g., MySQL). Returns a cursor over the matched documents.

Parameters:	query – dictionary specifying the search prototype document

update_many(jsons, ids=None)¶: Update many json objects following the same rule as update_one.

update_one(json, id=None)¶

Update one JSON.

If id is None a field representing the id is expected inside the JSON object.

Default Validator¶

class invenio.modules.jsonalchemy.validator.Validator(schema=None, transparent_schema_rules=True, ignore_none_values=False, allow_unknown=True)¶

Cerberus validator.

static force_type(document, field, type_)¶: Force field content to type.

Wrappers¶

JSONAlchemy wrappers.

class invenio.modules.jsonalchemy.wrappers.SmartJson(json=None, set_default_values=False, process_model_info=False, **kwargs)¶

Base class for Json structures.

additional_info¶: Shortcut to __meta_metadata__.__additional_info__.

continuable_errors¶: Shortcut to __meta_metadata__.__continuable_errors__.

dumps(without_meta_metadata=False, with_calculated_fields=False, clean=False, keywords=None, filter_hidden=False)¶

Create the JSON friendly representation of the current object.

Parameters:

without_meta_metadata – by default False, if set to True all the __meta_metadata__ will be removed from the output.
wit_calculated_fields – by default the calculated fields are not dump, if they are needed in the output set it to True
clean – if set to True all the keys stating with _ will be removed from the ouput
keywords – list of keywords to dump. if None, return all

Returns:

JSON friendly object

errors¶: Shortcut to __meta_metadata__.__errors__.

get(key, default=None, reset=False, **kwargs)¶: Like in dict.get.

get_blob(*args, **kwargs)¶

To be override in the specific class.

Should look for the original version of the file where the json came from.

items(without_meta_metadata=False)¶: Like in dict.items.

iteritems(without_meta_metadata=False)¶: Like in dict.items.

keys(without_meta_metadata=False)¶: Like in dict.keys.

loads(without_meta_metadata=False, with_calculated_fields=True, clean=False)¶

Create the BSON representation of the current object.

Parameters:	without_meta_metadata – if set to `True` all the `__meta_metadata__` will be removed from the output. wit_calculated_fields – by default the calculated fields are in the output, if they are not needed set it to `False` clean – if set to `True` all the keys stating with `_` will be removed from the ouput
Returns:	JSON friendly object

meta_metadata¶: Shortcut to __meta_metadata__.

model_info¶: Shortcut to __meta_metadata__.__model_info__.

produce(producer_code, fields=None)¶

Create a different flavor of JSON depending on procuder_code.

Parameters:	producer_code – One of the possible producers listed in the producer section inside the field definitions. fields – List of fields that should be present in the output, if None all fields from self will be used.
Returns:	It depends on each producer, see producer folder inside jsonext, typically dict.

set_default_values(fields=None)¶

Set default value for the fields using the schema definition.

Parameters:	fields – List of fields to set the default value, if None all.

validate(validator=None)¶

Validate using current JSON content using Cerberus.

See: (Cerberus)[http://cerberus.readthedocs.org/en/latest].

Parameters:	validator – Validator to be used, if None `Validator`

class invenio.modules.jsonalchemy.wrappers.SmartJsonLD(json=None, set_default_values=False, process_model_info=False, **kwargs)¶

Utility class for JSON-LD serialization.

get_context(context)¶

Return the context definition identified by the parameter.

If the context is not found in the current namespace, the received parameter is returned as is, the assumption being that a IRI was passed.

Parameters:	context – context identifier

get_jsonld(context, new_context={}, format='full')¶

Return the JSON-LD serialization.

Param:	context the context to use for raw publishing; each SmartJsonLD instance is expected to have a default context associated.
Param:	new_context the context to use for formatted publishing, usually supplied by the client; used by the ‘compacted’, ‘framed’, and ‘normalized’ formats.
Param:	format the publishing format; can be ‘full’, ‘inline’, ‘compacted’, ‘expanded’, ‘flattened’, ‘framed’ or ‘normalized’. Note that ‘full’ and ‘inline’ are synonims, referring to the document form which includes the context; for more information see: [http://www.w3.org/TR/json-ld/]

translate(context_name, context)¶

Translate object to fit given JSON-LD context.

Should not inject context as this will be done at publication time.

class invenio.modules.jsonalchemy.wrappers.StorageEngine(name, bases, dct)¶

Storage metaclass for parsing application config.

storage_engine¶

Return an instance of storage engine defined in application config.

It looks for key “ENGINE’ prefixed by __storagename__.upper() for example:

class Dummy(SmartJson):
    __storagename__ = 'dummy'

will look for key “DUMMY_ENGINE” and “DUMMY_`DUMMY_ENGINE.__name__.upper()`” should contain dictionary with keyword arguments of the engine defined in “DUMMY_ENGINE”.

Extensions¶

Engines¶

class invenio.modules.jsonalchemy.jsonext.engines.cache.CacheStorage(**kwargs)¶

Implement storage engine for Flask-Cache useful for testing.

create()¶: See create().

drop()¶: See create().

get_field_values(ids, field, repetitive_values=True, count=False, include_recid=False, split_by=0)¶: See get_field_values().

get_fields_values(ids, fields, repetitive_values=True, count=False, include_recid=False, split_by=0)¶: See get_fields_values().

get_many(ids)¶: See get_many().

get_one(id)¶: See get_one().

save_many(jsons, ids=None)¶: See save_many().

save_one(data, id=None)¶: See save_one().

search(query)¶: See search().

update_many(jsons, ids=None)¶: See update_many().

update_one(data, id=None)¶: See update_one().

class invenio.modules.jsonalchemy.jsonext.engines.memory.MemoryStorage(**kwargs)¶

Implement in-memory storage engine.

create()¶: See create().

drop()¶: See create().

get_field_values(ids, field, repetitive_values=True, count=False, include_recid=False, split_by=0)¶: See get_field_values().

get_fields_values(ids, fields, repetitive_values=True, count=False, include_recid=False, split_by=0)¶: See get_fields_values().

get_many(ids)¶: See get_many().

get_one(id)¶: See get_one().

save_many(jsons, ids=None)¶: See save_many().

save_one(data, id=None)¶: See save_one().

search(query)¶: See search().

update_many(jsons, ids=None)¶: See update_many().

update_one(data, id=None)¶: See update_one().

class invenio.modules.jsonalchemy.jsonext.engines.mongodb_pymongo.MongoDBStorage(model, **kwards)¶

Storage engine for MongoDB using the driver pymongo.

create()¶: See create().

drop()¶: See create().

get_field_values(ids, field, repetitive_values=True, count=False, include_id=False, split_by=0)¶: See get_field_values().

get_fields_values(ids, fields, repetitive_values=True, count=False, include_id=False, split_by=0)¶: See get_fields_values().

get_many(ids)¶: See get_many().

get_one(id)¶: See get_one().

save_many(jsons, ids=None)¶: See save_many().

save_one(json, id=None)¶: See save_one().

search(query)¶: See search().

update_many(jsons, ids=None)¶: See update_many().

update_one(json, id=None)¶: See update_one().

class invenio.modules.jsonalchemy.jsonext.engines.sqlalchemy.SQLAlchemyStorage(model, **kwards)¶

Implement database backend for SQLAlchemy model storage.

create()¶: See create().

db¶: Return SQLAlchemy database object.

drop()¶: See create().

get_field_values(recids, field, repetitive_values=True, count=False, include_recid=False, split_by=0)¶: See get_field_values().

get_fields_values(recids, fields, repetitive_values=True, count=False, include_recid=False, split_by=0)¶: See get_fields_values().

get_many(ids)¶: See get_many().

get_one(id)¶: See get_one().

model¶: Return SQLAchemy model.

save_many(jsons, ids=None)¶: See save_many().

save_one(json, id=None)¶: See save_one().

search(query)¶: See search().

update_many(jsons, ids=None)¶: See update_many().

update_one(json, id=None)¶: See update_one().

Functions¶

Parsers¶

JSONAlchemy parsers.

class invenio.modules.jsonalchemy.jsonext.parsers.connect_parser.ConnectParser¶

Handles the @connect decorator:

authors:
    derived:
        @connect('creators', handler_function)
        @connect('contributors' handler_function)
        self.get_list('creators') + self.get_list(contributors)

The handler functions will receive as parameters self and the current value of the field

classmethod add_info_to_field(json_id, info, args)¶: Simply returns the list with the tuples

classmethod create_element(rule, field_def, content, namespace)¶: Simply returns the list with the tuples

classmethod evaluate(json, field_name, action, args)¶: Applies the connect funtion with json, field_name and action parameters if any functions availabe, otherwise it will put the content of the current field into the connected one.

classmethod parse_element(indent_stack)¶: Sets connect attribute to the rule

class invenio.modules.jsonalchemy.jsonext.parsers.depends_on_parser.DependsOnParser¶

Handle the @depends_on decorator:

authors:
    derived:
        @depends_on('creators', 'contributors')
        self.get_list('creators') + self.get_list(contributors)

classmethod create_element(rule, field_def, content, namespace)¶: Just returns the list with the field names

classmethod evaluate(reader, args)¶: Tries to apply the rules for each field, if it fails on one of them returns False

class invenio.modules.jsonalchemy.jsonext.parsers.description_parser.DescriptionParser¶

Handle the description section in model and field definitions.

title:
    """Description on title"""

title:
    description:
        """Description on title"""

classmethod create_element(rule, namespace)¶: Simply return of the string.

classmethod evaluate(*args, **kwargs)¶

Evaluate parser.

This method is implemented like this because this parser is made for both, fields and models, and each of them have a different signature. Moreover this method does nothing.

classmethod extend_model(current_value, new_value)¶: The description should remain the one from the child model.

classmethod inherit_model(current_value, base_value)¶: The description should remain the one from the child model.

classmethod parse_element(indent_stack)¶: Set to the rule the description.

class invenio.modules.jsonalchemy.jsonext.parsers.extension_model_parser.ExtensionModelParser¶

Handles the extension section in the model definitions:

fields:
    ....
    extensions:
        'invenio_records.api:RecordIter'
        'invenio.modules.jsonalchemy.bases:Versinable'

classmethod add_info_to_field(info)¶: Adds the list of extensions to the model information

classmethod create_element(rule, namespace)¶: Simply returns the list of extensions

classmethod evaluate(obj, args)¶: Extend the incoming object with all the new things from args

classmethod extend_model(current_value, new_value)¶: Like inherit

classmethod inherit_model(current_value, base_value)¶: Extends the list of extensions with the new ones without repeating

classmethod parse_element(indent_stack)¶: Sets extensions attribute to the rule definition

class invenio.modules.jsonalchemy.jsonext.parsers.json_extra_parser.JsonExtraParser¶

JSON extension.

It parses something like this:

json:
    loads, function_to_load(field)
    dumps, function_to_dump(field)

The functions to load and dump must have one parameter which is the field to parse.

The main purpose of this extensions is to be able to work inside the JSON object with non JSON fields, such as dates. Following the example of dates, the load function will take a string representing a date and transform it into a datetime object, whereas the dumps function should take this object an create a JSON friendly representation, usually datetime.isoformat.

classmethod add_info_to_field(json_id, rule)¶: Add to the field definition the path to get the json functions.

classmethod create_element(rule, namespace)¶: Create the dictionary with the dump and load functions.

classmethod evaluate(json, field_name, action, args)¶: Evaluate the dumps and loads functions depending on the action.

classmethod parse_element(indent_stack)¶: Set json_ext in the rule.

class invenio.modules.jsonalchemy.jsonext.parsers.legacy_parser.LegacyParser¶

Handle the @legacy decorator.

doi:
    creator:
        @legacy((("024", "0247_", "0247_%"), ""),
                ("0247_a", ""))
        marc, "0247_", get_doi(value)


files:
    calculated:
         @legacy('marc', ("8564_z", "comment"),
                 ("8564_y", "caption", "description"),
                 ("8564_q", "eformat"),
                 ("8564_f", "name"),
                 ("8564_s", "size"),
                 ("8564_u", "url", "url")
                )
        @parse_first(('recid', ))
        {'url': 'http://example.org'}

classmethod create_element(rule, field_def, content, namespace)¶

Special case of decorator.

It creates the legacy rules dictionary and it doesn’t have any effect to the field definitions:

{'100'   : ['authors[0]'],
 '100__' : ['authors[0]'],
 '100__%': ['authors[0]'],
 '100__a': ['authors[0].full_name'],
 .......
}

classmethod evaluate(value, namespace, args)¶

Evaluate parser.

This is a special case where the real evaluation of the decorator happened before the evaluation.

class invenio.modules.jsonalchemy.jsonext.parsers.memoize_parser.MemoizeParser¶

Handle the @memoze decorator.

number_of_comments:
    calculated:
        @memoize(300)
        get_number_of_comments(self['recid'])

This decorator works only with calculated fields and it has three different ways of doing it:

No decorator is specified, the value of the field will be calculated every time that somebody asks for it and its value will not be stored in the DB. This way is useful to create fields that return objects that can’t be stored in the DB in a JSON friendly manner or a field that changes a lot its value and the calculated function is really light.
The decorator is used without any time, @memoize(). This means that the value of the field is calculated when the record is created, it is stored in the DB and it is the job of the client that modifies the data, which is used to calculated the field, to update the field value in the DB. This way should be used for fields that are typically updated just by a few clients, like bibupload, bibrank, etc.
A lifetime is set with the decorator @memoize(300). In this case the field value is only calculated when somebody asks for it and its value is stored in a general cache (invenio.ext.cache) using the timeout from the decorator. This form of the memoize decorator should be used with a field that changes a lot its value and the function to calculate it is not light. Keep in mind that the value that someone might get could be outdated. To avoid this situation the client that modifies the data where the value is calculated from could also invalidate the cache or modify the cached value. One good example of the use of it is the field number_of_comments

The cache engine used by this decorator could be set using CFG_JSONALCHEMY_CACHE in your instance configuration, by default invenio.ext.cache:cache will use. CFG_JSONALCHEMY_CACHE must be and importable string pointing to the cache object.

DEFAULT_TIMEOUT = -1¶: Default timeout, -1 means the cache will not be invalidated unless is explicitly requested

classmethod add_info_to_field(json_id, info, args)¶: Set the time out for the field

classmethod create_element(rule, field_def, content, namespace)¶

Try to evaluate the memoize value to int.

If it fails it sets the default value from DEFAULT_TIMEOUT.

classmethod evaluate(json, field_name, action, args)¶

Evaluate the parser.

When getting a json field compare the timestamp and the lifetime of it and, if it the lifetime is over calculate its value again.

If the value of the field has changed since the last time it gets updated in the DB.

classmethod parse_element(indent_stack)¶: Set memoize attribute to the rule.

class invenio.modules.jsonalchemy.jsonext.parsers.only_if_master_value_parser.OnlyIfMasterValueParser¶

Handle the @only_if_master_value decorator.

files_to_upload:
    creator:
        @only_if_value(is_local_url(value['u']),
                       is_available_url(value['u']))
        marc, "8564_", {'hots_name': value['a'],
                        'access_number': value['b'],
                ........

classmethod create_element(rule, field_def, content, namespace)¶: Simply return the list of boolean expressions.

classmethod evaluate(value, namespace, args)¶

Evaluate args with the master value from the input.

Returns:	a boolean depending on evaluated `value`.

classmethod parse_element(indent_stack)¶: Set only_if_master_value attribute to the rule.

class invenio.modules.jsonalchemy.jsonext.parsers.only_if_parser.OnlyIfParser¶

Handle the @only_if decorator.

number_of_copies:
    creator:
        @only_if('BOOK' in self.get('collection.primary', []))
        get_number_of_copies(self.get('recid'))

classmethod evaluate(reader, args)¶

Evaluate parser.

This is a special case where the real evaluation of the decorator is happening before the evaluation.

class invenio.modules.jsonalchemy.jsonext.parsers.parse_first_parser.ParseFirstParser¶

Handle the @parse_first decorator.

author_aggregation:
    derived:
        @parse_first('creators', 'contributors')
        self.get_list('creators') + self.get_list(contributors)

classmethod evaluate(reader, args)¶: Try to parse args first and return always True.

class invenio.modules.jsonalchemy.jsonext.parsers.producer_parser.ProducerParser¶

Handles the producer section from a field definition.

An example of this section could be:

recid:
    producer:
        json_for_marc(), {'001': ''}

title:
    producer:
        json_for_marc(), {'a': 'title'}

creator:
    producer:
        json_for_marc('100__'), {....}
        json_for_marc('1001_'), {....}
        json_for_marc('100[^1][^_]'), {....}

The parameter passed to the producer could be used by the producer for example to decide if the current producer rule will be applied depending on the tag from the master format. Typically is a string or a regex but it should be double check with the producer implementation.

To view the list of possible producer, check the producer folder inside jsonext or simply:

>>> from invenio.modules.jsonalchemy.registry import producers
>>> dict(producers)

classmethod create_element(rule, namespace)¶: Prepare the list of producers with their names and parameters.

classmethod parse_element(indent_stack)¶: Set. to the rule the list of producers in producer attribute.

class invenio.modules.jsonalchemy.jsonext.parsers.schema_parser.SchemaParser¶

Parse the schema definitions for fields, using cerberus.

modification_date:
    schema:
        {'modification_date': {
            'type': 'datetime',
            'required': True,
            'default': lambda: __import__('datetime').datetime.now()}}

classmethod create_element(rule, namespace)¶: Just evaluate the content of the schema to a python dictionary.

classmethod parse_element(indent_stack)¶: Set the schema attribute inside the rule.

Producers¶

JSON for MARC¶

MARC formatted as JSON producer.

This producer could be used in several ways.

It could preserve the input tag from marc:

title:
    ...
    producer:
        json_for_marc(), {'a': 'title'}

It will output the old marc tag followed by the subfield (dictionary key) and the value of this key will be json[‘title’][‘title’] For example:

...
<datafield tag="245" ind1="1" ind2="2">
  <subfield code="a">Awesome title</subfield>
</datafield>
...

Will produce:

[..., {'24512a': 'Awesome title'}, ...]

Also could also unify the input marc:

title:
    ...
    producer:
        json_for_marc(), {'245__a': 'title'}

Using the same example as before it will produce:

[..., {'245__a': 'Awesome title'}, ...]

The third way of using it is to create different outputs depending of the input tag. Lets say this time we have this field definition:

title:
    ...
    producer:
        json_for_marc('24511'), {'a': 'title'}
        json_for_marc('245__'), {'a': 'title', 'b': 'subtitle'}

The previous piece of MARC will produce the same output as before:

[..., {'24512a': 'Awesome title'}, ...]

But if we use this one:

...
<datafield tag="245" ind1=" " ind2=" ">
  <subfield code="a">Awesome title</subfield>
  <subfield code="b">Awesome subtitle</subfield>
</datafield>
...

This will produce:

[..., {'245__a': 'Awesome title'}, {'245__b': 'Awesome subtitle'},...]

This last approach should be used carefully as all the rules are applied, therefore the rules should not overlap (unless this is the desired behavior).

invenio.modules.jsonalchemy.jsonext.producers.json_for_marc.produce(self, fields=None)¶

Export the json in marc format.

Produces a list of dictionaries will all the possible marc tags as keys.

Parameters:	fields – list of fields to include in the output, if None or empty list all available tags will be included.

Readers¶

class invenio.modules.jsonalchemy.jsonext.readers.json_reader.JsonReader(json, blob=None, **kwargs)¶

JSON reader.

static split_blob(blob, schema=None, **kwargs)¶: In case of several objs inside the blob this method specify how to split then and work one by one afterwards.

class invenio.modules.jsonalchemy.jsonext.readers.marc_reader.MarcReader(json, blob=None, **kwargs)¶

Marc reader.

guess_model_from_input()¶

Guess from the input Marc the model to be used in this record.

This is the simplest implementation possible, it just take all 980__a tags and sets it as list of models. The guess function could be easily change by setting CFG_MARC_MODEL_GUESSER with an importable string

static split_blob(blob, schema=None, **kwargs)¶

Split the blob using <record.*?>.*?</record> as pattern.

Note 1: Taken from invenio.legacy.bibrecord:create_records Note 2: Use the DOTALL flag to include newlines.