Uploader

Uploader API.

Following example shows how to use this API for an easy use case:

>>> from invenio.modules.uploader.api import run
>>> blob = open('./testsuite/data/demo_record_marc_data.xml').read()
>>> reader_info = dict(schema='xml')
>>> run('insert', blob, master_format='marc', reader_info=reader_info)
invenio.modules.uploader.api.run(name, input_file, master_format='marc', reader_info={}, **kwargs)

Entry point to run any of the modes of the uploader.

Parameters:
  • name (str) – Upload mode, see ~.config.UPLOADER_WORKFLOWS for more info.
  • master_format (str) – Input file format, for example marc
  • reader_info (dict) – Any kind of information relevan to the reader, like for example char encoding or special characters.
  • kwargs
    • force:
    • pretend:
    • sync: False by default, if set to True the hole process will be teated synchronously
    • filename: original blob filename if it contains relative paths
Input_file:

Input master format, typically the content of an XML file.

Tasks

Uploader celery tasks.

tasks.translate(blob, master_format, kwargs=None)

Translate from the master_format to JSON.

Parameters:
  • blob – String contain the input file.
  • master_format – Format of the blob, it will used to decide which reader to use.
  • kwargs – Arguments to be used by the reader. See invenio.modules.jsonalchemy.reader.Reader
Returns:

The blob and the JSON representation of the input file created by the reader.

tasks.run_workflow(records, name, **kwargs)

Run the uploader workflow itself.

Parameters:
  • records – List of tuples (blob, json_record) from translate()
  • name – Name of the workflow to be run.
Parma kwargs:

Additional arguments to be used by the tasks of the workflow

Returns:

Typically the list of record Ids that has been process, although this value could be modify by the post_tasks.

Uploader workflow tasks.

Those are the main/common tasks that the uploader will use, they are used inside the workflows defined in workflows.

See: Simple workflows for Python

invenio.modules.uploader.uploader_tasks.create_records_for_workflow(records, **kwargs)

Create the record object from the json.

Parameters:records – List of records to be process.
Kwargs:
invenio.modules.uploader.uploader_tasks.legacy(step)

Update legacy bibxxx tables.

invenio.modules.uploader.uploader_tasks.manage_attached_documents(step)

Attach and treat all the documents embeded in the input filex.

invenio.modules.uploader.uploader_tasks.raise_(ex)

Helper task to raise an exception.

invenio.modules.uploader.uploader_tasks.reserve_record_id(step)

Reserve a new record id for the current object and set it inside.

invenio.modules.uploader.uploader_tasks.retrieve_record_id_from_pids(step)

Retrieve the record identifier from a record using its PIDS.

If any PID matches with any in the DB then the record id found is set to the current record

invenio.modules.uploader.uploader_tasks.return_recordids_only(records, **kwargs)

Retrieve from the records only the record ID to return them.

Parameters:records – Processed list of records
Parma kwargs:
invenio.modules.uploader.uploader_tasks.save_master_format(step)

Put the master format info the bfmt DB table.

invenio.modules.uploader.uploader_tasks.save_record(step)

Save the record to the DB using the _save method from it.

invenio.modules.uploader.uploader_tasks.update_pidstore(step)

Save each PID present in the record to the PID storage.

invenio.modules.uploader.uploader_tasks.validate(step)

Validate the record.

Validate the record using the validate method present in each record and the validation mode, either from the command line options or from UPLOADER_VALIDATION_MODE.

For the validation the schema information from the field definition is used, see invenio.modules.jsonalchemy.jsonext.parsers.schema_parser.

Workflows

invenio.modules.uploader.workflows

Every uploader workflow should be a python dictionary contain three keys:

  • pre_trasks

    list of tasks which will be run before running the actual workflow, each element of the list should a callable.

  • tasks

    List of tasks to be run by the WorkflowEngine.

  • post_tasks

    Same as for pre_tasks but in this case they will be run after the workflow is done.

An example function to be called after the workflow could be:

def return_recids_only(records, **kwargs):
    records = [obj[1].get('recid') for obj in records]

This functions must have always the same parameters (like the one above) and those parameters have the value that run_workflow() gets.

Default workflows for insert records using the uploader.

insert insert.undo

class invenio.modules.uploader.workflows.insert.insert

Default insert workflow.

class undo

Default undo steps for the insert workflow.