
Uploader API.

Following example shows how to use this API for an easy use case:

>>> from invenio.modules.uploader.api import run
>>> blob = open('./testsuite/data/demo_record_marc_data.xml').read()
>>> reader_info = dict(schema='xml')
>>> run('insert', blob, master_format='marc', reader_info=reader_info)
invenio.modules.uploader.api.run(name, input_file, master_format='marc', reader_info={}, **kwargs)

Entry point to run any of the modes of the uploader.

  • name (str) – Upload mode, see ~.config.UPLOADER_WORKFLOWS for more info.
  • master_format (str) – Input file format, for example marc
  • reader_info (dict) – Any kind of information relevan to the reader, like for example char encoding or special characters.
  • kwargs
    • force:
    • pretend:
    • sync: False by default, if set to True the hole process will be teated synchronously
    • filename: original blob filename if it contains relative paths

Input master format, typically the content of an XML file.


Uploader celery tasks.

tasks.translate(blob, master_format, kwargs=None)

Translate from the master_format to JSON.

  • blob – String contain the input file.
  • master_format – Format of the blob, it will used to decide which reader to use.
  • kwargs – Arguments to be used by the reader. See invenio.modules.jsonalchemy.reader.Reader

The blob and the JSON representation of the input file created by the reader.

tasks.run_workflow(records, name, **kwargs)

Run the uploader workflow itself.

  • records – List of tuples (blob, json_record) from translate()
  • name – Name of the workflow to be run.
Parma kwargs:

Additional arguments to be used by the tasks of the workflow


Typically the list of record Ids that has been process, although this value could be modify by the post_tasks.

Uploader workflow tasks.

Those are the main/common tasks that the uploader will use, they are used inside the workflows defined in workflows.

See: Simple workflows for Python

invenio.modules.uploader.uploader_tasks.create_records_for_workflow(records, **kwargs)

Create the record object from the json.

Parameters:records – List of records to be process.

Update legacy bibxxx tables.


Attach and treat all the documents embeded in the input filex.


Helper task to raise an exception.


Reserve a new record id for the current object and set it inside.


Retrieve the record identifier from a record using its PIDS.

If any PID matches with any in the DB then the record id found is set to the current record

invenio.modules.uploader.uploader_tasks.return_recordids_only(records, **kwargs)

Retrieve from the records only the record ID to return them.

Parameters:records – Processed list of records
Parma kwargs:

Put the master format info the bfmt DB table.


Save the record to the DB using the _save method from it.


Save each PID present in the record to the PID storage.


Validate the record.

Validate the record using the validate method present in each record and the validation mode, either from the command line options or from UPLOADER_VALIDATION_MODE.

For the validation the schema information from the field definition is used, see invenio.modules.jsonalchemy.jsonext.parsers.schema_parser.



Every uploader workflow should be a python dictionary contain three keys:

  • pre_trasks

    list of tasks which will be run before running the actual workflow, each element of the list should a callable.

  • tasks

    List of tasks to be run by the WorkflowEngine.

  • post_tasks

    Same as for pre_tasks but in this case they will be run after the workflow is done.

An example function to be called after the workflow could be:

def return_recids_only(records, **kwargs):
    records = [obj[1].get('recid') for obj in records]

This functions must have always the same parameters (like the one above) and those parameters have the value that run_workflow() gets.

Default workflows for insert records using the uploader.

insert insert.undo

class invenio.modules.uploader.workflows.insert.insert

Default insert workflow.

class undo

Default undo steps for the insert workflow.