Uploader¶

Uploader API.

Following example shows how to use this API for an easy use case:

>>> from invenio.modules.uploader.api import run
>>> blob = open('./testsuite/data/demo_record_marc_data.xml').read()
>>> reader_info = dict(schema='xml')
>>> run('insert', blob, master_format='marc', reader_info=reader_info)

invenio.modules.uploader.api.run(name, input_file, master_format='marc', reader_info={}, **kwargs)¶

Entry point to run any of the modes of the uploader.

Parameters:

name (str) – Upload mode, see ~.config.UPLOADER_WORKFLOWS for more info.
master_format (str) – Input file format, for example marc
reader_info (dict) – Any kind of information relevan to the reader, like for example char encoding or special characters.
kwargs –
- force:
- pretend:
- sync: False by default, if set to True the hole process will be teated synchronously
- filename: original blob filename if it contains relative paths

Input_file:

Input master format, typically the content of an XML file.

Tasks¶

Uploader celery tasks.

tasks.translate(blob, master_format, kwargs=None)¶

Translate from the master_format to JSON.

Parameters:	blob – String contain the input file. master_format – Format of the blob, it will used to decide which reader to use. kwargs – Arguments to be used by the reader. See `invenio.modules.jsonalchemy.reader.Reader`
Returns:	The blob and the JSON representation of the input file created by the reader.

tasks.run_workflow(records, name, **kwargs)¶

Run the uploader workflow itself.

Parameters:	records – List of tuples (blob, json_record) from `translate()` name – Name of the workflow to be run.
Parma kwargs:	Additional arguments to be used by the tasks of the workflow
Returns:	Typically the list of record Ids that has been process, although this value could be modify by the post_tasks.

Uploader workflow tasks.

Those are the main/common tasks that the uploader will use, they are used inside the workflows defined in workflows.

See: Simple workflows for Python

invenio.modules.uploader.uploader_tasks.create_records_for_workflow(records, **kwargs)¶

Create the record object from the json.

Parameters:	records – List of records to be process.
Kwargs:

invenio.modules.uploader.uploader_tasks.legacy(step)¶: Update legacy bibxxx tables.

invenio.modules.uploader.uploader_tasks.manage_attached_documents(step)¶: Attach and treat all the documents embeded in the input filex.

invenio.modules.uploader.uploader_tasks.raise_(ex)¶: Helper task to raise an exception.

invenio.modules.uploader.uploader_tasks.reserve_record_id(step)¶: Reserve a new record id for the current object and set it inside.

invenio.modules.uploader.uploader_tasks.retrieve_record_id_from_pids(step)¶

Retrieve the record identifier from a record using its PIDS.

If any PID matches with any in the DB then the record id found is set to the current record

invenio.modules.uploader.uploader_tasks.return_recordids_only(records, **kwargs)¶

Retrieve from the records only the record ID to return them.

Parameters:	records – Processed list of records
Parma kwargs:

invenio.modules.uploader.uploader_tasks.save_master_format(step)¶: Put the master format info the bfmt DB table.

invenio.modules.uploader.uploader_tasks.save_record(step)¶: Save the record to the DB using the _save method from it.

invenio.modules.uploader.uploader_tasks.update_pidstore(step)¶: Save each PID present in the record to the PID storage.

invenio.modules.uploader.uploader_tasks.validate(step)¶

Validate the record.

Validate the record using the validate method present in each record and the validation mode, either from the command line options or from UPLOADER_VALIDATION_MODE.

For the validation the schema information from the field definition is used, see invenio.modules.jsonalchemy.jsonext.parsers.schema_parser.

Workflows¶

invenio.modules.uploader.workflows¶

Every uploader workflow should be a python dictionary contain three keys:

pre_trasks

list of tasks which will be run before running the actual workflow, each element of the list should a callable.
tasks

List of tasks to be run by the WorkflowEngine.
post_tasks

Same as for pre_tasks but in this case they will be run after the workflow is done.

An example function to be called after the workflow could be:

def return_recids_only(records, **kwargs):
    records = [obj[1].get('recid') for obj in records]

This functions must have always the same parameters (like the one above) and those parameters have the value that run_workflow() gets.

Default workflows for insert records using the uploader.

insert insert.undo

class invenio.modules.uploader.workflows.insert.insert¶

Default insert workflow.

class undo¶: Default undo steps for the insert workflow.