Uploader¶
Uploader API.
Following example shows how to use this API for an easy use case:
>>> from invenio.modules.uploader.api import run
>>> blob = open('./testsuite/data/demo_record_marc_data.xml').read()
>>> reader_info = dict(schema='xml')
>>> run('insert', blob, master_format='marc', reader_info=reader_info)
-
invenio.modules.uploader.api.
run
(name, input_file, master_format='marc', reader_info={}, **kwargs)¶ Entry point to run any of the modes of the uploader.
Parameters: - name (str) – Upload mode, see ~.config.UPLOADER_WORKFLOWS for more info.
- master_format (str) – Input file format, for example marc
- reader_info (dict) – Any kind of information relevan to the reader, like for example char encoding or special characters.
- kwargs –
- force:
- pretend:
- sync: False by default, if set to True the hole process will be teated synchronously
- filename: original blob filename if it contains relative paths
Input_file: Input master format, typically the content of an XML file.
Tasks¶
Uploader celery tasks.
-
tasks.
translate
(blob, master_format, kwargs=None)¶ Translate from the master_format to JSON.
Parameters: - blob – String contain the input file.
- master_format – Format of the blob, it will used to decide which reader to use.
- kwargs – Arguments to be used by the reader.
See
invenio.modules.jsonalchemy.reader.Reader
Returns: The blob and the JSON representation of the input file created by the reader.
-
tasks.
run_workflow
(records, name, **kwargs)¶ Run the uploader workflow itself.
Parameters: - records – List of tuples (blob, json_record) from
translate()
- name – Name of the workflow to be run.
Parma kwargs: Additional arguments to be used by the tasks of the workflow
Returns: Typically the list of record Ids that has been process, although this value could be modify by the post_tasks.
- records – List of tuples (blob, json_record) from
Uploader workflow tasks.
Those are the main/common tasks that the uploader will use, they are used
inside the workflows defined in workflows
.
See: Simple workflows for Python
-
invenio.modules.uploader.uploader_tasks.
create_records_for_workflow
(records, **kwargs)¶ Create the record object from the json.
Parameters: records – List of records to be process. Kwargs:
-
invenio.modules.uploader.uploader_tasks.
legacy
(step)¶ Update legacy bibxxx tables.
-
invenio.modules.uploader.uploader_tasks.
manage_attached_documents
(step)¶ Attach and treat all the documents embeded in the input filex.
-
invenio.modules.uploader.uploader_tasks.
raise_
(ex)¶ Helper task to raise an exception.
-
invenio.modules.uploader.uploader_tasks.
reserve_record_id
(step)¶ Reserve a new record id for the current object and set it inside.
-
invenio.modules.uploader.uploader_tasks.
retrieve_record_id_from_pids
(step)¶ Retrieve the record identifier from a record using its PIDS.
If any PID matches with any in the DB then the record id found is set to the current record
-
invenio.modules.uploader.uploader_tasks.
return_recordids_only
(records, **kwargs)¶ Retrieve from the records only the record ID to return them.
Parameters: records – Processed list of records Parma kwargs:
-
invenio.modules.uploader.uploader_tasks.
save_master_format
(step)¶ Put the master format info the bfmt DB table.
-
invenio.modules.uploader.uploader_tasks.
save_record
(step)¶ Save the record to the DB using the _save method from it.
-
invenio.modules.uploader.uploader_tasks.
update_pidstore
(step)¶ Save each PID present in the record to the PID storage.
-
invenio.modules.uploader.uploader_tasks.
validate
(step)¶ Validate the record.
Validate the record using the validate method present in each record and the validation mode, either from the command line options or from UPLOADER_VALIDATION_MODE.
For the validation the schema information from the field definition is used, see invenio.modules.jsonalchemy.jsonext.parsers.schema_parser.
Workflows¶
invenio.modules.uploader.workflows¶
Every uploader workflow should be a python dictionary contain three keys:
- pre_trasks
list of tasks which will be run before running the actual workflow, each element of the list should a callable.
- tasks
List of tasks to be run by the WorkflowEngine.
- post_tasks
Same as for pre_tasks but in this case they will be run after the workflow is done.
An example function to be called after the workflow could be:
def return_recids_only(records, **kwargs):
records = [obj[1].get('recid') for obj in records]
This functions must have always the same parameters (like the one above)
and those parameters have the value that
run_workflow()
gets.
Default workflows for insert records using the uploader.