BibSched Admin Guide¶
Overview¶
BibSched – the bibliographic task scheduler – is central unit of the system that allows all other modules to access the bibliographic database in a controlled manner, preventing sharing violation threats and assuring the coherent execution of the database update tasks. The module comes with an administrative interface that allows to monitor the task queue including various possibilities of a manual intervention, for example to re-schedule queued tasks, change the task order, etc.
You can run the administrative interface by doing:
$ bibsched
Note that in general you should run bibsched with the same rights of the Apache user of your system.
The bibsched
can run in two modes: automatic and manual. In the
automatic mode, it will execute tasks automatically as they arrive in
the waiting queue. In the manual mode, the administrator has to launch
the tasks manually.
Bibsched graphical interface¶
bibsched interface is text mode graphical interface to display running
tasks. It has three views, one for listing done tasks, one for
scheduled/running/failed tasks and a third one for displaying archived
tasks. You can switch among these three views by pressing respectively
“1
”, “2
” or “3
”.
With the harrows you can move from one task to the other
By pressing “O
” you can see all the details of the selected task
If the task is running or is already run, you can press “l” (lower case
“l
”) to access the standard output produced by the task, if any.
You can press “L
” (upper case “L
”) to access the standard
error produced by the task, if any.
By pressing “P
” you can clean the list of DONE tasks and
archive/delete them.
By pressing “Q
” you can Quit the interface.
By pressing “A
” you can switch from Auto to Manual mode and
viceversa.
Manual mode¶
In manual mode, depending on the status of the task you are currently selecting you’re given different actions.
You can press “R
” for running Waiting tasks
You can press “D
” for deleting non running tasks
You can press “N
” for changing the priority of a waiting task.
(the equivalent of the UNIX renice command)
On a running task you can press “K
” to kill the task immediately
in case of emergency. “T
” for stopping it cleanly. “S
” for
putting it temporarily to sleep. A sleeping task can be waken up by
pressing “W
”. Note that for stopping or putting to sleep a task, a
signal is sent to the given bibtask and this, in turn, will acknowledge
it and decide to stop or go to sleep whenever it thinks it’s safe.
On a failed task you can press “K
” the acknowledge the error. This
is necessary in case you wish to put bibsched back to automatic mode.
Automatic mode¶
In automatic mode bibsched will take care of launching tasks based on their priority and runtime schedule. The available option are only those that allow you to query a given task (see the logs and the options).
If you have configured bibsched to allow for the execution of concurrent bibtasks, bibsched will take care of launching compatible tasks concurrently (note that this feature is currently experimental). Bibupload tasks will always be executed in the chronological order (to preserve input consistency).
Bibsched maintenance¶
bibsched produce two log files. bibsched.log and bibsched.err, located under the usual log directory of your Invenio installation. The former will contain all the actions (either automatic of manual) that bibsched has performed. The latter will contain all the exceptional errors.
In case of a bibtask failing while bibsched is in automatic mode, bibsched will stop by switching to manual mode, and will send an email to the administrator (and an emergency SMS in case it is configured to do so). Note that in case of failed bibtasks, bibsched will refuse to be put back to automatic mode, until either the task is reinitialized, or deleted or the error is acknowledged.
Priorities¶
A task can be scheduled with a given priority, represented by an integer number. When at a given time two or more tasks might be executed, the task with higher priority will be executed first.
When a task is running and is not a bibupload, the scheduler will allow to run higher priority tasks that don’t conflict with the former task, by first putting to sleep the former task, if the resources are not enough.
If a task has priority higher than 100 and there are currently other task running, conflicting with the execution of this task (because the other tasks should not run concurrently with this task), then the other tasks are stopped (unless they are bibuploads).
If the priority is less than -10 than the task will never be executed automatically.
Bibupload tasks are not affected by priority with respect to each other and will always be executed in the proper order.
Task logging¶
When executed each tasks will produced (if necessary) a couple of log
files. One called bibsched_task_{task_id}.log
and the other
bibsched_task_{task_id}.err
. In case of reschedulable task, each
time the task is rescheduled it is being assigned the same task_id.
That means that log information of successive execution of the given
task will be appended at the end of already existing log files.
A log-rotation algorithm is applied when writing into the log file. By default each log will be no bigger than 1MB. After this limit is reached the log is rotated. Note that when viewing the log file inside the bibsched monitor interface, only the latest log will be displayed.
Task concurrency¶
A recent experimental feature of bibsched is the concurrent execution of compatible tasks. The current definition of when two tasks are considered compatible is: “If a two tasks have the same name (e.g. bibupload) then they’re incompatible.”
Sometimes you might want to consider compatible two tasks even when they
have the same name. For this you can add a name specification via the
bibtask command line option --name
. E.g. you might want to
distinguish a generic bibupload from a bibupload carrying only
preformatting information. For this just launch bibupload -N
“bibformat”, and it will be considered compatible with all the other
bibuploads.
Configuration¶
Bibsched can be tweaked by adjusting some variables in the usual
invenio(-local).conf
file. Please refer to the documentation
associated with each variable inside this file.
Bibsched command line interface¶
Usage: /opt/invenio/bin/bibsched [options] [start|stop|restart|monitor|status]
The following commands are available for bibsched:
start start bibsched in background
stop stop running bibtasks and the bibsched daemon safely
halt halt running bibsched while keeping bibtasks running
restart restart a running bibsched
monitor enter the interactive monitor
status get report about current status of the queue
purge purge the scheduler queue from old tasks
Command options:
-d, --daemon Launch BibSched in the daemon mode (deprecated, use 'start')
General options:
-h, --help Print this help.
-V, --version Print version information.
Status options:
-s, --status=LIST Which BibTask status should be considered (default is Running,waiting)
-S, --since=TIME Since how long time to consider tasks e.g.: 30m, 2h, 1d (default
is all)
-t, --tasks=LIST Comma separated list of BibTask to consider (default
is all)
Purge options:
-s, --status=LIST Which BibTask status should be considered (default is DONE)
-S, --since=TIME Since how long time to consider tasks e.g.: 30m, 2h, 1d (default
is 30 days)
-t, --tasks=LIST Comma separated list of BibTask to consider (default
is bibindex,bibreformat,webcoll,bibrank,inveniogc,bibupload,oairepositoryupdater)
Bibtasks command line interface¶
Each bibtask has a common command interface in addition to the proper bibtask related options.
Scheduling options:
-u, --user=USER User name under which to submit this task.
-t, --runtime=TIME Time to execute the task. [default=now]
Examples: +15s, 5m, 3h, 2002-10-27 13:57:26.
-s, --sleeptime=SLEEP Sleeping frequency after which to repeat the task.
Examples: 30m, 2h, 1d. [default=no]
--fixed-time Avoid drifting of execution time when using --sleeptime
-I, --sequence-id=SEQUENCE-ID Sequence Id of the current process
-L --limit=LIMIT Time limit when it is allowed to execute the task.
Examples: 22:00-03:00, Sunday 01:00-05:00.
Syntax: [Wee[kday]] [hh[:mm][-hh[:mm]]].
-P, --priority=PRI Task priority (0=default, 1=higher, etc).
-N, --name=NAME Task specific name (advanced option).
General options:
-h, --help Print this help.
-V, --version Print version information.
-v, --verbose=LEVEL Verbose level (0=min, 1=default, 9=max).
--profile=STATS Print profile information. STATS is a comma-separated
list of desired output stats (calls, cumulative,
file, line, module, name, nfl, pcalls, stdname, time).
--stop-on-error In case of unrecoverable error stop the bibsched queue.
--continue-on-error In case of unrecoverable error don't stop the bibsched queue.
--post-process=BIB_TASKLET_NAME[parameters] Postprocesses the specified
bibtasklet with the given parameters between square
brackets.
Example:--post-process "bst_send_email[fromaddr=
'foo@xxx.com', toaddr='bar@xxx.com', subject='hello',
content='help']"
BibSched Tasklets¶
If you have very particular needs to write your self a bibtask that can be scheduled through the bibliographic scheduler, and you are able to write a Python function you can write a BibTaskLet
Suppose that you have Python function:
def foo(arg1, arg2='default'):
pass
that you want to execute through the bibliographic scheduler. Just put
such a function in the
/opt/cds-invenio/lib/python/invenio/bibsched_tasklets
in a file
called e.g. bst_foo.py
(the bst_ prefix and the .py extension
are compulsory) and rename the function to bst_foo
(the name of the
function must be identical to the name of the file).
A BibTaskLet can be executed through the bibtasklet
BibTask. E.g.:
$ # To list the available bibtasklets
$ sudo -u apache /opt/cds-invenio/bin/bibtasklet -l
Available tasklets:
╔══════════════════════════════════════════════════════════════════╗
║ def bst_fibonacci(n=30) ║
╠══════════════════════════════════════════════════════════════════╣
║ ║
║ Small tasklets that prints the the Fibonacci sequence for n. ║
║ @param n: how many Fibonacci numbers to print. ║
║ @type n: int ║
║ ║
╚══════════════════════════════════════════════════════════════════╝
╔════════════════════════════════════╗
║ def bst_foo(arg1, arg2='default') ║
╠════════════════════════════════════╣
║ ║
║ No documentation. ║
║ ║
╚════════════════════════════════════╝
Broken tasklets:
$ # To schedule a bibtasklet
$ sudo -u apache /opt/cds-invenio/bin/bibtasklet -T bst_foo -a "arg1=bar"
All the above bibtask options are available for any bibtasklet.
See
/opt/cds-invenio/lib/python/invenio/bibsched_tasklets/bst_fibonacci.py
for an example on how a bibtasklet look like.