Table of Contents

Acceso a bases Isis desde Python, vía WXIS

wxis is a module for working with CDS/ISIS databases from Python, via WXIS.

Alternative names: xispy, pywxis, wxispy.

Problema con el nombre del módulo Python: es muy cómodo llamarlo wxis (archivo wxis.py), pero en algunas situaciones se presta a confusión con wxis el programa.

README

README file for wxis


Requirements:

    * Server:
      - web server with CGI support enabled
      - wxis (5.x or higher)
    * Client:
      - Python (tested with version 2.5)

Files:

    wxis-json-modules/  (put it somewhere in your web server's cgi-bin folder)
        _common.xis
        _display-record.xis
        control.xis
        delete.xis
        edit.xis
        extract.xis
        index.xis
        list.xis
        search.xis
        write.xis
        
    wxis: copy it in the same folder as the .xis files (or use a symlink to a different location, always below your cgi-bin folder)

    wxis/  (put this folder wherever you please, on the client)
        wxis.py
        config.py
        test/       (test database files)
            cds.iso
            cds.mst
            cds.xrf
            cds.fst
        
Note: although you can test wxis on a standalone computer, you can also work with a separate database server, i.e. web server + databases + wxis + *.xis files living on a computer, and python + *.py files on another one. However, to make things easier, the ''test'' module assumes that the database is local.
        
Edit ''config.py'' and adjust a few parameters.

Check permissions for the web server user in the ''test'' folder. This user must be able to write and create files there.
        
Run the command

    python wxis.py

Compare results with ...

wxis.py

Versión: 2008-03-28 (plus some minor adjustments by user newacct, January 2010.)

# coding=utf-8
 
"""
wxis
A module for accessing CDS/ISIS databases through Bireme's WXIS. 
 
MIT License <http://www.opensource.org/licenses/mit-license.php>
 
(c) 2008 Fernando J. Gómez / INMABB / Conicet
 
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
 
The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.
 
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.
"""
 
 
def rename_key(oldkey, newkey, dict):
    """
    Replaces oldkey by newkey in dictionary dict.
    """
    try:
        dict[newkey] = dict[oldkey]
        del dict[oldkey]
    except KeyError:
        pass
    return dict
 
def remote_call(url, data):
    """
    Opens an URL and returns the response.
    TO-DO: move 'proxies' to a config file? Use 'proxies={}' to avoid looking for proxies when wxis is in localhost. 
    """
    from urllib import urlopen
    try:
        fp = urlopen(url, data, proxies={})  # NOTE: 'POST' is implied when a second positional param ('data') is used
    except IOError:
        return 'Error connecting to database server.'
        raise  # TO-DO: test this
    else:
        return fp.read()
 
# Should be defined inside the class IsisDb?        
def get_status(resp):
    """Returns the value of Isis_Status."""
    return resp['meta']['Isis_Status']
 
# Should be defined inside the class?
# Should be called automatically from an option in the constructor IsisDb.__init__?        
#def createdb(name):
#    """Creates a database."""
#    db = IsisDb(name)
#    db.control(function='create', create='master')
#    # If there was an error creating the database, it's been already handled by __doTask()
#    return db
 
 
class IsisDb:
    """
    Gives access to a CDS/ISIS database through WXIS.
 
    TO-DO: Besides the 'name' attribute, a database may have other associated attributes, such as
    fst, actab, uctab, stw (and maybe gizmo).
    """
 
    def __init__(self, name, **args):
        self.name = name
 
        # An optional keyword parameter 'create' means "create this db". Examples:
        #           books = IsisDb('/path/to/books')              # check master existence, raise exception if it does not exist  
        #           users = IsisDb('/path/to/users', create=True) # create unconditionally, don't check existence
        if args.get('create'):
            self.__create()
        elif not self.__exists():
            raise DatabaseDoesNotExist, self.name
 
    def __str__(self):
        return self.__class__.__name__ + ': ' + self.name 
 
    def __create(self):
        """
        Creates a database (only the master file).
        """
        self.control(function='create', create='master')
        # NOTE: If an error ocurrs while attempting to create the database,
        # it's handled by __doTask().
 
    def __exists(self):
        """
        Checks if master file exists.
        """
        resp = self.get_status() 
        return resp['database']['status']['master'] != 'not found'
 
    def __doTask(self, script, params, content=None):
        """
        This is the base method: builds an URL and the POST data, calls wxis, checks
        its response for errors, and finally returns the response or raises an
        exception.
 
        Parameters:
            script    Name of the IsisScript to invoke.
            params    Input parameters for the script. 
            content   (Optional) Record content to be written.
        """ 
 
        from urllib import urlencode
        import config
 
        # Build the URL
        url = 'http://%s:%s%s' % (config.HOST, config.PORT, config.PATH)
 
        # Append extra parameters. Note: all parameters are sent using POST
        IsisScript = '%s/%s.xis' % (config.SCRIPT_DIR, script)
        params.update({'IsisScript': IsisScript, 'database': self.name})
        if content:
            params.update({'content': content})
        data = urlencode(params)
 
        # Get WXIS's response
        wxis_response = remote_call(url, data)
        #print wxis_response
 
        # Now try to catch errors in the response
        try:
            # Try to create a Python object (a dictionary) from the response
            response = eval(wxis_response)
        except SyntaxError:
            # Reasons for a syntax error:
            #   (a) WXIS died:  "WXIS|some error|...|...|" 
            #       Some examples:
            #           WXIS|file error|file open|Isis_Script|
            #           WXIS|fatal error|unavoidable|dbxopen: /home/fernando/tmp/bibliox.xrf (2)|
            #           WXIS|execution error|invalid value|-1|
            #       For a comprehensive list of errors, see these semi-official docs:
            #           * http://ibama2.ibama.gov.br/cnia2/cisis/mensagens%20de%20erro%20do%20wxis-mx.pdf
            #           * http://www.elysio.com.br/documentacao/manual_phl81.pdf
            #           * http://www.google.com.ar/search?q=%22de+erro+do+CISIS%22&filter=0
            #
            #   (b) WXIS sent an ill-formed response (e.g. missing comma, mismatched brackets)
            #
            #   Errors of type (a) can be detected using a regular expression.
            import re
            pattern = re.compile(r'(WXIS\|.+ error\|.+$)')
            match = pattern.search(wxis_response)
            if match:
                raise WxisHardError, match.group()
            else:
                # This covers reason (b)
                raise BadResponseError, wxis_response
        else:
            # OK, so the response is clean JSON... but still we may have a (clean) error message
            try:
                # Did the script complain? 
                reason = response['error']
            except KeyError:
                # There's no 'error' key in the response -- return the Python object
                return response
            else:
                # We have an error of the 'soft' kind
                raise WxisSoftError, reason  
 
 
    # The following seven methods correspond to the original wxis-modules scripts
    # or basic functions.
    # NOTE: index.xis, list.xis and search.xis expect an optional 'from' parameter,
    # but since 'from' is a Python keyword, we use 'start' instead,
    # e.g. db.index(start='BAR', count=10)
 
    # TO-DO: rename method to mfnrange()? 
    def do_list(self, **params):
        """
        Retrieves a range of records.
 
        Parameters:
            start    (Optional)
            to       (Optional)
            count    (Optional)
        """
        params = rename_key('start', 'from', params)
        return self.__doTask('list', params)
 
    def search(self, **params):
        """
        Performs a search using the inverted file.
 
        Parameters:
            query        The search expression. Queries must use the CISIS search language,
                         which is based on the standard CDS-ISIS search language.
                         See http://www.ius.bg.ac.yu/biblioteka/isis_search.html
            start        (Optional)
            to           (Optional)
            count        (Optional)
            totalonly    (Optional) Use totalonly=1 to request the total number of results (no records)
        """
        params = rename_key('start', 'from', params)
        return self.__doTask('search', params)
 
    # TO-DO: rename method to keyrange()? 
    def index(self, **params):
        """
        Retrieves a range of keys from the inverted file.
 
        Parameters:
            start    (Optional) Defaults to first key. 
            to       (Optional) Defaults to last key.
            count    (Optional) Defaults to 'no limit'.
        """
        params = rename_key('start', 'from', params)
        return self.__doTask('index', params)
 
    def edit(self, **params):
        """
        Attempts to lock a record to allow editing. Returns the record or
        raises an exception.
 
        Parameters:
            mfn       MFN of record.
            lockid    Record lock id.
        """
        resp = self.__doTask('edit', params)
        if get_status(resp) == '0':
            return resp
        else:
            raise LockedRecord, 'edit'
 
    def write(self, content=None, **params):
        """
        Attempts to write a record. Returns the record or raises an exception.
 
        Parameters:
            content    The record's content. Must be a tuple, or list,
                       of 2-tuples (tag, value).
            mfn        The record's MFN, or 'New' to add a new record.
            lockid     Record lock id.
 
        Example:
 
            fields = (
                ('100', 'Some value'),
                ('200', 'Another value')
            )
            db.write(mfn=291, content=fields, lockid='xx')  
        """
        if content:
            content = ''.join([
                "H%s %s %s" % (field[0], str(len(field[1])), field[1])
                for field in content
            ])
        resp = self.__doTask('write', params, content)
        if get_status(resp) == '0':
            return resp
        else:
            raise LockedRecord, 'write'
 
    def delete(self, **params):
        """
        Attempts to (logically) delete a record. Returns the record or raises
        an exception.
 
        Parameters:
            mfn       MFN of record.
            lockid    Record lock id.
        """
        resp = self.__doTask('delete', params)
        if get_status(resp) == '0':
            return resp
        else:
            raise LockedRecord, 'delete'
 
    def control(self, **params):
        """
        Allows to create new databases and to perform several tasks on
        existing databases.
 
        Parameters:
            function    The control function to execute ('unlock', 'invert', 'status', 'create').
            create      If function='create', then create={'master'|'inverted'|'database'}
                        creates the specified type of file(s).
            unlock      If function='unlock', then unlock='control' unlocks
                        only the database's control record.
        """
        return self.__doTask('control', params)
 
    # And these are some convenient shortcuts
 
    def invert(self):
        """
        Generates the inverted file.
        """
        return self.control(function='invert')
    fullinv = invert
 
    def unlock(self):
        """
        Unlocks the master file and all locked records.
 
        Parameters:
            unlock    (Optional) If unlock='control', only the database's control
                      record is unlocked; otherwise, also all locked records are
                      unlocked. 
        """
        return self.control(function='unlock')
 
    def get_status(self):
        """
        Returns information about the current status of database files. 
        """
        return self.control(function='status')
 
 
    # This method was not available in wxis-modules, but is useful for cleaning
    # user-supplied queries.
    def extract(self, **params):
        """
        Returns the keys extracted from the passed data, using wxis's builtin
        mechanism, and optionally specifying custom stw, actab and uctab
        parameters. The method is in fact not associated with a specific
        IsisDb instance, though it could be useful to use the same stw, actab
        & uctab parameters used by the present IsisDb instance. 
 
        Parameters:
            data    The string from which to extract the keys. 
            tech    FST technique (4 to extract words).
        """
        return self.__doTask('extract', params)
 
 
 
# Exceptions
 
class IsisError(Exception):
    # Base class
    pass
 
#class ConnectionError(IsisError):
#    # For errors connecting with the server
#    def __str__(self):
#        return "Error while connecting to the database server"
 
class WxisHardError(IsisError):
    # For errors thrown by wxis (execution, fatal, file)
    def __init__(self, error):
        suggestion = ''
        if '|recread/xropn/w|' in error:
            suggestion = 'In other words, WXIS could not write to the disk. Check file and/or directory permissions for the web server user.'
        elif '|dbxopen:' in error:
            suggestion = 'In other words, WXIS could not open the database. Check that the files do exist and have read permissions for the web server user.'
        elif '|unavoidable|recisis0/xrf|' in error:
            suggestion = 'In other words, WXIS found problems trying to write something. Check database path and permissions for the web server user.'
        self.msg = "\n\n    %s\n\n%s" % (error, suggestion)
    def __str__(self):
        return self.msg
 
class WxisSoftError(IsisError):
    # For errors thrown by a script (missing parameter)
    def __init__(self, error):
        self.msg = error
    def __str__(self):
        return self.msg
 
class BadResponseError(IsisError):
    # For ill formed responses (with no wxis error) preventing the use of eval()
    def __init__(self, resp):
        self.msg = "The database server returned an ill-formed response. Check commas, quotes, braces, and brackets:\n\n%s" % resp 
    def __str__(self):
        return self.msg 
 
class LockedRecord(IsisError):
    # Isis_Status different from 0 when attempting to write a record
    def __init__(self, action):
        self.msg = "Can't %s record -- Record is locked" % action 
    def __str__(self):
        return self.msg
 
class DatabaseDoesNotExist(IsisError):
    def __init__(self, dbname):
        self.msg = "The database %s could not be found" % dbname 
    def __str__(self):
        return self.msg
 
# NOTE: check what other specific error codes may be returned by WXIS, described
# in the documents cited above (Elysio, etc).
 
 
 
#########################################################################
# Tests
#########################################################################
"""
This is a simple test of the code, which also shows how to use the API.
 
TO-DO:
 
  * compare the actual output with the expected output, so that errors may be
    automatically detected.
 
  * create a database from textual data (e.g. the usual CDS as .id or .iso)
    Should we have an extra method, load_iso(), using wxis's <import> tag? Not sure,
    since importing/exporting a database should probably not be done through HTTP...
    But for a purely local test this would be no problem.
 
  * show use of actab, uctab, stw, gizmo?
 
  * besides calling wxis, also show how to manipulate the data in Python, i.e. how
    to replace the formatting language:
      - display a list of records
      - display record details
      - display database status
      - use templates ("$"-based substitutions) to format output:
        http://docs.python.org/lib/node40.html
      - also use the usual "%"-based substitutions 
 
  * special case: MARC records (using pymarc)
"""
 
"""
Original usage examples:
 
1) Browse index keys
 
    >>> db = IsisDb('/home/fer/bases/testdb')
    >>> res = db.index(count=10, start='za')
    >>> [term['Isis_Key'] for term in res['terms']]
    ['ZAANEN', 'ZABCZYK', 'ZABRODSKY', 'ZACKS', 'ZADACH', 'ZADACHA', 'ZADACHAKH', 'ZADACHI', 'ZADATCH', 'ZADEH']
 
2) Search -- TO-DO: simplify using functions
 
    >>> res = db.search(query='marsden')
    >>> import re
    >>> titles = [ unicode(re.sub('\^\w', ' ', field['value'][4:]), 'latin1') for rec in res['records'] for field in rec['fields'] if field['tag'] == '245' ]
    >>> titles.sort()
    >>> print '\n'.join([ '(%s)  %s' % (n, t) for (n, t) in zip(range(1, len(titles)+1), titles) ])
    (1)  A mathematical introduction to fluid mechanics / A. J. Chorin and J. E. Marsden.
    (2)  Algebraic aspects of integrable systems : in memory of Irene Dorfman / A. S. Fokas and I. M. Gelfand, editors.
    (3)  Análisis clásico elemental / Jerrold E. Marsden, Michael J. Hoffman ; versión en español, Oscar Alfredo Palmas Velasco ; colaboración técnica, José Antonio Cuesta Ruiz.
    (4)  Basic complex analysis / Jerrold E. Marsden, Michael J. Hoffman.
    (5)  Calculus / Jerrold Marsden, Alan Weinstein.
    (6)  Cálculo vectorial / Jerrold E. Marsden, Anthony J. Tromba ; traducción: Patricia Cifuentes Muñiz ... [et al.] ; revisión técnica: Eugenio Hernández Rodríguez.
    (7)  Integration algorithms and classical mechanics / Jerrold E. Marsden, George W. Patrick, William F. Shadwick, editors.
    (8)  New directions in applied mathematics : papers presented April 25/26, 1980, on the occasion of the Case centennial celebration / edited by Peter J. Hilton and Gail S. Young ; with contributions by Kenneth Baclawski ... [et al.].
    (9)  Student's guide to Calculus by J. Marsden and A. Weinstein. Volume 2 / Frederick H. Soon.
    (10)  Vector calculus / Jerrold E. Marsden, Anthony J. Tromba.
"""
 
 
def test():
    import os
    from pprint import pprint
 
    def display_status(db):
        resp = db.get_status()
        status = resp['database']['status']
        pprint(status)
 
    def display_records(resp):
        """A simple way to display records."""
        pprint(resp['records'])
 
    def section(msg):
        """Displays a header for each section of the test."""
        line = '-'*40
        print
        print line
        print msg.upper()
        print line 
 
    TEST_DB = 'cds'
    TEST_DIR = 'test'
 
    path = os.path.join(os.getcwd(), TEST_DIR)
    testdb = os.path.join(path, TEST_DB)
 
    # create an IsisDb instance
    db = isis.IsisDb(testdb)
 
    # check db status
    section('check db status')
    display_status(db)
 
    #####################################
    section('list some records')
    #####################################
    resp = db.do_list(start=10, count=2)
    display_records(resp)
 
    # create an FST, or use an existing one
 
    #####################################
    section('generate the inverted file')
    # TO-DO: specify actab, uctab, stw
    #####################################
    resp = db.invert()
    status = resp['database']['status']
    if status == 'inverted':   # why is this check here? should it be catched earlier, and throw an exception?
        print 'Database was inverted.'
    else:
        print 'Some error occurred, database was not inverted.'
 
    section('check db status')
    display_status(db)
 
    #####################################
    section('list some keys')
    #####################################
    resp = db.index(start='W', count=10)
    print [term['Isis_Key'] for term in resp['terms']]
 
    #####################################
    section('do a search')
    #####################################
    resp = db.search(query='water', count=2)
    display_records(resp)
 
    #####################################
    section('lock a record for editing')
    #####################################
    from time import strftime
    some_mfn = 10  # arbitrary
    mylockid = 'test %s' % strftime("%Y%m%d %H%M%S")
    try:
        resp = db.edit(mfn=some_mfn, lockid=mylockid)
        pprint(resp)
    except isis.LockedRecord:
        print "Record %s is locked, can't be edited now." % some_mfn
 
    # TO-DO: attempt to edit, delete or write a locked record
 
    #####################################
    section('create a new record')
    #####################################
    fields = (
        ('100', 'Some value'),
        ('200', 'Another value')
    )
    try:
        resp = db.write(mfn='New', content=fields, lockid=mylockid)
    except:   # what kind of exception??
        print 'Record could not be written'
 
    # display the new record's MFN or error msg
    newmfn = resp['record']['mfn']
    print 'Record was saved. MFN: %s' % newmfn
 
    section('check db status')
    display_status(db)
 
    #####################################
    section('retrieve the new record')
    #####################################
    resp = db.do_list(start=newmfn, count=1)
    #resp = db.search(query='')
    display_records(resp)
 
    #####################################
    section('unlock records')
    #####################################
    resp = db.unlock()
    pprint(resp)
 
    section('check db status')
    display_status(db)
 
    #####################################
    section('delete the new record')
    #####################################
    try:
        resp = db.delete(mfn=newmfn, lockid=mylockid)
    except isis.LockedRecord:
        print "Record %s is locked, can't be deleted now." % some_mfn
    pprint(resp)
 
    section('check db status')
    display_status(db)
 
    # TODO: also show how to clean query using Python only
    #####################################
    section('clean a dirty query')
    #####################################
    query = ' water  plants '
    resp = db.extract(data=query)
    newquery = ' AND '.join(resp['terms'])
    resp = db.search(query=newquery)
    display_records(resp)
 
 
if __name__ == '__main__':
    test()

config.py

# -------------------------------------------------------------------
# Configuration file for module wxis.
# -------------------------------------------------------------------
 
 
# Host, port and path to access wxis via HTTP.
# Adjust according to your server.
# Example: if wxis is accessible through the URL
# "http://127.0.0.1:8000/cgi-bin/isis/wxis", then you have
#      HOST = '127.0.0.1'
#      PORT = '8000'
#      PATH = '/cgi-bin/isis/wxis'
HOST = '127.0.0.1'
PORT = '80'
PATH = '/cgi-bin/isis/wxis'
 
 
# Path of the directory where the *.xis files live.
# Use an absolute filesystem path, or (better) a path relative to wxis's
# location.
# For example, if the scripts are in directory 'py-wxis-modules' under the
# directory containing wxis, then you have SCRIPT_DIR = 'py-wxis-modules'
#
# IMPORTANT!! IF YOU CHANGE THIS VALUE, YOU MUST ALSO UPDATE THE *.xis FILES.
# (sorry, it seems to be a wxis limitation that requires hardcoded paths in
# <include> tags)
SCRIPT_DIR = 'py-wxis-modules'