====== Acceso a bases Isis desde Python, vía WXIS ====== **wxis** is a module for working with CDS/ISIS databases from Python, via WXIS. Alternative names: **xispy**, **pywxis**, **wxispy**. Problema con el nombre del módulo Python: es muy cómodo llamarlo //wxis// (archivo //wxis.py//), pero en algunas situaciones se presta a confusión con //wxis// el programa. ===== README ===== README file for wxis Requirements: * Server: - web server with CGI support enabled - wxis (5.x or higher) * Client: - Python (tested with version 2.5) Files: wxis-json-modules/ (put it somewhere in your web server's cgi-bin folder) _common.xis _display-record.xis control.xis delete.xis edit.xis extract.xis index.xis list.xis search.xis write.xis wxis: copy it in the same folder as the .xis files (or use a symlink to a different location, always below your cgi-bin folder) wxis/ (put this folder wherever you please, on the client) wxis.py config.py test/ (test database files) cds.iso cds.mst cds.xrf cds.fst Note: although you can test wxis on a standalone computer, you can also work with a separate database server, i.e. web server + databases + wxis + *.xis files living on a computer, and python + *.py files on another one. However, to make things easier, the ''test'' module assumes that the database is local. Edit ''config.py'' and adjust a few parameters. Check permissions for the web server user in the ''test'' folder. This user must be able to write and create files there. Run the command python wxis.py Compare results with ... ===== wxis.py ===== Versión: 2008-03-28 (plus some minor adjustments by user //newacct//, January 2010.) # coding=utf-8 """ wxis A module for accessing CDS/ISIS databases through Bireme's WXIS. MIT License (c) 2008 Fernando J. Gómez / INMABB / Conicet Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. """ def rename_key(oldkey, newkey, dict): """ Replaces oldkey by newkey in dictionary dict. """ try: dict[newkey] = dict[oldkey] del dict[oldkey] except KeyError: pass return dict def remote_call(url, data): """ Opens an URL and returns the response. TO-DO: move 'proxies' to a config file? Use 'proxies={}' to avoid looking for proxies when wxis is in localhost. """ from urllib import urlopen try: fp = urlopen(url, data, proxies={}) # NOTE: 'POST' is implied when a second positional param ('data') is used except IOError: return 'Error connecting to database server.' raise # TO-DO: test this else: return fp.read() # Should be defined inside the class IsisDb? def get_status(resp): """Returns the value of Isis_Status.""" return resp['meta']['Isis_Status'] # Should be defined inside the class? # Should be called automatically from an option in the constructor IsisDb.__init__? #def createdb(name): # """Creates a database.""" # db = IsisDb(name) # db.control(function='create', create='master') # # If there was an error creating the database, it's been already handled by __doTask() # return db class IsisDb: """ Gives access to a CDS/ISIS database through WXIS. TO-DO: Besides the 'name' attribute, a database may have other associated attributes, such as fst, actab, uctab, stw (and maybe gizmo). """ def __init__(self, name, **args): self.name = name # An optional keyword parameter 'create' means "create this db". Examples: # books = IsisDb('/path/to/books') # check master existence, raise exception if it does not exist # users = IsisDb('/path/to/users', create=True) # create unconditionally, don't check existence if args.get('create'): self.__create() elif not self.__exists(): raise DatabaseDoesNotExist, self.name def __str__(self): return self.__class__.__name__ + ': ' + self.name def __create(self): """ Creates a database (only the master file). """ self.control(function='create', create='master') # NOTE: If an error ocurrs while attempting to create the database, # it's handled by __doTask(). def __exists(self): """ Checks if master file exists. """ resp = self.get_status() return resp['database']['status']['master'] != 'not found' def __doTask(self, script, params, content=None): """ This is the base method: builds an URL and the POST data, calls wxis, checks its response for errors, and finally returns the response or raises an exception. Parameters: script Name of the IsisScript to invoke. params Input parameters for the script. content (Optional) Record content to be written. """ from urllib import urlencode import config # Build the URL url = 'http://%s:%s%s' % (config.HOST, config.PORT, config.PATH) # Append extra parameters. Note: all parameters are sent using POST IsisScript = '%s/%s.xis' % (config.SCRIPT_DIR, script) params.update({'IsisScript': IsisScript, 'database': self.name}) if content: params.update({'content': content}) data = urlencode(params) # Get WXIS's response wxis_response = remote_call(url, data) #print wxis_response # Now try to catch errors in the response try: # Try to create a Python object (a dictionary) from the response response = eval(wxis_response) except SyntaxError: # Reasons for a syntax error: # (a) WXIS died: "WXIS|some error|...|...|" # Some examples: # WXIS|file error|file open|Isis_Script| # WXIS|fatal error|unavoidable|dbxopen: /home/fernando/tmp/bibliox.xrf (2)| # WXIS|execution error|invalid value|-1| # For a comprehensive list of errors, see these semi-official docs: # * http://ibama2.ibama.gov.br/cnia2/cisis/mensagens%20de%20erro%20do%20wxis-mx.pdf # * http://www.elysio.com.br/documentacao/manual_phl81.pdf # * http://www.google.com.ar/search?q=%22de+erro+do+CISIS%22&filter=0 # # (b) WXIS sent an ill-formed response (e.g. missing comma, mismatched brackets) # # Errors of type (a) can be detected using a regular expression. import re pattern = re.compile(r'(WXIS\|.+ error\|.+$)') match = pattern.search(wxis_response) if match: raise WxisHardError, match.group() else: # This covers reason (b) raise BadResponseError, wxis_response else: # OK, so the response is clean JSON... but still we may have a (clean) error message try: # Did the script complain? reason = response['error'] except KeyError: # There's no 'error' key in the response -- return the Python object return response else: # We have an error of the 'soft' kind raise WxisSoftError, reason # The following seven methods correspond to the original wxis-modules scripts # or basic functions. # NOTE: index.xis, list.xis and search.xis expect an optional 'from' parameter, # but since 'from' is a Python keyword, we use 'start' instead, # e.g. db.index(start='BAR', count=10) # TO-DO: rename method to mfnrange()? def do_list(self, **params): """ Retrieves a range of records. Parameters: start (Optional) to (Optional) count (Optional) """ params = rename_key('start', 'from', params) return self.__doTask('list', params) def search(self, **params): """ Performs a search using the inverted file. Parameters: query The search expression. Queries must use the CISIS search language, which is based on the standard CDS-ISIS search language. See http://www.ius.bg.ac.yu/biblioteka/isis_search.html start (Optional) to (Optional) count (Optional) totalonly (Optional) Use totalonly=1 to request the total number of results (no records) """ params = rename_key('start', 'from', params) return self.__doTask('search', params) # TO-DO: rename method to keyrange()? def index(self, **params): """ Retrieves a range of keys from the inverted file. Parameters: start (Optional) Defaults to first key. to (Optional) Defaults to last key. count (Optional) Defaults to 'no limit'. """ params = rename_key('start', 'from', params) return self.__doTask('index', params) def edit(self, **params): """ Attempts to lock a record to allow editing. Returns the record or raises an exception. Parameters: mfn MFN of record. lockid Record lock id. """ resp = self.__doTask('edit', params) if get_status(resp) == '0': return resp else: raise LockedRecord, 'edit' def write(self, content=None, **params): """ Attempts to write a record. Returns the record or raises an exception. Parameters: content The record's content. Must be a tuple, or list, of 2-tuples (tag, value). mfn The record's MFN, or 'New' to add a new record. lockid Record lock id. Example: fields = ( ('100', 'Some value'), ('200', 'Another value') ) db.write(mfn=291, content=fields, lockid='xx') """ if content: content = ''.join([ "H%s %s %s" % (field[0], str(len(field[1])), field[1]) for field in content ]) resp = self.__doTask('write', params, content) if get_status(resp) == '0': return resp else: raise LockedRecord, 'write' def delete(self, **params): """ Attempts to (logically) delete a record. Returns the record or raises an exception. Parameters: mfn MFN of record. lockid Record lock id. """ resp = self.__doTask('delete', params) if get_status(resp) == '0': return resp else: raise LockedRecord, 'delete' def control(self, **params): """ Allows to create new databases and to perform several tasks on existing databases. Parameters: function The control function to execute ('unlock', 'invert', 'status', 'create'). create If function='create', then create={'master'|'inverted'|'database'} creates the specified type of file(s). unlock If function='unlock', then unlock='control' unlocks only the database's control record. """ return self.__doTask('control', params) # And these are some convenient shortcuts def invert(self): """ Generates the inverted file. """ return self.control(function='invert') fullinv = invert def unlock(self): """ Unlocks the master file and all locked records. Parameters: unlock (Optional) If unlock='control', only the database's control record is unlocked; otherwise, also all locked records are unlocked. """ return self.control(function='unlock') def get_status(self): """ Returns information about the current status of database files. """ return self.control(function='status') # This method was not available in wxis-modules, but is useful for cleaning # user-supplied queries. def extract(self, **params): """ Returns the keys extracted from the passed data, using wxis's builtin mechanism, and optionally specifying custom stw, actab and uctab parameters. The method is in fact not associated with a specific IsisDb instance, though it could be useful to use the same stw, actab & uctab parameters used by the present IsisDb instance. Parameters: data The string from which to extract the keys. tech FST technique (4 to extract words). """ return self.__doTask('extract', params) # Exceptions class IsisError(Exception): # Base class pass #class ConnectionError(IsisError): # # For errors connecting with the server # def __str__(self): # return "Error while connecting to the database server" class WxisHardError(IsisError): # For errors thrown by wxis (execution, fatal, file) def __init__(self, error): suggestion = '' if '|recread/xropn/w|' in error: suggestion = 'In other words, WXIS could not write to the disk. Check file and/or directory permissions for the web server user.' elif '|dbxopen:' in error: suggestion = 'In other words, WXIS could not open the database. Check that the files do exist and have read permissions for the web server user.' elif '|unavoidable|recisis0/xrf|' in error: suggestion = 'In other words, WXIS found problems trying to write something. Check database path and permissions for the web server user.' self.msg = "\n\n %s\n\n%s" % (error, suggestion) def __str__(self): return self.msg class WxisSoftError(IsisError): # For errors thrown by a script (missing parameter) def __init__(self, error): self.msg = error def __str__(self): return self.msg class BadResponseError(IsisError): # For ill formed responses (with no wxis error) preventing the use of eval() def __init__(self, resp): self.msg = "The database server returned an ill-formed response. Check commas, quotes, braces, and brackets:\n\n%s" % resp def __str__(self): return self.msg class LockedRecord(IsisError): # Isis_Status different from 0 when attempting to write a record def __init__(self, action): self.msg = "Can't %s record -- Record is locked" % action def __str__(self): return self.msg class DatabaseDoesNotExist(IsisError): def __init__(self, dbname): self.msg = "The database %s could not be found" % dbname def __str__(self): return self.msg # NOTE: check what other specific error codes may be returned by WXIS, described # in the documents cited above (Elysio, etc). ######################################################################### # Tests ######################################################################### """ This is a simple test of the code, which also shows how to use the API. TO-DO: * compare the actual output with the expected output, so that errors may be automatically detected. * create a database from textual data (e.g. the usual CDS as .id or .iso) Should we have an extra method, load_iso(), using wxis's tag? Not sure, since importing/exporting a database should probably not be done through HTTP... But for a purely local test this would be no problem. * show use of actab, uctab, stw, gizmo? * besides calling wxis, also show how to manipulate the data in Python, i.e. how to replace the formatting language: - display a list of records - display record details - display database status - use templates ("$"-based substitutions) to format output: http://docs.python.org/lib/node40.html - also use the usual "%"-based substitutions * special case: MARC records (using pymarc) """ """ Original usage examples: 1) Browse index keys >>> db = IsisDb('/home/fer/bases/testdb') >>> res = db.index(count=10, start='za') >>> [term['Isis_Key'] for term in res['terms']] ['ZAANEN', 'ZABCZYK', 'ZABRODSKY', 'ZACKS', 'ZADACH', 'ZADACHA', 'ZADACHAKH', 'ZADACHI', 'ZADATCH', 'ZADEH'] 2) Search -- TO-DO: simplify using functions >>> res = db.search(query='marsden') >>> import re >>> titles = [ unicode(re.sub('\^\w', ' ', field['value'][4:]), 'latin1') for rec in res['records'] for field in rec['fields'] if field['tag'] == '245' ] >>> titles.sort() >>> print '\n'.join([ '(%s) %s' % (n, t) for (n, t) in zip(range(1, len(titles)+1), titles) ]) (1) A mathematical introduction to fluid mechanics / A. J. Chorin and J. E. Marsden. (2) Algebraic aspects of integrable systems : in memory of Irene Dorfman / A. S. Fokas and I. M. Gelfand, editors. (3) Análisis clásico elemental / Jerrold E. Marsden, Michael J. Hoffman ; versión en español, Oscar Alfredo Palmas Velasco ; colaboración técnica, José Antonio Cuesta Ruiz. (4) Basic complex analysis / Jerrold E. Marsden, Michael J. Hoffman. (5) Calculus / Jerrold Marsden, Alan Weinstein. (6) Cálculo vectorial / Jerrold E. Marsden, Anthony J. Tromba ; traducción: Patricia Cifuentes Muñiz ... [et al.] ; revisión técnica: Eugenio Hernández Rodríguez. (7) Integration algorithms and classical mechanics / Jerrold E. Marsden, George W. Patrick, William F. Shadwick, editors. (8) New directions in applied mathematics : papers presented April 25/26, 1980, on the occasion of the Case centennial celebration / edited by Peter J. Hilton and Gail S. Young ; with contributions by Kenneth Baclawski ... [et al.]. (9) Student's guide to Calculus by J. Marsden and A. Weinstein. Volume 2 / Frederick H. Soon. (10) Vector calculus / Jerrold E. Marsden, Anthony J. Tromba. """ def test(): import os from pprint import pprint def display_status(db): resp = db.get_status() status = resp['database']['status'] pprint(status) def display_records(resp): """A simple way to display records.""" pprint(resp['records']) def section(msg): """Displays a header for each section of the test.""" line = '-'*40 print print line print msg.upper() print line TEST_DB = 'cds' TEST_DIR = 'test' path = os.path.join(os.getcwd(), TEST_DIR) testdb = os.path.join(path, TEST_DB) # create an IsisDb instance db = isis.IsisDb(testdb) # check db status section('check db status') display_status(db) ##################################### section('list some records') ##################################### resp = db.do_list(start=10, count=2) display_records(resp) # create an FST, or use an existing one ##################################### section('generate the inverted file') # TO-DO: specify actab, uctab, stw ##################################### resp = db.invert() status = resp['database']['status'] if status == 'inverted': # why is this check here? should it be catched earlier, and throw an exception? print 'Database was inverted.' else: print 'Some error occurred, database was not inverted.' section('check db status') display_status(db) ##################################### section('list some keys') ##################################### resp = db.index(start='W', count=10) print [term['Isis_Key'] for term in resp['terms']] ##################################### section('do a search') ##################################### resp = db.search(query='water', count=2) display_records(resp) ##################################### section('lock a record for editing') ##################################### from time import strftime some_mfn = 10 # arbitrary mylockid = 'test %s' % strftime("%Y%m%d %H%M%S") try: resp = db.edit(mfn=some_mfn, lockid=mylockid) pprint(resp) except isis.LockedRecord: print "Record %s is locked, can't be edited now." % some_mfn # TO-DO: attempt to edit, delete or write a locked record ##################################### section('create a new record') ##################################### fields = ( ('100', 'Some value'), ('200', 'Another value') ) try: resp = db.write(mfn='New', content=fields, lockid=mylockid) except: # what kind of exception?? print 'Record could not be written' # display the new record's MFN or error msg newmfn = resp['record']['mfn'] print 'Record was saved. MFN: %s' % newmfn section('check db status') display_status(db) ##################################### section('retrieve the new record') ##################################### resp = db.do_list(start=newmfn, count=1) #resp = db.search(query='') display_records(resp) ##################################### section('unlock records') ##################################### resp = db.unlock() pprint(resp) section('check db status') display_status(db) ##################################### section('delete the new record') ##################################### try: resp = db.delete(mfn=newmfn, lockid=mylockid) except isis.LockedRecord: print "Record %s is locked, can't be deleted now." % some_mfn pprint(resp) section('check db status') display_status(db) # TODO: also show how to clean query using Python only ##################################### section('clean a dirty query') ##################################### query = ' water plants ' resp = db.extract(data=query) newquery = ' AND '.join(resp['terms']) resp = db.search(query=newquery) display_records(resp) if __name__ == '__main__': test() ===== config.py ===== # ------------------------------------------------------------------- # Configuration file for module wxis. # ------------------------------------------------------------------- # Host, port and path to access wxis via HTTP. # Adjust according to your server. # Example: if wxis is accessible through the URL # "http://127.0.0.1:8000/cgi-bin/isis/wxis", then you have # HOST = '127.0.0.1' # PORT = '8000' # PATH = '/cgi-bin/isis/wxis' HOST = '127.0.0.1' PORT = '80' PATH = '/cgi-bin/isis/wxis' # Path of the directory where the *.xis files live. # Use an absolute filesystem path, or (better) a path relative to wxis's # location. # For example, if the scripts are in directory 'py-wxis-modules' under the # directory containing wxis, then you have SCRIPT_DIR = 'py-wxis-modules' # # IMPORTANT!! IF YOU CHANGE THIS VALUE, YOU MUST ALSO UPDATE THE *.xis FILES. # (sorry, it seems to be a wxis limitation that requires hardcoded paths in # tags) SCRIPT_DIR = 'py-wxis-modules' {{tag>desarrollo isis python}}