====== Acceso a bases Malete desde Python ====== Luego de haber logrado trabajar con bases CDS/ISIS desde Python, vía WXIS, ahora queremos hacer algo similar con bases Malete. ===== Tareas pendientes ===== ==== Ajustes en el código ==== * revisar si tag debe ser int (qué pasa si es string, '245'?). **ATENCION**: hay un serio problema si tag es de tipo numérico: un tag que comience con //0//, e.g. //020//, será interpretado por Python como un número octal (en este caso, 16). O bien, si los dígitos están fuera del rango 0-7, se producirá un error, e.g. //090//. Por lo tanto, para poder trabajar tranquilos con tags que tengan ceros iniciales, parece que estamos obligados a usar strings. Pymarc 2.0 usa strings para los tags. (Por suerte, [[http://www.python.org/dev/peps/pep-3127/|Python 3.0 solucionaría este problema]].) * if self.db and self.db.fdt and (tag in self.db.fdt): ver qué pasa si fdt[tag] no existe * IsisServer.request: cambiar el orden de "if numOnly == 0:" ? * def parse(self, text, repl=None): indicar return value * Revisar las llamadas a funciones con parámetros "None": usar keywords. Ejemplo: req('W', None, None, algo) * Añadir docstring a metodos. ==== Otras cuestiones ==== * Ver qué deben devolver los métodos: JSON/diccionarios * Excepciones? E.g. base no se puede escribir. * Base no existente? * Los métodos que leen registros o mfns (read, query) deberían devolver la lista completa, no de a 20. * Ver qué se puede hacer para simplificar la indización, incluyendo una opción para que se efectúe en forma automática luego de una grabación. Tal vez el método write() deba tener un parámetro ''index'' que por defecto valga ''True'', para activar indización post-grabación, usando una "fst" asociada a la base de datos. Análogamente, un método delete() tendría un parámetro ''index'' con default ''True'' para eliminar del índice las claves correspondientes. Ver borrado de claves viejas al re-indizar un registro ya existente. Código de muestra: def fst(r, delete=False): idx = () if delete: idx = (0, 'd') idx.append( 0, 's', 245, r.get(245), 0, 'f', 100, r.get(100), ) return idx r = IsisRec(...) idx = IsisRec(fst(r)) db.write(r) db.index(idx) db.delete(r) idx = IsisRec(fst(r, True)) db.index(idx) * Acceso concurrente a un registro, bloqueo. ===== Primer intento: sockets ===== Este es un primer intento de comunicación entre Python y Malete: """ Connects to malete server, sending queries. Usage: python sock.py Based on * Example 16.2. TCP Timestamp Client (tsTclnt.py) from Core Python Programing, 2nd ed. * Tutorial on Network Programming with Python """ from socket import * def main(): sock = socket(AF_INET, SOCK_STREAM) sock.connect(('127.0.0.1', 2042)) flo = sock.makefile('r', 0) # flo = file-like object # indicates that no more data is coming END_OF_RESPONSE = '\n' while True: query = raw_input('> Query: ') if not query: break msg = 'test.Q\t%s\n\n' % query sock.send(msg) for line in flo: # is the file-like object automatically reset for every query? print line, # should we print also the last, empty line? if line == END_OF_RESPONSE: break # no more lines to read sock.close() if __name__ == "__main__": main() ===== Port del código PHP de Malete ===== Versión de: 2008-03-28 # coding=utf-8 """ malete A module for accessing Malete databases. This is essentially a Python port of the original PHP code included with the Malete distribution. See http://malete.org/Doc/DownLoad MIT License (c) 2008 Fernando J. Gómez / INMABB / Conicet Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. """ # FIELD mode replaces newlines with tabs. # On deserializing, these tabs are not converted back to newline. # Do not use if you need to retain newline information. ISIS_REC_FIELD = '\t' # ASCII Tab # TEXT mode replaces newlines with vertical tabs. # Vertical tabs are converted back to newlines only when explicitly # deserializing in TEXT mode, since it's not transparent to binary data. ISIS_REC_TEXT = '\v' # ASCII Vertical Tab (VT) # PHP has a strspn() function; this is an implementation in Python. # Source: http://mail.python.org/pipermail/python-list/2003-November/237085.html import re def strspn(s, t): # kinda slow way to construct the pattern, but it does correctly # handle the '-' and ']' cases. Usually one would write the regexp # directly and not try to get close to the C API. pat = re.compile( "(" + "|".join(map(re.escape, t)) + ")*" ) m = pat.match(s) if not m: return 0 return m.end() class IsisRec(): """ An ISIS(/IIF/Z39.2/ISO2709)-style record in pure Python. This is only loosely connected to an Isis Database, most functions can be used without having a DB. """ def __init__(self, *args): """ Parameters: tag, value[, tag, value [...]] Example: r = malete.IsisRec( 10, 'Value for field 10', 20, 'Value for field 20' ) """ self.db = 0 self.mfn = 0 self.head = '' self.tag = [] self.val = [] if args: self.add(args) # FIXME: args is a tuple, should be splitted def __len__(self): """Counts the fields.""" return len(self.tag) def __str__(self): return '--\n%s--' % self.toString() def fdt(self, tag): """ Tries to lookup non-numeric tags in the fdt. Parameters: tag (int) A numeric tag. """ if not isinstance(tag, int): if self.db and self.db.fdt and (tag in self.db.fdt): tag = self.db.fdt[tag] return tag def get(self, tag): """ Gets all values for a tag as a list. FIXME: tags with leading zeros are treated as octal, e.g. >>> tag = 020 >>> tag 16 >>> print 0101 65 How can this situation be detected? Parameters: tag (int) A numeric tag. """ tag = self.fdt(tag) values = [v for (t, v) in zip(self.tag, self.val) if t == tag] return values def recs(self, db=None): """ Returns a list of subrecords. Parameters: db (Optional) A database, so that records know which db they belong to. """ ret = [] # clone lists, so we can use pop() safely tag = list(self.tag) val = list(self.val) while tag: t, v = tag.pop(0), val.pop(0) if t < 0: # negative tag => -(number of fields in record) # create a new record r = IsisRec() r.db = db r.head = v # TO-DO: r.mfn ?? i = -int(t) - 1 # add next i fields to the new record while i > 0 and tag: i -= 1 t, v = tag.pop(0), val.pop(0) r.tag.append(t) r.val.append(v) #print '%s -- %s' % (t, v) ret.append(r) return ret def append(self, tag, val): """ Appends a new field (tag-value pair) to the end of the record. TO-DO: check use of isinstance() in Python FIXME - is_numeric() Parameters: tag (int) A numeric tag. val The field's value. """ if not isinstance(tag, int): tag = self.fdt(tag) # echo "0\tappending $tag ",gettype($val),"\n" if isinstance(val, str) or isinstance(val, int): # or is_numeric(val) self.tag.append(tag) self.val.append(val) elif isinstance(val, list): for v in val: self.append(tag, v) elif isinstance(val, object): self.embed(val) return val def add(self, *args): """ Adds a list to the record. Returns the number of added fields. See docs at Rec.php. Parameters: args A list of the form [tag, value[, tag, value[...]]] Example: rec.add([100, 'Field 100', 200, 'Field 200']) """ added = 0 fdt = self.db and self.db.fdt or None # line omitted here args = list(args[0]) # FIXME (tuples vs. lists) --- this works when called from __init__, but not in general while args: i = args.pop(0) #print i if isinstance(i, int): if not self.append(i, args.pop(0)) is None: added += 1 elif isinstance(i, list): added += self.add(i) # recursive add elif i == '-mfn': self.mfn = args.pop(0) elif i == '-db': self.db = args.pop(0) fdt = self.db.fdt elif fdt and i in fdt and isinstance(fdt[i], int): if self.append(fdt[i], args.pop(0)) is not None: added += 1 elif i == ISIS_REC_TEXT: added += self.parse(args.pop(0), ISIS_REC_TEXT) else: added += self.parse(i) return added # NOTE: not in Rec.php def pack(self): pass # pack is not needed in Python, since del() also shifts indices, leaving no 'holes'. def rm(self, pos): """ Removes a field at the given pos. Parameters: pos (int) The position (index) to remove. """ del self.tag[pos] del self.val[pos] def delete(self, tag=None): """ Removes all fields, or all fields with a given tag. Note: We use 'delete' since 'del' is a reserved keyword in Python. Parameters: tag (Optional) Tag to be removed; if not present, all fields are removed. """ if tag is None: self.tag = [] self.val = [] else: if not isinstance(tag, int): tag = self.fdt[tag] for i, t in enumerate(self.tag): if t == tag: self.rm(i) def set(self, tag, *values): """ Sets fields with tag to values. TO-DO: if only tag is given, with no values, it behaves like delete(tag). Is this correct? Parameters: tag (int) A numeric tag. values One or more values. See docs in Malete's Rec.php. """ if not isinstance(tag, int): tag = self.fdt(tag) ary = None # isolate those indices in self.tag associated to tag, e.g. if there are 3 occs of tag '700' # in positions 6, 7, 9, then tag_positions = [6, 7, 9] tag_positions = [i for i, v in enumerate(self.tag) if v == tag] values = list(values) # make the tuple a list while True: # First step: get the next value to set/add if ary: # ary non empty value = ary.pop(0) #print "ary.pop(0): %s" % value #if not ary: # the list is now empty # ary = None # continue else: if not values: break value = values.pop(0) if isinstance(value, list): ary = value continue #print "setting '%s'" % value # Second step: do something using the value # if value is an integer, it has an special meaning if isinstance(value, int): #print 'integer value: %s' % value # if value is the integer 0, processing stops (i.e. remaining occurrences are left unchanged) if not value: #self.display() return # if value is a positive integer n, processing skips n occurrences (letting them unchanged) #print 'value: %s' % value for i in range(value): if tag_positions: tag_positions.pop(0) continue # now value is finally a value to set/add #print "setting '%s'" % value if tag_positions: # the first len(values) occurrences are set to the provided values self.val[tag_positions.pop(0)] = value continue # if there are less than len(values) occurrences, the remaining values are appended self.append(tag, value) # if there are more than len(values) occurrences, the remaining occurrences are deleted # NOTE: after each call to self.rm() indices in self.tag are shifted (towards 0), and thus tag_positions is not what we need. # To avoid this problem, loop in reversed order. for i in reversed(tag_positions): #print 'removing pos. ' + str(i) self.rm(i) #self.display() def embed(self, other_rec): """ Transparently embeds a record. Used from write() in IsisDb. Parameters: other_rec IsisRec """ i = len(other_rec) self.append(-i-1, other_rec.head) for t, v in zip(other_rec.tag, other_rec.val): self.tag.append(t) self.val.append(v) i -= 1 if i == 0: break def toString(self, mode=ISIS_REC_TEXT): """ Serializes record to a string. Parameter: mode replacement value for newlines """ s = '' if len(self.head): # is it enough with "if self.head" ? if '0' <= self.head[0] <= '9': s += "W\t" s += self.head + '\n' for t, v in zip(self.tag, self.val): s += '%s\t%s\n' % (t, str(v).replace('\n', mode)) # str() because v may be numeric return s def parse(self, text, repl=None): """ Parses a string representation of a record. Returns ?? Parameters: text repl String to be converted back to newlines. Use ISIS_REC_TEXT, if you know text is from toString(ISIS_REC_TEXT) """ # need compact array in order to reliably know last index lines = text.split("\n") if lines and len(lines[0]): line = lines[0] if not '0' <= line[0] <= '9': self.head = line lines.pop(0) for conv,line in enumerate(lines): if '' == line: # blank line or trailing newline continue dig = strspn(line, '0123456789-') t = dig and int(line[:dig]) or 0 o = ("\t" == line[dig]) v = line[dig+o:] if repl: v = v.replace(repl, "\n") self.tag.append(t) self.val.append(v) return conv class IsisDb(): """ This class represents a "database". It has a method for each of the standard Malete messages for databases: write, read, query, index, and terms. """ def __init__(self, fdt=None, name=None, server=None): self.fdt = fdt self.name = name self.srv = server def req(self, type, arg, emb=None, lst=None, ct=0): """ Internal helper to construct and send a request. Parameters: type The type of message (R, W, Q, T, X) arg Arguments to be added to the request's header emb A list of IsisRecs to be embedded in the request's body lst A list of parameters, to be added to the request's body as fields with tag 0 ct numOnly? """ req = IsisRec() req.head = '%s.%s' % (self.name, type) if arg: req.head += '\t' + arg if emb: #print 'emb:', emb for r in emb: req.embed(r) if lst: for l in lst: req.append(0, l) #print "req:\n%s" % req return self.srv.request(req, ct) def query(self, expr=None, recs=True): """ Parameters: expr If None, fetch more results from previous query recs If True, fetch a list of records, else of mfns """ if expr and recs and '?' not in expr: expr += '?' # force fetch records ret = self.req('Q', expr) # ret is an IsisRec instance return recs and ret.recs(self) or ret.get(0) def read(self, mfn): """ Read one or a list of mfns. Returns one or a list of records. Parameters: mfn a single mfn, or a list of mfns """ if isinstance(mfn, list): # is mfn a list? ret = self.req('R', None, None, mfn) return ret.recs(self) else: #ret = self.req('R', None, None, list(mfn)) ret = self.req('R', str(mfn)) recs = ret.recs(self) return recs[0] def terms(self, start, to=None): if to is not None: start += '\t' + to ret = self.req('T', start) #return ret.get(0) raw_list = ret.get(0) # ["Count1\tTerm1", "Count2\tTerm2", ...] r = [] for t in raw_list: data = t.split('\t') r.append({'key': data[1], 'count': data[0]}) return r def write(self, rec): """ Writes one or a list of records. Returns a list of mfns written. WARNING: check write permissions on the database files. Parameters: rec a single IsisRec, or a list of IsisRecs """ if not isinstance(rec, list): rec = list((rec,)) # make a list from a single element ret = self.req('W', None, rec) return ret.get(0) def index(self, req): """ Unlike the other methods, this expects 'req' to be a prepared X request. However, name.X is prepended. Returns res.head, which should be a comment. """ pfx = self.name + '.X' if req.head: req.head = pfx + '\t' + req.head else: req.head = pfx res = self.srv.request(req) return res.head class IsisServer(): """ This class represents the connection to an Isis server. In general, a server is any object having a request function, accepting a single IsisRec parameter and returning an IsisRec. This implementation is based on a TCP or UNIX socket. See: * Example 16.2. TCP Timestamp Client (tsTclnt.py) from Core Python Programing, 2nd ed. * Tutorial on Network Programming with Python * Socket Programming HOWTO """ def __init__(self, host=None, port=2042, pers=0): if not host: import os if 'ISIS_SERVER' in os.environ: host = os.environ['ISIS_SERVER'] else: host = 'localhost' self.host = host self.port = port self.pers = pers # persistent connection (in Python?) self.dbg = False self.open() def open(self): # Persistence?? import socket sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) try: sock.connect((self.host, self.port)) except socket.error: print 'Error connecting to the Malete server. Check that it is running.' self.sock = None else: self.sock = sock.makefile('w', 0) # file object associated with the socket return self.sock def request(self, req, numOnly=0): if not self.sock and not self.open(): return None if self.dbg: sys.stderr.write("SEND\n" + req.toString(ISIS_REC_TEXT)) # toString: serializes record self.sock.write(req.toString(ISIS_REC_TEXT) + "\n") #self.sock.flush() needed?? txt = '' if numOnly == 0: # return the retrieved records for line in self.sock: if line != '\n': if self.dbg: sys.stderr.write("RETR " + line) txt += line else: break res = IsisRec() res.parse(txt, ISIS_REC_TEXT) # de-serialize record if self.dbg: sys.stderr.write("GOT " + res.toString()) return res else: # return only the number of retrieved records for line in self.sock: if line != '\n': if line[0] == '#': inf = line.split('\t') else: break return inf[1] or 0 ######################################################################### # Tests ######################################################################### def test(): """ Some tests ported from malete's demo.php. Tests involving record formatting have been excluded here. 2008-03-26: Output coincides with that of the PHP demo. """ def section(title): sep = '-'*40 print '%s\n%s\n%s' % (sep, title.upper(), sep) fdt = { 'title': 24, 'author': 70, 'keywords': 69 } db = IsisDb(fdt, 'test') subs = 'initial aParis bUnesco b c-1965' # NOTE: this includes TABs! r = IsisRec( '-db', db, # first some lines from CDS, some using field names, some plain int tags 'keywords', 'Paper on: ', 'author', 'Magalhaes, A.C.', 24, ' Controlled climate in the plant chamber', 76, 'Les Politiques de la communication en Yougoslavie zfre', 'author', 'Franco, C.M.', 26, subs, # a field to test delete 77, 'ave Caesar', # a field using tab as subfield separator 42, "foo\tbar\tbaz", # a field containing newline 99, "two\nlines", # a serialized record (as of toString) as parameter "70\tyet another author\n99two more\n99lines\na 0 field\n42\tthe\tanswer" ) ############################################ section('dump of record') ############################################ print 'Record has %s fields' % len(r) print r r.delete(77) # ... morituri te salutant ############################################ section('embedding and TEXT mode') ############################################ q = IsisRec(77, 'sunset strip') # create a new record q.embed(r) # embed r into the new record s = q.toString(ISIS_REC_TEXT) print 'Record embedded\n\n%s\n\n' % s q.delete() # restore from the string q.parse(s, ISIS_REC_TEXT) recs = q.recs() r = recs[0] r.db = db print 'Record restored\n\n%s\n\n' % r ############################################ section('set operator') ############################################ r.set('title', 'new title', 'second new title') r.set(99, 'now a oneliner') r.set('author', [1, 'Blanco', 0]) print "\n%s\n" % r ############################################ section('Server') ############################################ db = IsisDb(fdt, 'test', IsisServer()) if not db.srv.sock: print "could not contact server" exit() # terms beginning with 'a' terms = db.terms('a') print "got %s terms for 'a'" % len(terms) #for cnt, term in [t.split('\t') for t in terms]: for t in terms: print "'%s' (%s)" % (t['key'], t['count']) # query reading records recs = db.query('plant water') print "\ngot %s records for query 'plant water'" % len(recs) for r in recs: print '%s\n' % r # query reading mfns query = 'plant + water + devel$' mfns = db.query(query, False) print "Query: '%s'" % query while mfns: print "got %s mfns: %s" % (len(mfns), ','.join(mfns)) mfns = db.query(None, False) print print "reading 42, 43" recs = db.read([42, 43]) for r in recs: print "\n%s" % r print "reading 42" r = db.read(42) print "\n%s\n" % r print "writing 42" r.append('author', 'one more author') print "\n%s\n" % r mfns = db.write(r) print "wrote %s mfns: %s\n" % (len(mfns), ','.join(mfns)) print "writing 42 as new record" r.head = '' mfns = db.write(r) print "wrote %s mfns: %s\n" % (len(mfns), ','.join(mfns)) print "indexing author fields as 70 in split mode" idx = IsisRec() idx.head = 's' idx.set(70, r.get('author')) print "\n%s\n" % idx res = db.index(idx) print "got %s\n" % res print "query 'one' near 'author'" mfns = db.query('one .. author', False) print "got %s mfns: %s" % (len(mfns), ','.join(mfns)) if __name__ == '__main__': test() {{tag>python malete}}