User Tools

Site Tools


python_y_malete

Acceso a bases Malete desde Python

Luego de haber logrado trabajar con bases CDS/ISIS desde Python, vía WXIS, ahora queremos hacer algo similar con bases Malete.

Tareas pendientes

Ajustes en el código

  • revisar si tag debe ser int (qué pasa si es string, '245'?). ATENCION: hay un serio problema si tag es de tipo numérico: un tag que comience con 0, e.g. 020, será interpretado por Python como un número octal (en este caso, 16). O bien, si los dígitos están fuera del rango 0-7, se producirá un error, e.g. 090. Por lo tanto, para poder trabajar tranquilos con tags que tengan ceros iniciales, parece que estamos obligados a usar strings. Pymarc 2.0 usa strings para los tags. (Por suerte, Python 3.0 solucionaría este problema.)
  • if self.db and self.db.fdt and (tag in self.db.fdt): ver qué pasa si fdt[tag] no existe
  • IsisServer.request: cambiar el orden de “if numOnly == 0:” ?
  • def parse(self, text, repl=None): indicar return value
  • Revisar las llamadas a funciones con parámetros “None”: usar keywords. Ejemplo: req('W', None, None, algo)
  • Añadir docstring a metodos.

Otras cuestiones

  • Ver qué deben devolver los métodos: JSON/diccionarios
  • Excepciones? E.g. base no se puede escribir.
  • Base no existente?
  • Los métodos que leen registros o mfns (read, query) deberían devolver la lista completa, no de a 20.
  • Ver qué se puede hacer para simplificar la indización, incluyendo una opción para que se efectúe en forma automática luego de una grabación. Tal vez el método write() deba tener un parámetro index que por defecto valga True, para activar indización post-grabación, usando una “fst” asociada a la base de datos. Análogamente, un método delete() tendría un parámetro index con default True para eliminar del índice las claves correspondientes. Ver borrado de claves viejas al re-indizar un registro ya existente. Código de muestra:
def fst(r, delete=False):
    idx = ()
    if delete:
        idx = (0, 'd')
    idx.append(
        0,   's',
        245, r.get(245),
        0,   'f',
        100, r.get(100),
    )
    return idx
 
r = IsisRec(...)
idx = IsisRec(fst(r))
db.write(r)
db.index(idx)
 
db.delete(r)
idx = IsisRec(fst(r, True))
db.index(idx)
  • Acceso concurrente a un registro, bloqueo.

Primer intento: sockets

Este es un primer intento de comunicación entre Python y Malete:

"""
Connects to malete server, sending queries.
Usage: python sock.py 
 
Based on
 
    * Example 16.2. TCP Timestamp Client (tsTclnt.py) from Core Python Programing, 2nd ed.
    * Tutorial on Network Programming with Python <http://heather.cs.ucdavis.edu/~matloff/Python/PyNet.pdf>
"""
 
from socket import *
 
def main():
    sock = socket(AF_INET, SOCK_STREAM)
    sock.connect(('127.0.0.1', 2042))
    flo = sock.makefile('r', 0)  # flo = file-like object
 
    # indicates that no more data is coming
    END_OF_RESPONSE = '\n'
 
    while True:
        query = raw_input('> Query: ')
 
        if not query:
            break
        msg = 'test.Q\t%s\n\n' % query
        sock.send(msg)
        for line in flo:   # is the file-like object automatically reset for every query?
            print line,  # should we print also the last, empty line?
            if line == END_OF_RESPONSE:
                break  # no more lines to read
 
    sock.close()
 
 
 
if __name__ == "__main__":
    main()

Port del código PHP de Malete

Versión de: 2008-03-28

# coding=utf-8
 
"""
malete
A module for accessing Malete databases.
This is essentially a Python port of the original PHP code included with
the Malete distribution. See http://malete.org/Doc/DownLoad
 
MIT License <http://www.opensource.org/licenses/mit-license.php>
 
(c) 2008 Fernando J. Gómez / INMABB / Conicet
 
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
 
The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.
 
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.
"""
 
# FIELD mode replaces newlines with tabs.
# On deserializing, these tabs are not converted back to newline.
# Do not use if you need to retain newline information.
ISIS_REC_FIELD = '\t'  # ASCII Tab
 
# TEXT mode replaces newlines with vertical tabs.
# Vertical tabs are converted back to newlines only when explicitly
# deserializing in TEXT mode, since it's not transparent to binary data.
ISIS_REC_TEXT = '\v'  # ASCII Vertical Tab (VT)
 
 
# PHP has a strspn() function; this is an implementation in Python.
# Source: http://mail.python.org/pipermail/python-list/2003-November/237085.html
import re
def strspn(s, t):
    # kinda slow way to construct the pattern, but it does correctly
    # handle the '-' and ']' cases.  Usually one would write the regexp
    # directly and not try to get close to the C API.
    pat = re.compile(
        "(" + "|".join(map(re.escape, t)) + ")*"
    )
    m = pat.match(s)
    if not m:
        return 0
    return m.end()
 
 
class IsisRec():
    """
    An ISIS(/IIF/Z39.2/ISO2709)-style record in pure Python.
 
    This is only loosely connected to an Isis Database,
    most functions can be used without having a DB.
    """
 
    def __init__(self, *args):
        """
        Parameters:
            tag, value[, tag, value [...]]
 
        Example:
            r = malete.IsisRec(
                10, 'Value for field 10',
                20, 'Value for field 20'
            )    
        """
        self.db = 0
        self.mfn = 0
        self.head = ''
        self.tag = []
        self.val = []
        if args:
            self.add(args)    # FIXME: args is a tuple, should be splitted
 
    def __len__(self):
        """Counts the fields."""
        return len(self.tag)
 
    def __str__(self):
        return '--\n%s--' % self.toString()
 
    def fdt(self, tag):
        """
        Tries to lookup non-numeric tags in the fdt.
 
        Parameters:
            tag    (int) A numeric tag.
        """
        if not isinstance(tag, int):
            if self.db and self.db.fdt and (tag in self.db.fdt):
                tag = self.db.fdt[tag]
        return tag
 
    def get(self, tag):
        """
        Gets all values for a tag as a list.
        FIXME: tags with leading zeros are treated as octal, e.g.
            >>> tag = 020
            >>> tag
            16
            >>> print 0101
            65
        How can this situation be detected?
 
        Parameters:
            tag    (int) A numeric tag.
        """
        tag = self.fdt(tag)
        values = [v for (t, v) in zip(self.tag, self.val) if t == tag]
        return values
 
    def recs(self, db=None):
        """
        Returns a list of subrecords.
 
        Parameters:
            db    (Optional) A database, so that records know which db they belong to.
        """
        ret = []
        # clone lists, so we can use pop() safely
        tag = list(self.tag)
        val = list(self.val)
 
        while tag:
            t, v = tag.pop(0), val.pop(0)
            if t < 0:  # negative tag => -(number of fields in record)
                # create a new record
                r = IsisRec()
                r.db = db
                r.head = v
                # TO-DO: r.mfn ?? 
                i = -int(t) - 1
                # add next i fields to the new record
                while i > 0 and tag:
                    i -= 1
                    t, v = tag.pop(0), val.pop(0)
                    r.tag.append(t)
                    r.val.append(v)
                    #print '%s -- %s' % (t, v)
                ret.append(r)
        return ret
 
    def append(self, tag, val):
        """
        Appends a new field (tag-value pair) to the end of the record.
        TO-DO: check use of isinstance() in Python
        FIXME - is_numeric()
 
        Parameters:
            tag    (int) A numeric tag.
            val    The field's value.
        """
        if not isinstance(tag, int):
            tag = self.fdt(tag)
        # echo "0\tappending $tag ",gettype($val),"\n"
        if isinstance(val, str) or isinstance(val, int):  # or is_numeric(val)
            self.tag.append(tag)
            self.val.append(val)
        elif isinstance(val, list):
            for v in val:
                self.append(tag, v)
        elif isinstance(val, object):
            self.embed(val)
        return val
 
    def add(self, *args):
        """
        Adds a list to the record.
        Returns the number of added fields.
        See docs at Rec.php.
 
        Parameters:
            args    A list of the form [tag, value[, tag, value[...]]]
 
        Example:
            rec.add([100, 'Field 100', 200, 'Field 200'])
        """
        added = 0
        fdt = self.db and self.db.fdt or None
        # line omitted here
        args = list(args[0])  # FIXME (tuples vs. lists) --- this works when called from __init__, but not in general
        while args:
            i = args.pop(0)
            #print i
            if isinstance(i, int):
                if not self.append(i, args.pop(0)) is None:
                    added += 1
            elif isinstance(i, list):
                added += self.add(i)  # recursive add
            elif i == '-mfn':
                self.mfn = args.pop(0)
            elif i == '-db':
                self.db = args.pop(0)
                fdt = self.db.fdt
            elif fdt and i in fdt and isinstance(fdt[i], int):
                if self.append(fdt[i], args.pop(0)) is not None:
                    added += 1
            elif i == ISIS_REC_TEXT:
                added += self.parse(args.pop(0), ISIS_REC_TEXT)
            else:
                added += self.parse(i)
        return added # NOTE: not in Rec.php
 
    def pack(self):
        pass  # pack is not needed in Python, since del() also shifts indices, leaving no 'holes'.
 
    def rm(self, pos):
        """
        Removes a field at the given pos.
 
        Parameters:
            pos    (int) The position (index) to remove.
        """
        del self.tag[pos]
        del self.val[pos]
 
    def delete(self, tag=None):
        """
        Removes all fields, or all fields with a given tag.
        Note: We use 'delete' since 'del' is a reserved keyword in Python.
 
        Parameters:
            tag    (Optional) Tag to be removed; if not present, all fields are
                   removed. 
        """
        if tag is None:
            self.tag = []
            self.val = []
        else:
            if not isinstance(tag, int):
                tag = self.fdt[tag]
            for i, t in enumerate(self.tag):
                if t == tag:
                    self.rm(i)
 
    def set(self, tag, *values):
        """
        Sets fields with tag to values.
 
        TO-DO: if only tag is given, with no values, it behaves like delete(tag).
               Is this correct? 
 
        Parameters:
            tag       (int) A numeric tag.
            values    One or more values. See docs in Malete's Rec.php.
        """
        if not isinstance(tag, int):
            tag = self.fdt(tag)
        ary = None
        # isolate those indices in self.tag associated to tag, e.g. if there are 3 occs of tag '700'
        # in positions 6, 7, 9, then tag_positions = [6, 7, 9]
        tag_positions = [i for i, v in enumerate(self.tag) if v == tag]
        values = list(values)   # make the tuple a list
 
        while True:
 
            # First step: get the next value to set/add
 
            if ary:  # ary non empty
                value = ary.pop(0)
                #print "ary.pop(0): %s" % value
                #if not ary:   # the list is now empty
                #    ary = None
                #    continue
            else:
                if not values:
                    break
                value = values.pop(0)
                if isinstance(value, list):
                    ary = value
                    continue
 
            #print "setting '%s'" % value
 
            # Second step: do something using the value
 
            # if value is an integer, it has an special meaning
            if isinstance(value, int):
                #print 'integer value: %s' % value
                # if value is the integer 0, processing stops (i.e. remaining occurrences are left unchanged)
                if not value:
                    #self.display()
                    return
                # if value is a positive integer n, processing skips n occurrences (letting them unchanged)
                #print 'value: %s' % value
                for i in range(value):
                    if tag_positions:
                        tag_positions.pop(0)
                continue
 
            # now value is finally a value to set/add
            #print "setting '%s'" % value
            if tag_positions:
                # the first len(values) occurrences are set to the provided values
                self.val[tag_positions.pop(0)] = value
                continue
            # if there are less than len(values) occurrences, the remaining values are appended
            self.append(tag, value)
 
        # if there are more than len(values) occurrences, the remaining occurrences are deleted
        # NOTE: after each call to self.rm() indices in self.tag are shifted (towards 0), and thus tag_positions is not what we need.
        # To avoid this problem, loop in reversed order.
        for i in reversed(tag_positions):
            #print 'removing pos. ' + str(i)
            self.rm(i)
 
        #self.display()
 
    def embed(self, other_rec):
        """
        Transparently embeds a record.
        Used from write() in IsisDb.
        Parameters:
            other_rec    IsisRec
        """
        i = len(other_rec)
        self.append(-i-1, other_rec.head)
        for t, v in zip(other_rec.tag, other_rec.val):
            self.tag.append(t)
            self.val.append(v)
            i -= 1
            if i == 0:
                break
 
    def toString(self, mode=ISIS_REC_TEXT):
        """
        Serializes record to a string.
        Parameter:
            mode  replacement value for newlines
        """
        s = ''
        if len(self.head):  # is it enough with "if self.head" ?
            if '0' <= self.head[0] <= '9':
                s += "W\t"
            s += self.head + '\n'
        for t, v in zip(self.tag, self.val):
            s += '%s\t%s\n' % (t, str(v).replace('\n', mode))   # str() because v may be numeric 
        return s
 
    def parse(self, text, repl=None):
        """
        Parses a string representation of a record. Returns ??
        Parameters:
            text  
            repl  String to be converted back to newlines. Use ISIS_REC_TEXT,
                  if you know text is from toString(ISIS_REC_TEXT)
        """
        # need compact array in order to reliably know last index
        lines = text.split("\n")
        if lines and len(lines[0]):
            line = lines[0]
            if not '0' <= line[0] <= '9':
                self.head = line
                lines.pop(0)
        for conv,line in enumerate(lines):
            if '' == line:  # blank line or trailing newline
                continue
            dig = strspn(line, '0123456789-')
            t = dig and int(line[:dig]) or 0
            o = ("\t" == line[dig])
            v = line[dig+o:] 
            if repl:
                v = v.replace(repl, "\n")
            self.tag.append(t)
            self.val.append(v)
        return conv
 
 
class IsisDb():
    """
    This class represents a "database". It has a method for each of the standard
    Malete messages for databases: write, read, query, index, and terms.
    """
 
    def __init__(self, fdt=None, name=None, server=None):
        self.fdt = fdt
        self.name = name
        self.srv = server
 
    def req(self, type, arg, emb=None, lst=None, ct=0):
        """
        Internal helper to construct and send a request.
        Parameters:
            type    The type of message (R, W, Q, T, X)
            arg     Arguments to be added to the request's header
            emb     A list of IsisRecs to be embedded in the request's body
            lst     A list of parameters, to be added to the request's body as fields with tag 0 
            ct      numOnly?      
        """
        req = IsisRec()
        req.head = '%s.%s' % (self.name, type)
        if arg:
            req.head += '\t' + arg
        if emb:
            #print 'emb:', emb
            for r in emb:
                req.embed(r)
        if lst:
            for l in lst:
                req.append(0, l)
        #print "req:\n%s" % req
        return self.srv.request(req, ct)
 
    def query(self, expr=None, recs=True):
        """
        Parameters:
            expr  If None, fetch more results from previous query
            recs  If True, fetch a list of records, else of mfns
        """
        if expr and recs and '?' not in expr:
            expr += '?' # force fetch records
        ret = self.req('Q', expr)  # ret is an IsisRec instance
        return recs and ret.recs(self) or ret.get(0)
 
    def read(self, mfn):
        """
        Read one or a list of mfns.
        Returns one or a list of records.
        Parameters:
            mfn    a single mfn, or a list of mfns
        """
        if isinstance(mfn, list):  # is mfn a list?
            ret = self.req('R', None, None, mfn)
            return ret.recs(self)
        else:
            #ret = self.req('R', None, None, list(mfn))
            ret = self.req('R', str(mfn))
            recs = ret.recs(self)
            return recs[0]
 
    def terms(self, start, to=None):
        if to is not None:
            start += '\t' + to
        ret = self.req('T', start)
        #return ret.get(0)
        raw_list = ret.get(0)  # ["Count1\tTerm1", "Count2\tTerm2", ...]
        r = []
        for t in raw_list:
          data = t.split('\t')
          r.append({'key': data[1], 'count': data[0]})
        return r
 
    def write(self, rec):
        """
        Writes one or a list of records.
        Returns a list of mfns written.
        WARNING: check write permissions on the database files.
        Parameters:
            rec    a single IsisRec, or a list of IsisRecs
        """
        if not isinstance(rec, list):
            rec = list((rec,))   # make a list from a single element 
        ret = self.req('W', None, rec)
        return ret.get(0)
 
    def index(self, req):
        """
        Unlike the other methods, this expects 'req' to be a prepared X request.
        However, name.X is prepended.
        Returns res.head, which should be a comment.
        """
        pfx = self.name + '.X'
        if req.head:
            req.head = pfx + '\t' + req.head
        else:
            req.head = pfx
        res = self.srv.request(req)
        return res.head
 
 
class IsisServer():
    """
    This class represents the connection to an Isis server.
	In general, a server is any object having a request function,
	accepting a single IsisRec parameter and returning an IsisRec.
 
	This implementation is based on a TCP or UNIX socket.
 
    See:
        * Example 16.2. TCP Timestamp Client (tsTclnt.py) from Core Python Programing, 2nd ed.
        * Tutorial on Network Programming with Python <http://heather.cs.ucdavis.edu/~matloff/Python/PyNet.pdf>
        * Socket Programming HOWTO <http://www.amk.ca/python/howto/sockets/>
    """
    def __init__(self, host=None, port=2042, pers=0):
        if not host:
            import os
            if 'ISIS_SERVER' in os.environ:
                host = os.environ['ISIS_SERVER']
            else:
                host = 'localhost'
        self.host = host
        self.port = port
        self.pers = pers  # persistent connection (in Python?)
        self.dbg = False
        self.open()
 
    def open(self):
        # Persistence??
        import socket
        sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
        try:
            sock.connect((self.host, self.port))
        except socket.error:
            print 'Error connecting to the Malete server. Check that it is running.'
            self.sock = None
        else:
            self.sock = sock.makefile('w', 0)  # file object associated with the socket
        return self.sock
 
    def request(self, req, numOnly=0):
        if not self.sock and not self.open():
            return None
        if self.dbg:
            sys.stderr.write("SEND\n" + req.toString(ISIS_REC_TEXT))   # toString: serializes record
        self.sock.write(req.toString(ISIS_REC_TEXT) + "\n")
        #self.sock.flush()   needed??
        txt = ''
        if numOnly == 0:
            # return the retrieved records
            for line in self.sock:
                if line != '\n':
                    if self.dbg:
                        sys.stderr.write("RETR " + line)
                    txt += line
                else:
                    break
            res = IsisRec()
            res.parse(txt, ISIS_REC_TEXT)   # de-serialize record
            if self.dbg:
                sys.stderr.write("GOT " + res.toString())
            return res
        else:
            # return only the number of retrieved records
            for line in self.sock:
                if line != '\n':
                    if line[0] == '#':
                        inf = line.split('\t')
                else:
                    break 
            return inf[1] or 0
 
 
 
#########################################################################
# Tests
#########################################################################
 
def test():
    """
    Some tests ported from malete's demo.php. Tests involving record formatting
    have been excluded here.
    2008-03-26: Output coincides with that of the PHP demo. 
    """
 
    def section(title):
        sep = '-'*40
        print '%s\n%s\n%s' % (sep, title.upper(), sep)
 
 
    fdt = {
        'title': 24,
        'author': 70,
        'keywords': 69
    }
    db = IsisDb(fdt, 'test')
    subs = 'initial	aParis	bUnesco	b<test=foo>	c-1965'   # NOTE: this includes TABs!
    r = IsisRec(
        '-db', db,
        # first some lines from CDS, some using field names, some plain int tags
        'keywords', 'Paper on: <plant physiology><moisture><temperature><wind><measurement and instruments><ecosystems>',
        'author', 'Magalhaes, A.C.',
        24, '<The> Controlled climate in the plant chamber',
        76, 'Les Politiques de la communication en Yougoslavie	zfre',
        'author', 'Franco, C.M.',
        26, subs,
        # a field to test delete
        77, 'ave Caesar',
        # a field using tab as subfield separator
        42, "foo\tbar\tbaz",
        # a field containing newline
        99, "two\nlines",
        # a serialized record (as of toString) as parameter
        "70\tyet another author\n99two more\n99lines\na 0 field\n42\tthe\tanswer"
    )
 
    ############################################
    section('dump of record')
    ############################################
    print 'Record has %s fields' % len(r)
    print r
 
    r.delete(77)  # ... morituri te salutant
 
    ############################################
    section('embedding and TEXT mode')
    ############################################
    q = IsisRec(77, 'sunset strip')  # create a new record
    q.embed(r)                       # embed r into the new record
    s = q.toString(ISIS_REC_TEXT)
    print 'Record embedded\n\n%s\n\n' % s
    q.delete()
    # restore from the string
    q.parse(s, ISIS_REC_TEXT)
    recs = q.recs()
    r = recs[0]
    r.db = db
    print 'Record restored\n\n%s\n\n' % r
 
    ############################################
    section('set operator')
    ############################################
    r.set('title', 'new title', 'second new title')
    r.set(99, 'now a oneliner')
    r.set('author', [1, 'Blanco', 0])
    print "\n%s\n" % r
 
    ############################################
    section('Server')
    ############################################
 
    db = IsisDb(fdt, 'test', IsisServer())
    if not db.srv.sock:
        print "could not contact server"
        exit()
 
    # terms beginning with 'a'
    terms = db.terms('a')
    print "got %s terms for 'a'" % len(terms)
    #for cnt, term in [t.split('\t') for t in terms]:
    for t in terms:
        print "'%s' (%s)" % (t['key'], t['count'])
 
    # query reading records
    recs = db.query('plant water')
    print "\ngot %s records for query 'plant water'" % len(recs)
    for r in recs:
        print '%s\n' % r
 
    # query reading mfns
    query = 'plant + water + devel$'
    mfns = db.query(query, False)
    print "Query: '%s'" % query
    while mfns:
        print "got %s mfns: %s" % (len(mfns), ','.join(mfns))
        mfns = db.query(None, False)
    print 
 
    print "reading 42, 43"
    recs = db.read([42, 43])
    for r in recs:
        print "\n%s" % r
 
    print "reading 42"
    r = db.read(42)
    print "\n%s\n" % r
 
    print "writing 42"
    r.append('author', 'one more author')
    print "\n%s\n" % r
    mfns = db.write(r)
    print "wrote %s mfns: %s\n" % (len(mfns), ','.join(mfns))
 
    print "writing 42 as new record"
    r.head = ''
    mfns = db.write(r)
    print "wrote %s mfns: %s\n" % (len(mfns), ','.join(mfns))
 
    print "indexing author fields as 70 in split mode"
    idx = IsisRec()
    idx.head = 's'
    idx.set(70, r.get('author'))
    print "\n%s\n" % idx
    res = db.index(idx)
    print "got %s\n" % res
 
    print "query 'one' near 'author'"
    mfns = db.query('one .. author', False)
    print "got %s mfns: %s" % (len(mfns), ','.join(mfns))
 
 
if __name__ == '__main__':
    test()
python_y_malete.txt · Last modified: 11/02/2010 00:00 (external edit)