====== Acceso a bases Malete desde Python ======
Luego de haber logrado trabajar con bases CDS/ISIS desde Python, vía WXIS, ahora queremos hacer algo similar con bases Malete.
===== Tareas pendientes =====
==== Ajustes en el código ====
* revisar si tag debe ser int (qué pasa si es string, '245'?). **ATENCION**: hay un serio problema si tag es de tipo numérico: un tag que comience con //0//, e.g. //020//, será interpretado por Python como un número octal (en este caso, 16). O bien, si los dígitos están fuera del rango 0-7, se producirá un error, e.g. //090//. Por lo tanto, para poder trabajar tranquilos con tags que tengan ceros iniciales, parece que estamos obligados a usar strings. Pymarc 2.0 usa strings para los tags. (Por suerte, [[http://www.python.org/dev/peps/pep-3127/|Python 3.0 solucionaría este problema]].)
* if self.db and self.db.fdt and (tag in self.db.fdt): ver qué pasa si fdt[tag] no existe
* IsisServer.request: cambiar el orden de "if numOnly == 0:" ?
* def parse(self, text, repl=None): indicar return value
* Revisar las llamadas a funciones con parámetros "None": usar keywords. Ejemplo: req('W', None, None, algo)
* Añadir docstring a metodos.
==== Otras cuestiones ====
* Ver qué deben devolver los métodos: JSON/diccionarios
* Excepciones? E.g. base no se puede escribir.
* Base no existente?
* Los métodos que leen registros o mfns (read, query) deberían devolver la lista completa, no de a 20.
* Ver qué se puede hacer para simplificar la indización, incluyendo una opción para que se efectúe en forma automática luego de una grabación. Tal vez el método write() deba tener un parámetro ''index'' que por defecto valga ''True'', para activar indización post-grabación, usando una "fst" asociada a la base de datos. Análogamente, un método delete() tendría un parámetro ''index'' con default ''True'' para eliminar del índice las claves correspondientes. Ver borrado de claves viejas al re-indizar un registro ya existente. Código de muestra:
def fst(r, delete=False):
idx = ()
if delete:
idx = (0, 'd')
idx.append(
0, 's',
245, r.get(245),
0, 'f',
100, r.get(100),
)
return idx
r = IsisRec(...)
idx = IsisRec(fst(r))
db.write(r)
db.index(idx)
db.delete(r)
idx = IsisRec(fst(r, True))
db.index(idx)
* Acceso concurrente a un registro, bloqueo.
===== Primer intento: sockets =====
Este es un primer intento de comunicación entre Python y Malete:
"""
Connects to malete server, sending queries.
Usage: python sock.py
Based on
* Example 16.2. TCP Timestamp Client (tsTclnt.py) from Core Python Programing, 2nd ed.
* Tutorial on Network Programming with Python
"""
from socket import *
def main():
sock = socket(AF_INET, SOCK_STREAM)
sock.connect(('127.0.0.1', 2042))
flo = sock.makefile('r', 0) # flo = file-like object
# indicates that no more data is coming
END_OF_RESPONSE = '\n'
while True:
query = raw_input('> Query: ')
if not query:
break
msg = 'test.Q\t%s\n\n' % query
sock.send(msg)
for line in flo: # is the file-like object automatically reset for every query?
print line, # should we print also the last, empty line?
if line == END_OF_RESPONSE:
break # no more lines to read
sock.close()
if __name__ == "__main__":
main()
===== Port del código PHP de Malete =====
Versión de: 2008-03-28
# coding=utf-8
"""
malete
A module for accessing Malete databases.
This is essentially a Python port of the original PHP code included with
the Malete distribution. See http://malete.org/Doc/DownLoad
MIT License
(c) 2008 Fernando J. Gómez / INMABB / Conicet
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.
"""
# FIELD mode replaces newlines with tabs.
# On deserializing, these tabs are not converted back to newline.
# Do not use if you need to retain newline information.
ISIS_REC_FIELD = '\t' # ASCII Tab
# TEXT mode replaces newlines with vertical tabs.
# Vertical tabs are converted back to newlines only when explicitly
# deserializing in TEXT mode, since it's not transparent to binary data.
ISIS_REC_TEXT = '\v' # ASCII Vertical Tab (VT)
# PHP has a strspn() function; this is an implementation in Python.
# Source: http://mail.python.org/pipermail/python-list/2003-November/237085.html
import re
def strspn(s, t):
# kinda slow way to construct the pattern, but it does correctly
# handle the '-' and ']' cases. Usually one would write the regexp
# directly and not try to get close to the C API.
pat = re.compile(
"(" + "|".join(map(re.escape, t)) + ")*"
)
m = pat.match(s)
if not m:
return 0
return m.end()
class IsisRec():
"""
An ISIS(/IIF/Z39.2/ISO2709)-style record in pure Python.
This is only loosely connected to an Isis Database,
most functions can be used without having a DB.
"""
def __init__(self, *args):
"""
Parameters:
tag, value[, tag, value [...]]
Example:
r = malete.IsisRec(
10, 'Value for field 10',
20, 'Value for field 20'
)
"""
self.db = 0
self.mfn = 0
self.head = ''
self.tag = []
self.val = []
if args:
self.add(args) # FIXME: args is a tuple, should be splitted
def __len__(self):
"""Counts the fields."""
return len(self.tag)
def __str__(self):
return '--\n%s--' % self.toString()
def fdt(self, tag):
"""
Tries to lookup non-numeric tags in the fdt.
Parameters:
tag (int) A numeric tag.
"""
if not isinstance(tag, int):
if self.db and self.db.fdt and (tag in self.db.fdt):
tag = self.db.fdt[tag]
return tag
def get(self, tag):
"""
Gets all values for a tag as a list.
FIXME: tags with leading zeros are treated as octal, e.g.
>>> tag = 020
>>> tag
16
>>> print 0101
65
How can this situation be detected?
Parameters:
tag (int) A numeric tag.
"""
tag = self.fdt(tag)
values = [v for (t, v) in zip(self.tag, self.val) if t == tag]
return values
def recs(self, db=None):
"""
Returns a list of subrecords.
Parameters:
db (Optional) A database, so that records know which db they belong to.
"""
ret = []
# clone lists, so we can use pop() safely
tag = list(self.tag)
val = list(self.val)
while tag:
t, v = tag.pop(0), val.pop(0)
if t < 0: # negative tag => -(number of fields in record)
# create a new record
r = IsisRec()
r.db = db
r.head = v
# TO-DO: r.mfn ??
i = -int(t) - 1
# add next i fields to the new record
while i > 0 and tag:
i -= 1
t, v = tag.pop(0), val.pop(0)
r.tag.append(t)
r.val.append(v)
#print '%s -- %s' % (t, v)
ret.append(r)
return ret
def append(self, tag, val):
"""
Appends a new field (tag-value pair) to the end of the record.
TO-DO: check use of isinstance() in Python
FIXME - is_numeric()
Parameters:
tag (int) A numeric tag.
val The field's value.
"""
if not isinstance(tag, int):
tag = self.fdt(tag)
# echo "0\tappending $tag ",gettype($val),"\n"
if isinstance(val, str) or isinstance(val, int): # or is_numeric(val)
self.tag.append(tag)
self.val.append(val)
elif isinstance(val, list):
for v in val:
self.append(tag, v)
elif isinstance(val, object):
self.embed(val)
return val
def add(self, *args):
"""
Adds a list to the record.
Returns the number of added fields.
See docs at Rec.php.
Parameters:
args A list of the form [tag, value[, tag, value[...]]]
Example:
rec.add([100, 'Field 100', 200, 'Field 200'])
"""
added = 0
fdt = self.db and self.db.fdt or None
# line omitted here
args = list(args[0]) # FIXME (tuples vs. lists) --- this works when called from __init__, but not in general
while args:
i = args.pop(0)
#print i
if isinstance(i, int):
if not self.append(i, args.pop(0)) is None:
added += 1
elif isinstance(i, list):
added += self.add(i) # recursive add
elif i == '-mfn':
self.mfn = args.pop(0)
elif i == '-db':
self.db = args.pop(0)
fdt = self.db.fdt
elif fdt and i in fdt and isinstance(fdt[i], int):
if self.append(fdt[i], args.pop(0)) is not None:
added += 1
elif i == ISIS_REC_TEXT:
added += self.parse(args.pop(0), ISIS_REC_TEXT)
else:
added += self.parse(i)
return added # NOTE: not in Rec.php
def pack(self):
pass # pack is not needed in Python, since del() also shifts indices, leaving no 'holes'.
def rm(self, pos):
"""
Removes a field at the given pos.
Parameters:
pos (int) The position (index) to remove.
"""
del self.tag[pos]
del self.val[pos]
def delete(self, tag=None):
"""
Removes all fields, or all fields with a given tag.
Note: We use 'delete' since 'del' is a reserved keyword in Python.
Parameters:
tag (Optional) Tag to be removed; if not present, all fields are
removed.
"""
if tag is None:
self.tag = []
self.val = []
else:
if not isinstance(tag, int):
tag = self.fdt[tag]
for i, t in enumerate(self.tag):
if t == tag:
self.rm(i)
def set(self, tag, *values):
"""
Sets fields with tag to values.
TO-DO: if only tag is given, with no values, it behaves like delete(tag).
Is this correct?
Parameters:
tag (int) A numeric tag.
values One or more values. See docs in Malete's Rec.php.
"""
if not isinstance(tag, int):
tag = self.fdt(tag)
ary = None
# isolate those indices in self.tag associated to tag, e.g. if there are 3 occs of tag '700'
# in positions 6, 7, 9, then tag_positions = [6, 7, 9]
tag_positions = [i for i, v in enumerate(self.tag) if v == tag]
values = list(values) # make the tuple a list
while True:
# First step: get the next value to set/add
if ary: # ary non empty
value = ary.pop(0)
#print "ary.pop(0): %s" % value
#if not ary: # the list is now empty
# ary = None
# continue
else:
if not values:
break
value = values.pop(0)
if isinstance(value, list):
ary = value
continue
#print "setting '%s'" % value
# Second step: do something using the value
# if value is an integer, it has an special meaning
if isinstance(value, int):
#print 'integer value: %s' % value
# if value is the integer 0, processing stops (i.e. remaining occurrences are left unchanged)
if not value:
#self.display()
return
# if value is a positive integer n, processing skips n occurrences (letting them unchanged)
#print 'value: %s' % value
for i in range(value):
if tag_positions:
tag_positions.pop(0)
continue
# now value is finally a value to set/add
#print "setting '%s'" % value
if tag_positions:
# the first len(values) occurrences are set to the provided values
self.val[tag_positions.pop(0)] = value
continue
# if there are less than len(values) occurrences, the remaining values are appended
self.append(tag, value)
# if there are more than len(values) occurrences, the remaining occurrences are deleted
# NOTE: after each call to self.rm() indices in self.tag are shifted (towards 0), and thus tag_positions is not what we need.
# To avoid this problem, loop in reversed order.
for i in reversed(tag_positions):
#print 'removing pos. ' + str(i)
self.rm(i)
#self.display()
def embed(self, other_rec):
"""
Transparently embeds a record.
Used from write() in IsisDb.
Parameters:
other_rec IsisRec
"""
i = len(other_rec)
self.append(-i-1, other_rec.head)
for t, v in zip(other_rec.tag, other_rec.val):
self.tag.append(t)
self.val.append(v)
i -= 1
if i == 0:
break
def toString(self, mode=ISIS_REC_TEXT):
"""
Serializes record to a string.
Parameter:
mode replacement value for newlines
"""
s = ''
if len(self.head): # is it enough with "if self.head" ?
if '0' <= self.head[0] <= '9':
s += "W\t"
s += self.head + '\n'
for t, v in zip(self.tag, self.val):
s += '%s\t%s\n' % (t, str(v).replace('\n', mode)) # str() because v may be numeric
return s
def parse(self, text, repl=None):
"""
Parses a string representation of a record. Returns ??
Parameters:
text
repl String to be converted back to newlines. Use ISIS_REC_TEXT,
if you know text is from toString(ISIS_REC_TEXT)
"""
# need compact array in order to reliably know last index
lines = text.split("\n")
if lines and len(lines[0]):
line = lines[0]
if not '0' <= line[0] <= '9':
self.head = line
lines.pop(0)
for conv,line in enumerate(lines):
if '' == line: # blank line or trailing newline
continue
dig = strspn(line, '0123456789-')
t = dig and int(line[:dig]) or 0
o = ("\t" == line[dig])
v = line[dig+o:]
if repl:
v = v.replace(repl, "\n")
self.tag.append(t)
self.val.append(v)
return conv
class IsisDb():
"""
This class represents a "database". It has a method for each of the standard
Malete messages for databases: write, read, query, index, and terms.
"""
def __init__(self, fdt=None, name=None, server=None):
self.fdt = fdt
self.name = name
self.srv = server
def req(self, type, arg, emb=None, lst=None, ct=0):
"""
Internal helper to construct and send a request.
Parameters:
type The type of message (R, W, Q, T, X)
arg Arguments to be added to the request's header
emb A list of IsisRecs to be embedded in the request's body
lst A list of parameters, to be added to the request's body as fields with tag 0
ct numOnly?
"""
req = IsisRec()
req.head = '%s.%s' % (self.name, type)
if arg:
req.head += '\t' + arg
if emb:
#print 'emb:', emb
for r in emb:
req.embed(r)
if lst:
for l in lst:
req.append(0, l)
#print "req:\n%s" % req
return self.srv.request(req, ct)
def query(self, expr=None, recs=True):
"""
Parameters:
expr If None, fetch more results from previous query
recs If True, fetch a list of records, else of mfns
"""
if expr and recs and '?' not in expr:
expr += '?' # force fetch records
ret = self.req('Q', expr) # ret is an IsisRec instance
return recs and ret.recs(self) or ret.get(0)
def read(self, mfn):
"""
Read one or a list of mfns.
Returns one or a list of records.
Parameters:
mfn a single mfn, or a list of mfns
"""
if isinstance(mfn, list): # is mfn a list?
ret = self.req('R', None, None, mfn)
return ret.recs(self)
else:
#ret = self.req('R', None, None, list(mfn))
ret = self.req('R', str(mfn))
recs = ret.recs(self)
return recs[0]
def terms(self, start, to=None):
if to is not None:
start += '\t' + to
ret = self.req('T', start)
#return ret.get(0)
raw_list = ret.get(0) # ["Count1\tTerm1", "Count2\tTerm2", ...]
r = []
for t in raw_list:
data = t.split('\t')
r.append({'key': data[1], 'count': data[0]})
return r
def write(self, rec):
"""
Writes one or a list of records.
Returns a list of mfns written.
WARNING: check write permissions on the database files.
Parameters:
rec a single IsisRec, or a list of IsisRecs
"""
if not isinstance(rec, list):
rec = list((rec,)) # make a list from a single element
ret = self.req('W', None, rec)
return ret.get(0)
def index(self, req):
"""
Unlike the other methods, this expects 'req' to be a prepared X request.
However, name.X is prepended.
Returns res.head, which should be a comment.
"""
pfx = self.name + '.X'
if req.head:
req.head = pfx + '\t' + req.head
else:
req.head = pfx
res = self.srv.request(req)
return res.head
class IsisServer():
"""
This class represents the connection to an Isis server.
In general, a server is any object having a request function,
accepting a single IsisRec parameter and returning an IsisRec.
This implementation is based on a TCP or UNIX socket.
See:
* Example 16.2. TCP Timestamp Client (tsTclnt.py) from Core Python Programing, 2nd ed.
* Tutorial on Network Programming with Python
* Socket Programming HOWTO
"""
def __init__(self, host=None, port=2042, pers=0):
if not host:
import os
if 'ISIS_SERVER' in os.environ:
host = os.environ['ISIS_SERVER']
else:
host = 'localhost'
self.host = host
self.port = port
self.pers = pers # persistent connection (in Python?)
self.dbg = False
self.open()
def open(self):
# Persistence??
import socket
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
try:
sock.connect((self.host, self.port))
except socket.error:
print 'Error connecting to the Malete server. Check that it is running.'
self.sock = None
else:
self.sock = sock.makefile('w', 0) # file object associated with the socket
return self.sock
def request(self, req, numOnly=0):
if not self.sock and not self.open():
return None
if self.dbg:
sys.stderr.write("SEND\n" + req.toString(ISIS_REC_TEXT)) # toString: serializes record
self.sock.write(req.toString(ISIS_REC_TEXT) + "\n")
#self.sock.flush() needed??
txt = ''
if numOnly == 0:
# return the retrieved records
for line in self.sock:
if line != '\n':
if self.dbg:
sys.stderr.write("RETR " + line)
txt += line
else:
break
res = IsisRec()
res.parse(txt, ISIS_REC_TEXT) # de-serialize record
if self.dbg:
sys.stderr.write("GOT " + res.toString())
return res
else:
# return only the number of retrieved records
for line in self.sock:
if line != '\n':
if line[0] == '#':
inf = line.split('\t')
else:
break
return inf[1] or 0
#########################################################################
# Tests
#########################################################################
def test():
"""
Some tests ported from malete's demo.php. Tests involving record formatting
have been excluded here.
2008-03-26: Output coincides with that of the PHP demo.
"""
def section(title):
sep = '-'*40
print '%s\n%s\n%s' % (sep, title.upper(), sep)
fdt = {
'title': 24,
'author': 70,
'keywords': 69
}
db = IsisDb(fdt, 'test')
subs = 'initial aParis bUnesco b c-1965' # NOTE: this includes TABs!
r = IsisRec(
'-db', db,
# first some lines from CDS, some using field names, some plain int tags
'keywords', 'Paper on: ',
'author', 'Magalhaes, A.C.',
24, ' Controlled climate in the plant chamber',
76, 'Les Politiques de la communication en Yougoslavie zfre',
'author', 'Franco, C.M.',
26, subs,
# a field to test delete
77, 'ave Caesar',
# a field using tab as subfield separator
42, "foo\tbar\tbaz",
# a field containing newline
99, "two\nlines",
# a serialized record (as of toString) as parameter
"70\tyet another author\n99two more\n99lines\na 0 field\n42\tthe\tanswer"
)
############################################
section('dump of record')
############################################
print 'Record has %s fields' % len(r)
print r
r.delete(77) # ... morituri te salutant
############################################
section('embedding and TEXT mode')
############################################
q = IsisRec(77, 'sunset strip') # create a new record
q.embed(r) # embed r into the new record
s = q.toString(ISIS_REC_TEXT)
print 'Record embedded\n\n%s\n\n' % s
q.delete()
# restore from the string
q.parse(s, ISIS_REC_TEXT)
recs = q.recs()
r = recs[0]
r.db = db
print 'Record restored\n\n%s\n\n' % r
############################################
section('set operator')
############################################
r.set('title', 'new title', 'second new title')
r.set(99, 'now a oneliner')
r.set('author', [1, 'Blanco', 0])
print "\n%s\n" % r
############################################
section('Server')
############################################
db = IsisDb(fdt, 'test', IsisServer())
if not db.srv.sock:
print "could not contact server"
exit()
# terms beginning with 'a'
terms = db.terms('a')
print "got %s terms for 'a'" % len(terms)
#for cnt, term in [t.split('\t') for t in terms]:
for t in terms:
print "'%s' (%s)" % (t['key'], t['count'])
# query reading records
recs = db.query('plant water')
print "\ngot %s records for query 'plant water'" % len(recs)
for r in recs:
print '%s\n' % r
# query reading mfns
query = 'plant + water + devel$'
mfns = db.query(query, False)
print "Query: '%s'" % query
while mfns:
print "got %s mfns: %s" % (len(mfns), ','.join(mfns))
mfns = db.query(None, False)
print
print "reading 42, 43"
recs = db.read([42, 43])
for r in recs:
print "\n%s" % r
print "reading 42"
r = db.read(42)
print "\n%s\n" % r
print "writing 42"
r.append('author', 'one more author')
print "\n%s\n" % r
mfns = db.write(r)
print "wrote %s mfns: %s\n" % (len(mfns), ','.join(mfns))
print "writing 42 as new record"
r.head = ''
mfns = db.write(r)
print "wrote %s mfns: %s\n" % (len(mfns), ','.join(mfns))
print "indexing author fields as 70 in split mode"
idx = IsisRec()
idx.head = 's'
idx.set(70, r.get('author'))
print "\n%s\n" % idx
res = db.index(idx)
print "got %s\n" % res
print "query 'one' near 'author'"
mfns = db.query('one .. author', False)
print "got %s mfns: %s" % (len(mfns), ','.join(mfns))
if __name__ == '__main__':
test()
{{tag>python malete}}