User Tools

Site Tools


python_isisdb

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

python_isisdb [15/05/2009 11:40]
fernando creada (movida desde namespace notas)
python_isisdb [29/08/2009 00:00]
Line 1: Line 1:
-====== Towards a common interface to WXIS and Malete ====== 
  
- 
-We have currently a couple of **IsisDb** classes in Python, one to access isis/wxis databases, another one to access malete databases. We'd like to have a single **IsisDb** class allowing us to work at a higher (more abstract) level, regardless of the underlying database. 
- 
- 
-===== IsisDb methods ===== 
- 
-This table shows a sort of "​equivalence"​ between methods in both **IsisDb** classes. Parameters are indicated in brackets. 
- 
-^wxis (xispy.py)^malete (malete.py)^preferred^ 
-|**mfnrange/​doList**[from,​ to, count]|**read**[mfn]|**read**| 
-|**search**[query,​ from, to, count, totals]|**query**[expr,​ recs] (from, to, count?)|?| 
-|**keyrange/​index**[from,​ to, count]|**terms**[from,​ to] (count?​)|**terms**| 
-|**update/​write**|**write**[rec] (+**index**[idxrec])|**write**| 
-|**update/​edit**|**read**[mfn] + [lock?]| ? | 
-|**update/​delete**|**write** [empty rec] + [lock?​]|**delete**| 
-|**fullinv/​invert**|**index** for earch record?​|**fullindex**,​ **invert**, **fullinv** ?| 
-|**status**|**write** (//long write tests existence and writeability of db//)| **status** | 
-|**unlock**| ? | ? | 
-|**create**| ? |**create**| 
-|**extract**|?​ (the same method used to obtain idxrec)| ? | 
- 
- 
-===== Technique ===== 
- 
-For an example of how to implement backend abstraction,​ see ''​__init__.py''​ in package ''​django.db'',​ and explore the ''​django.db''​ folder. 
- 
- 
-Idea for importing one of a number of options: 
- 
-<​code>​ 
-if option == '​A':​ 
-    import A as db 
-elif option == '​B':​ 
-    import B as db 
-</​code>​ 
- 
-===== Locking ===== 
- 
-How does concurrent access work in Malete? That is, what's the mechanism to avoid conflicting updates? 
- 
-See: 
-    * [[http://​malete.org/​Doc/​Protocol|Protocol > messages and data]] 
-    * [[http://​malete.org/​Doc/​MultiProcess|MultiProcess]] 
- 
- 
- 
-===== Database abstraction layers ===== 
- 
-Something to read about database abstraction layers: 
- 
-  * [[http://​www.xaprb.com/​blog/​2006/​08/​13/​four-types-of-database-abstraction-layers/​|Four types of database abstraction layers]] 
-  * [[http://​jeremy.zawodny.com/​blog/​archives/​002194.html|Database Abstraction Layers Must Die!]] 
-  * [[http://​wiki.python.org/​moin/​HigherLevelDatabaseProgramming|HigherLevelDatabaseProgramming]] (Python wiki) 
- 
- 
- 
-===== Python examples (for relational databases) ===== 
- 
-  * [[http://​www.python.org/​dev/​peps/​pep-0249/​|Python Database API Specification v2.0 (DB-API)]] 
- 
-  * [[http://​pdo.neurokode.com/​|Python Database Objects(PDO)]] 
- 
-  * [[https://​storm.canonical.com|Storm]],​ an object-relational mapper (ORM) for Python developed at Canonical. 
- 
-  * [[http://​www.sqlobject.org/​|SQLObject]] 
- 
-  * [[http://​www.sqlalchemy.org/​|SQLAlchemy]] 
- 
-  * [[http://​www.djangoproject.com/​documentation/​db-api/​|Django Database API reference]] 
- 
-See also: 
- 
-  * Core Python Programming:​ Chapter 21, Database Programming 
- 
- 
-===== Miscellaneous notes ===== 
- 
-Malete offers some extra functionality,​ and a common interface would have to sacrifice some or all of this extra stuff... or not? 
- 
-Where are the main differences?​ 
- 
-  * Query language 
-  * Indexing mechanism: FST replacement,​ char tables 
-  * Index terms (length limit): probably not a problem 
- 
-For some of those Malete features not readily available in wxis, we could attempt some emulation (e.g. filters). 
- 
-What will these methods return? As in xispy, the methods should return JSON/​dictionary objects. xispy already does this, malete.py must be modified to do the same. I think there'​s no reason to keep e.g. the TAB as a separator in a response from Malete, but I'm not completely sure yet. 
- 
- 
-Using the common API, code would be written like this: 
- 
-<code python> 
-    # ** CONFIG (uncomment the desired option) 
-    DATABASE_BACKEND = '​wxis'​ 
-    #​DATABASE_BACKEND = '​malete'​ 
-    # ** END CONFIG 
- 
-    # create a database object 
-    db = IsisDb('/​path/​to/​db'​) 
-    # create a raw query 
-    q = 'water and plants'​ 
-    # get the results 
-    res = db.search(query=q) ​ # list of records or list of mfns? total number (Isis_Total)?​ 
-    # process the results 
-    if res: 
-        print "​Found:​ %d results"​ % len(res) ​ # FIXME: len(res) may be less than total results 
-        for r in sorted(res): ​ # sorted by what? To sort the result set, we need the *whole* set 
-            # what is r? an IsisRec? Just a dictionary? 
-    else: 
-        print "No records found."​ 
-</​code>​ 
- 
-<code python> 
-def read(self, **params): 
-    """​ 
-    params (keyword arguments): start, to, count, list 
-    If list is present, the other params are ignored. 
-    Difference: Malete accepts an arbitrary sequence of mfns; wxis only adjacent mfns. 
-    So if mfns are not adjacent, wxis must be called several times, one for each interval of adjacent mfns. 
-    We have to options here: modify list.xis so that it also accepts arbitrary lists, or keep list.xis as is, and 
-    call it repeated times (less efficient!). 
-    """​ 
-    if backend == '​malete':​ 
-        return self.read(mfn) 
-    elif backend == '​wxis':​ 
-        return self.do_list(start,​ to, count) 
-        ​ 
-def search(self,​ query, **params): ​  # params: start, to, count, total 
-    """​ 
-    TODO: Check compatibility of query syntax. 
-    """​ 
-    if backend == '​malete':​ 
-        return self.query(expr=query,​ recs=) ​            # returns mfns or records 
-    elif backend == '​wxis':​ 
-        return self.search(query=,​ start=, to=, count=) ​ # returns mfns or records 
-        ​ 
-def terms(self, **params): ​  # params: start, to, count  
-    """​ 
-    Malete does not accept a "​count"​ parameter. But we can simulate it, requesting terms until the count limit is reached. 
-    """​ 
-    if backend == '​malete':​ 
-        return self.terms(start=start,​ to=to) ​          # returns [] or [t1*TAB*p1 [, t2*TAB*p2 [, ...]]  => modify the terms() method 
-    elif backend == '​wxis':​ 
-        return self.index(start=start,​ to=to, count=) ​  # returns [] or [{'​key':​t1,​ '​postings':​ p1} [, {'​key':​t2,​ '​postings':​ p2} [, ...]] 
-    ​ 
-    ​ 
-def write(self, rec):  # or a single param '​rec'​ (which is an IsisRec instance, and knows its mfn) 
-    """​ 
-    """​ 
-    if backend == '​malete':​ 
-        return self.write(rec) 
-    elif backend == '​wxis':​ 
-        mfn = rec.mfn 
-        content = rec.??? 
-        return self.write(mfn=mfn,​ content=content,​ lockid=??) 
- 
-def delete(self,​ rec): 
-    """​ 
-    Malete does not have a special support for deleting records; writing empty records has the same effect. 
-    TODO: define a delete() method in malete.py? (Or perhaps malete.py must follow very closely Malete'​s protocol, without adding 
-    these kind of convenience functions?) 
-    """​ 
-    if backend == '​malete':​ 
-        empty_rec = rec.mfn + EMPTY REC 
-        return self.write(empty_rec) 
-    elif backend == '​wxis':​ 
-        mfn = rec.mfn 
-        return self.delete(mfn=mfn,​ lockid=??) 
-</​code>​ 
- 
- 
-===== Python data structures for Isis data ===== 
- 
-What are the simplest and more "​natural"​ data structures for Isis data in Python? That is, just for storing the data, not for adding extra functionality (which requires custom objects). By "Isis data" I understand data sent from the database server (lists of records, single records, lists of index terms, status codes, error messages), or data sent to the database server (record content, parameters). The database server can be either Wxis or Malete. 
- 
-A **list of index terms** may be stored as a **tuple of 2-tuples**, e.g. 
- 
-  >>>​ terms = (('​A',​ 10), ('​AB',​ 3), ('​AG',​ 4)) 
-  >>>​ for term, count in tup: 
-  ...   print '%3d -- %s' % (count, term) 
-  ...  
-   10 -- A 
-    3 -- AB 
-    4 -- AG 
- 
-A tuple or list of dictionaries is also possible, but such verbosity seems unnecessary --- unless we want to store extra pieces of information on each term, such as detailed posting information:​ 
- 
-  >>>​ terms = ({"​term":​ "​A",​ "​count":​ 10}, {"​term":​ "​AB",​ "​count":​ 3}, {"​term":​ "​AG",​ "​count":​ 4}) 
-  >>>​ for t in terms: 
-  ...   print "%3d -- %s" % (t['​count'​],​ t['​term'​]) 
-  ...  
-   10 -- A 
-    3 -- AB 
-    4 -- AG 
- 
- 
-A **record** is basically a list of fields (including an optional leader), plus a record id or MFN. Depending on the context, a record may be considered mutable or immutable, so the list of fields may be respectively stored as a list or a tuple. Each field is a 2-tuple of the form (tag, value). Some alternatives are: 
- 
-  >>>​ r = [(0, '​Leader'​),​ (10, 'An author'​),​ (20, 'Some title'​),​ (30, 'Pub. date'​)] 
-  ​ 
-  >>>​ r = ('​002345',​ [(0, '​Leader'​),​ (10, 'An author'​),​ (20, 'Some title'​),​ (30, 'Pub. date'​)]) 
-  ​ 
-  >>>​ r = {"​record_id":​ "​002345",​ "​fields":​ [(0, '​Leader'​),​ (10, 'An author'​),​ (20, 'Some title'​),​ (30, 'Pub. date'​)]} 
-  ​ 
-  >>>​ r = {"​record_id":​ "​002345",​ "​leader":​ "​foobar",​ "​fields":​ [(10, 'An author'​),​ (20, 'Some title'​),​ (30, 'Pub. date'​)]} 
- 
- 
-A **list of records** can be stored as a tuple, or as a list in case we need to modify it (e.g. change its order): 
- 
-  >>>​ results = (rec_1, rec_2, ...., rec_n) 
-  >>>​ results = [rec_1, rec_2, ...., rec_n] 
- 
-Another useful object is a **query**, which would be more abstract, since it encapsulates both the //query expression//​ used to retrieve a set of records, and the //list of retrieved records// themselves. (See Django'​s //​QuerySet//​ class.) 
- 
-===== Perl and CDS/ISIS ===== 
- 
- 
-See http://​search.cpan.org/​~dpavlin/​Biblio-Isis-0.24/​lib/​Biblio/​Isis.pm 
- 
-<​file>​ 
-Biblio::​Isis - Read CDS/ISIS, WinISIS and IsisMarc database 
-  ​ 
-This module will read ISIS databases created by DOS CDS/ISIS, WinIsis or IsisMarc. It can be used as perl-only 
-alternative to OpenIsis module which seems to depriciate it's old XS bindings for perl. 
- 
-It can create hash values from data in ISIS database (using to_hash), ASCII dump (using to_ascii) or just hash  
-with field names and packed values (like ^asomething^belse). 
- 
-Unique feature of this module is ability to include_deleted records. It will also skip zero sized fields (OpenIsis ​ 
-has a bug in XS bindings, so fields which are zero sized will be filled with random junk from memory). 
- 
-It also has support for identifiers (only if ISIS database is created by IsisMarc), see to_hash. 
- 
-This module will always be slower than OpenIsis module which use C library. However, since it's written in perl,  
-it's platform independent (so you don't need C compiler), and can be easily modified. I hope that it creates data  
-structures which are easier to use than ones created by OpenIsis, so reduced time in other parts of the code  
-should compensate for slower performance of this module (speed of reading ISIS database is rarely an issue). 
-</​file>​ 
- 
-{{tag>​python isis malete}} 
python_isisdb.txt ยท Last modified: 29/08/2009 00:00 (external edit)