====== Libros electrónicos de la BECYT ======

En julio de 2011, en la Biblioteca del INMABB incorporamos al catálogo los registros de cerca de 1000 libros electrónicos de matemática de la editorial Springer, que son parte de la Biblioteca Electrónica de Ciencia y Tecnología, BECYT ([[http://inmabb-conicet.gob.ar/biblioteca/noticias/incorporamos-libros-electronicos-al-catalogo|ver noticia]]).

En este documento se describen los procedimientos seguidos y se indican algunas cuestiones a resolver.


===== Análisis de los registros =====


Análisis de registros MARC de la colección de ebooks de Springer en la BECYT.

Fernando Gómez, julio de 2011.


Documentación de posible utilidad:

  * [[http://www.slideshare.net/dok205/batch-cataloging-a-case-study|Batch Cataloging : a Case Study of Loading Records for Lecture Notes in Computer Science]]
  * Provider-neutral ebook records (J. Rochkind), http://bibwild.wordpress.com/2010/09/07/further-adventures-in-provider-neutral-e-book-bibs/, http://bibwild.wordpress.com/2010/08/26/provider-neutral-ebook-records-help/. Trata sobre un problema muy específico, pero igualmente interesante porque permite ver a qué tipo de problemas se enfrenta la gente al importar lotes de registros provistos por un vendedor.


==== Obtención de los registros ====

Archivo zip enviado por Marcio Gama, Licensing Manager Brazil, Springer+Business and Media Inc., el día 7 de julio de 2011. Según me indicó, corresponden a los libros adquiridos por la BECYT.


==== Procesamiento ====

zip > mrc > mkr > id > isis (cuasi Catalis)

Para hacer esta conversión uso un par de herramientas en Python. La librería pymarc:

  http://sourceforge.net/projects/pymarc/files/latest/download

y el script [[mkr2cat.py|mkr2cat.py]].

Ejecuto estos comandos y obtengo una base isis con los registros:

  python /ruta/al/archivo/MarcBreaker.py < foo.mrc > foo.mkr
  python /ruta/al/archivo/mkr2cat.py foo
  id2i foo.id create=foo


Leader: uso campos 1000 + offset (1005, 1006, etc.) para no interferir con campos 9xx de los registros originales.

El conversor de mkr a id (mkr2cat.py) se encarga de realizar el cambio de codificación de caracteres de utf-8 a latin1.

IMPORTANTE: limpiar espacios sobrantes! Ver p.ej. campo 950.


==== Análisis ====


=== Codificación de caracteres ===
    
Todos tienen 'a' en LDR/09 (es decir, Unicode).

  $ grep '=LDR' springer.mkr | awk '{print substr($0, 16, 1) }' | grep -v 'a'  # no devuelve nada

  $ file springer.mkr 
  springer.mkr: UTF-8 Unicode English text, with very long lines

=== mxf0 ===

Algo falla en mi PC al generar la tabla html, pues sale con basura en algunas filas. En el servidor de BC sale bien.

Ver tabla en http://catalis.uns.edu.ar/becyt-springer/springer-mxf0.html


=== Observaciones sobre los campos ===

== LDR (cabecera) ==

Las posiciones 05-11 son las mismas en todos los registros:

  $ grep LDR springer.mkr | awk '{print substr($0,12,7)}' | sort | uniq 
  nmm a22

  * LDR/05 (Record status): n - New
  * LDR/06 (Type of record): m - Computer file
  * LDR/07 (Bibliographic level): m - Monograph/Item
  * LDR/09 (Character coding scheme): a - UCS/Unicode 
   
Las posiciones 17-23 también son las mismas:

  $ grep LDR springer.mkr | awk '{print substr($0,24,7)}' | sort | uniq 
  5u 4500

  * LDR/17 (Encoding level): 5 - Partial (preliminary) level
  * LDR/18 (Descriptive cataloging form): u - Unknown
  * LDR/19 (Multipart resource record level): # - Not specified or not applicable

Nótese que en LDR/18 no hay referencia a AACR 2.
   

== Campo 001 ==

Usan como nro. de control el ISBN-13. No hay duplicados.

  $ mx springer "pft=v1/" now | wc -l
  9810
  $ mx springer "pft=v1/" now | sort | uniq | wc -l
  9810


== Campo 003 ==

Todos iguales, contienen el texto 'Springer'
  $ mx springer "pft=v3/"  now | wc -l
  9810
  $ mx springer "pft=v3/" now | sort | uniq
  Springer


== Campo 005 ==

Este es el rango de valores:

  $ mx springer "pft=v5/" now | sort | head -n 1
  20070420181308.3
  $ mx springer "pft=v5/" now | sort | tail -n 1
  20110626112017.0

TODO: contar frecuencia de cada año 2007-2011

La tabla del mxf0 sugiere que hay algunas basuritas:

  $ mx springer "pft=if size(v5) <> 16 then mfn,x3,v5/ fi" now
  000663   20090320210707.452
  000893   20080412121406.372
  001931   20080726091820.791
  002484   20081211185036.775
  002815   20090202084339.210
  003520   20090307113523.91
  004276   20080205134805.232
  008189   20080805073649.192
  008690   20080414162129.764
  009096   20080630155607.776
  009221   20080825054210.164


== Campo 007 ==

Todos iguales, contienen la cadena "''cr nn 008mamaa''".

  $ mx springer "pft=v7/" now | wc -l
  9810
  $ mx springer "pft=v7/" now | sort | uniq
  cr nn 008mamaa

Docs: http://www.loc.gov/marc/bibliographic/bd007.html, http://www.loc.gov/marc/bibliographic/bd007c.html

  * 007/00: c - Electronic resource
  * 007/01: r - Remote (Specific material designation)
  * 007/02: # - Undefined
  * 007/03: n - Not applicable (Color)
  * 007/04: n - Not applicable (Dimensions)
  * 007/05: # - No sound (silent)
  * 007/06-08: 008 - Exact bit depth (Image bit depth)
  * 007/09: m - Multiple file formats (File formats)
  * 007/10: a - Absent (Quality assurance target(s))
  * 007/11: m - Mixed (Antecedent/Source)
  * 007/12: a - Uncompressed (Level of compression)
  * 007/13: a - Access (Reformatting Quality)


== Campo 008 ==

TODO: contar frecuencia para 008/00-01

008/06: "s" en todos

  $ mx springer "pft=v008*6.1/" now | sort | uniq
  s

008/07-10: fecha 1

  $ mx springer "pft=v008*7.4/" now | sort | uniq -c
  3048 2005
  3335 2006
  3427 2007

Comparo con los valores en 260$c:

  $ mx springer "pft=v260^c/" now | sort | uniq -c
       12 2005
     3036 2005.
     3335 2006.
        2 2007
     3425 2007.

008/11-14: fecha 2

  $ mx springer "pft=v008*11.4/" now | sort | uniq
  \\\\

008/15-17: lugar

  $ mx springer "pft=v008*15.3/" now | sort | uniq
  xx\

008/18-34: computer files, ver http://www.loc.gov/marc/bibliographic/bd008c.html

008/26 (Type of computer file): j - Online system or service

  $ mx springer "pft=v008*18.17/" now | sort | uniq
  \\\\\\\\j\\\\\\\\

008/35-37: idioma

  $ mx springer "pft=v008*35.3/" now | sort | uniq -c
  9578 eng
   124 fre
   106 ita
     1 por
     1 spa

008/39:

  $ mx springer "pft=v008*38.2/" now | sort | uniq 
   d


== Campo 020 ==

Todos tienen el ISBN-13. El campo no se repite.

TODO: validar los ISBN.


== Campo 100 ==

Es llamativo que de 9810 registros haya 9800 con campo 100. Esa proporción de 0.999 no es normal en colecciones de textos científicos. Por ejemplo, en la base de libros del INMABB, de 766 registros con 'Springer' en el campo 260, hay 670 con campo 100, es decir una proporción de 0.875. Como puede comprobarse mirando el 245$c, en muchos casos se trata de editores que debieran (de acuerdo con AACR2) ir a un campo 700.

Ejemplos de la falta de control de autoridad y del uso de mayúsculas:

  1#^aAbbass, Hussein.
  1#^aAbbass, Hussein A.
  
  1#^aAbidi, Mongi.
  1#^aAbidi, Mongi A.
  
  1#^aAcuña, Silvia T.
  1#^aAcuña, Silvia Teresita.
  
  1#^aALBER, YAKOV.
  
  1#^aAsama, Hajima.
  1#^aAsama, Hajime.
  
  1#^aBADER, MARKUS.
  
  1#^aBAINBRIDGE, WILLIAM SIMS.
  
  1#^aBourbaki.
  1#^aBourbaki, N.
  
  1#^aFienberg, Stephen.
  1#^aFienberg, Stephen E.


== Campo 110 ==

Los 10 registros sin campo 100 tienen un 110.

TODO: mostrarlos.


== Campo 245 ==

Todos tienen ''^h[electronic resource]''

Se usan mayúsculas para todas las palabras. Ejemplos:

  Physics and Radiobiology of Nuclear Medicine
  The History of Approximation Theory
  Introduction to Symplectic Dirac Operators


== Campo 250 ==

En este campo resulta sencillo comprobar que no se siguen las prescripciones de AACR2

  $ mx springer "pft=v250^a/" lw=250 now | sort | uniq

Una pequeña muestra de la salida:

  2nd.
  2nd arranged and supplemented edition.
  2nd corrected and augmented edition.
  2nd ed.
  2nd edition.
  2nd Edition.
  2nd extended and revised edition.
  2nd Revised and Enlarged Edition.
  2nd, Revised and Enlarged Edition.
  2nd revised and extended edition.
  2nd revised Edition.
  2nd Revised Edition.


== Campo 260 ==

Hay de todo, a veces lejos de AACR2. Ejemplos de desviaciones:

  ##^aBerkeley, CA :^bAndy Budd, Simon Collison, Chris J. Davis, Michael Heilemann,
  John Oxton, David Powers, Richard Rutter, Phil Sherry,^c2006.
    
  ##^aBasel :^bBirkhäuser Verlag, P.O. Box 133, CH-4010 Basel, Switzerland,^c2005.
    
  ##^aBerlin, Heidelberg :^bMax-Planck-Gesellschaft zur Förderung der Wissenschaften
  e.V., to be exercised by Max-Planck-Institut für ausländisches öffentliches Recht
  und Völkerrecht, Heidelberg,^c2007.
    
  ##^aDordrecht :^bH. S. Ching, P. W. T. Poon and C. McNaught,^c2006.
    
  ##^aBerlin, Heidelberg :^bSpringer-Verlag Berlin · Heidelberg,^c2005.


== Campo 300 ==

Acá todo es sencillo (¿pero basado en qué norma?):

  $ mx springer "pft=v300/" lw=250 now | sort | uniq
  ##^bv.: digital


== Campo 440 ==

Otro caso de normalización ausente. Ejemplo:

  $ mx springer "pft=v440/" lw=250 now | grep -i MEMS | sort | uniq
  #0^aMicrotechnology and Mems,^x1615-8326
  #0^aMicrotechnology and MEMS,^x1615-8326
  #0^aMicrotechnology And Mems,^x1615-8326


== Campo 650 ==

Docs: http://www.loc.gov/marc/bibliographic/bd650.html

Hay 87.497 ocurrencias en 9.810 registros: promedio de casi 9 términos por registro!

Comencemos mirando los indicadores:

  $ mx springer "pft=(v650.2/)" now | sort | uniq -c
    39929 #0
     9810 14
    37758 24

El primer indicador es "Level of subject", y estos son los valores usados:

  * # - No information provided
  * 1 - Primary
  * 2 - Secondary 

El segundo indicador es "Thesaurus", y estos son los valores usados:

  * 0 - Library of Congress Subject Headings
  * 4 - Source not specified 

Una característica llamativa es el uso de múltiples términos en una misma ocurrencia del campo. Ejemplos:

  Popular Science in Mathematics/Computer Science/Natural Science/Technology
  Quantum Optics, Quantum Electronics, Nonlinear Optics
  Roman Law/Law History/Canon Law
  Theoretical, Mathematical and Computational Physics
  Tribology, Corrosion and Coatings
  Vibration, Dynamical Systems, Control
  Waste Management/Waste Technology
  Waste Water Technology / Water Pollution Control / Water Management /

Un mismo descriptor puede aparecer más de una vez en un registro, con diferentes indicadores:

  $ mx springer "pft=if v650:'0^aPhysics' and v650:'4^aPhysics' then mfn/ fi" now | wc -l
  765

  $ mx springer from=9679 count=1 "pft=(v650/)" | grep "\^aPhysics"
  #0^aPhysics
  14^aPhysics

Hay algunas ocurrencias del 650 sin indicadores ni subcampos:

  $ mx springer "pft=(v650/)" now | sort | uniq -c | tail
  
   93 Aquatic Pollution
   63 Geosciences
   33 Policy, and Law
    6 Sciences

Este es un caso raro, no lo miré bien:

  33 and Law  (tiene que ver con el de arriba?)

Otros errores:

   53 ^aText processing (Computer science
    1 ^aText processing (Translators (Computer programs)


== Campo 700 ==

Sin comentarios. Escribir luego algo en forma conjunta con el 100.


== Campo 710 ==

Presente en todos los registros. Sólo se usa para nombrar a Springer:

  $ mx springer "pft=(v710/)" now | wc -l
  9810

  $ mx springer "pft=(v710/)" now | sort | uniq 
  2#^aSpringerLink (Online service)


== Campo 773 ==

Este es un caso notable, pues el uso del campo no parece tener relación con su definición en MARC 21. ¿O se me escapa algo?

  $ mx springer "pft=(v773/)" now | sort | uniq -c
    51 0#^tSpringer e-books
  9759 0#^tSpringer eBooks

    
== Campo 856 ==

Presente una vez en cada registro. Aquí no esperaba sorpresas, pero me llama la atención que no sean 9810 valores diferentes, es decir, que hay registros que comparten el mismo 856.

  $ mx springer "pft=(v856/)" now | wc -l
  9810

  $ mx springer "pft=(v856/)" now | sort | uniq | wc -l
  9784

¿Tenemos duplicados en algún sentido? Ejemplo:

  $ mx springer from=1931 "pft=v1/v20/v245/v250/v260/v856/"
  978-0-387-76579-2
  ##^a9780387713137
  10^aCultural Heritage and Human Rights^h[electronic resource] /^cedited by 
  Helaine Silverman, D. Fairchild Ruggles.
  ##^aNew York, NY :^bSpringer Science + Business Media, LLC,^c2007.
  40^uhttp://dx.doi.org/10.1007/978-0-387-71313-7

  $ mx springer from=2919 "pft=v1/v20/v245/v250/v260/v856/"
  978-0-387-71312-0
  ##^a9780387713137
  10^aCultural Heritage and Human Rights^h[electronic resource] /^cedited by 
  Helaine Silverman, D. Fairchild Ruggles.
  ##^aNew York, NY :^bSpringer-Verlag New York,^c2007.
  40^uhttp://dx.doi.org/10.1007/978-0-387-71313-7

    
== Campo 912 ==

Campo local. No sé qué codifica.

  $ mx springer "pft=(v912/)" now | sort | uniq -c
       92 ##^aZDB-2-BHS
      465 ##^aZDB-2-CMS
      326 ##^aZDB-2-CWD
      482 ##^aZDB-2-EES
     1125 ##^aZDB-2-ENG
     1594 ##^aZDB-2-LNC
       63 ##^aZDB-2-LNM
       72 ##^aZDB-2-LNP
      762 ##^aZDB-2-PHA
      629 ##^aZDB-2-SBE
      945 ##^aZDB-2-SBL
     2239 ##^aZDB-2-SCS
      755 ##^aZDB-2-SHU
      976 ##^aZDB-2-SMA
     1014 ##^aZDB-2-SME


== Campo 950 ==

El área temática o colección según Springer. Hay inconsistencia por el uso de espacios:

  $ mx springer "pft=('«'v950'»'/)" now | sort | uniq -c
        1 «##^aBehavioral Science (Springer-11640) »
       91 «##^aBehavioral Science (Springer-11640)»
        6 «##^aBiomedical and Life Sciences (Springer-11642) »
      939 «##^aBiomedical and Life Sciences (Springer-11642)»
      629 «##^aBusiness and Economics (Springer-11643)»
       20 «##^aChemistry and Materials Science (Springer-11644) »
      445 «##^aChemistry and Materials Science (Springer-11644)»
        4 «##^aComputer Science (Springer-11645) »
     2235 «##^aComputer Science (Springer-11645)»
        3 «##^aEarth and Environmental Science (Springer-11646) »
      479 «##^aEarth and Environmental Science (Springer-11646)»
        5 «##^aEngineering (Springer-11647) »
     1120 «##^aEngineering (Springer-11647)»
        8 «##^aHumanities, Social Science and Law (Springer-11648) »
      747 «##^aHumanities, Social Science and Law (Springer-11648)»
        6 «##^aMathematics and Statistics (Springer-11649) »
      970 «##^aMathematics and Statistics (Springer-11649)»
        6 «##^aMedicine (Springer-11650) »
     1008 «##^aMedicine (Springer-11650)»
        3 «##^aPhysics and Astronomy (Springer-11651) »
      759 «##^aPhysics and Astronomy (Springer-11651)»
      326 «##^aProfessional and Applied Computing (Springer-12059)»


Y listo, no hay más campos.
   
    
==== Selección de registros ====

¿Qué registros vale la pena incluir en un determinado OPAC? El criterio más obvio es el uso del campo 950 para filtrar por grandes áreas temáticas. De esa manera encontraríamos, por ejemplo, casi 1000 libros de matemática y estadística si tomamos "Mathematics and Statistics (Springer-11649)".

Otro criterio, que puede servir para refinar un poco, es usar los términos del campo 650. Detectemos algunos de matemática:

    Abstract Harmonic Analysis
    Algebra
    Algebraic Geometry
    Algebraic topology
    Algebraic Topology
    Algebra^xData processing
    Algorithm Analysis and Problem Complexity
    Algorithms
    Analysis ??
    Applications of Mathematics
    Appl.Mathematics/Computational Methods of Engineering
    Approximations and Expansions
    Arithmetic and Logic Structures
    Associative Rings and Algebras
    Biology^xMathematics
    Calculus of Variations and Optimal Control
    Calculus of Variations and Optimal Control, Optimization
    Calculus of Variations and Optimal Control; Optimization
    Category Theory, Homological Algebra
    Cell aggregation^xMathematics
    Coding and Information Theory
    Coding theory
    Combinatorics
    Commutative Rings and Algebras
    Complexity
    Computational complexity
    Computational Mathematics and Numerical Analysis
    Computer science^xMathematics
    Convex and Discrete Geometry
    Difference and Functional Equations
    Differentiable dynamical systems
    Differential Equations
    Differential equations, partial
    Differential Geometry
    Discrete groups
    Discrete Mathematics in Computer Science
    Distribution (Probability theory)
    Dynamical Systems and Ergodic Theory
    Field Theory and Polynomials
    Fourier analysis
    Fourier Analysis
    Functional analysis
    Functional Analysis
    Functional equations
    Functions of a Complex Variable
    Functions of complex variables
    Functions, special
    Game Theory, Economics, Social and Behav. Sciences
    Game Theory/Mathematical Methods
    General Algebraic Systems
    Genetics^xMathematics
    GeologyxMathematics
    Geometry
    Geometry, algebraic
    Global analysis
    Global Analysis and Analysis on Manifolds
    Global analysis (Mathematics)
    Global differential geometry
    Group theory
    Group Theory and Generalizations
    Harmonic analysis
    History of Mathematics
    Integral equations
    Integral Equations
    Integral Transforms
    Integral Transforms, Operational Calculus
    K-theory
    K-Theory
    Linear and Multilinear Algebras, Matrix Theory
    Logic
    Logic, Symbolic and mathematical
    Manifolds and Cell Complexes (incl. Diff.Topology)
    Math. Applications in Chemistry
    Math Applications in Computer Science
    Math. Applications in Geosciences
    Math. Appl. in Environmental Science
    Mathematical and Computational Physics
    Mathematical Applications in Earth Sciences
    Mathematical Biology in General
    Mathematical geography
    Mathematical Logic and Formal Languages
    Mathematical Logic and Foundations
    Mathematical Methods in Physics
    Mathematical Modeling and Industrial Mathematics
    Mathematical optimization
    Mathematical physics
    Mathematical Software
    Mathematical statistics
    Mathematics
    Mathematics Education
    Mathematics, general
    Mathematics of Computing
    Mathematics_^xHistory
    Mathematics_$xHistory
    Matrix theory
    Measure and Integration
    Non-associative Rings and Algebras
    Nonlinear Dynamics, Complex Systems, Chaos, Neural Networks
    Number theory
    Number Theory
    Numerical analysis
    Numerical Analysis
    Numerical and Computational Methods
    Numerical and Computational Methods in Engineering
    Numerical and Computational Physics
    Numeric Computing
    Operations research
    Operations Research/Decision Theory
    Operations Research, Mathematical Programming
    Operator theory
    Operator Theory
    Optimization
    Order, Lattices, Ordered Algebraic Structures
    Ordinary Differential Equations
    Partial Differential Equations
    Physiology^xMathematics
    Popular Science in Mathematics/Computer Science/Natural Science/Technology
    Potential Theory
    Potential theory (Mathematics)
    Probability and Statistics in Computer Science
    Probability Theory and Stochastic Processes
    Quality Control, Reliability, Safety and Risk  ??
    Quantum computing ??
    Quantum Computing, Information and Physics ??
    Real Functions
    Sequences (Mathematics)
    Sequences, Series, Summability
    Several Complex Variables and Analytic Spaces
    Special Functions
    Statistical Theory and Methods
    Statistics
    Statistics and Computing/Statistics Programs
    Statistics for Business/Economics/Mathematical Finance/Insurance
    Statistics for Engineering, Physics, Computer Science, Chemistry & 
    Statistics for Engineering, Physics, Computer Science, Chemistry and Earth 
    Statistics for Life Sciences, Medicine, Health Sciences
    Statistics for Social Science, Behavorial Science, Education, Public Policy, 
    Statistics, general
    Symbolic and Algebraic Manipulation  ??
    Theory of Computation ??
    Topological Groups
    Topological Groups, Lie Groups
    Topology
    Vibration, Dynamical Systems, Control ??


Tarea: analizar qué grado de solapamiento hay entre el uso de estos descriptores y el uso de "Mathematics and Statistics" en el campo 950.

¿Cómo recupero todos los registros asociados a ese conjunto de descriptores?

Aquí algunas primeras pruebas hechas en base al campo 650; hay que analizar con más detenimiento.

<code>
$ mx springer-clean "text/show=Biology^xMath"
mfn     92|tag   650|occ   3|Biology^xMath
650  "#0^aBiology^xMathematics"
..

$ mx springer-clean from=1206 "pft=(v650/)(mhl,v650*4/)"
...
#0^aBiology^xMathematics
...
Biology. Mathematics
...

$ cat springer.fst 
950 0 v950^a
650 0 (mhl,v650*4/)

$ mx springer-clean fst=@springer.fst fullinv=springer-clean now -all

$ ifkeys springer-clean | more
    20|ABDOMEN. SURGERY
    20|ABDOMINAL SURGERY
    20|ABSTRACT HARMONIC ANALYSIS
    14|ACCOUNTING/AUDITING
    64|ACOUSTICS
    10|ACUPUNCTURE
    ...
    72|WEIGHTS AND MEASURES
     1|WILDLIFE MANAGEMENT
    13|WOOD
    13|WOOD SCIENCE & TECHNOLOGY
   100|ZOOLOGY


$ mx springer-clean "MATH$/(650)" now -all
   1795  MATH$/(650)
   1795  Set #000000001
Hits=1795

$ mx springer-clean "MATHEM$/(650)" now -all
   1712  MATHEM$/(650)
   1712  Set #000000001
Hits=1712

$ mx springer-clean "MATHEM$/(650) and MATH$/(950)" now -all
   1712  MATHEM$/(650)
    976  MATH$/(950)
    935  Operation *
    935  Set #000000001
Hits=935

$ mx springer-clean "MATH$/(950)" now -all
    976  MATH$/(950)
    976  Set #000000001
Hits=976

$ mx springer-clean "ALGEB$" now -all
    229  ALGEB$
    229  Set #000000001
Hits=229

$ mx springer-clean "ALGEB$/(650)" now -all
    229  ALGEB$/(650)
    229  Set #000000001
Hits=229

$ mx springer-clean "ALGEB$ AND MATH$/(950)" now -all
    229  ALGEB$
    976  MATH$/(950)
    173  Operation *
    173  Set #000000001
Hits=173

$ mx springer-clean "ALGEB$ AND NOT MATH$/(950)" now -all
    229  ALGEB$
    976  MATH$/(950)
     56  Operation ^
     56  Set #000000001
Hits=56

$ mx springer-clean "ALGEB$ AND NOT MATH$/(950)" now "pft='950: 'v950^a/" | sort | uniq -c | awk '/950: / {print $0}'
     48 950: Computer Science (Springer-11645)
      2 950: Engineering (Springer-11647)
      6 950: Physics and Astronomy (Springer-11651)

$ mx springer-clean "ALGEB$ AND NOT MATH$/(950)" now "pft=v245^a/v950^a/#"

$ mx springer-clean "ALGEB$ AND NOT MATH$/(950) AND NOT COMPUT$/(950)" now "pft=v245^a/v950^a/#"
</code>


<code>
$ cat springer.fst 
950 0 v950^a
650 0 (mhl,v650*4/)
654 4 (mhl,v650*4/)

$ mx springer-clean fst=@springer.fst fullinv=springer-clean now -all

$ mx springer-clean "MATH$/(654)" now -all
   2234  MATH$/(654)
   2234  Set #000000001
Hits=2234

$ mx springer-clean "MATH$/(650)" now -all
   1795  MATH$/(650)
   1795  Set #000000001
Hits=1795

$ mx springer-clean "MATHEM$/(650)" now -all
   1712  MATHEM$/(650)
   1712  Set #000000001
Hits=1712

$ mx springer-clean "MATHEM$/(654)" now -all
   2172  MATHEM$/(654)
   2172  Set #000000001
Hits=2172

$ mx springer-clean "MATH$/(654) AND NOT MATHEM$/(654)" now -all
   2234  MATH$/(654)
   2172  MATHEM$/(654)
     62  Operation ^
     62  Set #000000001
Hits=62

$ mx springer-clean "variation$ and not MATH$/(950)" now lw=400 "pft=v245/(x3,v650*2/)v950^a/#"
</code>

Tarea: averiguar la correlación que hay entre la presencia de los descriptores identificados arriba, y la presencia de descriptores generales como //Mathematics// o //Statistics//.
==== Integración al OPAC ====

Este es el procedimiento que usé para unir los registros de ebooks de Springer con el resto de los registros del catálogo del INMABB.

FIXME agregar los comandos utilizados para generar la base ''springer'' a partir del archivo mrc enviado por Springer.

<code bash>
# elimino campos 773 y 912
mx springer "proc='d773d912'" copy=springer now -all

# limpieza de espacios sobrantes (genera un log)
mxcp springer create=springer-clean clean log=springer-clean.log

# agrego subcampos $y y $z a cada campo 856
mx springer-clean "proc='d856a856~',v856,'^ySpringerLink, via BECYT^zAcceso desde instituciones autorizadas','~'" copy=springer-clean now -all

# genero un invertido con el campo 950
mx springer-clean "fst=950 0 v950^a" fullinv=springer-clean

# creo una base con los registros de matemática (según el 950)
mx springer-clean MATH$ create=springer-math now -all

# agrego prefijo al campo 001 y hago la unión de registros
mx springer-math "proc='d001a001@springer:',v001,'@'" append=biblio now -all
</code>
==== Ideas varias ====


Hacer un OPAC sólo para estos registros.

Nube de descriptores.

Conseguir imagen de tapa (vía Google Books? vía Springer? Amazon? Otro?)


==== Errores en los datos ====

Encontré estos tres títulos con errores:

  * Groupes et algérbes de Lie => Groupes et algèbres de Lie
  * Groupes et algébras de Lie => Groupes et algèbres de Lie
  * Complemantarity => Complementarity

No solo están así en los registros MARC, sino también en el sitio de Springer. ¿Cómo se informan estos errores a Springer para que sean corregidos?


==== Falta de datos ====

No sé si "falta de datos" es la forma adecuada de denominar a este problema; un ejemplo es la falta de un punto de acceso para Frederick Mosteller en el registro 978-0-387-20271-6, correspondiente a "Selected Papers of Frederick Mosteller / edited by Stephen E. Fienberg, David C. Hoaglin."


==== Batch cataloging ====

A continuación, algunas notas generales sobre el problema de "batch cataloging".

Problemas que se presentan:

    * Selección y obtención de los registros bibliográficos adecuados.
    * Análisis y control de calidad de los registros obtenidos.
        * ¿El uso de etiquetas MARC es correcto?
        * ¿Qué reglas de catalogación se han usado?
        * ¿La catalogación es correcta? (i.e., ¿están bien catalogados los recursos?)
    * Modificaciones automáticas a los registros, antes de importarlos. Por ejemplo, supresión o agregado de campos locales 9xx, agregado de texto a enlaces en 856.
    * Control de autoridades. ¿Pre o post importación?
    * Detección de duplicados (o cuasi-duplicados): dentro del conjunto a importar, y entre los nuevos registros y el resto del catálogo.
    * Importación de los registros:
        * A) En una "base" separada para su administración, que sólo se integra al resto de los registros en el OPAC? O bien,
        * B) En la misma "base" con el resto de los registros del catálogo?
    * Correcciones manuales posteriores, mantenimiento de los registros.
    * Mantenimiento a medida que el proveedor (e.g. Springer) ofrece nuevos conjuntos de registros, o versiones corregidas de los registros ya obtenidos.
    * Obtención de imágenes de cubiertas.