edu.iastate.jtm.jmed
Class MedlineParser

java.lang.Object
  extended by edu.iastate.jtm.jmed.StreamXMLParser
      extended by edu.iastate.jtm.jmed.MedlineParser

public class MedlineParser
extends StreamXMLParser

Extract citation info from Medline/PubMed XML files.

Author:
Jing Ding

Field Summary
 
Fields inherited from class edu.iastate.jtm.jmed.StreamXMLParser
unit
 
Constructor Summary
MedlineParser()
          Creates a new blank instance of MedlineParser in default MEDLINE mode.
MedlineParser(boolean pubmode)
           
MedlineParser(org.dom4j.io.SAXReader xp)
          Creates a new blank instance of MedlineParser in default MEDLINE mode.
MedlineParser(org.dom4j.io.SAXReader xp, boolean pubmode)
           
 
Method Summary
 void extractPlainText(java.io.File infile, java.io.File outDir)
          Extract plain text abstracts from an XML file into a directory.
 void extractPlainText(java.io.File infile, java.lang.String outDir, int size)
          Extract plain text abstracts from an XML file into multiple directories.
 void extractPlainText(java.io.InputStream instream, java.io.File outDir)
          Extract plain text abstracts from an input stream into a directory.
 void extractPlainText(java.io.InputStream instream, java.lang.String outDir, int size)
          Extract plain text abstracts from an input stream into multiple directories.
 void extractPmids(java.io.File output)
          Extract all PMIDs to a single file.
 void extractPmids(java.lang.String prefix, int size)
          Extract all PMIDs to multiple files.
 boolean isPubMode()
           
static void main(java.lang.String[] args)
          For test.
protected  boolean open(boolean parseHeader)
           
 void setAutoAdjustMode(boolean auto)
           
 void setPubMode(boolean mode)
           
 void splitXml(java.io.File infile, java.lang.String prefix, int size, boolean compress)
          Split a large XML file into small ones.
 void splitXml(java.io.InputStream instream, java.lang.String prefix, int size, boolean compress)
          Split a large XML stream into small files.
 
Methods inherited from class edu.iastate.jtm.jmed.StreamXMLParser
close, copyUnit, getHeader, getRootName, getSingleField, getUnit, getUnitCount, nextUnit, nextUnit, open, open, open, open, open, open, setIgnoreXmlError, setOutputPlain, setRefillZone, setRootMatcher, setSaxParser, setUnitMatcher, setVerbose
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

MedlineParser

public MedlineParser(boolean pubmode)
              throws org.xml.sax.SAXException
Throws:
org.xml.sax.SAXException

MedlineParser

public MedlineParser()
              throws org.xml.sax.SAXException
Creates a new blank instance of MedlineParser in default MEDLINE mode.

Throws:
org.xml.sax.SAXException

MedlineParser

public MedlineParser(org.dom4j.io.SAXReader xp,
                     boolean pubmode)
              throws org.xml.sax.SAXException
Throws:
org.xml.sax.SAXException

MedlineParser

public MedlineParser(org.dom4j.io.SAXReader xp)
              throws org.xml.sax.SAXException
Creates a new blank instance of MedlineParser in default MEDLINE mode.

Throws:
org.xml.sax.SAXException
Method Detail

open

protected boolean open(boolean parseHeader)
                throws java.io.IOException
Overrides:
open in class StreamXMLParser
Throws:
java.io.IOException

setAutoAdjustMode

public void setAutoAdjustMode(boolean auto)

setPubMode

public void setPubMode(boolean mode)

isPubMode

public boolean isPubMode()

extractPmids

public void extractPmids(java.io.File output)
                  throws org.dom4j.DocumentException,
                         java.io.IOException
Extract all PMIDs to a single file.

Parameters:
output - the output file.
Throws:
org.dom4j.DocumentException
java.io.IOException

extractPmids

public void extractPmids(java.lang.String prefix,
                         int size)
                  throws org.dom4j.DocumentException,
                         java.io.IOException
Extract all PMIDs to multiple files.

Parameters:
prefix - prefix of output filenames.
size - number of PMIDs in each file.
Throws:
org.dom4j.DocumentException
java.io.IOException

extractPlainText

public void extractPlainText(java.io.File infile,
                             java.io.File outDir)
                      throws org.dom4j.DocumentException,
                             java.io.IOException
Extract plain text abstracts from an XML file into a directory.

Parameters:
infile - input XML file
outDir - output directory
Throws:
org.dom4j.DocumentException
java.io.IOException

extractPlainText

public void extractPlainText(java.io.File infile,
                             java.lang.String outDir,
                             int size)
                      throws org.dom4j.DocumentException,
                             java.io.IOException
Extract plain text abstracts from an XML file into multiple directories.

Parameters:
infile - input XML file
outDir - prefix of output directories
size - number of abstracts per directory
Throws:
org.dom4j.DocumentException
java.io.IOException

extractPlainText

public void extractPlainText(java.io.InputStream instream,
                             java.io.File outDir)
                      throws org.dom4j.DocumentException,
                             java.io.IOException
Extract plain text abstracts from an input stream into a directory.

Parameters:
instream - input stream
outDir - output directory
Throws:
org.dom4j.DocumentException
java.io.IOException

extractPlainText

public void extractPlainText(java.io.InputStream instream,
                             java.lang.String outDir,
                             int size)
                      throws org.dom4j.DocumentException,
                             java.io.IOException
Extract plain text abstracts from an input stream into multiple directories.

Parameters:
instream - input stream
outDir - prefix of output directories
size - number of abstracts per directory
Throws:
org.dom4j.DocumentException
java.io.IOException

splitXml

public void splitXml(java.io.File infile,
                     java.lang.String prefix,
                     int size,
                     boolean compress)
              throws org.dom4j.DocumentException,
                     java.io.IOException
Split a large XML file into small ones.

Parameters:
infile - input XML file
prefix - prefix of output XML files
size - number of abstracts in each output file
Throws:
org.dom4j.DocumentException
java.io.IOException

splitXml

public void splitXml(java.io.InputStream instream,
                     java.lang.String prefix,
                     int size,
                     boolean compress)
              throws org.dom4j.DocumentException,
                     java.io.IOException
Split a large XML stream into small files.

Parameters:
instream - input stream
prefix - prefix of output XML files
size - number of abstracts in each output file
Throws:
org.dom4j.DocumentException
java.io.IOException

main

public static void main(java.lang.String[] args)
For test.

Parameters:
args - the command line arguments