Subscribe via RSS Feed

General Architecture of Text Engineering

Programs to Extract the Annotations from Raw Documents – Part 2

March 26, 2013 0 Comments
Programs to Extract the Annotations from Raw Documents – Part 2

We have previously written – Programs to Extract the Annotations from Raw Documents – Part 1. Here is the program to get the annotated value from XML File and put into the Database. AnnotationImplementationPosTaggerDB.java  a> Logical stop words for GATE annotation processing –  static String stop_words[] = {“few days”,“Some people”,“toes”,“or run”,“There”,“weeks”,“the pain”,“Both the pain”,“It”,“the body”,“the […]

Continue Reading »

Programs to Extract the Annotations from Raw Documents – Part 1

March 25, 2013 1 Comment
Programs to Extract the Annotations from Raw Documents – Part 1

In our work, after writing the jape files and populating gazatteer files(*.lst) files, we have used two program files to process all the annotations to db, from which we actually served in web application (Symptom Search).  We will describe those two files line-by-line and attach other files in the project with broad description of the […]

Continue Reading »

Gate Ontology Update

March 24, 2013 0 Comments
Gate Ontology Update

Instuction for making Ontology in Gate Developer – (1) Creole Plugins to be configured – Tools Ontology Ontology_Based_Gazetteer Ontology_Tools ANNIE Place the attached disease_symptom.owl in “<<Gate Installation>>\plugins\Ontology_Tools\resources” Click on Language Resource and add OWLIM Ontology – Follow picture – OI-01.jpg Double Click the DiseaseSymptom Ontology and see the OI-02.jpg How to create classes there – Follow the P-01, P-02 and P-03 jpg files and […]

Continue Reading »

An medical application made with GATE

March 22, 2013 0 Comments
An medical application made with GATE

An work which we have done with GATE – Steps what we have taken – 1> Extracted all the symptom information from our disease database application and parsed and extracted clean text without html tags from there. 2> We have made a sample gazetteer from disease names. 3> We have made a sample gazetteer with […]

Continue Reading »

Text Analysis with GATE – Part 7

March 22, 2013 0 Comments
Text Analysis with GATE – Part 7

JAPE: Regular Expressions over Annotations JAPE is a Java Annotation Patterns Engine. JAPE provides finite state transduction over annotations based on regular expressions. JAPE is a version of Common Pattern Specification Language. JAPE allows to recognise regular expressions in annotations on documents. A regular language can only describe sets of strings, not graphs, and GATE’s […]

Continue Reading »

Creating and Running Application file in GATE

March 22, 2013 0 Comments
Creating and Running Application file in GATE

Running GATE on based of gazetteer – Working logic to run the GATE Application was – 1>Take the lst files in gazetter 2>Map the lists.def file with updated gazetters. 3>Then put the PRs for processings in the Gate Application File. 4> Write Jape rules for taking the gazetter values and add some words to it. 5> Write […]

Continue Reading »

Text Analysis with GATE – Part 6

March 21, 2013 0 Comments
Text Analysis with GATE – Part 6

Components of GATE GATE Documents Documents are modelled as content, annotations and features . The content of a document can be any form in GATE. The features are <attribute, value> pairs stored a Feature Map. Attributes are String values while the values can be any Java object. The annotations are grouped in sets . A […]

Continue Reading »

Text Analysis with GATE – Part 5

March 19, 2013 0 Comments
Text Analysis with GATE – Part 5

GATE Embedded Integrating GATE-based language processing in applications using GATE Embedded (the GATE API) : add $GATE_HOME/bin/gate.jar and the JAR files in $GATE_HOME/lib to the Java CLASSPATH ($GATE_HOME is the GATE root directory which is stored in Environment variables in OS) To initialise GATE with gate.Gate.init(); We have worked with GATE in following areas (We […]

Continue Reading »

Text Analysis with GATE – Part 4

March 18, 2013 0 Comments
Text Analysis with GATE – Part 4

GATE comes with various built-in components: Language Resources modelling Documents and Corpora, and various types of Annotation Schema. Processing Resources that are part of the ANNIE system. Gazetteers. Ontologies. Machine Learning resources. Parsers and taggers. Other miscellaneous resources. ANNIE: a Nearly-New Information Extraction System ANNIE components are  1 Document Reset PR  The document reset resource […]

Continue Reading »

Text Analysis with GATE – Part 3

March 17, 2013 0 Comments
Text Analysis with GATE – Part 3

The basic business of GATE is annotating documents.Core concepts are; the documents to be annotated corpora comprising sets of documents, grouping documents for the purpose of running uniform processes across them annotations that are created on documents,annotation types such as ‘Name’ or ‘Date’ annotation sets comprising groups of annotations processing resources that manipulate and create […]

Continue Reading »

Text Analysis with GATE – Part 2

March 16, 2013 0 Comments
Text Analysis with GATE – Part 2

Collectively, the set of resources integrated with GATE is known as CREOLE – a Collection of REusable Objects for Language Engineering. While using GATE to develop language processing on some collection of documents, the developer uses GATE Developer and GATE Embedded to construct resources. This may involve programming, or the development of Language Resources such […]

Continue Reading »

Text Analysis with GATE – Part 1

March 15, 2013 0 Comments
Text Analysis with GATE – Part 1

In our work environment, we have done some of  the projects which are related to Text Engineering. Apart from the theoretical aspects, we have used GATE (General Architechture of Text Engineering) Most of the contents of this series of posts are in line with GATE Documentation. In this series, current post is a simple introduction […]

Continue Reading »