Skip to main content

SoBigData Event

11th GATE Training Course: Large Scale Text and Social Media Analytics with GATE

The focus will be on mining text and social media content with GATE. Many of the hands on exercises will be focused on analysing news articles, tweets, and other textual content.

The planned schedule is as follows (NOTE: may still be subject to minor timetabling changes).

Single track from Monday to Thursday (9am - 5pm):

  • Monday: Module 1: Basic Information Extraction with GATE
    • Intro to GATE + Information Extraction (IE)
    • Corpus Annotation and Evaluation
    • Writing Information Extraction Patterns with JAPE
  • Tuesday: Module 2: Using GATE for social media analysis
    • Challenges for analysing social media, GATE for social media
    • Twitter intro + JSON structure
    • Language identification, tokenisation for Twitter
    • POS tagging and Information Extraction for Twitter
  • Wednesday: Module 3: Crowdsourcing, GATE Cloud/MIMIR, and Machine Learning
    • Crowdsourcing annotated social media content with the GATE crowdsourcing plugin
    • GATE Cloud, deploying your own IE pipeline at scale (how to process 5 millions tweets in 30 mins)
    • GATE Mimir - how to index and search semantically annotated social media streams Challenges of opinion mining in social media
    • Training Machine Learning Models for IE in GATE
  • Thursday: Module 4: Advanced IE and Opinion Mining in GATE
    • Advanced Information Extraction
    • Useful GATE components (plugins)
    • Opinion mining components and applications in GATE

On Friday, there is a choice of modules (9am - 5pm):

  • Module 5: GATE for developers
    • Basic GATE Embedded
    • Writing your own plugin
    • GATE in production - multi-threading, web applications, etc.
  • Module 6: GATE Applications
    • Building your own applications
    • Examples of some current GATE applications: social media summarisation, visualisation, Linked Open Data for IE, and more