GSoC 2017 : Week 5 & 6

Status update for the 5th and 6th week

Krishanu Konar

3 minute read

So, the results of the first evaluations were out last week, and thankfully, I passed the evaluation with flying colors. My mentors seemed happy with my work so far and asked me to keep it up!

So, its back to business. So in Week 5, my job was to create a tool that could create mapping rules and mapper functions as per the user’s demands. This would be something completely opposite to what I’ve been doing all month, as it’ll generalize all the work for future domains instead of me (or any other developer) writing specialized rules for each domain. Hence, this is a huge step in increasing the scalability of the project.

I came up with a structured plan during the evaluation week on how to implement a tool that would allow users to add custom rules and mapping functions to the list extractor, which the extractor could use in conjunction with the existing pre-defined rules, and its impact on the current code-base.

Then, I started working on the plans to complete rulesGenerator(), which would allow users to do all that. At first, I coded up a prototype for the rulesGenerator that could create/modify rules. After testing it, I made changes to the existing list extractor, which will now look at 2 additional files for the mapping rules: One is the pre-defined mapping_rules.py, which contains all the core mapping_rules and the user defined settings.json and custom_mappers.json, which contain user defined mapping rules. The extractor can hence run on previously unmapped domains too!

{
 "MAPPING": {
  "Writer": ["BIBLIOGRAPHY", "HONORS"],
  "EducationalInstitution": ["ALUMNI", "PROGRAMS_OFFERED", "STAFF"],
  "Actor": ["FILMOGRAPHY", "DISCOGRAPHY", "HONORS"],
  "Band": ["DISCOGRAPHY", "CONCERT_TOURS", "BAND_MEMBERS", "HONORS"],
  "PeriodicalLiterature": ["CONTRIBUTORS", "OTHER_LITERATURE_DETAILS", "HONORS", "BIBLIOGRAPHY"],
  }
}

I added the newly structured Mapping Rules to the list extractor and it can now accept optional command-line argument to select class of mapper functions. After that, I also completed working on custom mappers using rulesGenerator.py which could take settings on how the triple extracting mapper function should work, and run the custom mapper function according to those settings, expanding the coverage of the extractor. Below is the sample settings that the mapping functions will use for extraction purposes.

{
 "Actor": {
  "years": true,
  "headers": {
   "de": ["Filmografie"],
   "en": ["filmography", "shows"]
  },
  "ontology": {
   "de": {
    "Darsteller": "Starring",
    "Regisseur": "Director"
   },
   "en": {
    "Actor": "Starring",
    "Director": "Director"
   }
  },
  "extractors": ["1", "2"]
 }
}

Next week, I’ll code up a general mapper function in mapper.py that can use the settings.json and custom_mappers.json files to create a totally user defined list-extractor module! I’ll also get in touch with Luca to discuss further possible improvements.

Finally,

You can follow my project on github here.

comments powered by Disqus