GSoC 2017 : Week 9-12

Status update for the final few weeks

Krishanu Konar

3 minute read

The last few weeks have been really intense and I wasn’t able to finish the weekly blogs. I did pass my second evaluation, and recieved some feedback on the work I’ve been doing. This was going to be the final phase of my project, and I still had a lot to do. I didnt get much time to finish the blogs with university resuming after the summer break, so I decided to compile all of it into one post. So, for the final time, here we go.

So I’ve been working on JSONpedia live’s integration into the process instead of the live service since last week. And after I was done with the wrapper, I realized it wasn’t working as expected and I was getting errors. After debugging for a while, I noticed that the output of the JSONpedia Live service and the JSONpedia library had a minor difference, which is as explained below:

The expected output after applying the required filters:

{
	"filter": "object_filter(@type=section,)>null",
	"result": [{ ......
          .......
}

Instead the following output was observed:

{
	"filter": object_filter(@type=section,)>null,
	"result": [{ ......
          .......
}

The missing quotes in the first line were breaking the json syntax and hence were breaking the current code.

I contacted Michele, the creator of JSONpedia for support on this issue. It took me some time to figure out where the bug was, because this was the first time I was looking into a codebase this big. It was overwhelming in the beginning, but I slowly sifted through the code and eventually found the bug which was present in JSONpedia filters. I wrote a fix to the filter problem, and then sent a pull request with the fix to JSONpedia for same. Once the bug was fixed and the wrapper was working correctly, it was time for a test run, and the preliminary tests were a success!

Now it was time for some final tweaks. I made some small changes in the time period extractor, so that we can also add extra parameters with ontologies for special cases (like releaseYear instead of activeYear in case of Movie or Musical Albums). Once that was done, I finalized the ontology classes/properties that I would be using in the extractor to extract triples. Once everything was done, I ran the extractor to create a dataset for musicalArtist.

The next week was all about documentation and bug-fixing. Most of my project was done and the extractors were running. It was time to create write-ups and add internal comments in my code. There were a few issues that came up while the extractor was running and those bugs were fixed. Once the internal documentation was done, I decided to use Sphinx to create external documentation. The final week (or this week) was all about finishing up the project and formulate a summary report and results for the project. After 3 months of working on this project, it was finally complete. Here are some sample results from using the extractor on different domains.

Topic & Language # Resources # Statements Evaluation Accuracy
Actors (2016) 6,621 110,797 77%
Actors (2017) 6,606 134,013 79.08%
Musical Artist 52,759 1,340,800 75.77%
Band 34,505 867,984 84.57%
University 20,343 250,167 49.29%
Newspapers 6,861 17,546 52.37%

Now we wait for the final verdict! I had discussion with my mentors, and they were happy with the work that was done and congratulated me for the work. I have a good feeling that I would successfully complete my project, lets see what happens!!

Fingers crossed.

You can follow my project on github here.

comments powered by Disqus