Nazca notebooks
We have just published the following ipython notebooks explaining how to perform record linkage and entities matching with Nazca:
We have just published the following ipython notebooks explaining how to perform record linkage and entities matching with Nazca:
During Euroscipy, the Logilab Team presented an original approach for querying news using semantic information: "Rss feeds aggregator based on Scikits.learn and CubicWeb" by Vincent Michel This work is based on two major pieces of software:
Based on these tools, we built a pure Python application to query the news:
Moreover, queries may be used jointly with semantic information from Dbpedia:
All musical artists in the news:
DISTINCT Any E, R WHERE E appears_in_rss R, E has_type T, T label "musical artist"
All living office holder persons in the news:
DISTINCT Any E WHERE E appears_in_rss R, E has_type T, T label "office holder", E has_subject C, C label "Living people"
All news that talk about Barack Obama and any scientist:
DISTINCT Any R WHERE E1 label "Barack Obama", E1 appears_in_rss R, E2 appears_in_rss R, E2 has_type T, T label "scientist"
All news that talk about a drug:
Any X, R WHERE X appears_in_rss R, X has_type T, T label "drug"
Such a tool may be used for informetrics and news analysis. Feel free to download the complete slides of the presentation.
We are planning a one day coding sprint on scikits.learn the 1st April.
Venues, or remote participation on IRC are more than welcome !
More information can be found on the wiki:
https://github.com/scikit-learn/scikit-learn/wiki/Upcoming-events