|
Blog entriesI recently had to (remotely) debug an issue on windows involving
PostgreSQL and PL/Python. Basically, two very similar computers, with
Python2.5 installed via python(x,y), PostgreSQL 8.3.8 installed via
the binary installer. On the first machine create language
plpythonu; worked like a charm, and on the other one, it failed with
C:\\Program Files\\Postgresql\\8.3\\plpython.dll: specified module
could not be found. This is caused by the dynamic linker not finding
some DLL. Using Depends.exe showed that
plpython.dll looks for python25.dll (the one it was built
against in the 8.3.8 installer), but that the DLL was there.
I'll save the various things we tried and jump directly to the
solution. After much head scratching, it turned out that the first
computer had TortoiseHg installed. This caused C:\\Program
Files\\TortoiseHg to be included in the System PATH environment
variable, and that directory contains python25.dll. On the other
hand C:\\Python25 was in the user's PATH environment variable on both
computers. As the database Windows service runs using a dedicated
local account (typically with login postgres), it would not have
C:\\Python25 in its PATH, but if TortoiseHg was there, it would
find the DLL in some other directory. So the solution was to add
C:\\Python25 to the system PATH.
During Euroscipy, the Logilab Team presented an original approach for querying news using semantic information: "Rss feeds aggregator based on Scikits.learn and CubicWeb" by Vincent Michel
This work is based on two major pieces of software:
- CubicWeb, the pythonic semantic web framework, is used to store and query Dbpedia information. CubicWeb is able to reconstruct links from rdf/nt files, and can easily execute complex queries in a database with more than 8 millions entities and 75 millions links when using a PostgreSQL backend.
- Scikit.learn is a cutting-edge python toolbox for machine learning. It provides algorithms that are simple and easy to use.
Based on these tools, we built a pure Python application to query the news:
- Named Entities are extracted from RSS articles of a few mainstream English newspapers (New York Times, Reuteurs, BBC News, etc.), for each group of words in an article, we check if a Dbpedia entry has the same label. If so, we create a semantic link between the article and the Dbpedia entry.
- An occurrence matrix of "RSS Articles" times "Named Entities" is constructed and may be used against several machine learning algorithms (MeanShift algorithm, Hierachical Clustering) in order to provide original and informative views of recent events.
Moreover, queries may be used jointly with semantic information from Dbpedia:
All musical artists in the news:
DISTINCT Any E, R WHERE E appears_in_rss R, E has_type T, T label "musical artist"
All living office holder persons in the news:
DISTINCT Any E WHERE E appears_in_rss R, E has_type T, T label "office holder", E has_subject C, C label "Living people"
All news that talk about Barack Obama and any scientist:
DISTINCT Any R WHERE E1 label "Barack Obama", E1 appears_in_rss R, E2 appears_in_rss R, E2 has_type T, T label "scientist"
All news that talk about a drug:
Any X, R WHERE X appears_in_rss R, X has_type T, T label "drug"
Such a tool may be used for informetrics and news analysis.
Feel free to download the complete slides of the presentation.
I was in Bruxelles for FOSDEM 2013. As with previous FOSDEM there were too many
interesting talks and people to see. Here is a summary of what I saw:
In the Mozilla's room:
- The html5 pdf viewer pdfjs is impressive. The PDF specification is really
scary but this full featured "native" viewer is able to renders most of it
with very good performance. Have a look at the pdfjs demo!
- Firefox debug tools overview with a specific focus of Firefox OS emulator in
your browser.
- Introduction to webl10n: an internationalization format and library used in
Firefox OS. A successful mix that results in a format that is idiot-proof
enough for a duck to use, that relies on Unicode specifications to handle
complex pluralization rules and that allows cascading translation
definitions.
- Status of html5 video and audio support in Firefox. The topic looks like a
real headache but the team seems to be doing really well. Special mention
for the reverse demo effect: The speaker expected some format to be still
unsupported but someone else apparently implemented them over night.
- Last but not least I gave a talk about the changeset evolution concept that
I'm putting in Mercurial. Thanks goes to Feth for asking me his
not-scripted-at-all-questions during this talk. (slides)
In the postgresql room:
- Insightful talk about more event trigger in postgresql engine and how this may
becomes the perfect way to break your system.
- Full update of the capability of postgis 2.0. The postgis suite was already
impressive for storing and querying 2D data, but it now have impressive
capability regarding 3D data.
On python related topic:
- Victor Stinner has started an interesting project to improve CPython
performance. The first one: astoptimizer breaks some of the language
semantics to apply optimisation on compiling to byte code (lookup caching,
constant folding,…). The other, registervm is a full redefinition of how the interpreter
handles reference in byte code.
After the FOSDEM, I crossed the channel to attend a Mercurial sprint in London.
Expect more on this topic soon.
|