LinkLog: Python and Data Handling

Pipes and Filters are a familiar pattern for people managing data. Its use has been popularized by Yahoo Pipes. I always wanted to get a programmable version of pipes and filters and felt that a mini language would help a lot.

Guess what? I found two packages for creating piples and filters today through my Infostream alerts  –  FilterPype  and Joblib.

Pypes and Filters is a framework for working with data. The purpose of Pypes and Filters is to make it easy to manipulate streams of data by “filtering” the data through Filters that in turn form a Pipeline, or Pype.

Here are some features from the Introduction page.

FilterPype is being used for multi-level data analysis, but could be applied to many other areas where it is difficult to split up a system into small independent parts.

Some of its features:

  • Advanced algorithms broken down into simple data filter coroutines
  • Pipelines constructed from filters in the new FilterPype mini-language
  • Domain experts assemble pipelines with no Python knowledge required
  • Sub-pipelines and filters linked by automatic pipeline construction
  • All standard operations available: branching, joining and looping
  • Recursive coroutine pipes allowing calculation of e.g. factorials
  • Using it is like writing a synchronous multi-threaded program

Joblib is a set of tools to provide lightweight pipelining in Python. In particular, joblib offers:

  • transparent disk-caching of the output values and lazy re-evaluation (memoize pattern)
  • easy simple parallel computing
  • logging and tracing of the execution

Planning to give both a try. Have you used any of these?