FOIwiki:ScraperSync: Difference between revisions
Line 24: | Line 24: | ||
==Output== | ==Output== | ||
ScraperSync's main output is a new version of the page source. | ScraperSync's main output is a new version of the page source. It attempts to: | ||
* cross out bodies that no longer appear in the scraper | |||
* add bodies to the list that appear in the scraper but not in the page | |||
* mark as "gone" any bodies with the {{tag|defunct}} tag in WhatDoTheyKnow | |||
* link bodies to WhatDoTheyKnow if they aren't already linked | |||
==Creating a new page== | ==Creating a new page== | ||
ScraperSync has a mode for creating an entirely new page from a dataset. This can be most easily activated by [http://scraperwikiviews.com/run/foiwiki_scrapersync/ running it with no options] and choosing "Create new" and entering the URL name of the Scraper to use as the source. | ScraperSync has a mode for creating an entirely new page from a dataset. This can be most easily activated by [http://scraperwikiviews.com/run/foiwiki_scrapersync/ running it with no options] and choosing "Create new" and entering the URL name of the Scraper to use as the source. |
Revision as of 15:11, 9 April 2012
ScraperSync is a tool for maintaining an FOIwiki page as a mirror of a dataset from ScraperWiki. It attempts to preserve markup and changes in the page while propagating changes from the dataset.
Dataset requirements
The dataset must have a table called "swdata" containing a column called "name". The contents of this column form the texts of entries in a bulletted list in the wiki page.
List requirements
The start of the maintained list is marked by a special comment like this:
<!-- ScraperSync start { "scraper": "thing" } -->
After the word "start" is a JSON object which configures ScraperSync. It can contain the following entries:
- scraper: names the scraper from which the data should be pulled.
- sort: if set to name, sorts list entries by name.
ScraperSync understands several different list formats, and tries to remove markup and notes to work out what public authority is named by the item.
The maintained list is ended by another special comment:
<!-- ScraperSync end -->
Output
ScraperSync's main output is a new version of the page source. It attempts to:
- cross out bodies that no longer appear in the scraper
- add bodies to the list that appear in the scraper but not in the page
- mark as "gone" any bodies with the defunct tag in WhatDoTheyKnow
- link bodies to WhatDoTheyKnow if they aren't already linked
Creating a new page
ScraperSync has a mode for creating an entirely new page from a dataset. This can be most easily activated by running it with no options and choosing "Create new" and entering the URL name of the Scraper to use as the source.