Scheduling is the art of deciding when to run your scraper. There are two schedulers available by default
SimpleScheduler QuartzScheduler
The best way to describe this scheduler is as the 'run it now' scheduler. When you run the OSCube Engine, it will immediately run the scraper once, and never again.
Xxx.scheduler=SimpleSimpleScheduler is the default option so does not need to be explicitly specified, and is most commonly used when an external cron-job is used, or for testing. (TODO: There will be a testing mode for OSCube at a later date)
This allows you to specify when you'd like the scraper to run using the de-facto standard of the UNIX cron system, only on startup (as with the SimpleScheduler), or on a repeating basis.
Xxx.scheduler=Quartz Xxx.schedule=simple Xxx.simple.interval=<integer milliseconds> Xxx.simple.repeat=<integer number of times to repeat>Confusingly the schedule in this case is called
simple (the name comes from the Quartz SimpleTrigger). The interval is the number of milliseconds to separate each scrape with, and the optional repeat is the number of times to perform the scrape. If unspecified it will go forever.
Xxx.scheduler=Quartz Xxx.schedule=cron Xxx.cron=<cron string>Quartz's implementation of the UNIX Cron system, their documentation http://www.opensymphony.com/quartz/tutorial.html#cronTriggers is probably the best place to read up on it. Scraping every 5 minutes would be:
Xxx.scheduler=Quartz Xxx.schedule=cron Xxx.cron=0 0/5 * * * ?NOTE: Quartz adds a column to the front of the classic cron text to represent seconds, the above example would have only been
0/5 * * * ? in the usual cron syntax.