In the wdqs code we consume the RDF update stream from kafka using the KafkaStreamConsumer class. A similar implementation should be written to work on top of HTTP EventStreams.
The features it must provide are:
- implement StreamConsumer
- offsets handling and persistance (which is provided when consuming directly from kafka)
- it knows what to do on the first run (infer the initial offset possibly using the triple store itself scanning select (min(?date) as ?start) { wikibase:Dump schema:dateModified ?date } LIMIT 1)
- it knows how to resume operations
- Adapt or add a new main to run it based on a set of parameters
- Use the same batching/compression technique (see PatchAccumulator)
- ideally populate the same set of metrics
AC:
- a triple compatible with SPARQL 1.1 Update operations and loaded with a munged wikidata dump can be updated outside of the WMF infrastructure using HTTP event streams.