[go: up one dir, main page]

Parallelisation for data-intensive applications over peer-to-peer networks

[thumbnail of WRAP_THESIS_Chen_2009.pdf] PDF
WRAP_THESIS_Chen_2009.pdf - Requires a PDF viewer.

Download (1MB)

Request Changes to record.

Abstract

In Data Intensive Computing, properties of the data that are the input for
an application decide running performance in most cases. Those properties include
the size of the data, the relationships inside data, and so forth. There is a
class of data intensive applications (BLAST, SETI@home, Folding@Home and
so on so forth) whose performances solely depend on the amount of input data.
Another important characteristic of those applications is that the input data can be
split into units and these units are not related to each other during the runs of the
applications. This characteristic helps this class of data intensive applications to
be parallelised in the way where the input data is split into units and application
runs on different computer nodes for certain portion of the units. SETI@home and
Folding@Home have been successfully parallelised over peer-to-peer networks.
However, they suffer from the problems of single point of failure and poor scalability.
In order to solve these problems, we choose BLAST as our example data
intensive applications and parallelise BLAST over a fully distributed peer-to-peer
network.
BLAST is a popular bioinformatics toolset which can be used to compare
two DNA sequences. The major usage of BLAST is searching a query of sequences
inside a database for their similarities so as to identify whether they are
new. When comparing single pair of sequences, BLAST is efficient. However,
due to growing size of the databases, executing BLAST jobs locally produces
prohibitively poor performance. Thus, methods for parallelising BLAST are
sought.
Traditional BLAST parallelisation approaches are all based on clusters.
Clusters employ a number of computing nodes and high bandwidth interlinks between
nodes. Cluster-based BLAST exhibits higher performance; nevertheless,
clusters suffer from limited resources and scalability problems. Clusters are expensive, prohibitively so when the growth of the sequence database are taken into
account. It involves high cost and complication when increasing the number of
nodes to adapt to the growth of BLAST databases. Hence a Peer-to-Peer-based
BLAST service is required.
This thesis demonstrates our parallelisation of BLAST over Peer-to-Peer
networks (termed ppBLAST), which utilises the free storage and computing resources
in the Peer-to-Peer networks to complete BLAST jobs in parallel. In order
to achieve the goal, we build three layers in ppBLAST each of which is responsible
for particular functions. The bottom layer is a DHT infrastructure with the
support of range queries. It provides efficient range-based lookup service and
storage for BLAST tasks. The middle layer is the BitTorrent-based database distribution.
The upper layer is the core of ppBLAST which schedules and dispatches
task to peers. For each layer, we conduct comprehensive research and the
achievements are presented in this thesis.
For the DHT layer, we design and implement our DAST-DHT. We analyse
balancing, maximum number of children and the accuracy of the range query.
We also compare the DAST with other range query methodology and state that if
the number of children is adjusted to more two, the performance of DAST overcomes
others. For the BitTorrent-like database distribution layer, we investigate
the relationship between the seeding strategies and the selfish leechers (freeriders
and exploiters). We conclude that OSS works better than TSS in a normal situation.

Item Type: Thesis (PhD)
Subjects: Q Science > QA Mathematics > QA76 Electronic computers. Computer science. Computer software
Library of Congress Subject Headings (LCSH): Parallel processing (Electronic computers), Peer-to-peer architecture (Computer networks), Program transformation (Computer programming)
Official Date: September 2009
Dates:
Date
Event
September 2009
Submitted
Institution: University of Warwick
Theses Department: Department of Computer Science
Thesis Type: PhD
Publication Status: Unpublished
Supervisor(s)/Advisor: Jarvis, Stephen
Extent: xii, 116 leaves : ill., charts
Language: eng
Persistent URL: https://wrap.warwick.ac.uk/3640/

Export / Share Citation


Request changes or add full text files to a record

Repository staff actions (login required)

View Item View Item