الملخص الإنجليزي
Due to the recent trends in computer hardware, especially processors, parallel query execution is becoming widely important in modern XML databases. In addition, increasing amount of data and the demand for a fast query processing, the efficiency of database operations continues to be a challenging task. This thesis presents an approach for efficient execution of the XQueries over PACD's environment. This is to use a sort of parallel execution methods.
The work done here considered three different methods; Simultaneous Multiple Branch Execution, Partitioned Single-Path Execution and the Hybrid Multiple Branch Execution. The Simultaneous Multiple-Branch Execution method works by dividing the branches among processes to reduce the waiting time produced by executing one branch after the other. The second method is the Partitioned Single Path Execution works over queries that have a single path of sequence of SQL statements. This method divides number of states among processes and these processes will execute their states in parallel and join the result in the end of the query to display the final result. The third method is the Hybrid Multiple-Branch Execution this method will consider combining the earlier two methods when facing queries with long branches. The Simultaneous Multiple-Branch Execution method will be applied on such queries due to the nature of this query (queries with branches). In addition, each branch will be handled by more than one process which will apply the Single-Path Execution method on each branch. This is done to reduce the workload on each braches by dividing the states in each branch among processes, which will be done for all branches,
To test these methods, an experiment is conducted using three different databases categories (Wide databases (e.g DBLP), Deep databases (e.g TreeBank) and Average databases (eg XMark)) to measure the effect of the characteristics of the databases on the execution time. This experiment used same query types evaluated by PACD environment to compare the effect of adding the proposed methods on them and on the overall performance of PACD. Applying the proposed methods on different query types over different datasets showed an improvement in the execution time and as a result on the overall performance of PACD, In the future work, more investigation and work can be done on some of the query types that showed unpredicted outcomes to get better results. In addition, some consideration should be taken into account such as memory consumption, CPU usage and I/O operations workload. Also distributing the states among processes were done based on number of states in each query which can be further modified to include the workload into consideration as query optimization solution!