問題描述
Java 並行文件處理 (Java Parallel File Processing)
I have following code:
import java.io.*;
import java.util.concurrent.* ;
public class Example{
public static void main(String args[]) {
try {
FileOutputStream fos = new FileOutputStream("1.dat");
DataOutputStream dos = new DataOutputStream(fos);
for (int i = 0; i < 200000; i++) {
dos.writeInt(i);
}
dos.close(); // Two sample files created
FileOutputStream fos1 = new FileOutputStream("2.dat");
DataOutputStream dos1 = new DataOutputStream(fos1);
for (int i = 200000; i < 400000; i++) {
dos1.writeInt(i);
}
dos1.close();
Exampless.createArray(200000); //Create a shared array
Exampless ex1 = new Exampless("1.dat");
Exampless ex2 = new Exampless("2.dat");
ExecutorService executor = Executors.newFixedThreadPool(2); //Exexuted parallaly to cont number of matches in two file
long startTime = System.nanoTime();
long endTime;
Future<Integer> future1 = executor.submit(ex1);
Future<Integer> future2 = executor.submit(ex2);
int count1 = future1.get();
int count2 = future2.get();
endTime = System.nanoTime();
long duration = endTime - startTime;
System.out.println("duration with threads:"+duration);
executor.shutdown();
System.out.println("Matches: " + (count1 + count2));
startTime = System.nanoTime();
ex1.call();
ex2.call();
endTime = System.nanoTime();
duration = endTime - startTime;
System.out.println("duration without threads:"+duration);
} catch (Exception e) {
System.err.println("Error: " + e.getMessage());
}
}
}
class Exampless implements Callable {
public static int[] arr = new int[20000];
public String _name;
public Exampless(String name) {
this._name = name;
}
static void createArray(int z) {
for (int i = z; i < z + 20000; i++) { //shared array
arr[i - z] = i;
}
}
public Object call() {
try {
int cnt = 0;
FileInputStream fin = new FileInputStream(_name);
DataInputStream din = new DataInputStream(fin); // read file and calculate number of matches
for (int i = 0; i < 20000; i++) {
int c = din.readInt();
if (c == arr[i]) {
cnt++;
}
}
return cnt ;
} catch (Exception e) {
System.err.println("Error: " + e.getMessage());
}
return -1 ;
}
}
Where I am trying to count number of matches in an array with two files. Now, though I am running it on two threads, code is not doing well because:
(running it on single thread, file 1 + file 2 reading time) < (file 1 || file 2 reading time in multiple thread).
Can anyone help me how to solve this (I have 2 core CPU and file size is approx. 1.5 GB).
參考解法
方法 1:
In the first case you are reading sequentially one file, byte-by-byte, block-by-block. This is as fast as disk I/O can be, providing the file is not very fragmented. When you are done with the first file, disk/OS finds the beginning of the second file and continues very efficient, linear reading of disk.
In the second case you are constantly switching between the first and the second file, forcing the disk to seek from one place to another. This extra seeking time (approximately 10 ms) is the root of your confusion.
Oh, and you know that disk access is single-threaded and your task is I/O bound so there is no way splitting this task to multiple threads could help, as long as your reading from the same physical disk? Your approach could only be justified if:
each thread, except reading from a file, was also performing some CPU intensive or blocking operations, slower by an order of magnitude compared to I/O.
files are on different physical drives (different partition is not enough) or on some RAID configurations
you are using SSD drive
方法 2:
You will not get any benefit from multithreading as Tomasz pointed out from reading the data from disk. You may get some improvement in speed if you multithread the checks, i.e. you load the data from the files into arrays sequentially and then the threads execute the checking in parallel. But considering the small size of your files (~80kb) and the fact that you are just comparing ints I doubt the performance improvement will be worth the effort.
Something that will definitely improve your execution speed is if you do not use readInt(). Since you know you are comparing 20000 ints, you should read all 20000 ints into an array at once for each file (or at least in blocks), rather than calling the readInt() function 20000 times.
(by Arpssss、Tomasz Nurkiewicz、onit)