Thursday, June 17, 2010

Digital Scanning: From Hard Copy to Soft Copy

Most papers and journals that we can gather are already in pdf form or printed form. When studying these papers, we take a look at the graphs and analyze the trends. We run into a problem when we need the actual values of the graph. This is where digital scanning and processing comes in.

The idea of digital scanning and processing is that if the graph printed is accurate and scaled, we can find the values of the data points we need in the graph just by using ratio and proportion. We can relate the pixels of the image to a specific scale on the actual graph, thus relating the pixel location to physical quantities represented by the graph.

Here we start by finding an old graph without the specific values of the data points. The graph I obtained comes from the Journal of Microbiology. By scanning this graph, we can now process the image to calculate the relationship of the pixel location to physical quantities.

Figure 1 Original Scanned Graph


Figure 2 Cropped Image of the Graph

First, we start by counting the pixels of the first tick mark to the last tick mark in the x and y axes. From the image and the use of Paint software, we find that the first tick mark in the x axis is at 18th pixel and the last tick mark is in the 474th pixel. These correspond to the 23 years and 100 years tick marks respectively. For the y axis, the first tick mark is at 799th pixel corresponding to -1 R value and 0th tick mark corresponding to the 6 R value.

From here, we compute for the unit/pixel constant.

77 years / (474 pixels -18 pixels) = 0.168859649 years / pixel (1)

7 R values / 799 pixels = 0.008760951 R values / pixel (2)

So given a data points x and y axis pixel count we can compute for the actual years and R values they represent.

X-axis

(data point(x) – 18 pixels) * constantx-axis + 23 years = # of years (3)

Y-axis

(799 pixels – data point(y)) * constanty-axis - 1 R value = R value (4)

For the x-axis, the subtraction of 18 pixels accounts for the offset of the origin and the addition of 23 years accounts for the starting year of the graph. While for the y-axis, the difference of 799 pixels and the data point y value accounts for the fact that the 0th pixel along the y axis start from the top of the graph to the bottom. But subtracting the data point y value to the maximum number of pixels along the y-axis (799) we get the height of the data point from the base of the graph. The subtraction of negative 1 accounts for the fact that the graph starts at negative 1. The constants used are the values solved in equations 1 and 2.

Now, to find the values of the data points, we get the pixel locations of the data points and apply equations 3 and 4 to get the actual years and R values of the graphs.

Years

R

Years

R

18

200

23

4.24781

59

90

29.92325

5.211514

178

230

50.01754

3.984981

267

440

65.04605

2.145181

326

490

75.00877

1.707134

474

615

100

0.612015

(a) (b)

Table 1 Data Points of Ash or Triangular markers (a) is in pixels (b) in actual values

Years

R

Years

R

18

267

23

3.660826

59

420

29.92325

2.320401

178

410

50.01754

2.40801

267

627

65.04605

0.506884

326

550

75.00877

1.181477

474

630

100

0.480601

(a) (b)

Table 2 Data Points of Organic Matter or Circular markers (a) is in pixels (b) in actual values

Years

R

Years

R

18

276

23

3.581977

59

591

29.92325

0.822278

178

553

50.01754

1.155194

267

723

65.04605

-0.33417

326

672

75.00877

0.112641

474

673

100

0.10388

(a) (b)

Table 3 Data Points of Water or Square markers (a) is in pixels (b) in actual values

Years

R

Years

R

18

259

23

3.730914

59

460

29.92325

1.969962

178

440

50.01754

2.145181

267

610

65.04605

0.65582

326

585

75.00877

0.874844

474

643

100

0.366708

(a) (b)

Table 4 Data Points of Water or X markers (a) is in pixels (b) in actual values

From the solved actual values of the data points, we can now graph the data and compare to the original graph.

Figure 3 Reconstructed graph with superimposed original graph

We can see that the data points very closely match each other. Small deviations can be seen in position. This can be attributed to 2 things. One, the data points in the graph are large, hence the exact position of the data point can vary largely. Two, since the graph was initially photocopied a slight distortion can be seen. This distortion will correspond to a change in the actual value and the corresponding pixel location.

I then conclude that this method is effective in very closely approximating the values in a graph. It is only limited to the quality of the initial graph in terms of resolution, clarity of printing and precision. It is recommended that the graph to be scanned be on a separate sheet of paper rather than bound in a book or a journal.

For this exercise, I give myself a score of 10. I was able to complete the activity as well as repeat the process for the multiple data series available in the graph. I was also able to present the data in a complete form. Hence, I believe I deserve a perfect score.

I would also like to thank Arvin Mabilangan for the support and initial help in the understanding the methodology.

No comments:

Post a Comment