Random commentary about Machine Learning, BigData, Spark, Deep Learning, C++, STL, Boost, Perl, Python, Algorithms, Problem Solving and Web Search

Saturday, June 26, 2010

Fit a set of (x,y) data points to a rectangular display

You are given a set of datapoints (x, y) with 0<x<N, 0<y<M. Try to map them in a rectangular display of size [0, S] x [0, T] with S<N and T<M. Make your own assumptions.

There are a few ways you can do this. Depends a lot upon what type of information you want preserved.

Solution 1: Trivial one. Here compute new value of x to floor(S/N)*x for each x. Compute y to be floor(T/M)*y. However this solution does not take into account the input "spread". For example if your inputs for 'x' is largely less than or equal to S but have say only one at N, you would have "compressed" all x to accommodate this single occurrence.

The following two solutions inspects the input to determine how the values of 'x' and 'y' are spread to determine what algorithm to use to make the mapping:

Many values in the input are in the range 0<x<S and fewer in the S,x<N range:

Solution 2: Use something similar to A-law or Mu-Law schemes used to represent speech in the audio world. Here values on the lower side tend to be mapped more accurately while those on the higher side get mapped more aggressively.

Many values of x are in the S<x<N range and fewer in the 0<x<S range.

A classic problem when writing plotting software.

ReplyDeleteWe assume that N/M = S/T if we want to keep the

aspect ratio unchanged.

Regardless of the aspect ratio, we should

transform the co-ordinates x,y of each datapoint

to x * S/N and y * T/M

There are a few ways you can do this. Depends a lot upon what type of information you want preserved.

ReplyDeleteSolution 1: Trivial one. Here compute new value of x to floor(S/N)*x for each x. Compute y to be floor(T/M)*y. However this solution does not take into account the input "spread". For example if your inputs for 'x' is largely less than or equal to S but have say only one at N, you would have "compressed" all x to accommodate this single occurrence.

The following two solutions inspects the input to determine how the values of 'x' and 'y' are spread to determine what algorithm to use to make the mapping:

Many values in the input are in the range 0<x<S and fewer in the S,x<N range:

Solution 2: Use something similar to A-law or Mu-Law schemes used to represent speech in the audio world. Here values on the lower side tend to be mapped more accurately while those on the higher side get mapped more aggressively.

Many values of x are in the S<x<N range and fewer in the 0<x<S range.

Solution 3: Use a 'inverse' mapping.