Discretization in the field of algorithms
In the field of algorithm design, discretization is a simple but very useful technique, which can be seen as a form of hashing.
In the field of algorithms, when some data is too large or the type is not supported, and cannot be easily processed as an array index, discretization can be used.
Discretization is essentially a type of hashing.
The data types that can be used for discretization include large integers, floating-point numbers, strings, and so on.
This article explains the discretization of large integers, which is widely used in the field of algorithm design.
Core idea
When we encounter an array with a very large value range, we only care about total/partial order relationship between array elements, not their actual values. However, a large value range can hinder our use of some good data structures, such as segment tree. In this case, we need to perform discretization on the array elements.
For example, if the array is [11, 23, 4567, 88888, 999999], it is actually equivalent to [1, 2, 3, 4, 5], as shown in Figure 1:
This is the technique of discretization. We put all the numbers that will be involved in comparison operations into a list, sort them, and then assign a new value to them from small to large.
After the discretization is completed, the complete/partial order relationship of the original array is preserved, but the upper and lower bounds of the elements are restricted, which makes it convenient for us to use some data structures.
Implementation
Discretizing an array and performing queries is a common use case.
There are two main steps:
Removing possible duplicate elements in array a (using deduplication)
Determining the discretized value of a[i] (using binary search)
Using the existing STL algorithms in C++, the code is as follows:
#include <algorithm>
#include <iostream>
using namespace std;
const int N = 1e5 + 7;
int n, a[N], b[N];
int main()
{
cin >> n;
for (int i = 1; i <= n; i++) // 1-indexed
{
cin >> a[i]; // a is the original array
b[i] = a[i]; // b is the array used for discretization.
}
sort(b + 1, b + 1 + n);
// "len" represents the number of distinct values after discretization
int len = unique(b + 1, b + 1 + n) - b - 1;
for (int i = 1; i <= n; i++)
// Use the std::lower_bound function to search for the rank
// (i.e., the new number) after discretization, 1-indexed
a[i] = lower_bound(b + 1, b + len + 1, a[i]) - b;
return 0;
}
Similarly, we can also perform discretization on vector:
#include <algorithm>
#include <iostream>
#include <vector>
using namespace std;
int main()
{
vector<int> a; // a is the original vector
vector<int> b = a; // b is used for discretization.
sort(b.begin(), b.end());
b.erase(unique(b.begin(), b.end()), b.end());
auto get_id = [&](int v) { // Query the 1-indexed rank of a[i] after discretization.
return lower_bound(b.begin(), b.end(), v) - b.begin() + 1;
};
for (int i = 0; i <= (int)a.size() - 1; i++)
a[i] = get_id(a[i]);
}
Example
For an example of discretization, please refer to this article: Typical problems of Fenwick Tree (Binary Indexed Tree). Specifically, please see the corresponding paragraphs for “Leetcode 300. Longest Increasing Subsequence” and “Leetcode 493. Reverse Pairs”. No further explanation will be given here.
Conclusion
In the field of algorithm design, discretization is a simple but very useful technique.
Discretization can essentially be seen as a form of hashing, which ensures that elements maintains their original total/partial order relationship after being hashed.
Finally, if there are any mistakes in this article, please don’t hesitate to advise me. Thank you.