RANGER-5406: Support export policies in a segmented manner#741
RANGER-5406: Support export policies in a segmented manner#741yunyezhang-work wants to merge 1 commit intoapache:ranger-2.3from
Conversation
|
@mneethiraj @kumaab |
|
Thank you @yunyezhang-work for the patch! please raise a PR for the |
| return ret; | ||
| } | ||
|
|
||
| private List<RangerPolicy> cutRangerPolicyList(List<RangerPolicy> policyList, SearchFilter filter) { |
There was a problem hiding this comment.
Suggested name: getRangerPoliciesInRange
| int startIndex = filter.getBeginIndex(); | ||
| int pageSize = filter.getOffsetIndex(); | ||
| int toIndex = Math.min(startIndex + pageSize, totalCount); | ||
| LOG.info("==>totalCount: " + totalCount + " startIndex: " + startIndex + " pageSize: " +pageSize + " toIndex: " + toIndex); |
There was a problem hiding this comment.
Avoid string concatenation, use String.format()
| LOG.info("Invalid or Unsupported sortType : " + sortType); | ||
| } | ||
| } else { | ||
| LOG.info("Invalid or Unsupported sortBy property : " + sortBy); |
There was a problem hiding this comment.
Avoid string concat, check all references.
See: https://cwiki.apache.org/confluence/display/RANGER/Apache+Ranger+Java+Style+Guide
agents-common/src/main/java/org/apache/ranger/plugin/util/SearchFilter.java
Show resolved
Hide resolved
| public static final String UPDATE_TIME = "updateTime"; // sort | ||
| public static final String START_INDEX = "startIndex"; | ||
| public static final String BEGIN_INDEX = "beginIndex"; | ||
| public static final String OFFSET_INDEX = "offsetIndex"; |
There was a problem hiding this comment.
I think OFFSET is more meaning full than OFFSET_INDEX, offset is not index. What do you think ?
| private int startIndex; | ||
| private int maxRows = Integer.MAX_VALUE; | ||
| private int beginIndex = -1; | ||
| private int offsetIndex = -1; |
There was a problem hiding this comment.
Since you've added new fields to the SearchFilter class, don't forget to modify the copy constructor (public SearchFilter(SearchFilter other)) accordingly to ensure the new attributes are properly copied.
| } | ||
|
|
||
| public void setBeginIndex(int beginIndex) { | ||
| this.beginIndex = beginIndex; |
There was a problem hiding this comment.
I think we should validate that beginIndex >= 0. What’s your opinion?
What changes were proposed in this pull request?
In big data production environments, customers create a massive number of policies, often reaching hundreds of thousands or even millions. Exporting the entire set of policies for disaster recovery would result in an enormous data volume and extremely slow import speeds into the backup cluster. Our current experimental data shows that importing 10,000 policies via the API is very memory-intensive and takes approximately 15 minutes. Importing 100,000 policies via the API will take 2.5h or even longer.
With an even larger number of policies, memory consumption will increase significantly, and insufficient memory can cause import interruptions. Therefore, we recommend modifying the API to allow for segmented export. This will save memory and ensure data reliability when importing to other clusters for disaster recovery.
How was this patch tested?
To manually test this feature, you can send an HTTP request to the ranger. Using a shell command as an example:
Without the segmentation parameter, calling the export API
getPoliciesInJsonwill export all policies. As shown in the figure, there are 18 policies in this environment for hdfs-xxx.curl -u$USER:$PASSWORD -XGET "http://$RANGER_HOST:$RANGER_PORT/service/plugins/policies/exportJson?serviceName=$SERVICE&checkPoliciesExists=true" -v -o export.jsonAdding the segmentation parameter will export the policies for the specified start and end position range. As shown in the figure, policies 1-5 of hdfs-xxx are exported.

curl -u$USER:$PASSWORD -XGET "http://$RANGER_HOST:$RANGER_PORT/service/plugins/policies/exportJson?serviceName=$SERVICE&checkPoliciesExists=true&beginIndex=$BEGIN_INDEX&offsetIndex=$OFFSET_INDEX" -v -o export_${BEGIN_INDEX}_${OFFSET_INDEX}.json