Skip to content

Commit 97aa41b

Browse files
authored
Merge pull request #775 from secureCodeBox/feature/git-repo-scanner-annotate-commit-id
Git-Repo-Scanner: Optionally include commit information in output
2 parents 4afb903 + 87e9680 commit 97aa41b

File tree

8 files changed

+130
-22
lines changed

8 files changed

+130
-22
lines changed

scanners/git-repo-scanner/.helm-docs.gotmpl

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -46,21 +46,22 @@ or
4646
For type GitHub you can use the following options:
4747
- `--organization`: The name of the GitHub organization you want to scan.
4848
- `--url`: The url of the api for a GitHub enterprise server. Skip this option for repos on <https://github.com>.
49-
- `--access-token`: Your personal GitHub access token.
49+
- `--access-token`: Your personal GitHub access token (needs full `repo` rights if you want to also find private repositories, otherwise `repo:status` and `public_repo` is sufficient).
5050
- `--ignore-repos`: A list of GitHub repository ids you want to ignore
5151
- `--obey-rate-limit`: True to obey the rate limit of the GitHub server (default), otherwise False
5252
- `--activity-since-duration`: Return git repo findings with repo activity (e.g. commits) more recent than a specific date expressed by a duration (now + duration). A duration string is a possibly signed sequence of decimal numbers, each
5353
with optional fraction and a unit suffix, such as '1h' or '2h45m'. Valid time units are 'm', 'h', 'd', 'w'.
5454
- `--activity-until-duration`: Return git repo findings with repo activity (e.g. commits) older than a specific date expressed by a duration (now + duration). A duration string is a possibly signed sequence of decimal numbers, each with
5555
optional fraction and a unit suffix, such as '1h' or '2h45m'. Valid time units are 'm', 'h', 'd', 'w'.
56+
- `--annotate-latest-commit-id`: Set to True to annotate the results with the SHA1 of the latest commit on the main branch. Causes an extra API hit per repository. False by default.
5657

5758
For now only organizations are supported, so the option is mandatory. We **strongly recommend** providing an access token
5859
for authentication. If not provided the rate limiting will kick in after about 30 repositories scanned.
5960

6061
#### GitLab
6162
For type GitLab you can use the following options:
6263
- `--url`: The url of the GitLab server.
63-
- `--access-token`: Your personal GitLab access token.
64+
- `--access-token`: Your personal GitLab access token (needs at least `read_api` and `read_repository` scopes).
6465
- `--group`: A specific GitLab group id you want to san, including subgroups.
6566
- `--ignore-groups`: A list of GitLab group ids you want to ignore
6667
- `--ignore-repos`: A list of GitLab project ids you want to ignore
@@ -69,6 +70,7 @@ For type GitLab you can use the following options:
6970
with optional fraction and a unit suffix, such as '1h' or '2h45m'. Valid time units are 'm', 'h', 'd', 'w'.
7071
- `--activity-until-duration`: Return git repo findings with repo activity (e.g. commits) older than a specific date expressed by a duration (now + duration). A duration string is a possibly signed sequence of decimal numbers, each with
7172
optional fraction and a unit suffix, such as '1h' or '2h45m'. Valid time units are 'm', 'h', 'd', 'w'.
73+
- `--annotate-latest-commit-id`: Set to True to annotate the results with the SHA1 of the latest commit on the main branch. Causes an extra API hit per repository. False by default.
7274

7375

7476
For Gitlab, the url and the access token is mandatory. If you don't provide a specific group id, all projects

scanners/git-repo-scanner/Chart.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ description: A Helm chart for the git-repo-scanner that integrates with the secu
99
type: application
1010
# version - gets automatically set to the secureCodeBox release version when the helm charts gets published
1111
version: v3.1.0-alpha1
12-
appVersion: "1.0"
12+
appVersion: "1.1"
1313
kubeVersion: ">=v1.11.0-0"
1414

1515
keywords:

scanners/git-repo-scanner/README.md

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ title: "Git Repo Scanner"
33
category: "scanner"
44
type: "Repository"
55
state: "released"
6-
appVersion: "1.0"
6+
appVersion: "1.1"
77
usecase: "Discover Git repositories"
88
---
99

@@ -62,21 +62,22 @@ or
6262
For type GitHub you can use the following options:
6363
- `--organization`: The name of the GitHub organization you want to scan.
6464
- `--url`: The url of the api for a GitHub enterprise server. Skip this option for repos on <https://github.com>.
65-
- `--access-token`: Your personal GitHub access token.
65+
- `--access-token`: Your personal GitHub access token (needs full `repo` rights if you want to also find private repositories, otherwise `repo:status` and `public_repo` is sufficient).
6666
- `--ignore-repos`: A list of GitHub repository ids you want to ignore
6767
- `--obey-rate-limit`: True to obey the rate limit of the GitHub server (default), otherwise False
6868
- `--activity-since-duration`: Return git repo findings with repo activity (e.g. commits) more recent than a specific date expressed by a duration (now + duration). A duration string is a possibly signed sequence of decimal numbers, each
6969
with optional fraction and a unit suffix, such as '1h' or '2h45m'. Valid time units are 'm', 'h', 'd', 'w'.
7070
- `--activity-until-duration`: Return git repo findings with repo activity (e.g. commits) older than a specific date expressed by a duration (now + duration). A duration string is a possibly signed sequence of decimal numbers, each with
7171
optional fraction and a unit suffix, such as '1h' or '2h45m'. Valid time units are 'm', 'h', 'd', 'w'.
72+
- `--annotate-latest-commit-id`: Set to True to annotate the results with the SHA1 of the latest commit on the main branch. Causes an extra API hit per repository. False by default.
7273

7374
For now only organizations are supported, so the option is mandatory. We **strongly recommend** providing an access token
7475
for authentication. If not provided the rate limiting will kick in after about 30 repositories scanned.
7576

7677
#### GitLab
7778
For type GitLab you can use the following options:
7879
- `--url`: The url of the GitLab server.
79-
- `--access-token`: Your personal GitLab access token.
80+
- `--access-token`: Your personal GitLab access token (needs at least `read_api` and `read_repository` scopes).
8081
- `--group`: A specific GitLab group id you want to san, including subgroups.
8182
- `--ignore-groups`: A list of GitLab group ids you want to ignore
8283
- `--ignore-repos`: A list of GitLab project ids you want to ignore
@@ -85,6 +86,7 @@ For type GitLab you can use the following options:
8586
with optional fraction and a unit suffix, such as '1h' or '2h45m'. Valid time units are 'm', 'h', 'd', 'w'.
8687
- `--activity-until-duration`: Return git repo findings with repo activity (e.g. commits) older than a specific date expressed by a duration (now + duration). A duration string is a possibly signed sequence of decimal numbers, each with
8788
optional fraction and a unit suffix, such as '1h' or '2h45m'. Valid time units are 'm', 'h', 'd', 'w'.
89+
- `--annotate-latest-commit-id`: Set to True to annotate the results with the SHA1 of the latest commit on the main branch. Causes an extra API hit per repository. False by default.
8890

8991
For Gitlab, the url and the access token is mandatory. If you don't provide a specific group id, all projects
9092
on the Gitlab server are going to be discovered.

scanners/git-repo-scanner/scanner/git_repo_scanner/__main__.py

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -52,15 +52,17 @@ def process(args):
5252
group=args.group,
5353
ignored_groups=args.ignore_groups,
5454
ignore_repos=args.ignore_repos,
55-
obey_rate_limit=args.obey_rate_limit
55+
obey_rate_limit=args.obey_rate_limit,
56+
annotate_latest_commit_id=args.annotate_latest_commit_id
5657
)
5758
elif args.git_type == 'github':
5859
scanner = GitHubScanner(
5960
url=args.url,
6061
access_token=args.access_token,
6162
organization=args.organization,
6263
ignore_repos=args.ignore_repos,
63-
obey_rate_limit=args.obey_rate_limit
64+
obey_rate_limit=args.obey_rate_limit,
65+
annotate_latest_commit_id=args.annotate_latest_commit_id
6466
)
6567
else:
6668
logger.info('Argument error: Unknown git type')
@@ -146,6 +148,12 @@ def get_parser_args(args=None):
146148
type=bool,
147149
default=True,
148150
required=False)
151+
parser.add_argument('--annotate-latest-commit-id',
152+
help="Annotate the results with the latest commit hash of the main branch of the repository. "
153+
"Will result in up to two extra API hits per repository",
154+
type=bool,
155+
default=False,
156+
required=False)
149157
parser.add_argument('--activity-since-duration',
150158
help='Return git repo findings with repo activity (e.g. commits) more recent than a specific '
151159
'date expressed by a duration (now - duration)',

scanners/git-repo-scanner/scanner/git_repo_scanner/abstract_scanner.py

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -21,8 +21,9 @@ def process(self, start_time: Optional[datetime] = None, end_time: Optional[date
2121
raise NotImplementedError()
2222

2323
def _create_finding(self, repo_id: str, web_url: str, full_name: str, owner_type: str, owner_id: str,
24-
owner_name: str, created_at: str, last_activity_at: str, visibility: str) -> FINDING:
25-
return {
24+
owner_name: str, created_at: str, last_activity_at: str, visibility: str,
25+
last_commit_id: str = None) -> FINDING:
26+
finding = {
2627
'name': f'{self.git_type} Repo',
2728
'description': f'A {self.git_type} repository',
2829
'category': 'Git Repository',
@@ -40,3 +41,6 @@ def _create_finding(self, repo_id: str, web_url: str, full_name: str, owner_type
4041
'visibility': visibility
4142
}
4243
}
44+
if last_commit_id is not None:
45+
finding["attributes"]["last_commit_id"] = last_commit_id
46+
return finding

scanners/git-repo-scanner/scanner/git_repo_scanner/github_scanner.py

Lines changed: 11 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ class GitHubScanner(AbstractScanner):
2121
LOGGER = logging.getLogger('git_repo_scanner')
2222

2323
def __init__(self, url: Optional[str], access_token: Optional[str], organization: str, ignore_repos: List[int],
24-
obey_rate_limit: bool = True) -> None:
24+
obey_rate_limit: bool = True, annotate_latest_commit_id: bool = False) -> None:
2525
super().__init__()
2626
if not organization:
2727
raise argparse.ArgumentError(None, 'Organization required for GitHub connection.')
@@ -33,6 +33,7 @@ def __init__(self, url: Optional[str], access_token: Optional[str], organization
3333
self._organization = organization
3434
self._ignore_repos = ignore_repos
3535
self._obey_rate_limit = obey_rate_limit
36+
self._annotate_latest_commit_id = annotate_latest_commit_id
3637
self._gh: Optional[github.Github] = None
3738

3839
@property
@@ -125,6 +126,13 @@ def _setup_with_url(self):
125126
raise argparse.ArgumentError(None, 'Access token required for github enterprise authentication.')
126127

127128
def _create_finding_from_repo(self, repo: Repository) -> FINDING:
129+
latest_commit: str = None
130+
if self._annotate_latest_commit_id:
131+
try:
132+
latest_commit = repo.get_commits()[0].sha
133+
except Exception:
134+
self.LOGGER.warn("Could not identify the latest commit ID - repository without commits?")
135+
latest_commit = ""
128136
return super()._create_finding(
129137
str(repo.id),
130138
repo.html_url,
@@ -134,5 +142,6 @@ def _create_finding_from_repo(self, repo: Repository) -> FINDING:
134142
repo.owner.name,
135143
repo.created_at.strftime("%Y-%m-%dT%H:%M:%SZ"),
136144
repo.updated_at.strftime("%Y-%m-%dT%H:%M:%SZ"),
137-
'private' if repo.private else 'public'
145+
'private' if repo.private else 'public',
146+
latest_commit
138147
)

scanners/git-repo-scanner/scanner/git_repo_scanner/gitlab_scanner.py

Lines changed: 20 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,8 @@ def __init__(self, url: str,
2323
group: Optional[int],
2424
ignored_groups: List[int],
2525
ignore_repos: List[int],
26-
obey_rate_limit: bool = True) -> None:
26+
obey_rate_limit: bool = True,
27+
annotate_latest_commit_id: bool = False) -> None:
2728
super().__init__()
2829
if not url:
2930
raise argparse.ArgumentError(None, 'URL required for GitLab connection.')
@@ -36,6 +37,7 @@ def __init__(self, url: str,
3637
self._ignored_groups = ignored_groups
3738
self._ignore_repos = ignore_repos
3839
self._obey_rate_limit = obey_rate_limit
40+
self._annotate_latest_commit_id = annotate_latest_commit_id
3941
self._gl: Optional[gitlab.Gitlab] = None
4042

4143
@property
@@ -47,6 +49,12 @@ def process(self, start_time: Optional[datetime] = None, end_time: Optional[date
4749

4850
projects: List[Project] = self._get_projects(start_time, end_time)
4951
return self._process_projects(projects)
52+
53+
def _group_project_to_project(self, group_project):
54+
# The GitLab API library gives us a GroupProject object, which has limited functionality.
55+
# This function turns the GroupProject into a "real" project, which allows us to get the
56+
# list of commits and include the SHA1 of the latest commit in the output later
57+
return self._gl.projects.get(group_project.id, lazy=True)
5058

5159
def _get_projects(self, start_time: Optional[datetime], end_time: Optional[datetime]):
5260
logger.info(f'Get GitLab repositories with last activity between {start_time} and {end_time}.')
@@ -103,6 +111,15 @@ def _create_finding_from_project(self, project: Project, index: int, total: int)
103111
logger.info(
104112
f'({index + 1}/{total}) Add finding for repo {project.name} with last activity at '
105113
f'{datetime.fromisoformat(project.last_activity_at)}')
114+
115+
# Retrieve the latest commit ID
116+
latest_commit_id: str = None
117+
if self._annotate_latest_commit_id:
118+
try:
119+
latest_commit_id = self._group_project_to_project(project).commits.list()[0].id
120+
except Exception as e:
121+
logger.warn("Could not identify the latest commit ID - repository without commits?")
122+
latest_commit_id = ""
106123
return super()._create_finding(
107124
project.id,
108125
project.web_url,
@@ -112,5 +129,6 @@ def _create_finding_from_project(self, project: Project, index: int, total: int)
112129
project.namespace['name'],
113130
project.created_at,
114131
project.last_activity_at,
115-
project.visibility
132+
project.visibility,
133+
latest_commit_id
116134
)

0 commit comments

Comments
 (0)