Add option to use local LLMs and filter sensitive information#20
Add option to use local LLMs and filter sensitive information#20vishwamartur wants to merge 1 commit intoInteguru-AI:mainfrom
Conversation
Related to Integuru-AI#18 Add measures to prevent sensitive information leakage and provide an option to use local LLMs. * **create_har.py** - Add a filter to exclude sensitive information such as auth tokens and login credentials from the recorded network requests and cookies. - Update the `record_har_path` and `record_har_content` parameters to use the filtered data. - Add a function to filter sensitive information from requests. * **integuru/__main__.py** - Add an option to use local LLMs instead of sending data to OpenAI. - Update the `call_agent` function to handle the new option for local LLMs. * **integuru/util/LLM.py** - Add a method to set and use local LLMs. - Update the `get_instance` method to handle the new option for local LLMs. * **integuru/util/har_processing.py** - Add measures to filter out sensitive information from HAR files. - Update the `parse_har_file` function to use the filtered data. - Add a function to filter sensitive information from request headers and body.
|
appreciate the help but the agent needs to see the full request including the Authorization headers to know if theres a "dynamic part" right in the auth header? If u completely remove the authorization header, what if the graph requires another request to get that header? |
|
Thank you, @PredictiveManish, for the feedback! @alanalanlu , good point regarding the Authorization headers. To address this, instead of completely removing the header, we could implement a partial redaction approach. This way, sensitive information like tokens can be partially masked rather than fully removed. This would allow the agent to identify dynamic parts without exposing sensitive information in full. I'll update the |
|
I dont think that would work still. The LLM will identify the masked auth token instead of the actual token and try to find that masked token in the response of a previous request but there will be no matches as we created the masked token. I think the best way to handle this is to either support local LLM (not smart enough for code gen) or have a mapping of the mapped token to the real token (For ex. u mask the token while keeping a mapping of the masked token to the actual token, pass the request with the masked token into the LLM, the LLM spits out the token, then u map it back. There can be issues with this approach too as the token can be anywhere such as the path and u cant hard code this). |
Related to #18
Add measures to prevent sensitive information leakage and provide an option to use local LLMs.
create_har.py
record_har_pathandrecord_har_contentparameters to use the filtered data.integuru/main.py
call_agentfunction to handle the new option for local LLMs.integuru/util/LLM.py
get_instancemethod to handle the new option for local LLMs.integuru/util/har_processing.py
parse_har_filefunction to use the filtered data.