-
-
Notifications
You must be signed in to change notification settings - Fork 585
cleanup Lib and CloudScraper TLS bypass is production ready #283
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
zinzied
commented
Jun 10, 2025
- Final Clean Directory Structure
- Performance Optimizations:
- TLS Anti-Detection Features Confirmed Working
…ual capabilities of the cloudscraper library without false claims or unnecessary promotional content. It provides users with realistic expectations and practical information they can actually use
|
@zinzied great work. Btw the system currently waits indefinitely for some cases. For example trying to scrape a stack overflow url gets stuck. Running debug=True shows it's reached max concurrent request, even though there is currently only one request being processed. |
"Thanks for the additional context! Knowing you're using CloudScraper is really helpful, as it changes the potential source of the 'max concurrent requests' issue. When CloudScraper gets stuck with that error, even on a single request, it often points to a specific challenge in how it's interacting with Cloudflare, or how its internal mechanisms are handling the anti-bot process. It's possible that a challenge is taking too long to resolve, or a session isn't being properly closed or released, leading CloudScraper to think it still has an 'active' (but stuck) request. To help us pinpoint this, could you please provide a bit more detail?
In the meantime, here are a few CloudScraper-specific troubleshooting steps you might try:
Once we have more information, especially the debug output, we'll be in a much better position to diagnose why CloudScraper is getting stuck and reporting max concurrent requests. Thanks for your patience!" |
|
@zinzied thanks for the quick reply. I'm directly using the code from your master branch. Here is my code snippet: Installationpip install git+https://github.com/zinzied/cloudscraper.git@8f13e9a9d1b1d8ff9108f713e3f9c8462cd37dceCode Snippetimport cloudscraper
scraper = cloudscraper.create_scraper(browser={'custom': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/137.0.0.0 Safari/537.36'}, debug=True)
response = scraper.get('https://stackoverflow.com/questions/381806/large-public-datasets', timeout=10)
print(response.status_code)Debug Output |
|
Note that after trying your latest commit (2b90912) on the same code this is now an infinite loop: Debug Output |
…ion options for viewport simulation and behavioral patterns. Update README to reflect changes and provide advanced configuration guidance.
…lenge classes to use specific exceptions for improved clarity and logging. Adjust pyproject.toml and setup.py for cleanup and formatting consistency.
|
I used proxy, TLS rotation and sticky proxy IP information. But after running it for a while, it couldn't bypass Cloudflare. I think it was blocked by Cloudflare. Maybe it was because the request TLS was blocked, or maybe it was something else. I paused it for an hour or two, and then ran it again, and it was able to bypass Cloudflare normally. Can you fix this? |
CloudScraper V3 Handler EnhancementsWhat This Project AccomplishedI set out to solve a specific problem: accessing prosportstransactions.com, which was blocked by Cloudflare's advanced protection. While I didn't achieve the original goal due to fundamental technical limitations, I made significant improvements to CloudScraper that benefit the entire community. The ChallengeModern websites like prosportstransactions.com use sophisticated protection mechanisms that operate at multiple layers:
What I Improved ✅Enhanced Challenge DetectionI think ill upgrade CloudScraper's ability to recognize modern Cloudflare challenges by adding support for:
|
- Introduced PerformanceProfiler, CodeBlockProfiler, MemoryOptimizer, RequestOptimizer, and PerformanceMonitor classes for enhanced performance tracking and optimization. - Implemented a SessionManager for efficient session handling with automatic cleanup. - Added ResponseCache for memory-efficient caching with LRU eviction. - Created advanced usage examples demonstrating stealth mode, proxy rotation, performance monitoring, metrics collection, error handling, and session management. - Developed async examples showcasing concurrent requests, batch processing, and performance comparison between sync and async operations. - Established a test suite with fixtures for mocking responses, creating scraper instances, and testing various scenarios.
|
Unlike #295 this is not resolving the loopig issue on the TLS |
- Added a new TLS fingerprinting module with JA3 fingerprint randomization, cipher suite rotation, and SSL/TLS version negotiation. - Introduced a CipherSuiteManager for managing cipher suite selection based on browser type. - Developed a comprehensive TLSFingerprintingManager to handle fingerprint generation and SSL context creation. - Enhanced the existing CloudScraper with new features for bypassing Cloudflare protections, including intelligent challenge detection and adaptive timing. - Created a demonstration script showcasing various enhanced bypass scenarios and configurations.
…tions in pyproject.toml and setup.py
- Increment solve depth counter to avoid infinite recursion - Return original response if maximum solve depth is reached - Decrement concurrent requests count upon loop protection with debug logging perf(timing): optimize adaptive timing parameters and limits - Reduce base delays and variance in human behavior timing profiles - Apply more conservative delay increase for low success rates and failures - Blend learned optimal timings with less influence and apply hard caps - Cap learning of optimal timings and average response times to avoid extremes - Lower avg_interval and variance ranges in traffic pattern obfuscation - Decrease burst controller cooldown base and randomization ranges - Adjust burst limits adaptively with tighter bounds on cooldowns - Shorten session idle timing thresholds and reduce min intervals for sessions
|
I tried to use this fork, as I like the robustness it offers, but there was an issue with recursion in the |