android-cloud-device-agent
Android Cloud Device Agent
Use this when Alex wants a Hermes skill, CLI, prototype, or research plan for controlling persistent Android devices in the cloud. The target pattern is a long-lived Android profile/device with stable state and device identity, controlled by APIs plus ADB/Appium and optionally guided by screenshots/vision.
Scope and Safety
- Keep the framing to legitimate testing, QA, app compatibility, account-owned workflows, and research.
- Prefer owned/pre-created test accounts over automated account creation.
- Do not build spam, evasion, account-farming, or platform-abuse workflows.
- Treat provider credentials, Google accounts, proxies, device IDs, and screenshots as sensitive.
Provider Selection Criteria
Prioritize providers that offer all of:
- Persistence: device/profile state survives stop/start and remains available as long as rented/paid.
- Programmatic control: ADB or equivalent APIs for screenshots, shell commands, input injection, app install/start/stop, and status.
- Device identity: stable Android/device serial, IMEI or equivalent, MAC/Bluetooth metadata, timezone/locale/region metadata.
- Monthly/unlimited billing: avoid per-minute 24/7 use unless only doing a tiny spike.
- Google Play Services support: verify on the exact region/Android model before assuming Play Store flows work.
- Webhooks/task status: useful for long-running screenshots, RPA tasks, installs, and login/download operations.
Recommended Architecture
Default to running Hermes on the VPS/host and controlling the cloud Android device remotely; do not start by trying to run Hermes inside Android/Termux. Treat the Android device as the target environment and the host as the agent brain/tool runner.
Layer provider-specific APIs under stable Hermes primitives:
provider API -> lifecycle, screenshots, app management, billing/status
ADB -> shell, screencap, input tap/swipe/text, uiautomator dump, apk install
MobileRun/DroidRun -> natural-language mobile-agent tasks over an ADB-connected device
Appium/uiautomator2 -> higher-level element targeting when accessibility tree is usable
LLM vision loop -> screenshot + UI tree -> decide next action -> execute -> verify
See references/mobile-agent-frameworks.md for session notes on MobileRun/DroidRun, Appium fallback, GeeLark provisioning guidance, and first live verification commands. See references/geelark-account-onboarding.md for GeeLark signup URLs/selectors, Playwright/Xvfb inspection notes, and CAPTCHA-specific pitfalls from account onboarding.
Expose a local CLI or tool server with commands like:
android-device list
android-device create --provider geelark --name test-phone --android "Android 15" --region us --monthly
android-device start <phone_id>
android-device stop <phone_id>
android-device status <phone_id>
android-device screenshot <phone_id> --out /tmp/screen.png
android-device shell <phone_id> "pm list packages"
android-device adb-connect <phone_id>
android-device tap <phone_id> 320 800
android-device swipe <phone_id> 300 1100 300 200 600
android-device text <phone_id> "hello"
android-device install-apk <phone_id> ./app.apk
android-device open-app <phone_id> com.example.app
Vision-Control Loop
- Start/verify the device is running.
- Capture screenshot with provider screenshot API or
adb exec-out screencap -p.
- Dump UI hierarchy when possible:
bash
adb shell uiautomator dump /sdcard/window.xml
adb pull /sdcard/window.xml /tmp/window.xml
- Send screenshot plus UI tree to the model.
- Choose one action: tap, swipe, text, back/home, wait, open app, shell command.
- Execute via ADB/provider shell API.
- Re-screenshot and verify the state changed.
- Save a trace of screenshots, action JSON, package name, device id, and errors.
Mobile Agent Framework Integration
Prefer this order for turning a GeeLark phone into a Hermes-operated mobile agent:
- Direct GeeLark + ADB wrapper first: prove lifecycle, ADB, screenshots, Google packages, and persistence with deterministic commands.
- MobileRun/DroidRun second: run
droidrun/mobilerun on the Hermes host/VPS against the ADB-connected GeeLark device for natural-language mobile control. This was the primary open-source project identified for “browser-use but mobile.”
- Appium/uiautomator2 fallback: use when structured element selection is more reliable than pure screenshot/vision actions.
- On-device Hermes/Termux only later: treat running Hermes inside Android as an experimental phase after host-driven control works.
GeeLark Provider Notes
GeeLark is a strong first prototype candidate because official docs expose persistent cloud-phone management, ADB, screenshots, shell execution, app management, and Google RPA endpoints. See references/geelark-cloud-phone-openapi.md for concise endpoint/pricing/payment notes. See references/geelark-account-onboarding.md before automating account signup or code-sending flows.
Recommended GeeLark flow:
- Confirm plan/API access and rent one monthly cloud phone, not per-minute, for persistent 24/7 testing.
- Create/list/select the phone; record cloud phone ID and equipment info.
- Start the phone.
- Enable ADB, wait about 3 seconds, fetch ADB IP/port/password, then connect.
- Verify primitives:
- status returns started
- screenshot returns a valid image
- shell
pm list packages works
- ADB input tap changes UI
- uiautomator dump returns XML
- Test Google Play only with owned/pre-created test accounts.
- Stop/restart later and confirm app/account state persists.
Pricing Rule of Thumb
For persistent workers, calculate 30-day always-on cost before recommending a billing mode:
monthly_minutes = 60 * 24 * 30
payg_cost = per_minute_rate * monthly_minutes
If a monthly rental/unlimited plan exists, it is usually the right choice for long-lived Android agents.
Implementation Checklist
- [ ] Provider API client with auth from env vars only.
- [ ] Lifecycle commands: list/create/start/stop/status.
- [ ] Screenshot command with local file output and expiry handling.
- [ ] Shell command wrapper with output/error capture.
- [ ] ADB connect helper and health check.
- [ ] Input primitives: tap, long press, swipe, text, back, home.
- [ ] App primitives: install APK, uninstall, list packages, open package/activity.
- [ ] UI hierarchy dump and parser.
- [ ] Optional Appium/uiautomator2 session creation.
- [ ] Vision loop with one-action-at-a-time execution and verification.
- [ ] Trace logging with screenshots and JSON actions.
- [ ] Guardrails for account credentials and platform ToS boundaries.
Common Pitfalls
- Do not assume a “device farm” is persistent; BrowserStack/Sauce/AWS-style public devices are usually session-reset testing devices unless on expensive private/dedicated contracts.
- Do not recommend per-minute billing for 24/7 use without calculating monthly cost.
- Do not assume Google Play/Google login works on every cloud-phone Android version or region; verify on the exact model.
- Screenshot APIs may be asynchronous and links may expire; implement polling and local archival.
- ADB may require the phone to be started first and may require a provider-specific login command after
adb connect.
- Coordinate tapping is brittle; use UI hierarchy/Appium when possible and screenshots as fallback.
- Keep provider tokens and test-account credentials out of command history, logs, and public files.
- GeeLark signup/code-send flows can initialize an Aliyun
PUZZLE CAPTCHA that may not render or complete under headless/Xvfb automation; verify a real “code sent” countdown/response before telling Alex to check email.
- Do not bypass CAPTCHA or use solver services for signup. If the CAPTCHA blocks automation, ask Alex to complete that step manually or provide an interactive visible session, then continue with the verification code.
Verification
A prototype is not proven until all of these pass:
android-device status <id>
android-device screenshot <id> --out /tmp/screen.png
android-device shell <id> "pm list packages"
android-device adb-connect <id>
adb -s <serial> shell input tap 100 100
adb -s <serial> shell uiautomator dump /sdcard/window.xml
Then restart the cloud phone and confirm persistence:
- Install/open a test app.
- Stop the phone.
- Start it again.
- Confirm the app and account/session state are still present.