Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Broken with last CBA update #17

Open
jpillora opened this issue Dec 13, 2019 · 23 comments
Open

Broken with last CBA update #17

jpillora opened this issue Dec 13, 2019 · 23 comments

Comments

@jpillora
Copy link

The login process is now broken. Hopefully it's just the login to fix, and the rest of the functionality still works!

@jcwillox
Copy link

Hi, I'm currently maintaining a Python API for CommBank (not publically released yet). I've just updated it to support the new login process, so hopefully, I can save you guys some time 👍 .

The login process is similar to before except before you are redirected to the homepage (and are logged in) you now need to manually submit a hidden form in the response, then you are taken to the homepage.

CommBank has added an API for accounts now 😀 /retail/netbank/api/home/v1/accounts this returns a JSON object with a list of the accounts, their balances etc. and a link to their transactions page. The parsing for the transactions page has stayed the same, except when navigating to it for the first time you are required to submit a hidden form in the response then you will then be redirected to the transactions page.

The addition of the hidden forms is due to CommBank moving over to OpenID.

@jpillora
Copy link
Author

jpillora commented Dec 16, 2019 via email

@jcwillox
Copy link

jcwillox commented Jan 3, 2020

A little update, CommBank appears to have reverted the login process back to normal, so everything should work as it did before. Along with that, the API endpoint for accounts is no longer available 😢, so guess its back to web scraping 👍.

@jpillora
Copy link
Author

jpillora commented Jan 3, 2020 via email

@jcwillox
Copy link

I feel like I should update this thread with the new changes, commbank has re-enabled that API I was talking about, which means you need to handle those hidden forms I mentioned previously.

Accounts API

https://www.commbank.com.au/retail/netbank/api/home/v1/accounts

Sample python code to extract the relevant data.

[
    {
        "name": account["displayName"],
        "bsb": account["number"][:6],  # first 6 digits are the bsb.
        "number": account["number"][6:],  # rest is the account number.
        "balance": account["balance"][0]["amount"],
        "available_balance": account["availableFunds"][0]["amount"],
        "link": account["link"]["url"],
    }
    for account in response.json()["accounts"]
]

I was trying out mitmproxy the other day and was able to glean some of commbank's mobile APIs. Here's what I found:

all requests need the header Authorization: Bearer <token>.
not completely sure where to get this token but I believe you can extract it from the web login process.

all responses from the APIs are JSON formatted.

Get Balances (i.e get accounts)

https://www.commbank.com.au/innovate/SimpleBalance/v1/balance

Get Transactions

Needs the header x-param-map-accountIdentifier: <product_code>+<bsb><account_number>

https://www.my.commbank.com.au/netbank/EnrichedTransactions/v0/transactions

Get Transaction Detail

Each transaction from the ../transactions endpoint has an id field this needs to be used to get the detailed information for a transaction.

https://www.my.commbank.com.au/netbank/EnrichedTransactions/v0/transactions/accountIdentifier-<id>/details

@jpillora
Copy link
Author

Awesome, thanks :)

@svict4
Copy link

svict4 commented Sep 28, 2020

Hey @jcwillox is your python wrapper ready for release by any chance? 🤞

@jcwillox
Copy link

@svict4 haha kinda, the code base isn't great as I wrote it a long time ago, but it does work. I spent a bit refactoring just now to make it a little more like jcwillox/up-bank-api. I'll upload the code to GitHub when I get the chance. I don't use commbank that much anymore so it's probably not going to be maintained that well, and currently, it only supports retrieving accounts and the first page of transactions, but maybe someone can improve upon it.

@svict4
Copy link

svict4 commented Sep 29, 2020

Fair enough @jcwillox Up is the better bank 💪
Was simply looking for some prior art, so thanks for your previous comment on the various endpoints

@balupton
Copy link

Maybe https://www.commbank.com.au/Developer/ will be of use, however there is no docs on how to authenticate with the apis

@jcwillox
Copy link

jcwillox commented Aug 8, 2022

It would be great if we could use those but unfortunately, you can't unless you're a business.

To access consumer APIs, you'll need to be accredited by the ACCC and get the customer's consent

For now we are stuck mimicking the NetBank login process and then using the following unofficial API endpoints.

https://www.commbank.com.au/retail/netbank/accounts/api/accounts
https://www.commbank.com.au/retail/netbank/accounts/api/transactions

@jpillora
Copy link
Author

jpillora commented Oct 11, 2022 via email

@paytah232
Copy link

I feel like I should update this thread with the new changes, commbank has re-enabled that API I was talking about, which means you need to handle those hidden forms I mentioned previously.

Accounts API

https://www.commbank.com.au/retail/netbank/api/home/v1/accounts

Sample python code to extract the relevant data.

[
    {
        "name": account["displayName"],
        "bsb": account["number"][:6],  # first 6 digits are the bsb.
        "number": account["number"][6:],  # rest is the account number.
        "balance": account["balance"][0]["amount"],
        "available_balance": account["availableFunds"][0]["amount"],
        "link": account["link"]["url"],
    }
    for account in response.json()["accounts"]
]

I was trying out mitmproxy the other day and was able to glean some of commbank's mobile APIs. Here's what I found:

all requests need the header Authorization: Bearer <token>.
not completely sure where to get this token but I believe you can extract it from the web login process.

all responses from the APIs are JSON formatted.

Get Balances (i.e get accounts)

https://www.commbank.com.au/innovate/SimpleBalance/v1/balance

Get Transactions

Needs the header x-param-map-accountIdentifier: <product_code>+<bsb><account_number>

https://www.my.commbank.com.au/netbank/EnrichedTransactions/v0/transactions

Get Transaction Detail

Each transaction from the ../transactions endpoint has an id field this needs to be used to get the detailed information for a transaction.

https://www.my.commbank.com.au/netbank/EnrichedTransactions/v0/transactions/accountIdentifier-<id>/details

@jcwillox Did you ever work out how to find that token during the login phase? I am scraping in php, and could only see XSRF-TOKEN, but seems not to work?

@jcwillox
Copy link

@paytah232 no sorry never looked into those mobile APIs any further. I just pushed the CommBank API wrapper I wrote in Python all those years ago so maybe that will help you, I believe it still works. It sounds like you're having issues with the login so maybe check out this function https://github.com/jcwillox/commbank-api/blob/9ed0f4d1107c28bddb45cc07d6d3be97cbf82178/commbank/client.py#L26-L69.

Here's the repo with the Python API client, it definitely needs some polish, but it might be helpful to people https://github.com/jcwillox/commbank-api.

@paytah232
Copy link

@jcwillox Thanks Josh. Seems the hardest thing I'm finding with PHP implementation is scraping the crawler for a form that allows me to post data. From what I can tell, both of these scripts are creating a form and submitting it, but I don't think PHP can do that - at least, I haven't found a way yet, and I haven't found an existing form I can use once I get past login.

https://github.com/jcwillox/commbank-api/blob/0364bfccc03c3c1ff9d70e6529f6908e44bc0f40/commbank/client.py#L51-L56

node-cba-netbank/src/api.js

Lines 164 to 176 in 7feb68b

const form = Object.assign({}, response.form, {
// fill the form
ctl00$ctl00: 'ctl00$BodyPlaceHolder$updatePanelSearch|ctl00$BodyPlaceHolder$lbSearch',
__EVENTTARGET: 'ctl00$BodyPlaceHolder$lbSearch',
__EVENTARGUMENT: '',
ctl00$BodyPlaceHolder$searchTypeField: '1',
ctl00$BodyPlaceHolder$radioSwitchDateRange$field$: 'ChooseDates',
ctl00$BodyPlaceHolder$dateRangeField: 'ChooseDates',
ctl00$BodyPlaceHolder$fromCalTxtBox$field: from,
ctl00$BodyPlaceHolder$toCalTxtBox$field: to,
// Add this for partial update
ctl00$BodyPlaceHolder$radioSwitchSearchType$field$: 'AllTransactions',
});

I can however get the 40 most recent transactions using the transactions api. I'm wondering if you found a way to pass a query or header to that api that allowed it to handle to/from dates, increase results, or get more results. I wouldn't even know where to start to try that myself.

I have a forked repo where I'm slowly working on this for a PHP implementation here:
https://github.com/paytah232/php-cba-netbank

@jcwillox
Copy link

jcwillox commented May 8, 2023

@paytah232 seems this got lost in all my github notifications just cleaning them out now.

The code you linked in my repo, does a post request to login which returns HTML, then I'm using an html-parser (BeautifulSoup) to grab the action which is the next URL and find all the inputs and pull out their names and values. Basically the browser would auto submit this form but because we're headless we have to manually pull out the form data and send the request ourselves, (parse_form is just a helper to handle extracting any HTML forms data). Then finally in the section you highlighted I submit that form by sending its data to the action url.

This should definitely be possible in PHP I'm sure they'll have some kind of HTML parse library available as that's the only thing you're missing.

@paytah232
Copy link

@jcwillox No worries mate.

What html are you actually receiving from client.py?

I could be interpreting incorrectly, but you're using your action to parse the html that is received after the successful login. So, that should be the page listing your account information and you are finding any form, grabbing the action/data, and then creating a new form to return.
For whatever reason, I don't believe I am actually ever getting to this page where I can find a form to get data from. The original source I don't think finds it's form (#aspnetForm) because it doesn't return any accounts and there is no form on the page. I end up skipping that and using the api after login because it was the only way I could get any data. So I am definitely logged in, but perhaps not correctly redirecting to the account page.

I also note that when I do try to redirect to the account page like this, I always get an error stating that the url doesn't exist. I am guessing that the 'dynamic' account link I received is now not valid (or has changed) for that session.

I also have seen a response page that asks for a 'Click to Confirm' prompt, which your wrapper doesn't cover. I'm wondering if there's a reason I'm getting that too...

Perhaps (offline) we could compare notes on stages of parsing in more detail?

@jcwillox
Copy link

Looking at your code, the crawler you use looks to be either more advanced or actually controlling a headless browser, in my case, I have to manually parse those forms and construct the correct requests for each step of the login, but your crawler appears to actually allow you to click a button which is cool, so you may not need to do the same process.

I've never tried redirecting to the accounts page so not sure why that wouldn't work but that sounds quite possible. But for getting the accounts/transactions the JSON API is definitely the way to go.

As for getting transactions, I use the link in the response from the accounts JSON endpoint, e.g. accounts[0].link.url, to create the URL for the transactions JSON endpoint, I do that here https://github.com/jcwillox/commbank-api/blob/main/commbank/client.py#L113-L115 e.g.

accounts[0].link.url='/retail/netbank/accounts/?account=<some-long-id>'
transactions URL='https://www.commbank.com.au/retail/netbank/accounts/api/transactions?account=<some-long-id>'

Here's an example of how the parse_form works during the login
So the first parse_form gets passed a huge HTML document, with an HTML form inside it, here's part of it.

<head id="head">
   <title>
      NetBank - Log on to NetBank - Enjoy simple and secure online banking from Commonwealth Bank
   </title>
   <meta name="description" content="NetBank is here to simplify your banking life. You can manage all your accounts 
      from one place, and do your banking whenever or wherever it suits you." />
   <meta name="google-site-verification" 
      content="_Y1ecy6XcbQ3abYLk9glqe_Csuq0QakknnlXfW2Qrjo" />
   <link rel="canonical" 
      href="https://www.my.commbank.com.au/netbank/Logon/Logon.aspx" />
   <meta name="viewport" content="width=device-width,       
      initial-scale=1" />
   <link rel="stylesheet" type="text/css" 
      href="https://static.my.commbank.com.au/static/netbank/theme/fo/css/logon-merge.8397238ab0ae7a25ea1af4d375f2c3df.css"     
      rel-album="R700" />
</head>
<body id="body" class="logon">
   <form method="post" action="/netbank/Logon/Logon.aspx" onsubmit="javascript:return WebForm_OnSubmit();" id="form1"autocomplete="off">
   <div class="aspNetHidden">
      <input type="hidden" name="RID" id="RID" value="..." />
      <input type="hidden" name="SID" id="SID" value="..." />
      <input type="hidden" name="cid" id="cid" value="..." />
      <input type="hidden" name="rqid" id="rqid" value="..." />
      <input type="hidden" name="__VIEWSTATE" id="__VIEWSTATE"
         value="..." />
   </div>

This gets parsed into the following, I redacted the values as I can't remember if any of these are private. The actual response

{
    'action': '/netbank/Logon/Logon.aspx',
    'data': {
        'RID': '...',
        'SID': '...',
        'cid': '...',
        'rqid': '...',
        '__VIEWSTATE': '...',
        '__VIEWSTATEGENERATOR': '...',
        '__EVENTVALIDATION': '...',
        'JS': 'D'
    }
}

The second parse_form gets

<html>
   <head>
      <meta http-equiv='X-UA-Compatible' content='IE=edge' />
      <base target='_self'/>
   </head>
   <body>
      <form method='post' action='https://www.commbank.com.au/retail/netbank/identity/signin-oidc'>
         <input type='hidden' name='code' value='<redacted>' />
         <input type='hidden' name='scope' value='openid profile digital-platform netbank' />
         <input type='hidden' name='state' value='<redacted>' />
         <input type='hidden' name='session_state' value='<redacted>' />
         <noscript><button>Click to continue</button></noscript>
      </form>
      <script>window.addEventListener('load',
         function(){document.forms[0].submit();});
      </script>
   </body>
</html>

Which is parsed to

{
    'action': 'https://www.commbank.com.au/retail/netbank/identity/signin-oidc',
    'data': {
        'code': '<redacted>',
        'scope': 'openid profile digital-platform netbank',
        'state': '<redacted>',
        'session_state': '<redacted>'
    }
}

Perhaps (offline) we could compare notes on stages of parsing in more detail?

Sure 👍

@jpillora
Copy link
Author

jpillora commented May 11, 2023

I’ve switched to headless chrome and do minimal parsing

I only fill out login and click buttons, and here’s the trick, I add chrome driver network listener and just wait for the JSON I want to come back during the click-around session

@jcwillox
Copy link

Yeah that sounds like a pretty solid method 👍. At least the CommBank login process that I've been using has been stable for a few years now, so I haven't had to do much maintenance.

@paytah232
Copy link

Just as a random comment, I just don't think I have a good enough understanding of the crawler for PHP, so I haven't made any progress.

I did however find a 3rd party app that is relatively easily integratable for PHP, Python and Java - called Odoo.

It's free (for one app - I used the Accounting app), and provides API access to a BUNCH of banking institutions through other 3rd party providers, all claiming best in practice security (hopefully true).

I've got it set up and working for my PHP DB, and I'm pretty happy with it overall. Anyway, thanks for the help whilst I was trying to go down the crawler path!

@balupton
Copy link

balupton commented May 31, 2023

Worth noting that cloudflare workers now support headless browsing, so that can be an avenue for achieving this: https://developers.cloudflare.com/browser-rendering/platform/puppeteer/

I'm also wondering if anyone has any experience with getting transactions from a Zip Pay account. I'm thinking of going headless, or using a proxy to observe the API requests.

Niffy about Odoo, there use to be an Australian company called Pocketbook that was also a transaction aggregator but they shut down: https://help.getpocketbook.com/hc/en-us/articles/5184060829071-Alternative-Service-Providers

@jpillora
Copy link
Author

jpillora commented May 31, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants