Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(mizrahi): scrape extra info #890

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
113 changes: 108 additions & 5 deletions src/scrapers/mizrahi.ts
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ import { SHEKEL_CURRENCY } from '../constants';
import {
pageEvalAll, waitUntilElementDisappear, waitUntilElementFound, waitUntilIframeFound,
} from '../helpers/elements-interactions';
import { fetchPostWithinPage } from '../helpers/fetch';
import { fetchPost, fetchPostWithinPage } from '../helpers/fetch';
import { waitForUrl } from '../helpers/navigation';
import {
type Transaction,
Expand All @@ -13,13 +13,21 @@ import {
} from '../transactions';
import { BaseScraperWithBrowser, LoginResults, type PossibleLoginResults } from './base-scraper-with-browser';
import { ScraperErrorTypes } from './errors';
import { sleep } from '../helpers/waiting';

interface ScrapedTransaction {
RecTypeSpecified: boolean;
MC02PeulaTaaEZ: string;
MC02SchumEZ: number;
MC02AsmahtaMekoritEZ: string;
MC02TnuaTeurEZ: string;
MC02KodGoremEZ: string;
MC02SugTnuaKaspitEZ: string;
MC02AgidEZ: string;
MC02SeifMaralEZ: string;
MC02NoseMaralEZ: string;
MC02ShowDetailsEZ: string;
TransactionNumber: string;
}

interface ScrapedTransactionsResult {
Expand All @@ -37,6 +45,27 @@ interface ScrapedTransactionsResult {
};
}

interface ExtraTransactionDetail {
Label: string;
Value: string;
}

interface ExtraTransactionResult {
body: {
fields: [
[
{
Records: [
{
Fields: ExtraTransactionDetail[];
},
];
},
],
];
};
}

const BASE_WEBSITE_URL = 'https://www.mizrahi-tefahot.co.il';
const LOGIN_URL = `${BASE_WEBSITE_URL}/login/index.html#/auth-page-he`;
const BASE_APP_URL = 'https://mto.mizrahi-tefahot.co.il';
Expand All @@ -47,11 +76,14 @@ const TRANSACTIONS_REQUEST_URLS = [
`${BASE_APP_URL}/OnlinePilot/api/SkyOSH/get428Index`,
`${BASE_APP_URL}/Online/api/SkyOSH/get428Index`,
];
const TRANSACTION_DETAILS_REQUEST_URL = `${BASE_APP_URL}/Online/api/OSH/getMaherBerurimSMF`;
const PENDING_TRANSACTIONS_PAGE = '/osh/legacy/legacy-Osh-p420';
const PENDING_TRANSACTIONS_IFRAME = 'p420.aspx';
const CHANGE_PASSWORD_URL = /https:\/\/www\.mizrahi-tefahot\.co\.il\/login\/index\.html#\/change-pass/;
const DATE_FORMAT = 'DD/MM/YYYY';
const MAX_ROWS_PER_REQUEST = 10000000000;
const TRANSACTION_DETAILS_REQUEST_CONCURRENCY = 1;
const TRANSACTION_DETAILS_REQUEST_WAIT_TIME = 500; // ms
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can add _MS to the variable name and save the comment.

Suggested change
const TRANSACTION_DETAILS_REQUEST_WAIT_TIME = 500; // ms
const TRANSACTION_DETAILS_REQUEST_WAIT_TIME_MS = 500; // ms


const usernameSelector = '#emailDesktopHeb';
const passwordSelector = '#passwordIDDesktopHEB';
Expand Down Expand Up @@ -130,6 +162,70 @@ function convertTransactions(txns: ScrapedTransaction[]): Transaction[] {
});
}

async function getTransactionExtraScrap(record: ScrapedTransaction, headers: Headers): Promise<ExtraTransactionResult | null> {
const formattedPeulaDate = moment(record.MC02PeulaTaaEZ).format(DATE_FORMAT);
const data = {
inKodGorem: record.MC02KodGoremEZ,
inAsmachta: record.MC02AsmahtaMekoritEZ,
inSchum: record.MC02SchumEZ,
inNakvanit: record.MC02KodGoremEZ,
inSugTnua: record.MC02SugTnuaKaspitEZ,
inAgid: record.MC02AgidEZ,
inTarPeulaFormatted: formattedPeulaDate,
inTarErechFormatted: formattedPeulaDate,
inKodNose: record.MC02SeifMaralEZ,
inKodTatNose: record.MC02NoseMaralEZ,
inTransactionNumber: record.TransactionNumber,
};

try {
const res = await fetchPost(TRANSACTION_DETAILS_REQUEST_URL, data, headers);
return res;
} catch (e) {
console.error(`Error fetching extra transaction details for record ${JSON.stringify(record)}`, e);
}
return null;
}

function simplifyExtraTransactionResultsToMemo(extraResult: ExtraTransactionResult): string {
let memo = '';
extraResult.body.fields.forEach(field =>
field?.forEach(group =>
group?.Records.forEach(record =>
record?.Fields.forEach((fieldRecord) => {
memo += `${fieldRecord.Label} ${fieldRecord.Value}; `;
}),
),
),
);
return memo;
}
Comment on lines +190 to +202
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we want to build a memo by ourself.

Maybe you can add an example of an object and we can learn what are you building here.


async function getExtraScrap(originalRecords: ScrapedTransaction[], currentTxns: Transaction[], headers: Headers): Promise<Transaction[]> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename to addExtraInfo

const recordsWithDetails = originalRecords
.map((record, index) => ({ record, index }))
.filter(({ record }) => record.MC02ShowDetailsEZ === '1');
Comment on lines +205 to +207
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to filter and then map


const promises = recordsWithDetails.map(({ record }) => getTransactionExtraScrap(record, headers));
let accounts: Array<ExtraTransactionResult | null> = [];
while (promises.length > 0) {
const currentPromises = promises.splice(0, TRANSACTION_DETAILS_REQUEST_CONCURRENCY);
accounts = accounts.concat(await Promise.all(currentPromises));
await sleep(TRANSACTION_DETAILS_REQUEST_WAIT_TIME);
}
Comment on lines +209 to +215
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're trying to do a batch proccesing here, but you missed something.

Once you run an async function, it runs! When you called getTransactionExtraScrap(record, headers), the function executed. It happened on the function call and not on the await call.

In your code, the promises hold all the executions, and you just waiting them one-by-one, it is not logical.


const txnsWithExtra = currentTxns.map((txn, i) => {
const extraDetailIndex = recordsWithDetails.findIndex(({ index }) => index === i);
const extraDetails = extraDetailIndex !== -1 ? accounts[extraDetailIndex] : undefined;
const currentTxn = { ...txn };
if (extraDetails) {
currentTxn.memo = simplifyExtraTransactionResultsToMemo(extraDetails);
}
return currentTxn;
});
Comment on lines +217 to +225
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you want to clone the current transaction, what do you think about this?

Suggested change
const txnsWithExtra = currentTxns.map((txn, i) => {
const extraDetailIndex = recordsWithDetails.findIndex(({ index }) => index === i);
const extraDetails = extraDetailIndex !== -1 ? accounts[extraDetailIndex] : undefined;
const currentTxn = { ...txn };
if (extraDetails) {
currentTxn.memo = simplifyExtraTransactionResultsToMemo(extraDetails);
}
return currentTxn;
});
const txnsWithExtra = currentTxns.map(({ ...currentTxn }, i) => {
const extraDetailIndex = recordsWithDetails.findIndex(({ index }) => index === i);
const extraDetails = extraDetailIndex !== -1 ? accounts[extraDetailIndex] : undefined;
if (extraDetails) {
currentTxn.memo = simplifyExtraTransactionResultsToMemo(extraDetails);
}
return currentTxn;
});

return txnsWithExtra;
}

async function extractPendingTransactions(page: Frame): Promise<Transaction[]> {
const pendingTxn = await pageEvalAll(page, 'tr.rgRow', [], (trs) => {
return trs.map((tr) => Array.from(tr.querySelectorAll('td'), (td: HTMLTableDataCellElement) => td.textContent || ''));
Expand Down Expand Up @@ -228,24 +324,31 @@ class MizrahiScraper extends BaseScraperWithBrowser<ScraperSpecificCredentials>
throw new Error('Account number not found');
}

const headersMap: Record<string, any> = {};
const response = await Promise.any(TRANSACTIONS_REQUEST_URLS.map(async (url) => {
const request = await this.page.waitForRequest(url);
const data = createDataFromRequest(request, this.options.startDate);
const headers = createHeadersFromRequest(request);
headersMap[url] = createHeadersFromRequest(request);

return fetchPostWithinPage<ScrapedTransactionsResult>(this.page, url, data, headers);
return fetchPostWithinPage<ScrapedTransactionsResult>(this.page, url, data, headersMap[url]);
}));

const cookies = await this.page.cookies();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Page-level cookie API is deprecated. Use Browser.cookies or BrowserContext.cookies instead.

const headers = Object.values(headersMap)[0];
headers.Cookie = cookies.map((cookie) => `${cookie.name}=${cookie.value}`).join('; ');
Comment on lines +327 to +337
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like this headersMap solution which become another headers object.

But I can't find another way right now.


if (!response || response.header.success === false) {
throw new Error(`Error fetching transaction. Response message: ${response ? response.header.messages[0].text : ''}`);
}

const relevantRows = response.body.table.rows.filter((row) => row.RecTypeSpecified);
const oshTxn = convertTransactions(relevantRows);

const oshTxnWithExtra = this.options.additionalTransactionInformation ?
await getExtraScrap(relevantRows, oshTxn, headers) : oshTxn;

// workaround for a bug which the bank's API returns transactions before the requested start date
const startMoment = getStartMoment(this.options.startDate);
const oshTxnAfterStartDate = oshTxn.filter((txn) => moment(txn.date).isSameOrAfter(startMoment));
const oshTxnAfterStartDate = oshTxnWithExtra.filter((txn) => moment(txn.date).isSameOrAfter(startMoment));

const pendingTxn = await this.getPendingTransactions();
const allTxn = oshTxnAfterStartDate.concat(pendingTxn);
Expand Down