arelle.WebCache

See COPYRIGHT.md for copyright information.

For SEC EDGAR data access see: https://www.sec.gov/os/accessing-edgar-data e.g., User-Agent: Sample Company Name AdminContact@.com

Module Contents

Classes

Functions

Data

API

arelle.WebCache._: arelle.typing.TypeGetText

None

arelle.WebCache.addServerWebCache

None

arelle.WebCache.DIRECTORY_INDEX_FILE

‘!~DirectoryIndex~!’

arelle.WebCache.FILE_LOCK_TIMEOUT

30

arelle.WebCache.INF

‘float(…)’

arelle.WebCache.RETRIEVAL_RETRY_COUNT

5

arelle.WebCache.HTTP_USER_AGENT

‘format(…)’

arelle.WebCache._XBRL_ORG_URL_PREFIXES

‘frozenset(…)’

arelle.WebCache.XBRL_ORG_CACHE_REDIRECTS

None

class arelle.WebCache.ProxyTuple

Bases: typing.NamedTuple

useOsProxy: bool

None

urlAddr: str | None

None

urlPort: str | None

None

user: str | None

None

password: str | None

None

classmethod coerce(value: Any) arelle.WebCache.ProxyTuple | None
property authority: str
arelle.WebCache.proxyDirFmt(httpProxyTuple: arelle.WebCache.ProxyTuple | None) dict[str, str] | None
arelle.WebCache.proxyTuple(url: str) arelle.WebCache.ProxyTuple
arelle.WebCache.lastModifiedTime(headers: dict[str, str]) float | None
class arelle.WebCache.WebCache(cntlr: arelle.Cntlr.Cntlr, httpProxyTuple: arelle.WebCache.ProxyTuple | None)

Initialization

default_timeout: float | int | None

None

property timeout: float | None
property recheck: str
property logDownloads: bool
saveUrlCheckTimes() None
property noCertificateCheck: bool
property httpUserAgent: str
property httpsRedirect: bool
redirectFallback(matchPattern: regex.Pattern[str], replaceFormat: str) None
resetProxies(httpProxyTuple: arelle.WebCache.ProxyTuple | None) None
property opener: urllib.request.OpenerDirector
normalizeFilepath(filepath: str, url: str, cacheDir: str | None = None) str

Perform any necessary transformations to filepath.

Parameters:
  • filepath – Filepath to normalize.

  • url – Original URL (for http/https redirect).

  • cacheDir – Cache root directory.

Returns:

Normalized filepath.

normalizeUrl(url: str | None, base: str | None = None) Any
encodeForFilename(pathpart: str) str
_fallbackRedirect(url: str, originalFilepath: str, cacheDir: str) str

If the original URL does not map to an existing cache file, we’ll check each fallback redirect pattern to see if modifying the URL yields a path to a file that does exist in the cache. If none is found, the original filepath is returned.

Parameters:
  • url – The requested URL.

  • originalFilepath – The original mapped filepath.

Returns:

An existing redirected path or the original filepath.

urlToCacheFilepath(url: str, cacheDir: str | None = None, useRedirectFallback: bool = True) str

Converts url into the corresponding cache filepath in `cacheDir.

Parameters:
  • url – URL to convert.

  • cacheDir – Cache root directory.

  • useRedirectFallback – Whether to use fallback redirects.

Returns:

Cache filepath.

cacheFilepathToUrl(cacheFilepath: str, cacheDir: str | None = None) str
getfilename(url: str | None, base: str | None = None, reload: bool = False, checkModifiedTime: bool = False, normalize: bool = False, filenameOnly: bool = False, allowTransformation: bool = True) str | None
_checkIfNewerOnWeb(url: str, filepath: str) bool
Parameters:
  • url – URL to retrieve web timestamp from

  • filepath – Filepath to retrieve local timestamp from

Returns:

static _getTimeString(timeValue: float) str
Parameters:

timeValue – time in seconds since the epoch, in UTC

Returns:

UTC-formatted string representation of timeValue

static _quotedUrl(url: str) str
Parameters:

url

Returns:

url with scheme-specific-part quoted except for parameter separators

static _getFileTimestamp(path: str) float
_downloadFileWithLock(url: str, filepath: str, retrievingDueToRecheckInterval: bool = False, retryCount: int = 5) bool
_downloadFile(url: str, filepath: str, retrievingDueToRecheckInterval: bool = False, retryCount: int = 5) bool

Downloads the file at url to a temporary location before copying it to filepath.

Parameters:
  • url – Web resource to download.

  • filepath – End destination for downloaded file.

  • retrievingDueToRecheckInterval – Determines how errors are handled when download is part of a cache recheck.

  • retryCount – Number of times to retry download.

Returns:

Whether filepath should now be used.

internetRecheckFailedRecovery(url: str, err: str | Exception, timeNowStr: str) None
reportProgress(blockCount: int, blockSize: int, totalSize: int) None
clear() None
getheaders(url: str) dict[str, str]
geturl(url: str) str | None
retrieve(url: str, filename: str | None = None, filestream: io.BytesIO | None = None, reporthook: collections.abc.Callable[[int, int, int], None] | None = None, data: bytes | None = None) tuple[str | None, dict[str, str], bytes]