Файловый менеджер - Редактировать - /home/digitalm/venv/lib/python3.7/site-packages/pip/_vendor/chardet/__pycache__/charsetprober.cpython-37.pyc
Назад
B �5�g, � @ sL d dl Z d dlZd dlmZmZ ddlmZmZ e�d�Z G dd� d�Z dS )� N)�Optional�Union� )�LanguageFilter�ProbingStates% [a-zA-Z]*[�-�]+[a-zA-Z]*[^a-zA-Z�-�]?c @ s� e Zd ZdZejfedd�dd�Zdd�dd�Zee e d�d d ��Zee e d�dd��Ze eef ed �dd�Zeed�dd��Zed�dd�Zee eef ed�dd��Zee eef ed�dd��Zee eef ed�dd��ZdS )� CharSetProbergffffff�?N)�lang_filter�returnc C s$ t j| _d| _|| _t�t�| _d S )NT) r � DETECTING�_state�activer �logging� getLogger�__name__�logger)�selfr � r �B/tmp/pip-install-251nq386/pip/pip/_vendor/chardet/charsetprober.py�__init__, s zCharSetProber.__init__)r c C s t j| _d S )N)r r r )r r r r �reset2 s zCharSetProber.resetc C s d S )Nr )r r r r �charset_name5 s zCharSetProber.charset_namec C s t �d S )N)�NotImplementedError)r r r r �language9 s zCharSetProber.language)�byte_strr c C s t �d S )N)r )r r r r r �feed= s zCharSetProber.feedc C s | j S )N)r )r r r r �state@ s zCharSetProber.statec C s dS )Ng r )r r r r �get_confidenceD s zCharSetProber.get_confidence)�bufr c C s t �dd| �} | S )Ns ([ -])+� )�re�sub)r r r r �filter_high_byte_onlyG s z#CharSetProber.filter_high_byte_onlyc C s^ t � }t�| �}xH|D ]@}|�|dd� � |dd� }|�� sL|dk rLd}|�|� qW |S )u7 We define three types of bytes: alphabet: english alphabets [a-zA-Z] international: international characters [-ÿ] marker: everything else [^a-zA-Z-ÿ] The input buffer can be thought to contain a series of words delimited by markers. This function works to filter all words that contain at least one international character. All contiguous sequences of markers are replaced by a single space ascii character. This filter applies to all scripts which do not use English characters. N���� �r )� bytearray�INTERNATIONAL_WORDS_PATTERN�findall�extend�isalpha)r �filtered�words�word� last_charr r r �filter_international_wordsL s z(CharSetProber.filter_international_wordsc C s� t � }d}d}t| ��d�} x^t| �D ]R\}}|dkrD|d }d}q&|dkr&||krt|st|�| ||� � |�d� d}q&W |s�|�| |d � � |S ) a[ Returns a copy of ``buf`` that retains only the sequences of English alphabet and high byte characters that are not between <> characters. This filter can be applied to all scripts which contain both English characters and extended ASCII characters, but is currently only used by ``Latin1Prober``. Fr �c� >r � <r TN)r$ � memoryview�cast� enumerater'