internetnl_domain_analyse package

Submodules

internetnl_domain_analyse.domain_analyse_classes module

class internetnl_domain_analyse.domain_analyse_classes.DomainAnalyser(scan_data_key=None, cache_file_base='tables_df', cache_directory_base_name=None, tld_extract_cache_directory=None, output_file=None, reset=None, records_cache_info: RecordCacheInfo | None = None, internet_nl_filename=None, breakdown_labels=None, statistics: dict | None = None, default_scan=None, variables: dict | None = None, module_info: dict | None = None, weights=None, url_key='website_url', suffix_key='suffix', translations=None, module_key='module', variable_key='variable', sheet_renames=None, n_digits=None, write_dataframe_to_sqlite=False, statistics_to_xls=False, n_bins=100, mode=None, correlations=None, categories=None, dump_cache_as_sqlite=False)[source]

Bases: object

calculate_categories()[source]
calculate_correlations_and_scores()[source]
calculate_statistics()[source]
calculate_statistics_one_breakdown(group_by)[source]
check_if_cache_exist(mode: str)[source]
get_correct_categories_count()[source]

Bekijk per record hoeveel categorieën goed zijn en geef terug als dataframe

read_data()[source]
variable_dict2df(variables, module_info: dict | None = None) DataFrame[source]

Converteer de directory met variable info naar een data frame :param variables: dict met variable info :param module_info: dict met module informatie

Returns:

dataframe

write_data()[source]

write the combined data frame to sqlite lite

write_statistics()[source]
class internetnl_domain_analyse.domain_analyse_classes.DomainPlotter(scan_data, scan_data_key=None, default_scan=None, plot_info=None, show_plots=False, barh=False, image_directory=None, cache_directory=None, image_type='pdf', max_plots=None, tex_prepend_path=None, statistics=None, variables=None, cdf_plot=False, bar_plot=False, cor_plot=False, add_logo=True, cumulative=False, show_title=False, breakdown_labels=None, translations: dict | None = None, export_highcharts=False, highcharts_directory=None, correlations=None, tex_horizontal_shift=None, bovenschrift=True, variables_to_plot=None, exclude_variables=None, force_plots=False, latex_files=False, years_to_add_to_plot_legend=None, module_info=None, english=False)[source]

Bases: object

get_plot_cache(scan_data_key, plot_key, year_key)[source]
make_plots(add_logo=True)[source]
class internetnl_domain_analyse.domain_analyse_classes.ImageFileInfo(scan_data_key, cache_file_name_base='image_info', cache_directory='cache')[source]

Bases: object

add_entry(plot_key, plot_info, image_key, sub_image_label, file_name, tex_right_shift=None, section=None)[source]

add a new entry

fix_order(variables)[source]
read_cache()[source]

Lees de cache

write_cache()[source]

Schrijf de cache

class internetnl_domain_analyse.domain_analyse_classes.PlotInfo(variables_df, var_name, breakdown_name)[source]

Bases: object

get_plot_info()[source]

In de variables dataframe kunnen we ook uitdrukkelijk de highcharts directory en highcharts label opgeven per variabele. Zoek dat hier op

class internetnl_domain_analyse.domain_analyse_classes.RecordCacheInfo(records_cache_data: dict, year_key: str, stat_directory: str | None = None)[source]

Bases: object

get_cache_file_name()[source]

Retrieve the cache file name from the dictionary. If environment variables are given, base the directory on the environment name. Names are given like RECORDS_CACHE_DIR_20, RECORDS_CACHE_DIR_21, for 2020, 2021 resp.

get_cache_table_names()[source]

Get the table names of the cache files.

internetnl_domain_analyse.domain_analyse_classes.add_missing_years(plot_df, years_to_plot=None, jaar_level_name='Jaar', column=None)[source]

Voeg missende jaren toe

Parameters:
  • plot_df – pd.DataFrame DataFrame om te plotetn

  • years_to_plot – list De jaren die we willen plotten

  • jaar_level_name – str De naam van de level= van de jaren

  • column – str Naam van de column voor de foutmelding

Returns:

pd.DataFrame

internetnl_domain_analyse.domain_analyse_classes.calculate_histogram_per_breakdown(data: DataFrame, var_key: str, df_weights: Series, n_bins: int = 100) dict[source]

Bereken per breakdown van de data het histogram die hoort bij var_key

Parameters:
  • data (DataFrame) – De data met breakdown op de index

  • var_key (str) – De naam van de kolom waarvoor we de histogram gaan berekenen

  • df_weights (Series) – De weegfactoren die we voor de histogram gebruiken

  • n_bins (int) – Aan binnen in het histogram

Returns:

De histogrammen per breakdown

Return type:

dict

internetnl_domain_analyse.domain_analyse_classes.make_plot_cache_file_name(cache_directory, file_base, prefix)[source]

internetnl_domain_analyse.domain_plots module

class internetnl_domain_analyse.domain_plots.AxisLabel(label_properties, text_default=None, positie_default=None)[source]

Bases: object

class om de eigenschappen van een as label op te slaan

set_properties()[source]
internetnl_domain_analyse.domain_plots.make_bar_plot(plot_df, plot_key, plot_variable, scan_data_key, module_name, question_name, image_directory, show_plots=False, add_logo=True, figsize=None, highcharts_height=None, image_type='pdf', reference_lines=None, xoff=0.02, yoff=0.02, show_title=False, barh=False, subplot_adjust=None, sort_values=False, y_max_bar_plot=None, y_spacing_bar_plot=None, translations=None, legend_position=None, legend_max_columns=None, box_margin=None, export_svg=False, export_highcharts=False, highcharts_directory=None, title=None, normalize_data=False, force_plot=False, enable_highcharts_legend=True, unit=None, english=False, bar_width=None)[source]
internetnl_domain_analyse.domain_plots.make_bar_plot_horizontal(plot_df, fig, axis, margin, plot_title, show_title, translations, reference_lines, line_iter, xoff, yoff, trans, y_spacing_bar_plot, y_max_bar_plot, legend_position, legend_max_columns, add_logo=True, unit=None, english=False, bar_width=None)[source]
internetnl_domain_analyse.domain_plots.make_bar_plot_stacked(year, plot_df, plot_key, plot_variable, scan_data_key, module_name, question_name, image_directory, show_plots=False, figsize=None, image_type='pdf', reference_lines=None, xoff=0.02, yoff=0.02, show_title=False, barh=False, subplot_adjust=None, sort_values=False, add_logo=True, y_max_bar_plot=None, y_spacing_bar_plot=None, translations=None, legend_position=None, box_margin=None, export_svg=False, export_highcharts=False, highcharts_directory=None, title=None, normalize_data=False, force_plot=False, enable_highcharts_legend=True, unit=None, english=False)[source]
internetnl_domain_analyse.domain_plots.make_bar_plot_vertical(plot_df, axis, plot_title, show_title, translations, reference_lines, line_iter, xoff, yoff, trans, add_logo=True, unit=None, english=False)[source]
internetnl_domain_analyse.domain_plots.make_cdf_plot(hist, grp_key, plot_key, scan_data_key, module_name=None, question_name=None, image_directory=None, show_plots=False, figsize=None, image_type=None, image_file_base=None, cummulative=False, reference_lines=None, xoff=None, yoff=None, y_max=None, y_spacing=None, translations=None, export_highcharts=None, export_svg=False, highcharts_info: dict | None = None, title: str | None = None, year: int | None = None, english=False)[source]
internetnl_domain_analyse.domain_plots.make_conditional_pdf_plot(categories, image_directory, show_plots=False, export_highcharts=False, highcharts_directory=None, cache_directory=None, english=False)[source]
internetnl_domain_analyse.domain_plots.make_conditional_score_plot(correlations, image_directory, show_plots=False, figsize=None, image_type='.pdf', export_svg=False, export_highcharts=False, highcharts_directory=None, title=None, cache_directory=None, english=False)[source]
internetnl_domain_analyse.domain_plots.make_heatmap(correlations, image_directory, show_plots=False, figsize=None, image_type='.pdf', export_svg=False, export_highcharts=False, highcharts_directory=None, title=None, cache_directory=None, english=False)[source]
internetnl_domain_analyse.domain_plots.make_verdeling_per_aantal_categorie(categories, image_directory, show_plots=False, export_highcharts=False, highcharts_directory=None, cache_directory=None, english=False)[source]
internetnl_domain_analyse.domain_plots.plot_score_per_count(scores, categories, highcharts_directory, im_file, show_plots, plot_title, x_label, y_label, english=False)[source]
internetnl_domain_analyse.domain_plots.plot_score_per_interval(scores, score_intervallen, index_labels, categories, highcharts_directory, im_file, show_plots, plot_title, x_label, y_label, english=False)[source]

internetnl_domain_analyse.domein_analyse module

internetnl_domain_analyse.domein_analyse.main()[source]
internetnl_domain_analyse.domein_analyse.parse_args()[source]

Parse command line parameters

Parameters:

args ([str]) – command line parameters as list of strings

Returns:

command line parameters namespace

Return type:

argparse.Namespace

internetnl_domain_analyse.domein_analyse.set_do_it_vlaggen(required_keys, chapter_info, recursive=False)[source]

Van een hoofdstukje uit je settings file, druk de do_it vlaggen op

Parameters:
  • required_keys – list List van de items waarvan je de do_it vlag wilt opdrukken

  • chapter_info – de dictionary waarvan je de vlaggen zet.

  • recursive – bool Als dit een recursieve call is, dan willen we de waardes die niet in de lijst zitten niet op False zetten

Returns: dict

De nieuwe dictionary.

internetnl_domain_analyse.latex_output module

class internetnl_domain_analyse.latex_output.ExampleEnvironment(*, options=None, arguments=None, start_arguments=None, **kwargs)[source]

Bases: Environment

A class representing a custom LaTeX environment.

This class represents a custom LaTeX environment named exampleEnvironment.

packages = OrderedSet([Package(Arguments('mdframed'), Options())])
class internetnl_domain_analyse.latex_output.SubFloat(arguments=None, options=None, *, extra_arguments=None)[source]

Bases: CommandBase

A class representing a custom LaTeX command.

This class represents a custom LaTeX command named exampleCommand.

packages = OrderedSet()
internetnl_domain_analyse.latex_output.make_latex_overview(image_info=None, variables=None, image_directory=None, cache_directory=None, image_files=None, tex_horizontal_shift='-2cm', tex_prepend_path=None, bovenschrift=False, module_info=None)[source]

Maak latex output file met alle plaatjes

Parameters:
  • module_info – class Informatie van de modules

  • cache_directory – obj:Path

  • image_info – object: ImageInfo

  • variables – dict met variabele eigenschappen

  • image_directory – str

  • image_files – obj:Path

  • tex_prepend_path – str

  • tex_horizontal_shift – verschuiving naar links

  • bovenschrift – boolean Voeg caption bovenaan figuren

internetnl_domain_analyse.utils module

internetnl_domain_analyse.utils.add_derived_variables(tables, variables)[source]

Add the variables we defined in the settings files which do not exist yet, but are defined with an eval statement

Parameters:
  • tables – pd.DataFrame original table of variables

  • variables – pd.DataFrame properties of variables

Returns:

pd.DataFame

internetnl_domain_analyse.utils.add_missing_groups(all_stats, group_by, group_by_original, missing_groups)[source]
internetnl_domain_analyse.utils.clean_all_suffix(dataframe, suffix_key, variables)[source]

Hier gaan we de suffixen selecteren die we gedefinieerd hebben.

Parameters:
  • dataframe – dataframe met tabellen, waaronder een kolom met website extensies

  • suffix_key – de naam van de kolom met website extensies

  • variables – dataframe met variable informatie. Moet minimaal een variabele

  • zijn (gelijk aan de suffix_key hebben waarin de categorieën gedefinieerd)

Returns:

dataframe

internetnl_domain_analyse.utils.dump_data_frame_as_sqlite(dataframe, file_name)[source]

Dump data als sqlite, maar zorg dat je duplicates eruit haalt

internetnl_domain_analyse.utils.fill_booleans(tables, translations, variables)[source]
internetnl_domain_analyse.utils.get_all_clean_urls(urls, show_progress=False, cache_directory=None)[source]
internetnl_domain_analyse.utils.get_option_mask(question_df, variables, question_type, valid_options=None)[source]

get the mask to filter the positive options from a question

internetnl_domain_analyse.utils.get_windows_or_linux_value(value)[source]

Pas de waarde aan als deze in een dict gegeven is met een windows en linux veld

internetnl_domain_analyse.utils.impose_variable_defaults(variables, module_info: dict | None = None, module_key: str | None = None)[source]

Impose default values to the variables data frame

Parameters:
  • variables (pd.DataFrame) – Dataframe with the initial variables

  • module_info (pd.DataFrame) – Dataframe with information per module

  • module_key (str) – Key of the module in the dataframe

Returns:

Filled dataframe

Return type:

pd.DataFrame

internetnl_domain_analyse.utils.prepare_stat_data_for_write(all_stats, file_base, variables, module_key, variable_key, breakdown_labels=None, n_digits=3, connection=None)[source]
internetnl_domain_analyse.utils.read_tables_from_sqlite(filename: Path, table_names, index_name) DataFrame[source]

Module contents