ph_ny_mat_sci/dz_les_2.ipynb
2022-07-12 13:29:12 +03:00

969 lines
30 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

{
"cells": [
{
"cell_type": "markdown",
"id": "70f93a61",
"metadata": {},
"source": [
"## Тема “Вычисления с помощью Numpy”"
]
},
{
"cell_type": "markdown",
"id": "870fedf1",
"metadata": {},
"source": [
"###### Задание 1\n",
"Импортируйте библиотеку Numpy и дайте ей псевдоним np.\n",
"Создайте массив Numpy под названием a размером 5x2, то есть состоящий из 5 строк и 2 столбцов. Первый столбец должен содержать числа 1, 2, 3, 3, 1, а второй - числа 6, 8, 11, 10, 7. Будем считать, что каждый столбец - это признак, а строка - наблюдение. Затем найдите среднее значение по каждому признаку, используя метод mean массива Numpy. Результат запишите в массив mean_a, в нем должно быть 2 элемента.\n"
]
},
{
"cell_type": "code",
"execution_count": 249,
"id": "6b380c8a",
"metadata": {},
"outputs": [],
"source": [
"import numpy as np"
]
},
{
"cell_type": "code",
"execution_count": 250,
"id": "e456de05",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[ 1, 6],\n",
" [ 2, 8],\n",
" [ 3, 11],\n",
" [ 3, 10],\n",
" [ 1, 7]])"
]
},
"execution_count": 250,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"a = np.array([[1, 2, 3, 3, 1],[ 6, 8, 11, 10, 7]]).T\n",
"a"
]
},
{
"cell_type": "code",
"execution_count": 251,
"id": "a03ab87d",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([2. , 8.4])"
]
},
"execution_count": 251,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"mean_a = np.mean(a, axis=0)\n",
"mean_a"
]
},
{
"cell_type": "markdown",
"id": "1055a5cc",
"metadata": {},
"source": [
"###### Задание 2\n",
"Вычислите массив a_centered, отняв от значений массива “а” средние значения соответствующих признаков, содержащиеся в массиве mean_a. Вычисление должно производиться в одно действие. Получившийся массив должен иметь размер 5x2"
]
},
{
"cell_type": "code",
"execution_count": 252,
"id": "f8e2828c",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[-1. , -2.4],\n",
" [ 0. , -0.4],\n",
" [ 1. , 2.6],\n",
" [ 1. , 1.6],\n",
" [-1. , -1.4]])"
]
},
"execution_count": 252,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"a_centered = np.vstack((a[:,0]-mean_a[0], a[:,1]-mean_a[1])).T\n",
"a_centered"
]
},
{
"cell_type": "code",
"execution_count": 253,
"id": "433c0d3c",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(5, 2)"
]
},
"execution_count": 253,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"a_centered.shape"
]
},
{
"cell_type": "markdown",
"id": "1b41f215",
"metadata": {},
"source": [
"###### Задание 3\n",
"Найдите скалярное произведение столбцов массива a_centered. В результате должна получиться величина a_centered_sp. Затем поделите a_centered_sp на N-1, где N - число наблюдений."
]
},
{
"cell_type": "code",
"execution_count": 254,
"id": "5b9124f0",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"8.0"
]
},
"execution_count": 254,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"a_centered_sp = a_centered[:,0] @ a_centered[:,1]\n",
"a_centered_sp"
]
},
{
"cell_type": "code",
"execution_count": 255,
"id": "09840068",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"2.0"
]
},
"execution_count": 255,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"a_centered_sp / (a.shape[0] - 1)"
]
},
{
"cell_type": "markdown",
"id": "eb344eb7",
"metadata": {},
"source": [
"###### Задание 4**\n",
"Число, которое мы получили в конце задания 3 является ковариацией двух признаков, содержащихся в массиве “а”. В задании 4 мы делили сумму произведений центрированных признаков на N-1, а не на N, поэтому полученная нами величина является несмещенной оценкой ковариации."
]
},
{
"cell_type": "markdown",
"id": "7c02908a",
"metadata": {},
"source": [
"В этом задании проверьте получившееся число, вычислив ковариацию еще одним способом - с помощью функции np.cov. В качестве аргумента m функция np.cov должна принимать транспонированный массив “a”. В получившейся ковариационной матрице (массив Numpy размером 2x2) искомое значение ковариации будет равно элементу в строке с индексом 0 и столбце с индексом "
]
},
{
"cell_type": "code",
"execution_count": 256,
"id": "d478c468",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"2.0"
]
},
"execution_count": 256,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.cov(a.T)[0][1]"
]
},
{
"cell_type": "markdown",
"id": "b0b213c8",
"metadata": {},
"source": [
"## Тема “Работа с данными в Pandas”"
]
},
{
"cell_type": "markdown",
"id": "101bae0d",
"metadata": {},
"source": [
"###### Задание 1\n",
"Импортируйте библиотеку Pandas и дайте ей псевдоним pd. Создайте датафрейм authors со столбцами author_id и author_name, в которых соответственно содержатся данные: [1, 2, 3] и ['Тургенев', 'Чехов', 'Островский'].\n",
"Затем создайте датафрейм book cо столбцами author_id, book_title и price, в которых соответственно содержатся данные: \n",
"[1, 1, 1, 2, 2, 3, 3],\n",
"['Отцы и дети', 'Рудин', 'Дворянское гнездо', 'Толстый и тонкий', 'Дама с собачкой', 'Гроза', 'Таланты и поклонники'],\n",
"[450, 300, 350, 500, 450, 370, 290]."
]
},
{
"cell_type": "code",
"execution_count": 257,
"id": "ebe7d756",
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd"
]
},
{
"cell_type": "code",
"execution_count": 258,
"id": "36c112a2",
"metadata": {},
"outputs": [],
"source": [
"authors = pd.DataFrame({\n",
" \"author_id\": [1, 2, 3],\n",
" \"author_name\": ['Тургенев', 'Чехов', 'Островский']\n",
"})\n",
"book = pd.DataFrame({\n",
" \"author_id\": [1, 1, 1, 2, 2, 3, 3],\n",
" \"book_title\": ['Отцы и дети', 'Рудин', 'Дворянское гнездо', 'Толстый и тонкий', 'Дама с собачкой', 'Гроза', 'Таланты и поклонники'],\n",
" \"price\": [450, 300, 350, 500, 450, 370, 290]\n",
"})"
]
},
{
"cell_type": "markdown",
"id": "23d8fcee",
"metadata": {},
"source": [
"###### Задание 2\n",
"Получите датафрейм authors_price, соединив датафреймы authors и books по полю author_id."
]
},
{
"cell_type": "code",
"execution_count": 259,
"id": "1bf02209",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>author_id</th>\n",
" <th>author_name</th>\n",
" <th>book_title</th>\n",
" <th>price</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1</td>\n",
" <td>Тургенев</td>\n",
" <td>Отцы и дети</td>\n",
" <td>450</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1</td>\n",
" <td>Тургенев</td>\n",
" <td>Рудин</td>\n",
" <td>300</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>1</td>\n",
" <td>Тургенев</td>\n",
" <td>Дворянское гнездо</td>\n",
" <td>350</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>2</td>\n",
" <td>Чехов</td>\n",
" <td>Толстый и тонкий</td>\n",
" <td>500</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>2</td>\n",
" <td>Чехов</td>\n",
" <td>Дама с собачкой</td>\n",
" <td>450</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>3</td>\n",
" <td>Островский</td>\n",
" <td>Гроза</td>\n",
" <td>370</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>3</td>\n",
" <td>Островский</td>\n",
" <td>Таланты и поклонники</td>\n",
" <td>290</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" author_id author_name book_title price\n",
"0 1 Тургенев Отцы и дети 450\n",
"1 1 Тургенев Рудин 300\n",
"2 1 Тургенев Дворянское гнездо 350\n",
"3 2 Чехов Толстый и тонкий 500\n",
"4 2 Чехов Дама с собачкой 450\n",
"5 3 Островский Гроза 370\n",
"6 3 Островский Таланты и поклонники 290"
]
},
"execution_count": 259,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"authors_price = pd.merge(authors, book, on='author_id', how='inner')\n",
"authors_price"
]
},
{
"cell_type": "markdown",
"id": "c2eaa126",
"metadata": {},
"source": [
"###### Задание 3\n",
"Создайте датафрейм top5, в котором содержатся строки из authors_price с пятью самыми дорогими книгами."
]
},
{
"cell_type": "code",
"execution_count": 260,
"id": "52d7df3e",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>author_id</th>\n",
" <th>author_name</th>\n",
" <th>book_title</th>\n",
" <th>price</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>1</td>\n",
" <td>Тургенев</td>\n",
" <td>Дворянское гнездо</td>\n",
" <td>350</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>3</td>\n",
" <td>Островский</td>\n",
" <td>Гроза</td>\n",
" <td>370</td>\n",
" </tr>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1</td>\n",
" <td>Тургенев</td>\n",
" <td>Отцы и дети</td>\n",
" <td>450</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>2</td>\n",
" <td>Чехов</td>\n",
" <td>Дама с собачкой</td>\n",
" <td>450</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>2</td>\n",
" <td>Чехов</td>\n",
" <td>Толстый и тонкий</td>\n",
" <td>500</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" author_id author_name book_title price\n",
"2 1 Тургенев Дворянское гнездо 350\n",
"5 3 Островский Гроза 370\n",
"0 1 Тургенев Отцы и дети 450\n",
"4 2 Чехов Дама с собачкой 450\n",
"3 2 Чехов Толстый и тонкий 500"
]
},
"execution_count": 260,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"top5 = (authors_price.sort_values('price')).tail(5)\n",
"top5"
]
},
{
"cell_type": "markdown",
"id": "d3beda82",
"metadata": {},
"source": [
"###### Задание 4\n",
"Создайте датафрейм authors_stat на основе информации из authors_price. В датафрейме authors_stat должны быть четыре столбца:\n",
"author_name, min_price, max_price и mean_price,\n",
"в которых должны содержаться соответственно имя автора, минимальная, максимальная и средняя цена на книги этого автора"
]
},
{
"cell_type": "code",
"execution_count": 261,
"id": "ec5b3bc8",
"metadata": {},
"outputs": [],
"source": [
"authors_stat = authors_price.groupby(\"author_name\")\n"
]
},
{
"cell_type": "code",
"execution_count": 262,
"id": "e3688ea4",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>min_price</th>\n",
" <th>max_price</th>\n",
" <th>mean_price</th>\n",
" </tr>\n",
" <tr>\n",
" <th>author_name</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>Островский</th>\n",
" <td>370</td>\n",
" <td>290</td>\n",
" <td>330.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Тургенев</th>\n",
" <td>450</td>\n",
" <td>300</td>\n",
" <td>366.666667</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Чехов</th>\n",
" <td>500</td>\n",
" <td>450</td>\n",
" <td>475.000000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" min_price max_price mean_price\n",
"author_name \n",
"Островский 370 290 330.000000\n",
"Тургенев 450 300 366.666667\n",
"Чехов 500 450 475.000000"
]
},
"execution_count": 262,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"authors_stat_price = pd.concat([authors_stat.agg({\"price\": \"max\" }), authors_stat.agg({\"price\": \"min\" }), authors_stat.agg({\"price\": \"mean\" })], axis=1, ignore_index=True);\n",
"authors_stat_price.columns= [\"min_price\",\"max_price\",\"mean_price\"]\n",
"authors_stat_price"
]
},
{
"cell_type": "markdown",
"id": "f9b0606e",
"metadata": {},
"source": [
"###### Задание 5**\n",
"Создайте новый столбец в датафрейме authors_price под названием cover, в нем будут располагаться данные о том, какая обложка у данной книги - твердая или мягкая. В этот столбец поместите данные из следующего списка:\n",
"['твердая', 'мягкая', 'мягкая', 'твердая', 'твердая', 'мягкая', 'мягкая'].\n",
"Просмотрите документацию по функции pd.pivot_table с помощью вопросительного знака.\n",
"\n",
"Для каждого автора посчитайте суммарную стоимость книг в твердой и мягкой обложке. Используйте для этого функцию pd.pivot_table. При этом столбцы должны называться \"твердая\" и \"мягкая\", а индексами должны быть фамилии авторов. Пропущенные значения стоимостей заполните нулями, при необходимости загрузите библиотеку Numpy.\n",
"Назовите полученный датасет book_info и сохраните его в формат pickle под названием \"book_info.pkl\". Затем загрузите из этого файла датафрейм и назовите его book_info2. Удостоверьтесь, что датафреймы book_info и book_info2 идентичны."
]
},
{
"cell_type": "code",
"execution_count": 263,
"id": "dfb4e399",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>author_id</th>\n",
" <th>author_name</th>\n",
" <th>book_title</th>\n",
" <th>price</th>\n",
" <th>cover</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1</td>\n",
" <td>Тургенев</td>\n",
" <td>Отцы и дети</td>\n",
" <td>450</td>\n",
" <td>твердая</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1</td>\n",
" <td>Тургенев</td>\n",
" <td>Рудин</td>\n",
" <td>300</td>\n",
" <td>мягкая</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>1</td>\n",
" <td>Тургенев</td>\n",
" <td>Дворянское гнездо</td>\n",
" <td>350</td>\n",
" <td>мягкая</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>2</td>\n",
" <td>Чехов</td>\n",
" <td>Толстый и тонкий</td>\n",
" <td>500</td>\n",
" <td>твердая</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>2</td>\n",
" <td>Чехов</td>\n",
" <td>Дама с собачкой</td>\n",
" <td>450</td>\n",
" <td>твердая</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>3</td>\n",
" <td>Островский</td>\n",
" <td>Гроза</td>\n",
" <td>370</td>\n",
" <td>мягкая</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>3</td>\n",
" <td>Островский</td>\n",
" <td>Таланты и поклонники</td>\n",
" <td>290</td>\n",
" <td>мягкая</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" author_id author_name book_title price cover\n",
"0 1 Тургенев Отцы и дети 450 твердая\n",
"1 1 Тургенев Рудин 300 мягкая\n",
"2 1 Тургенев Дворянское гнездо 350 мягкая\n",
"3 2 Чехов Толстый и тонкий 500 твердая\n",
"4 2 Чехов Дама с собачкой 450 твердая\n",
"5 3 Островский Гроза 370 мягкая\n",
"6 3 Островский Таланты и поклонники 290 мягкая"
]
},
"execution_count": 263,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"authors_price[\"cover\"] = ['твердая', 'мягкая', 'мягкая', 'твердая', 'твердая', 'мягкая', 'мягкая']\n",
"authors_price"
]
},
{
"cell_type": "code",
"execution_count": 264,
"id": "f631cfe7",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th></th>\n",
" <th>price</th>\n",
" </tr>\n",
" <tr>\n",
" <th>author_name</th>\n",
" <th>cover</th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>Островский</th>\n",
" <th>мягкая</th>\n",
" <td>660</td>\n",
" </tr>\n",
" <tr>\n",
" <th rowspan=\"2\" valign=\"top\">Тургенев</th>\n",
" <th>мягкая</th>\n",
" <td>650</td>\n",
" </tr>\n",
" <tr>\n",
" <th>твердая</th>\n",
" <td>450</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Чехов</th>\n",
" <th>твердая</th>\n",
" <td>950</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" price\n",
"author_name cover \n",
"Островский мягкая 660\n",
"Тургенев мягкая 650\n",
" твердая 450\n",
"Чехов твердая 950"
]
},
"execution_count": 264,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"book_info = pd.pivot_table(authors_price, index = [\"author_name\",\"cover\"], values=[\"price\"], aggfunc=np.sum, fill_value=0)\n",
"book_info"
]
},
{
"cell_type": "code",
"execution_count": 271,
"id": "d265050d",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th></th>\n",
" <th>price</th>\n",
" </tr>\n",
" <tr>\n",
" <th>author_name</th>\n",
" <th>cover</th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>Островский</th>\n",
" <th>мягкая</th>\n",
" <td>660</td>\n",
" </tr>\n",
" <tr>\n",
" <th rowspan=\"2\" valign=\"top\">Тургенев</th>\n",
" <th>мягкая</th>\n",
" <td>650</td>\n",
" </tr>\n",
" <tr>\n",
" <th>твердая</th>\n",
" <td>450</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Чехов</th>\n",
" <th>твердая</th>\n",
" <td>950</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" price\n",
"author_name cover \n",
"Островский мягкая 660\n",
"Тургенев мягкая 650\n",
" твердая 450\n",
"Чехов твердая 950"
]
},
"execution_count": 271,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"book_info.to_pickle(\"book_info.pkl\")\n",
"book_info2 = pd.read_pickle(\"book_info.pkl\")\n",
"book_info2"
]
},
{
"cell_type": "markdown",
"id": "3228ed7f",
"metadata": {},
"source": [
"Найдем различие"
]
},
{
"cell_type": "code",
"execution_count": 272,
"id": "f2e5762e",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th></th>\n",
" <th>price</th>\n",
" </tr>\n",
" <tr>\n",
" <th>author_name</th>\n",
" <th>cover</th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
"Empty DataFrame\n",
"Columns: [price]\n",
"Index: []"
]
},
"execution_count": 272,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"book_info2[ ~book_info2.isin(book_info)].dropna()"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.12"
}
},
"nbformat": 4,
"nbformat_minor": 5
}