Skip to content

Commit 37ee417

Browse files
committed
markdown source builds
Auto-generated via {sandpaper} Source : 84e75f6 Branch : main Author : Julika Mimkes <[email protected]> Time : 2024-03-19 12:33:19 +0000 Message : Merge pull request #169 from kaitlinnewson/formatting-fixes Copyedits and formatting fixes
1 parent 127ea6f commit 37ee417

13 files changed

+69
-73
lines changed

01-introduction.md

+17-16
Original file line numberDiff line numberDiff line change
@@ -53,7 +53,7 @@ into a field that should contain a number. Understanding the nature of relationa
5353
databases, and using SQL, will help you in using databases in programming languages
5454
such as R or Python.
5555

56-
Many web applications (including WordPress and ecommerce sites like Amazon) run on a SQL (relational) database. Understanding SQL is the first step in eventually building custom web applications that can serve data to users.
56+
Many web applications (including WordPress and e-commerce sites like Amazon) run on a SQL (relational) database. Understanding SQL is the first step in eventually building custom web applications that can serve data to users.
5757

5858
## Why are people working in library- and information-related roles well suited to SQL?
5959

@@ -77,7 +77,7 @@ direct way of finding information.
7777

7878
- You can use SQL to query your library database and explore new views that are not necessarily provided via library systems patron facing interfaces.
7979

80-
- SQL can be used to keep an inventory of items, for instance, for a library's makerspace, or it can be used to track licenses for journals.
80+
- SQL can be used to keep an inventory of items, for instance, for a library's makerspace, or it can be used to track licences for journals.
8181

8282
- For projects involving migrating and cleaning data from one system to another, SQL can be a handy tool.
8383

@@ -108,34 +108,35 @@ Let's all open the database we downloaded via the setup in DB Browser for SQLite
108108
You can see the tables in the database by looking at the left hand side of the
109109
screen under Tables.
110110

111-
To see the contents of a table, click on that table and then click on the Browse
112-
Data tab above the table data.
111+
To see the contents of a table, click on "Browse Data" then select the table in the "Table" dropdown in the upper left corner.
113112

114-
If we want to write a query, we click on the Execute SQL tab.
113+
If we want to write a query, we click on the "Execute SQL" tab.
115114

116115
There are two ways to add new data to a table without writing SQL:
117116

118117
1. Enter data into a CSV file and append
119118
2. Click the "Browse Data" tab, then click the "New Record" button.
120119

121-
The steps for adding data from a CSV file are:
120+
To add data from a CSV file:
122121

123-
1. Choose "File" > "Import" > "Table" from CSV file...
124-
2. DB Browser for SQLite will prompt you if you want to add the data to the existing table.
122+
1. Choose "File" > "Import" > "Table from CSV file..."
123+
2. Select a CSV file to import
124+
3. Review the import settings and confirm that the column names and fields are correct
125+
4. Click "OK" to import the data. If the table name matches an existing table and the number of columns match, DB Browser will ask if you want to add the data to the existing table.
125126

126127
## Dataset Description
127128

128-
The data we will be using consists of 5 csv files that contain tables of article titles, journals, languages, licenses, and publishers. The information in these tables are from a sample of 51 different journals published during 2015.
129+
The data we will use was created from 5 csv files that contain tables of article titles, journals, languages, licences, and publishers. The information in these tables are from a sample of 51 different journals published during 2015.
129130

130131
**articles**
131132

132-
- Contains individual article Titles and the associated citations and metadata
133+
- Contains individual article titles and the associated citations and metadata.
133134
- (16 fields, 1001 records)
134-
- Field names: `id`, `Title`, `Authors`, `DOI`, `URL`, `Subjects`, `ISSNs`, `Citation`, `LanguageID`, `LicenseID`, `Author_Count`, `First_Author`, `Citation_Count`, `Day`, `Month`, `Year`
135+
- Field names: `id`, `Title`, `Authors`, `DOI`, `URL`, `Subjects`, `ISSNs`, `Citation`, `LanguageID`, `LicenceID`, `Author_Count`, `First_Author`, `Citation_Count`, `Day`, `Month`, `Year`
135136

136137
**journals**
137138

138-
- Contains various journal Titles and associated metadata. The table also associates Journal Titles with ISSN numbers that are then referenced in the 'articles' table by the `ISSNs` field.
139+
- Contains various journal titles and associated metadata. The table also associates Journal Titles with ISSN numbers that are then referenced in the 'articles' table by the `ISSNs` field.
139140
- (5 fields, 51 records)
140141
- Field names: `id`, `ISSN-L`,`ISSNs`, `PublisherID`, `Journal_Title`
141142

@@ -145,9 +146,9 @@ The data we will be using consists of 5 csv files that contain tables of article
145146
- (2 fields, 4 records)
146147
- Field names: `id`, `Language`
147148

148-
**licenses**
149+
**licences**
149150

150-
- ID table which associates License codes with id numbers. These id numbers are then referenced in the 'articles' table by the `LicenseID` field.
151+
- ID table which associates Licence codes with id numbers. These id numbers are then referenced in the 'articles' table by the `LicenceID` field.
151152
- (2 fields, 4 records)
152153
- Field names: `id`, `Licence`
153154

@@ -163,14 +164,14 @@ The main data types that are used in doaj-article-sample database are `INTEGER`
163164

164165
## SQL Data Type Quick Reference
165166

166-
Different database software/platforms have different names and sometimes different definitions of data types, so you'll need to understand the data types for any platform you are using. The following table explains some of the common data types and how they are represented in SQLite; [more details available on the SQLite website](https://www.sqlite.org/datatype3.html).
167+
Different database software/platforms have different names and sometimes different definitions of data types, so you'll need to understand the data types for any platform you are using. The following table explains some of the common data types and how they are represented in SQLite; [more details available on the SQLite website](https://www.sqlite.org/datatype3.html).
167168

168169
| Data type | Details | Name in SQLite |
169170
| :--------------------- |:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| :-------------------------------------------------------------------------------------------------------------------- |
170171
| boolean or binary | this variable type is often used to represent variables that can only have two values: yes or no, true or false. | doesn't exist - need to use integer data type and values of 0 or 1. |
171172
| integer | sometimes called whole numbers or counting numbers. Can be 1, 2, 3, etc., as well as 0 and negative whole numbers: -1, -2, -3, etc. | INTEGER |
172173
| float, real, or double | a decimal number or a floating point value. The largest possible size of the number may be specified. | REAL |
173-
| text or string | and combination of numbers, letters, symbols. Platforms may have different data types: one for variables with a set number of characters - e.g., a zip code or postal code, and one for variables with an open number of characters, e.g., an address or description variable. | TEXT |
174+
| text or string | any combination of numbers, letters, symbols. Platforms may have different data types: one for variables with a set number of characters - e.g., a zip code or postal code, and one for variables with an open number of characters, e.g., an address or description variable. | TEXT |
174175
| date or datetime | depending on the platform, may represent the date and time or the number of days since a specified date. This field often has a specified format, e.g., YYYY-MM-DD | doesn't exist - need to use built-in date and time functions and store dates in real, integer, or text formats. See [Section 2.2 of SQLite documentation](https://www.sqlite.org/datatype3.html#date_and_time_datatype) for more details. |
175176
| blob | a Binary Large OBject can store a large amount of data, documents, audio or video files. | BLOB |
176177

02-selecting-sorting-data.md

+3-3
Original file line numberDiff line numberDiff line change
@@ -22,11 +22,11 @@ exercises: 5
2222

2323
## What is a query?
2424

25-
A query is a question or request for data. For example, "How many journals does our library subscribe to?" When we query a database, we can ask the same question using a common language called Structured Query Language or SQL in what is called a statement. Some of the most useful queries - the ones we are introducing in this first section - are used to return results from a table that match specific criteria.
25+
A query is a question or request for data. For example, "How many journals does our library subscribe to?". When we query a database, we can ask the same question using Structured Query Language (SQL) in what is called a statement. Some of the most useful queries - the ones we are introducing in this first section - are used to return results from a table that match specific criteria.
2626

2727
## Writing my first query
2828

29-
Let's start by opening DB Browser for SQLite and the doaj-article-sample database (see Setup). Choose `Browse Data` and the `articles` table. The articles table contains columns or fields such as `Title`, `Authors`, `DOI`, `URL`, etc.
29+
Let's start by opening DB Browser for SQLite and the doaj-article-sample database (see [Setup](/)). Click "Browse Data" and select the `articles` table in the "Table" dropdown menu. The articles table contains columns or fields such as `Title`, `Authors`, `DOI`, `URL`, etc.
3030

3131
Let's write a SQL query that selects only the `Title` column from the `articles` table.
3232

@@ -60,7 +60,7 @@ SELECT Title, Authors, ISSNs, Year, DOI
6060
FROM articles;
6161
```
6262

63-
Or we can select all of the columns in a table using the wildcard `*`.
63+
Or we can select all of the columns in a table using the wildcard `*`:
6464

6565
```sql
6666
SELECT *

03-filtering.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ exercises: 10
2020

2121
## Filtering
2222

23-
SQL is a powerful tool for filtering data in databases based on a set of conditions. Let's say we only want data for a specific ISSN, for instance, for the *Acta Crystallographica* journal from the `articles` table. The journal has an ISSN code `2056-9890`. To filter by this ISSN code, we will use the `WHERE` clause.
23+
SQL is a powerful tool for filtering data in databases based on a set of conditions. Let's say we only want data for a specific ISSN, for instance, for the *Acta Crystallographica* journal from the `articles` table. The journal has an ISSN code `2056-9890`. To filter by this ISSN code, we will use the `WHERE` clause.
2424

2525
```sql
2626
SELECT *

04-ordering-commenting.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -69,7 +69,7 @@ WHERE (ISSNs IN ('2076-0787', '2077-1444', '2067-2764|2247-6202'));
6969
```
7070

7171
We started with something simple, then added more clauses one by one, testing
72-
their effects as we went along. For complex queries, this is a good strategy, to make sure you are getting what you want. Sometimes it might help to take a subset of the data that you can easily see in a temporary database to practice your queries on before working on a larger or more complicated database.
72+
their effects as we went along. For complex queries, this is a good strategy, to make sure you are getting what you want. Sometimes it might help to take a subset of the data that you can easily see in a temporary database to practice your queries on before working on a larger or more complicated database.
7373

7474
When the queries become more complex, it can be useful to add comments to express to yourself, or to others, what you are doing with your query. Comments help explain the logic of a section and provide context for anyone reading the query. It's essentially a way of making notes within your SQL. In SQL, comments begin using <code class="language-plaintext highlighter-rouge">\--</code> and end at the end of the line. To mark a whole paragraph as a comment, you can enclose it with the characters /\* and \*/. For example, a commented version of the above query can be written as:
7575

@@ -95,11 +95,11 @@ ON publishers.id = journals.PublisherId;
9595
```
9696

9797
To see the introduction and explanation of JOINS, please click to [Episode 6](06-joins-aliases.md).
98-
{: .sql}
9998

10099
:::::::::::::::::::::::::::::::::::::::: keypoints
101100

102101
- Queries often have the structure: SELECT data FROM table WHERE certain criteria are present.
102+
- Comments can make our queries easier to read and understand.
103103

104104
::::::::::::::::::::::::::::::::::::::::::::::::::
105105

05-aggregating-calculating.md

+3-2
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
title: Aggregating & calculating values
2+
title: Aggregating and calculating values
33
teaching: 15
44
exercises: 5
55
---
@@ -108,7 +108,8 @@ In SQL, we can also perform calculations as we query the database. Also known as
108108
```sql
109109
SELECT Title, ISSNs, Author_Count - 1 as CoAuthor_Count
110110
FROM articles
111-
ORDER BY Author_Count - 1 DESC;
111+
ORDER BY CoAuthor_Count DESC;
112+
112113
```
113114

114115
In section [6\. Joins and aliases](06-joins-aliases.md) we are going to learn more about the SQL keyword `AS` and how to make use of aliases - in this example we simply used the calculation and `AS` to represent that the new column is different from the original SQL table data.

08-database-design.md

+5-7
Original file line numberDiff line numberDiff line change
@@ -52,11 +52,11 @@ Database design involves a model or plan developed to determine how the data is
5252

5353
## Terminology
5454

55-
<img src="fig/field-record-value.png" alt="Fields, Records, Values" width="500"/>
55+
![](fig/field-record-value.png){alt='Fields, Records, Values'}
5656

5757
In the [Introduction to SQL](01-introduction.md) lesson, we introduced the terms "fields", "records", and "values". These terms are commonly used in databases while the "columns", "rows", and "cells" terms are more common in spreadsheets. Fields store a single kind of information (text, integers, etc.) related to one topic (title, author, year), while records are a set of fields containing specific values related to one item in your database (a book, a person, a library).
5858

59-
To design a database, we must first decide what kinds of things we want to represent as tables. A table is the physical manifestation of a kind of "entity". An entity is the conceptual representation of the thing we want to store informtation about in the database, with each row containing information about one entity. An entity has "attributes" that describe it, represented as fields. For example, an article or a journal is an entity. Attributes would be things like the article title, or journal ISSN which would appear as fields.
59+
To design a database, we must first decide what kinds of things we want to represent as tables. A table is the physical manifestation of a kind of "entity". An entity is the conceptual representation of the thing we want to store information about in the database, with each row containing information about one entity. An entity has "attributes" that describe it, represented as fields. For example, an article or a journal is an entity. Attributes would be things like the article title, or journal ISSN which would appear as fields.
6060

6161
To create relationships between tables later on, it is important to designate one column as a primary key. A primary key, often designated as PK, is one attribute of an entity that distinguishes it from the other entities (or records) in your table. The primary key must be unique for each row for this to work. A common way to create a primary key in a table is to make an 'id' field that contains an auto-generated integer that increases by 1 for each new record. This will ensure that your primary key is unique.
6262

@@ -68,11 +68,11 @@ ERDs are helpful tools for visualising and structuring your data more efficientl
6868

6969
![](https://user-images.githubusercontent.com/30397506/115917162-6cc7ef00-a43b-11eb-97af-16fe50caa6a6.png){alt='Articles Database'}
7070

71-
*Or you can view the [dbdiagram.io interactive version of the ERD](https://dbdiagram.io/d/5cc32b0cf7c5bb70c72fc530)*
71+
*Or you can view the [dbdiagram.io interactive version of the ERD](https://dbdiagram.io/d/5cc32b0cf7c5bb70c72fc530).*
7272

7373
Relationships between entities and their attributes are represented by lines linking them together. For example, the line linking journals and publishers is interpreted as follows: The 'journals' entity is related to the 'publishers' entity through the attributes 'PublisherId' and 'id' respectively.
7474

75-
Conceptually, we know that a journal has only one publisher but a publisher can publish many journals. This is known as a one-to-many relationship. In modeling relationships, we usually assign a unique identifier to the 'one' side of the relationship and use that same identifier to refer to that entity on the 'many' side. In 'publishers' table, the 'id' attribute is that unique identifier. We use that same identifier in the 'journals' table to refer to an individual publisher. That way, there is an unambiguous way for us to distinguish which journals are associated with which publisher in a way that keeps the integrity of the data (see the Normalization section below).
75+
Conceptually, we know that a journal has only one publisher but a publisher can publish many journals. This is known as a one-to-many relationship. In modeling relationships, we usually assign a unique identifier to the 'one' side of the relationship and use that same identifier to refer to that entity on the 'many' side. In 'publishers' table, the 'id' attribute is that unique identifier. We use that same identifier in the 'journals' table to refer to an individual publisher. That way, there is an unambiguous way for us to distinguish which journals are associated with which publisher in a way that keeps the integrity of the data (see [the Normalisation section](#normalisation) below).
7676

7777
## More Terminology
7878

@@ -95,7 +95,7 @@ ERDs are helpful in normalising your data which is a process that can be used to
9595

9696
In the example ERD above, creating a separate table for publishers and linking to it from the journals table via PK and FK identifiers allows us to normalise the data and avoid inconsistencies. If we used one table, we could introduce publisher name errors such as misspellings or alternate names as demonstrated below.
9797

98-
![](fig/normalisation.png){alt='Introducting inconsistencies and normalising data'}
98+
![](fig/normalisation.png){alt='Introducing inconsistencies and normalising data'}
9999

100100
There are a number of normal forms in the normalisation process that can help you reduce redundancy in database tables. [Study Tonight](https://www.studytonight.com/dbms/database-normalization.php) features tutorials where you can learn more about them.
101101

@@ -129,5 +129,3 @@ Additional database design tutorials to consult from Lucidchart:
129129
- Database design is helpful for creating more efficient databases.
130130

131131
::::::::::::::::::::::::::::::::::::::::::::::::::
132-
133-

0 commit comments

Comments
 (0)