Skip to content

Commit 41133ed

Browse files
feat: add -f codepage flag for input/output encoding
Implements the -f flag for specifying input/output file encoding: - Format: codepage | i:codepage[,o:codepage] | o:codepage[,i:codepage] - Use 65001 for UTF-8 - --list-codepages shows all supported encodings Windows uses native MultiByteToWideChar/WideCharToMultiByte APIs for codepages not in the golang.org/x/text registry.
1 parent ca107b8 commit 41133ed

File tree

11 files changed

+1324
-5
lines changed

11 files changed

+1324
-5
lines changed

README.md

Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -151,6 +151,7 @@ The following switches have different behavior in this version of `sqlcmd` compa
151151
- To provide the value of the host name in the server certificate when using strict encryption, pass the host name with `-F`. Example: `-Ns -F myhost.domain.com`
152152
- More information about client/server encryption negotiation can be found at <https://docs.microsoft.com/openspecs/windows_protocols/ms-tds/60f56408-0188-4cd5-8b90-25c6f2423868>
153153
- `-u` The generated Unicode output file will have the UTF16 Little-Endian Byte-order mark (BOM) written to it.
154+
- `-f` Specifies the code page for input and output files. See [Code Page Support](#code-page-support) below for details and examples.
154155
- Some behaviors that were kept to maintain compatibility with `OSQL` may be changed, such as alignment of column headers for some data types.
155156
- All commands must fit on one line, even `EXIT`. Interactive mode will not check for open parentheses or quotes for commands and prompt for successive lines. The ODBC sqlcmd allows the query run by `EXIT(query)` to span multiple lines.
156157
- `-i` doesn't handle a comma `,` in a file name correctly unless the file name argument is triple quoted. For example:
@@ -255,6 +256,79 @@ To see a list of available styles along with colored syntax samples, use this co
255256
:list color
256257
```
257258

259+
### Code Page Support
260+
261+
The `-f` flag specifies the code page for reading input files and writing output. This is useful when working with SQL scripts saved in legacy encodings or when output needs to be in a specific encoding.
262+
263+
#### Format
264+
265+
```
266+
-f codepage # Set both input and output to the same codepage
267+
-f i:codepage # Set input codepage only
268+
-f o:codepage # Set output codepage only
269+
-f i:codepage,o:codepage # Set input and output to different codepages
270+
-f o:codepage,i:codepage # Same as above (order doesn't matter)
271+
```
272+
273+
#### Common Code Pages
274+
275+
| Code Page | Name | Description |
276+
|-----------|------|-------------|
277+
| 65001 | UTF-8 | Unicode (UTF-8) - default for most modern systems |
278+
| 1200 | UTF-16LE | Unicode (UTF-16 Little-Endian) |
279+
| 1201 | UTF-16BE | Unicode (UTF-16 Big-Endian) |
280+
| 1252 | Windows-1252 | Western European (Windows) |
281+
| 932 | Shift_JIS | Japanese |
282+
| 936 | GBK | Chinese Simplified |
283+
| 949 | EUC-KR | Korean |
284+
| 950 | Big5 | Chinese Traditional |
285+
| 437 | CP437 | OEM United States (DOS) |
286+
287+
#### Examples
288+
289+
**Run a script saved in Windows-1252 encoding:**
290+
```bash
291+
sqlcmd -S myserver -i legacy_script.sql -f 1252
292+
```
293+
294+
**Read UTF-16 input file and write UTF-8 output:**
295+
```bash
296+
sqlcmd -S myserver -i unicode_script.sql -o results.txt -f i:1200,o:65001
297+
```
298+
299+
**Process a Japanese Shift-JIS encoded script:**
300+
```bash
301+
sqlcmd -S myserver -i japanese_data.sql -f 932
302+
```
303+
304+
**Write output in Windows-1252 for legacy applications:**
305+
```bash
306+
sqlcmd -S myserver -Q "SELECT * FROM Products" -o report.txt -f o:1252
307+
```
308+
309+
**List all supported code pages:**
310+
```bash
311+
sqlcmd --list-codepages
312+
```
313+
314+
#### Notes
315+
316+
- When no `-f` flag is specified, sqlcmd auto-detects UTF-8/UTF-16LE/UTF-16BE BOM (Byte Order Mark) in input files and switches to the appropriate decoder. If no BOM is present, UTF-8 is assumed.
317+
- UTF-8 input files with BOM are handled automatically.
318+
- On Windows, additional codepages installed on the system are available via the Windows API, even if not shown by `--list-codepages`.
319+
- Use `--list-codepages` to see the built-in code pages with their names and descriptions.
320+
321+
#### Differences from ODBC sqlcmd
322+
323+
| Aspect | ODBC sqlcmd | go-sqlcmd |
324+
|--------|-------------|-----------|
325+
| **Default encoding (no BOM, no `-f`)** | Windows ANSI code page (locale-dependent, e.g., 1252) | UTF-8 |
326+
| **UTF-16 codepages (1200, 1201)** | Rejected by `IsValidCodePage()` API | Accepted |
327+
| **BOM detection** | Yes (UTF-8, UTF-16 LE/BE) | Yes (identical behavior) |
328+
| **`--list-codepages`** | Not available | Available |
329+
330+
**Migration note**: If you have UTF-8 encoded SQL scripts without a BOM that worked with ODBC sqlcmd on Windows, they should work identically or better with go-sqlcmd since go-sqlcmd defaults to UTF-8. However, if you have scripts in Windows ANSI encoding (e.g., Windows-1252) without a BOM, you may need to explicitly specify `-f 1252` with go-sqlcmd.
331+
258332
### Packages
259333

260334
#### sqlcmd executable

cmd/sqlcmd/sqlcmd.go

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -82,6 +82,11 @@ type SQLCmdArguments struct {
8282
ChangePassword string
8383
ChangePasswordAndExit string
8484
TraceFile string
85+
CodePage string
86+
// codePageSettings stores the parsed CodePageSettings after validation.
87+
// This avoids parsing CodePage twice (in Validate and run).
88+
codePageSettings *sqlcmd.CodePageSettings
89+
ListCodePages bool
8590
// Keep Help at the end of the list
8691
Help bool
8792
}
@@ -171,6 +176,12 @@ func (a *SQLCmdArguments) Validate(c *cobra.Command) (err error) {
171176
err = rangeParameterError("-t", fmt.Sprint(a.QueryTimeout), 0, 65534, true)
172177
case a.ServerCertificate != "" && !encryptConnectionAllowsTLS(a.EncryptConnection):
173178
err = localizer.Errorf("The -J parameter requires encryption to be enabled (-N true, -N mandatory, or -N strict).")
179+
case a.CodePage != "":
180+
if codePageSettings, parseErr := sqlcmd.ParseCodePage(a.CodePage); parseErr != nil {
181+
err = localizer.Errorf(`'-f %s': %v`, a.CodePage, parseErr)
182+
} else {
183+
a.codePageSettings = codePageSettings
184+
}
174185
}
175186
}
176187
if err != nil {
@@ -239,6 +250,17 @@ func Execute(version string) {
239250
listLocalServers()
240251
os.Exit(0)
241252
}
253+
// List supported codepages
254+
if args.ListCodePages {
255+
fmt.Println(localizer.Sprintf("Supported Code Pages:"))
256+
fmt.Println()
257+
fmt.Printf("%-8s %-20s %s\n", "Code", "Name", "Description")
258+
fmt.Printf("%-8s %-20s %s\n", "----", "----", "-----------")
259+
for _, cp := range sqlcmd.SupportedCodePages() {
260+
fmt.Printf("%-8d %-20s %s\n", cp.CodePage, cp.Name, cp.Description)
261+
}
262+
os.Exit(0)
263+
}
242264
if len(argss) > 0 {
243265
fmt.Printf("%s'%s': Unknown command. Enter '--help' for command help.", sqlcmdErrorPrefix, argss[0])
244266
os.Exit(1)
@@ -479,6 +501,8 @@ func setFlags(rootCmd *cobra.Command, args *SQLCmdArguments) {
479501
rootCmd.Flags().BoolVarP(&args.EnableColumnEncryption, "enable-column-encryption", "g", false, localizer.Sprintf("Enable column encryption"))
480502
rootCmd.Flags().StringVarP(&args.ChangePassword, "change-password", "z", "", localizer.Sprintf("New password"))
481503
rootCmd.Flags().StringVarP(&args.ChangePasswordAndExit, "change-password-exit", "Z", "", localizer.Sprintf("New password and exit"))
504+
rootCmd.Flags().StringVarP(&args.CodePage, "code-page", "f", "", localizer.Sprintf("Specifies the code page for input/output. Use 65001 for UTF-8. Format: codepage | i:codepage[,o:codepage] | o:codepage[,i:codepage]"))
505+
rootCmd.Flags().BoolVar(&args.ListCodePages, "list-codepages", false, localizer.Sprintf("List supported code pages and exit"))
482506
}
483507

484508
func setScriptVariable(v string) string {
@@ -817,6 +841,11 @@ func run(vars *sqlcmd.Variables, args *SQLCmdArguments) (int, error) {
817841
defer s.StopCloseHandler()
818842
s.UnicodeOutputFile = args.UnicodeOutputFile
819843

844+
// Apply codepage settings (already parsed and validated in Validate)
845+
if args.codePageSettings != nil {
846+
s.CodePage = args.codePageSettings
847+
}
848+
820849
if args.DisableCmd != nil {
821850
s.Cmd.DisableSysCommands(args.errorOnBlockedCmd())
822851
}

cmd/sqlcmd/sqlcmd_test.go

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -123,6 +123,22 @@ func TestValidCommandLineToArgsConversion(t *testing.T) {
123123
{[]string{"-N", "true", "-J", "/path/to/cert2.pem"}, func(args SQLCmdArguments) bool {
124124
return args.EncryptConnection == "true" && args.ServerCertificate == "/path/to/cert2.pem"
125125
}},
126+
// Codepage flag tests
127+
{[]string{"-f", "65001"}, func(args SQLCmdArguments) bool {
128+
return args.CodePage == "65001"
129+
}},
130+
{[]string{"-f", "i:1252,o:65001"}, func(args SQLCmdArguments) bool {
131+
return args.CodePage == "i:1252,o:65001"
132+
}},
133+
{[]string{"-f", "o:65001,i:1252"}, func(args SQLCmdArguments) bool {
134+
return args.CodePage == "o:65001,i:1252"
135+
}},
136+
{[]string{"--code-page", "1252"}, func(args SQLCmdArguments) bool {
137+
return args.CodePage == "1252"
138+
}},
139+
{[]string{"--list-codepages"}, func(args SQLCmdArguments) bool {
140+
return args.ListCodePages
141+
}},
126142
}
127143

128144
for _, test := range commands {
@@ -178,6 +194,11 @@ func TestInvalidCommandLine(t *testing.T) {
178194
{[]string{"-N", "optional", "-J", "/path/to/cert.pem"}, "The -J parameter requires encryption to be enabled (-N true, -N mandatory, or -N strict)."},
179195
{[]string{"-N", "disable", "-J", "/path/to/cert.pem"}, "The -J parameter requires encryption to be enabled (-N true, -N mandatory, or -N strict)."},
180196
{[]string{"-N", "strict", "-F", "myserver.domain.com", "-J", "/path/to/cert.pem"}, "The -F and the -J options are mutually exclusive."},
197+
// Codepage validation tests
198+
{[]string{"-f", "invalid"}, `'-f invalid': invalid codepage: invalid`},
199+
{[]string{"-f", "99999"}, `'-f 99999': unsupported codepage 99999`},
200+
{[]string{"-f", "i:invalid"}, `'-f i:invalid': invalid input codepage: i:invalid`},
201+
{[]string{"-f", "x:1252"}, `'-f x:1252': invalid codepage: x:1252`},
181202
}
182203

183204
for _, test := range commands {

0 commit comments

Comments
 (0)