-2

I have an input text file with a single line having thousands of records one after another. I want to split them after length of 10 characters each.

**Input Record** - 
====================== Begin of data =========================
 abcdefghijklmnopqrstuvwxyz1234567890       <= Input file having all records on single line
====================== End of data   =========================

**Expected output** -
====================== Begin of data =========================
 abcdefghij            <= each line of 10 characters
 klmnopqrst
 uvwxyz1234
 567890
====================== End of data   =========================

Please help me to do this using batch script.

Try using Notepad++

It worked well using Notepad++ with below regular expressions -

Find => (?-s).{10}
Replace => ${0}\\r\\n

The record length of 10 above was used only for simplicity, the actual record length is 800 bytes. There are 50 thousand records in each line.

3
  • 1
    probably doable in batch, but consider using a language with proper REGEX support (VBA, PowerShell, ...) How long can the original lines get? (batch has a string length limitation) Commented Aug 14, 2024 at 8:22
  • 1
    The other thing you should clarify is the character encoding of the file. 800 bytes is not necessarily 800 characters, so the clarification is important. Commented Aug 14, 2024 at 10:27
  • 1
    the accepted answer is actually the not working answer, you should reselect it Commented Aug 14, 2024 at 23:46

3 Answers 3

3
@echo off
setlocal EnableDelayedExpansion

set "recLen=10"
set "chunk="
call :splitFile  < input.txt  > output.txt
goto :EOF


:splitFile

:nextChunk
rem Read next chunk and join to (remaining) previous one
set "newChunk="
set /P "newChunk="
if not defined newChunk goto EndOfFile
set "chunk=!chunk!!newChunk!"

rem Break current chunk in records of the required size
:nextRec
   echo !chunk:~0,%recLen%!
   set "chunk=!chunk:~%recLen%!"
   if not defined chunk goto nextChunk
if "!chunk:~%recLen%!" neq "" goto nextRec
goto nextChunk

:EndOfFile
if defined chunk echo !chunk!
exit /B

This method should work with records up to 1022 characters long. You can read further details about this method at this answer or this one.

Sign up to request clarification or add additional context in comments.

2 Comments

Mmm... Did you tested my code? I ask you because you selected as Best Answer one that does not solve your problem... :/
I selected that answer as correct answer because there is nothing wrong in logic based on information I gave in my first post. Only after trying it I realised that 8K is limit. Someone can find it useful. I also tested your code and is perfect! Thank you!
0
@ECHO OFF
SETLOCAL
rem The following settings for the directories and filenames are names
rem that I use for testing and deliberately includes spaces to make sure
rem that the process works using such names. These will need to be changed to suit your situation.

SET "sourcedir=u:\your files"
SET "destdir=u:\your results"
SET "filename1=%sourcedir%\q78869753.txt"
SET "outfile=%destdir%\outfile.txt"

(
FOR /f "usebackqdelims=" %%e IN ("%filename1%") DO SET "line=%%e"&CALL :sub
)>"%outfile%"

GOTO :EOF

:sub
IF NOT DEFINED line GOTO :eof
ECHO %line:~0,10%
SET "line=%line:~10%"
GOTO sub

Note that if the filename does not contain separators like spaces, then both usebackq and the quotes around %filename1% can be omitted.

You would need to change the values assigned to sourcedir and destdir to suit your circumstances. The listing uses a setting that suits my system.

I deliberately include spaces in names to ensure that the spaces are processed correctly.

I used a file named q78869753.txt containing your data plus some dummy data for my testing.

Produces the file defined as %outfile%

for documentation, see set /? for /? call /? from the prompt or or endless examples on SO.

8 Comments

"thousands of records on a single line" prevented me from offering a very similar solution. With a record length of 10, that would easily exceed the string length of ~8k
Record length 10 was taken as example but actually record length is 800 bytes and there are 50 thousand records in single line.
Magoo, Thank you for the solution. It is working for 10 records of 8000 bytes long record . For 8800 bytes long record, it is not working. How to resolve this issue as I mentioned above that my file is having 50000 records of 800 length in a single line. Please advise!
Batch has a limit of a little over 8000 bytes in a string variable. You're probably better-off using sed or (g)awk which are designed for the task. These utilities are available free-to-use, just use Google.
there's no need to use sed or awk. It can be easily done with built-in Windows tools without any line length limitation
|
0

It's simpler in PowerShell where there's no limit. For files that aren't very big you can use this one-liner

$ (Get-Content -Raw ./in.txt) -split '(.{10})' -ne '' | Set-Content out.txt

# Or the shortened version
$ (gc -Ra in.txt) -split '(.{10})' -ne '' >out.txt

Of course it's better to write the entire script in PowerShell but if you really can't then you can simply call it from cmd or a batch file like this

powershell -C "(gc -Ra in.txt) -split '(.{10})' -ne '' >out.txt"

This method reads the whole file and split into 10-character strings using the .{10} regex so it won't work for files like GBs big. In such cases of huge files you can use this

$ Get-Content -AsByteStream -ReadCount 10 ./in.txt | `
  ForEach-Object { [Text.Encoding]::ASCII.GetString($_) } | `
  Set-Content out.txt

# Or the shortened version
$ gc -A -Re 10 in.txt |% { [Text.Encoding]::ASCII.GetString($_) } >out.txt

This will read the input file as a byte stream and then grab every 10 bytes and print as string. That means there's no limit in line length.
Remember to select the correct encoding of your files by replacing [Text.Encoding]::ASCII with
[Text.Encoding]::GetEncoding("windows-1252") (the default charset in US Windows), or
[Text.Encoding]::GetEncoding("iso-8859-1")... depending on whether your input files are in CP1252, ISO-8859-1, or other encodings... You can simply check the encoding in Notepad++.
For UTF-8 and UTF-16 you'll need [Text.Encoding]::UTF8 and [Text.Encoding]::Unicode but this won't quite work for UTF because of the variable multibyte encoding. You can use this solution instead:

Get-Content ./in.txt | ForEach-Object {
    $line = $_
    for ($i = 0; $i -lt $line.Length; $i += 10) {
        $line.Substring($i, [Math]::Min(10, $line.Length - $i))
    }
}

You can call from cmd like above, or add some options like this to speed up the startup time

powershell -NoProfile -ExecutionPolicy Bypass -NoLogo -NonInteractive -Command "gc -A -Re 10 in.txt |% { [Text.Encoding]::ASCII.GetString($_) } >out.txt"

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.