Replacing some text

ianw1974 · July 6, 2011

Will be interesting to see if anyone knows how I can do this :)

I need to replace some text, the first 6 - 9 characters are numbers. Then there are two letters, of which then follow four more characters which are a mix of numbers. There is a possibility that the two letters that proceed the first 6 - 9 characters could occur in the last four, so need to make sure I only replace the first instance that follows the first 6 - 9 characters. I'm thinking of sed, but I'm not sure exactly how I can get it to find the 6 - 9 characters, and then replace the two letters with a new string.

Any ideas?

paul · July 6, 2011

mmm been at the pub tonight .. too many pints to think straight .. but the is definitely a sed and regex thing.

shall call back here tomorrow

ianw1974 · July 6, 2011

OK, cool, hope the pub was good. Could do with some :beer: myself.

paul · July 7, 2011

mmmm

s/^*[0-9]{6,9}[a-zA-Z][a-zA-Z]*

match from the beginning anything followed by 0-9 a minimum of 6 but a maximum of 9 times, then match an alpha character, then another alpha character, then match anything

a replace might look like this:

s/^*[0-9]{6,9}[a-zA-Z][a-zA-Z]*/mytext/

something like that perhaps?

ianw1974 · July 7, 2011

Will have to try it. Problem is the text file has one single line of all these numbers separated by spaces. It's a pity each number isn't on a separate line, then it might be easier to parse the file. For example, it would look like this:

123456SA3456 098775443SA6666

and so on instead of having each number on a separate line. A colleague went and did it with perl, so I'll have to play with this with the file they sent me and see what I can do :)

I need to replace the SA in the middle.

SilverSurfer60 · July 7, 2011

Do you want to preserve the spaces?

Are the number of spaces always the same?

ianw1974 · July 8, 2011

No, the spaces identify the next number in the file. Instead of each number being on a new line, they are separated by a space - probably by the system that generated the file. As far as I'm aware, the space is always the same - a single space.

SilverSurfer60 · July 8, 2011

Depending on the length of line that is input a simple way of doing what you want is.

sed 's/ /\n/g'<input.txt >temp.txt #This will give you a list with newline at then end.

sed 's/[a-zA-Z][a-zA-Z]/-Replaced-/ 1' <temp.txt >out.txt #Job done!

Sounds a bit too simple for me! There must be a gotcha somewhere. :)

input.txt

temp.txt

out.txt

SilverSurfer60 · July 16, 2011

Have you tried it Ian?

If so did it work or was it totally off the mark?

ianw1974 · July 16, 2011

I'm currently on hols at the minute but will have to give it a go. Just to clarify, is what you posted to replace the "SA" that is located somewhere in the middle? Because this is the only instance it can replace. It could appear in the last four characters and this one I wouldn't want to replace.

I won't be back in to check it until the end of the month :)

SilverSurfer60 · July 20, 2011

The only instance replaced is the first instance in the string.

The magic is the '1' in the REGEX as in sed 's/[a-zA-Z][a-zA-Z]/-Replaced-/ 1' <temp.txt >out.txt

Also the single quotes are part of the expression.

Edited July 20, 2011 by SilverSurfer60

ianw1974 · August 3, 2011

Hi, no that didn't work. It replaced only the first instance in the file as everything is on one long line. I'll try creating carriage returns and test again.

ianw1974 · August 3, 2011

No, didn't work with that either. Worked better, but replaced something completely different than what was intended. It was meant to replace the first instance of SA, but other text exists, and it search and found any text. I need it to find just "SA" and replace this.

ianw1974 · August 3, 2011

There is text that appears before the first 6 to 9 numbers - this needs to be ignored.

It needs to locate the 6 - 9 numbers and replace the SA that follows this.

Therefore, any other instance of SA before these numbers is to be ignored and any SA that follows after the first instance of SA after the 6 - 9 numbers needs to be ignored.

SilverSurfer60 · August 7, 2011

Not knowing exactly what your input is like I am guessing a little.

However I did paste the wrong Regex for the second sed.

The first is OK as it should split the input into a list of lines, each line with a \n where there is a space in the input.

That is sed 's/ /\n/g'<input.txt >temp

The second part should be sed 's/SA/-Replaced-/ 1' <temp >out.txt

Notice the absence of the 'g'

The '1' in the expression should replace the first occurrence of SA on each line.

Now if SA occurs before the 6-9 numbers that is a different situation.

Edited August 7, 2011 by SilverSurfer60

Replacing some text

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation