Checkpoint Management#
| Activity name | Checkpoint Management |
| Activity ID | 40 |
| Short Description | Create and manage config checkpoints |
| Difficulty | Advanced |
| Topology Nodes | Any SRL or SR OS node |
| References | LSO Framework, Workflow Development |
1. Objective#
When you change live network devices, mistakes or unexpected side effects are always possible. Operators need known-good restore points (checkpoints of the running configuration) and a repeatable way to create, roll back to, or clean up those snapshots. Doing that only by hand, or only on the box without a consistent operational pattern, is slow, error-prone, and hard to scale across many nodes.
In this activity you use Device Operations to create checkpoints, revert to one you already saved, then extend the flows instead of relying only on one-off CLI steps.
2. Technology explanation#
2.1 Device Operations (Operations Manager)#
The Device Management in NSP provides a set of tools for managing network devices throughout their lifecycle. These tools are grouped into Operations that can perform a wide range of tasks, including checking device configuration, taking backups, restoring data, and upgrading software.
These operations help oversees running those workflows, keeping track of their progress, and giving you control over when and how they run.
Each operation targets one or more devices. To run it, the workflow needs to know the device's ID (called ne-id). Extra fields come from the operation’s YANG model and mapping.
2.2 Workflow Manager (WFM)#
Workflows are written in YAML using Mistral DSL v2: straightforward expressions, branching, and data passing between tasks. Skim the bundled examples first, then open the tasks you plan to change.
2.3 Artifact Bundle#
In NSP, an artifact bundle can group artifacts for several applications at once. In this activity, the bundle ships LSO device operations and the workflow assets those operations need.
You can publish an updated bundle without withdrawing or deleting the previous one as long as the bundle and its artifacts use higher version numbers. Expect install to fail if a new operation-type ships a YANG model that is incompatible with the model from the version already installed.
Note
The LSO deployer prevents Artifact Manager from uninstalling or removing the bundle until the operation is set to withdrawn. You should not need to uninstall anything here. The operation is meant to stay deployed while you add the enhancements in the tasks below.
3. Tasks#
You should read these tasks from top to bottom before beginning the activity.
It is tempting to skip ahead, but tasks may require you to have completed previous tasks before tackling them.
Warning
Remember that you are using a shared NSP system. Include your group number in every workflow input that asks for Group ID (and in filters such as g<N>-p1 where applicable).
3.1 Quick start on NSP Web UI#
| NE Session | ☰ → Network Search and Inventory → find your group's PE node (for example g7-pe1) → open the row context menu ⋮ → Open in NE Session. |
| NSP Help | ? icon at the top right for context-aware quick help and to open the Help Center. On some pages, the ? icon also links directly to related Help Center articles. |
| Operations Manager | ☰ → Device Management → All Operations |
| Workflow Manager | ☰ → Workflows |
| Artifacts | ☰ → Artifacts |
Note
- The operation bundle will be unsigned, giving you the opportunity to update workflows without rebuilding and reinstalling the whole bundle each time.
- Since installing an artifact is admin-only task, your user or group still needs to be granted access to the bundled workflows because of NSP access controls. Reach out to the team if you dont see your workflows.
- The supplied bundle already runs local checkpoint creation for SR Linux
- It is advised that you do not edit the operation artifact for this exercise but put your customizations as part of the workflows.
3.2 Checkpoint creation#
Defining a valid checkpoint operation bundle by hand means a lot of structural boilerplate before you reach any useful logic. Use the LSO Skeleton Bundle Generator instead. It generates an artifact bundle in the format NSP expects, so you start from a working template and focus on the workflows in this activity.
- When you want to inspect the bundle locally, download, unpack the archive and review the layout.
- When you trust the template, use the generator’s Upload to NSP to install the bundle.
Whats interesting about the bundle?
% cd ne-checkpoint-revert-1.0.0-g00
% tree
.
├── content
│ ├── actions-revert-group00
│ │ ├── getFSCheckpoints-group00.action
│ │ └── getLCCheckpoints-group00.action
│ ├── ne-checkpoint-group00
│ │ ├── README.md
│ │ ├── ne-checkpoint-group00.json
│ │ └── ne-checkpoint-group00.yaml
│ ├── ne-revert-group00
│ │ ├── README.md
│ │ ├── ne-revert-group00.json
│ │ └── ne-revert-group00.yaml
│ └── operation-checkpoint
│ ├── ne-checkpoint-group00.yaml
│ ├── ne-checkpoint-group00.yang
│ └── operation_types.json
└── metadata.json
6 directories, 12 files
Explore the files:
- Map where you can view or edit code (Artifact Manager, Workflow Manager, Operations Manager / operation types) and note what is read-only versus editable.
- Trace how the artifact bundle is laid out: which folders map to operations, workflows, and supporting actions, and how NSP deploys them together.
- For one operation artifact, list the moving parts (YANG model, mapping, scripts or helpers, metadata) and describe how data flows from the Web UI into a workflow run.
- Read the mapping profile and explain how operation inputs and outputs bind to the underlying model fields.
- Compare workflow input and output definitions to the operation model and mapping, then confirm how that shows up in the Operations Manager forms you used earlier.
- On rollback-style flows, compare inputs such as "pick a backup operation" versus "raw path or filename". Trigger similar actions from different Web UI entry points and inspect the resulting workflow inputs to see why both styles sometimes exist.
- On SR Linux, follow where checkpoints land under
/etc/opt/srlinux/checkpoint/(path may vary slightly by release; use device state or workflow logs to confirm the exact file layout on your node).
3.2.1 Run the checkpoint operation (optional)#
When the bundle status is Installed, you can try this optional run.
- Open Operations Manager and start
+ Operation. - Select the checkpoint operation type that came from your bundle (the walk-through text uses
ne-checkpoint-groupNN; your type name ends with your group id, for examplene-checkpoint-group07). Set an operation name, for examplefirst-demo-checkpoint-07. - Under Target NEs, select one or more SR Linux nodes. If the operation input offers
Save to FS, you can toggle it, but the starter bundle does not implement saving checkpoints to the file server yet. Adding that path is part of the main exercise, so treat this optional run as on-device checkpoints only. - Select
Execute Immediately, then clickRun. When the run finishes, the execution status should showcompleted. - Review the operation result. From the operation execution row, open the
⋮menu and chooseOpen in Workflowsto jump to the underlying workflow execution and its details.
3.3 Revert to existing checkpoint#
As part of the checkpoint creation, the bundle comes with a ready to use ne-revert workflow to perform the revert operation on SR Linux devices.
3.3.1 Run the revert operation (optional)#
- Navigate to Workflow Manager and select Workflows from the drop down.
- Open the workflow definition
ne-revert-group00(or your group’s name). - Click Execute (play).
- Fill the inputs: pick one SR Linux node that already has a local checkpoint, choose the checkpoint file to revert to, and skip file-service options for this smoke test.
- Run, then use the quick-view icon on the execution to inspect steps, inputs, and outputs.
How to explore the code?
- In the Web UI, open Definition and skim every task: inputs, publishes, and branches.
- Decide where SR OS or "revert from NSP copy" logic should plug in before you edit.
- Compare workflow input/output definitions against the operation model and mapping. See how they align with the WebUI rendering.
3.4 Extend to SR OS#
3.4.1 Checkpoint Creation#
To extend the flow for SR OS, the default execution stops because createSROScheckpoint is still a std.noop (no operation) placeholder. So your job is to:
- Find the SR OS CLI (or MD-CLI) steps that create a local checkpoint and where the
.cfg(or equivalent) file ends up on disk. - Replace createSROScheckpoint with
nsp.managed_cliso those commands run on SR OS nodes. - Test on one SR OS node and confirm the checkpoint file exists where you expect.
- Publish fields the rest of the workflow needs (same idea as SR Linux), e.g.
sourcePathandcheckpoint(watch spelling onsourcePath).
You can always refer to the SR OS documentation and to Workflow Development best practices.
Expanded hint
Updated definition for createSROScheckpoint task:
# SR OS dummy commit to create new config.cfg
createSROScheckpoint:
action: nsp.managed_cli
input:
neId: <% $.neId %>
stopOn: <% $.stop %>
idleTimeout: 30
closeSession: true
cmds:
- /!md-cli
- configure private
- commit
- exit
publish:
sourcePath: cf3:/config.cfg
checkpoint: config.cfg
on-success:
- CheckpointSuccess
on-error:
- CheckpointFailed
3.4.2 Revert#
If you look at the previous step, reverting from a local checkpoint file already works on SR Linux. For SR OS, the operation simply does not work. So we need to add revertFromFileSROS as a nsp.managed_cli task (the action name is managed_cli, not manage_cli).
- Look up how SR OS applies a full-replace load from a file on the device.
- Note where
.cfgcheckpoints live for your software train. - Implement revertFromFileSROS and validate on one SR OS node (confirm the running config matches what you expect after revert).
- Publish whatever downstream tasks need (
loadFile, status strings, etc.).
You can always refer to the SR OS documentation and to [Workflow Development best practices] (https://network.developer.nokia.com/learn/25_11/artifact-development/programming/workflows/wfm-workflow-development/wfm-best-practices/).
Expanded hint
Create a new task for revertFromFileSROS:
revertFromFileSROS:
action: nsp.managed_cli
input:
neId: <% $.neId %>
stopOn: <% $.stop %>
idleTimeout: 30
closeSession: true
cmds:
- /!md-cli
- configure private
- load full-replace <% $.loadFile %>
- commit
- exit
publish:
lsoInfo: "Revert from file <% $.loadFile %> successfully completed"
publish-on-error:
lsoInfo: "Revert from file <% $.loadFile %> failed"
on-success:
- RevertSuccess
on-error:
- RevertFailed
3.5 NSP File Service#
3.5.1 Checkpoint creation#
Now that we have seen how to create checkpoints within NE, lets explore how the same can be saved within NSP File Server.
- Create the target folder on the file service.
- Reuse the same path pattern getDeviceInfo already publishes as
dirName(vendor, family,ne-id, etc.), e.g./lsom/checkpoint/<% task().result.content.get("ne-vendor") %>/<% task().result.content.get("ne-family").replace(" ","_") %>/<% task().result.content.get("ne-id") %>. - Call the directory API with
nsp.https(directory create).
- Reuse the same path pattern getDeviceInfo already publishes as
- Transfer the checkpoint file from the NE to that folder.
- Use the predefined
lso_transferFilesFromNeaction with inputs such as:neId:<% $.neId %>sourcePath:<% $.sourcePath %>destinationPath:<% $.dirName %>/<% $.timestamp %>ftUuid: unique id for this transfer, e.g.<% $.neId+"-nsp_"+$.timestamp %>
- Use the predefined
- Rename the uploaded file so a later task can fetch a predictable name. Use the file-service rename API.
Expanded hint
Updated definition:
createBackupFolder:
action: nsp.https
input:
method: POST
url: https://file-service/nsp-file-service-app/rest/api/v1/directory?dirName=<% $.dirName %>/<% $.timestamp %>
resultFilter : $.content.data.fileName
publish-on-error:
lsoStageError: <% task().result %>
lsoInfo: "Failed: creating checkpoint directory on file-server"
on-success:
- transferCheckpoint
on-error:
- CheckpointFailed
transferCheckpoint:
action: lso_transferFilesFromNe
input:
neId: <% $.neId %>
sourcePath: <% $.sourcePath %>
destinationPath: <% $.dirName %>/<% $.timestamp %>
ftUuid: <% $.neId+"-nsp_"+$.timestamp %>
publish:
lsoInfo: "starting transfering file from node to server"
publish-on-error:
lsoInfo: "Files transfer from NE to File Server failed.Please check execution progress for details. Possible reasons for failure: FTP policy not assigned or has incorrect properties, file transfer failed due to interim failure, file service not functional, file transfer timeout, etc"
lsoStageError: <% task().result.where((isDict($) and $.containsKey('errorType')) or not isDict($)).last() %>
stage: "fileTransfer"
on-success:
- renameCheckpoint
on-error:
- CheckpointFailed
renameCheckpoint:
action: nsp.https
input:
method: POST
url: https://file-service/nsp-file-service-app/rest/api/v1/file/rename?sourceFilePath=<% $.dirName %>/<% $.timestamp %>/<% $.checkpoint %>&&targetFilePath=<% $.dirName %>/<% $.timestamp %>/nsp_checkpoint.<% $.extension %>
publish-on-error:
lsoStageError: <% task().result %>
lsoInfo: "Failed: creating checkpoint directory on file-server"
on-success:
- CheckpointSuccess
on-error:
- CheckpointFailed
Make sure you update the createSRLcheckpoint and createSROScheckpoint tasks to go to the previous two tasks only when required:
3.5.2 Revert#
Copy a checkpoint from the file service back to the device, then feed that path into your existing revert tasks.
Step
- Transfer from file service to NE with
lso_transferFilesToNe, for example:neId:<% $.neId %>sourcePath:"<% $.pathFS %>/nsp_checkpoint.<% $.fileExt %>"(adjust if your rename step used another basename;fileExtisjsonvscfgfor SR Linux vs SR OS as appropriate)destinationPath:/tmp/(or another agreed path on the box)ftUuid: e.g.<% $.neId+"-nsp_"+$.timestamp %>(must stay unique per transfer)
Expanded hint
Updated definition:
transferCheckpoint:
action: lso_transferFilesToNe
input:
neId: <% $.neId %>
sourcePath: "<% $.pathFS %>/nsp_checkpoint.<% $.fileExt %>"
destinationPath: "/tmp/"
ftUuid: <% $.neId+"-nsp_"+$.timestamp %>
publish:
lsoInfo: "Checkpoint file transferred to NE successfully"
loadFile: "/tmp/nsp_checkpoint.<% $.fileExt %>"
publish-on-error:
lsoInfo: "Failed to transfer checkpoint file to NE"
on-success:
- revertFromFileSRL
on-error:
- RevertFailed
You can optionally update the createSRLcheckpoint task to remove the file from the device, if needed:
4. Summary#
Congratulations! You have completed this activity. Take a moment to review what you achieved:
- Experienced Device Management Operations to manage checkpoints for Nokia devices.
- Learned how to navigate and view the created configuration checkpoints using File Service.
- Understand what artifacts are and install an artifact bundle in NSP.
- Extended the artifact bundle provided to support SR OS.
- Extended the artifact bundle provided to support file service integration.
- Explored the relationship of Device Operations, Workflows, Artifact Bundles and File Service.
- Looked in WFM design best-practices.